High-Dimensional Dynamic Pricing under Non-Stationarity: Learning and Earning with Change-Point Detection
We consider a high-dimensional dynamic pricing problem under non-stationarity, where a firm sells products to T sequentially arriving consumers that behave according to an unknown demand model with potential changes at unknown times. The demand model is assumed to be a high-dimensional generalized linear model (GLM), allowing for a feature vector in ℝ^d that encodes products and consumer information. To achieve optimal revenue (i.e., least regret), the firm needs to learn and exploit the unknown GLMs while monitoring for potential change-points. To tackle such a problem, we first design a novel penalized likelihood-based online change-point detection algorithm for high-dimensional GLMs, which is the first algorithm in the change-point literature that achieves optimal minimax localization error rate for high-dimensional GLMs. A change-point detection assisted dynamic pricing (CPDP) policy is further proposed and achieves a near-optimal regret of order O(s√(Υ_T T)log(Td)), where s is the sparsity level and Υ_T is the number of change-points. This regret is accompanied with a minimax lower bound, demonstrating the optimality of CPDP (up to logarithmic factors). In particular, the optimality with respect to Υ_T is seen for the first time in the dynamic pricing literature, and is achieved via a novel accelerated exploration mechanism. Extensive simulation experiments and a real data application on online lending illustrate the efficiency of the proposed policy and the importance and practical value of handling non-stationarity in dynamic pricing.
READ FULL TEXT