Model-based controlled learning of MDP policies with an application to lost-sales inventory control

11/30/2020
by   Willem van Jaarsveld, et al.
0

Recent literature established that neural networks can represent good MDP policies across a range of stochastic dynamic models in supply chain and logistics. To overcome limitations of the model-free algorithms typically employed to learn/find such neural network policies, a model-based algorithm is proposed that incorporates variance reduction techniques. For the classical lost sales inventory model, the algorithm learns neural network policies that are superior to those learned using model-free algorithms, while also outperforming heuristic benchmarks. The algorithm may be an interesting candidate to apply to other stochastic dynamic problems in supply chain and logistics.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset