In the field of quantitative trading, it is common practice to transform...
In high-dimensional time-series analysis, it is essential to have a set ...
Optimal execution is a sequential decision-making problem for cost-savin...
The delayed feedback problem is one of the imperative challenges in onli...
The exploration/exploitation (E E) dilemma lies at the core of interac...
It is a popular belief that model-based Reinforcement Learning (RL) is m...
A chatbot that converses like a human should be goal-oriented (i.e., be
In machine learning, it is observed that probabilistic predictions somet...
Click-through rate (CTR) prediction has been one of the most central pro...
Model-free reinforcement learning methods such as the Proximal Policy
We study a generalized contextual-bandits problem, where there is a stat...