We derive the first finite-time logarithmic regret bounds for Bayesian
b...
We study the finite-horizon offline reinforcement learning (RL) problem....
Many practical applications, such as recommender systems and learning to...
Fixed-budget best-arm identification (BAI) is a bandit problem where the...
A contextual bandit is a popular and practical framework for online lear...
We develop a meta-learning framework for simple regret minimization in
b...
Graph Neural Networks (GNNs) have emerged as powerful tools to encode gr...
Mean rewards of actions are often correlated. The form of these correlat...
Graph Neural Networks (GNNs) have achieved state of the art performance ...
Logical reasoning over Knowledge Graphs (KGs) is a fundamental technique...
Knowledge Graphs (KGs) are ubiquitous structures for information storage...
We study the problem of Robust Outlier Arm Identification (ROAI), where ...
This paper studies the problem of adaptively sampling from K distributio...
In many practical problems, a learning agent may want to learn the best
...
We consider the problem of active coarse ranking, where the goal is to s...
The probability that a user will click a search result depends both on i...
We propose stochastic rank-1 bandits, a class of online learning problem...
A search engine recommends to the user a list of web pages. The user exa...
The dueling bandit problem is a variation of the classical multi-armed b...