Multi-Tenant Cross-Slice Resource Orchestration: A Deep Reinforcement Learning Approach
In a software-defined radio access network (RAN), a major challenge lies in how to support diverse services for mobile users (MUs) over a common physical network infrastructure. Network slicing is a promising solution to tailor the network to match such service requests. This paper considers a software-defined RAN, where a limited number of channels are auctioned across scheduling slots to MUs of multiple service providers (SPs) (i.e., the tenants). Each SP behaves selfishly to maximize the expected long-term payoff from competition with other SPs for the orchestration of channel access opportunities over its MUs, which request both mobile-edge computing and traditional cellular services in the slices. This problem is modelled as a stochastic game, in which the decision makings of a SP depend on the network dynamics as well as the control policies of its competitors. We propose an abstract stochastic game to approximate the Nash equilibrium. The selfish behaviours of a SP can then be characterized by a single-agent Markov decision process (MDP). To simplify decision makings, we linearly decompose the per-SP MDP and derive an online scheme based on deep reinforcement learning to approach the optimal abstract control policies. Numerical experiments show significant performance gains from our scheme.
READ FULL TEXT