Learning to Dynamically Select Cost Optimal Schedulers in Cloud Computing Environments
The operational cost of a cloud computing platform is one of the most significant Quality of Service (QoS) criteria for schedulers, crucial to keep up with the growing computational demands. Several data-driven deep neural network (DNN)-based schedulers have been proposed in recent years that outperform alternative approaches by providing scalable and effective resource management for dynamic workloads. However, state-of-the-art schedulers rely on advanced DNNs with high computational requirements, implying high scheduling costs. In non-stationary contexts, the most sophisticated schedulers may not always be required, and it may be sufficient to rely on low-cost schedulers to temporarily save operational costs. In this work, we propose MetaNet, a surrogate model that predicts the operational costs and scheduling overheads of a large number of DNN-based schedulers and chooses one on-the-fly to jointly optimize job scheduling and execution costs. This facilitates improvements in execution costs, energy usage and service level agreement violations of up to 11
READ FULL TEXT