A Fault Tolerant Elastic Resource Management Framework Towards High Availability of Cloud Services
Cloud computing has become inevitable for every digital service which has exponentially increased its usage. However, a tremendous surge in cloud resource demand stave off service availability resulting into outages, performance degradation, load imbalance, and excessive power-consumption. The existing approaches mainly attempt to address the problem by using multi-cloud and running multiple replicas of a virtual machine (VM) which accounts for high operational-cost. This paper proposes a Fault Tolerant Elastic Resource Management (FT-ERM) framework that addresses aforementioned problem from a different perspective by inducing high-availability in servers and VMs. Specifically, (1) an online failure predictor is developed to anticipate failure-prone VMs based on predicted resource contention; (2) the operational status of server is monitored with the help of power analyser, resource estimator and thermal analyser to identify any failure due to overloading and overheating of servers proactively; and (3) failure-prone VMs are assigned to proposed fault-tolerance unit composed of decision matrix and safe box to trigger VM migration and handle any outage beforehand while maintaining desired level of availability for cloud users. The proposed framework is evaluated and compared against state-of-the-arts by executing experiments using two real-world datasets. FT-ERM improved the availability of the services up to 34.47 62.4
READ FULL TEXT