High-Performance Mining of COVID-19 Open Research Datasets for Text Classification and Insights in Cloud Computing Environments

09/16/2020
by   Jie Zhao, et al.
0

COVID-19 global pandemic is an unprecedented health crisis. Since the outbreak, many researchers around the world have produced an extensive collection of literatures. For the research community and the general public to digest, it is crucial to analyse the text and provide insights in a timely manner, which requires a considerable amount of computational power. Clouding computing has been widely adopted in academia and industry in recent years. In particular, hybrid cloud is gaining popularity since its two-fold benefits: utilising existing resource to save cost and using additional cloud service providers to gain assess to extra computing resources on demand. In this paper, we developed a system utilising the Aneka PaaS middleware with parallel processing and multi-cloud capability to accelerate the ETL and article categorising process using machine learning technology on a hybrid cloud. The result is then persisted for further referencing, searching and visualising. Our performance evaluation shows that the system can help with reducing processing time and achieving linear scalability. Beyond COVID-19, the application might be used directly in broader scholarly article indexing and analysing.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset