Metabolomics in the Cloud: Scaling Computational Tools to Big Data
Background: Metabolomics datasets are becoming increasingly large and complex, with multiple types of algorithms and workflows needed to process and analyse the data. A cloud infrastructure with portable software tools can provide much needed resources enabling faster processing of much larger datasets than would be possible at any individual lab. The PhenoMeNal project has developed such an infrastructure, allowing users to run analyses on local or commercial cloud platforms. We have examined the computational scaling behaviour of the PhenoMeNal platform using four different implementations across 1-1000 virtual CPUs using two common metabolomics tools. Results: Our results show that data which takes up to 4 days to process on a standard desktop computer can be processed in just 10 min on the largest cluster. Improved runtimes come at the cost of decreased efficiency, with all platforms falling below 80 number of vCPUs. An economic analysis revealed that running on large scale cloud platforms is cost effective compared to traditional desktop systems. Conclusions: Overall, cloud implementations of PhenoMeNal show excellent scalability for standard metabolomics computing tasks on a range of platforms, making them a compelling choice for research computing in metabolomics.
READ FULL TEXT