An Extensible, Scalable Spark Platform for Alignment-free Genomic Analysis – Version 1
Alignment-free similarity/distance functions, a computationally convenient alternative to alignment-based tasks in Computational Biology (e.g., classification and taxonomy), are a largely ignored Big Data problem, a fact limiting their impact, potentially vast. We provide the first user-friendly, extensible and scalable Spark platform for their computation, including (a) statistical significance tests of their output; (b) useful novel indications about their day-to-day use. Our contribution addresses an acute need in Alignment-free sequence analysis.