Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking for Everyone

by   Zhen Xu, et al.

Obtaining standardized crowdsourced benchmark of computational methods is a major issue in scientific communities. Dedicated frameworks enabling fair continuous benchmarking in a unified environment are yet to be developed. Here we introduce Codabench, an open-sourced, community-driven platform for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone, free of charge, and allows benchmark organizers to compare fairly submissions, under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating the organization of benchmarks flexibly, easily and reproducibly. Firstly, it supports code submission and data submission for testing on dedicated compute workers, which can be supplied by the benchmark organizers. This makes the system scalable, at low cost for the platform providers. Secondly, Codabench benchmarks are created from self-contained bundles, which are zip files containing a full description of the benchmark in a configuration file (following a well-defined schema), documentation pages, data, ingestion and scoring programs, making benchmarks reusable and portable. The Codabench documentation includes many examples of bundles that can serve as templates. Thirdly, Codabench uses dockers for each task's running environment to make results reproducible. Codabench has been used internally and externally with more than 10 applications during the past 6 months. As illustrative use cases, we introduce 4 diverse benchmarks covering Graph Machine Learning, Cancer Heterogeneity, Clinical Diagnosis and Reinforcement Learning.


page 3

page 4

page 14

page 16


Benchmarking Contemporary Deep Learning Hardware and Frameworks:A Survey of Qualitative Metrics

This paper surveys benchmarking principles, machine learning devices inc...

BeFaaS: An Application-Centric Benchmarking Framework for FaaS Platforms

Following the increasing interest and adoption of FaaS systems, benchmar...

OR-Benchmark: An Open and Reconfigurable Digital Watermarking Benchmarking Framework

Benchmarking digital watermarking algorithms is not an easy task because...

The Benchmark Lottery

The world of empirical machine learning (ML) strongly relies on benchmar...

MLPerf Training Benchmark

Machine learning is experiencing an explosion of software and hardware s...

RiverBench: an Open RDF Streaming Benchmark Suite

RDF data streaming has been explored by the Semantic Web community from ...

Why every GBDT speed benchmark is wrong

This article provides a comprehensive study of different ways to make sp...

Please sign up or login with your details

Forgot password? Click here to reset