Enabling Single-Cell Benchmarks at Scale: The OpenProblems.bio, Viash, and Seqera Partnership

Single-Cell: The Interface of ML and Bio

Single-cell analysis sits at the intersection of machine learning and biology. Advances in microfluidic technology have transformed data collection, creating large, tabular datasets well suited to machine-learning (ML) methods.

OpenProblems.bio formalizes evaluating approaches into living, community‑run benchmarks (curated tasks, datasets, methods, and metrics) and then executes them as robust pipelines. Computational methods are implemented as Viash components, converted into Nextflow modules, and then assembled into Nextflow workflows. These workflows are run on Seqera Platform, giving teams elastic scale and shared visibility while staying portable across both cloud and HPC environments.

Discover OpenProblems.bio

Bringing the ML and Bio Communities Together

One area where cultural differences between ML and bioinformatics communities can emerge is in preferred computational environments. Interactive environments like Jupyter notebooks enable rapid experimentation and interactive model development. In contrast, workflow management systems like Nextflow provide robust error handling, reproducibility, and scalable execution for complex, multi-step analyses across diverse computing environments.

OpenProblems is designed to reconcile these cultures. OpenProblems transforms core challenges in single-cell analysis into standardized benchmarks with transparent metrics and openly published results. Drawing inspiration from breakthrough ML competitions like ImageNet, OpenProblems covers a growing set of single-cell task families including dimensionality reduction (global structure preservation), denoising (recovery of simulated missing counts), and perturbation prediction (as demonstrated in the NeurIPS 2023 challenge). Each task provides standardized loaders, metrics, and openly published results for true apples-to-apples comparisons.

The “why now” is simple: single‑cell datasets are big, tabular, and noisy, making them perfectly suited to ML methods. However, the scale and complexity of these datasets demand rigorous workflows to ensure reproducible results. OpenProblems borrows the best of past ML and bio benchmarking traditions and adapts them to single‑cell.

Enabling Community-Driven Single-Cell Benchmarks

OpenProblems.bio is possible because three complementary technologies create a solution where communities can contribute naturally while maintaining the highest standards of reproducibility and scale. In short: Viash and Nextflow deliver portability and reproducibility; Seqera provides scale and collaboration.

1. OpenProblems.bio: The Community

OpenProblems.bio provides the scientific framework that makes fair comparison possible:

→Community‑run, living benchmarks for single‑cell applications with formalized tasks, curated datasets, and clear metrics.
→Open results & governance with public repos, published methods/results, and documented decision‑making.
→Built-in quality controls: beyond results, the framework bakes in control methods, QC reports, and input/output validation so issues are detected as early as possible, keeping contributions trustworthy and reproducible at scale.

2. Viash: The Critical Bridge

Viash is what makes the entire OpenProblems ecosystem possible by solving the notebook-to-pipeline problem:

→Bridges scripts‑to‑pipeline: accept Python/R scripts, wrap as components, and compile to Nextflow for fair, repeatable comparisons without any boilerplate.
→Lowers the barrier to entry so new contributors don’t need deep Docker or Nextflow expertise to get started.
→End‑to‑end versioning for components, containers, and workflows, plus dependency management across repositories to keep methods auditable and reusable over time.

Viash allows ML scientists to contribute their methods without needing to become familiar with the complex workflows OpenProblems uses for reproducibility.

3. Seqera: Scalable, Flexible Execution

The final piece is execution infrastructure that can handle community-scale benchmarking. Seqera Cloud provides elasticity, queueing/retries, and shared observability across AWS, GCP, and maintainer-operated HPC without changing the underlying Viash components or Nextflow pipelines:

→Elastic scale on AWS Batch and Google Batch, using simple resource tags to pick the right horsepower per task: lowcpu, midcpu, highcpu, lowmem, midmem, highmem, gpu, biggpu.
→Multiple execution environments per campaign: we stand up the right backends and grant access to task leaders so teams can run benchmarks without setting up infrastructure themselves.
→HPC when a team needs full control: in some cases we configure an HPC execution environment and still surface runs in one place for transparency.

Open Science with Complete Traceability

OpenProblems keeps results, methods, and decisions public to earn trust. The same component or workflow contract that enables fair comparison also supports traceability: versioned containers, tested I/O, and reproducible runs, as well as practices that many organizations extend with SBOMs, vulnerability scans, and audit trails to speed security reviews and regulatory documentation. (Data Intuitive often helps teams institutionalize those controls.)

Looking Forward: The Future of BioML Collaboration

OpenProblems.bio represents more than just a benchmarking platform; it embodies a new model for how scientific communities can collaborate across disciplinary boundaries. As biological data grows in scale and AI methods become increasingly sophisticated, platforms like this will become essential infrastructure. The initiative demonstrates that when communities unite around shared challenges, innovations exceed what any single group could achieve alone.

This collaborative approach enables the community to move fast without breaking reproducibility. By bridging bioinformatics tradition with ML innovation through Viash components, Nextflow workflows, and Seqera Cloud execution, OpenProblems creates scalable infrastructure that connects scientific ambition with computational reality in the age of AI.

Get Involved

→Join OpenProblems.bio: Contribute a method or propose a task. Wrap your tool as a Viash component, plug into a standardized task, and see it evaluated with transparent metrics alongside peers. Everything lives in public repos with clear docs and governance so you can jump in quickly.
→Package methods once, run anywhere with Viash: Turn a notebook or script into a versioned, containerized component today. Compile to a Nextflow module, slot into a workflow, and keep your users on laptops, HPC, or cloud without re‑engineering every time.
→Run benchmarks at scale with Seqera: If you’re coordinating a community benchmark, let researchers run without wrestling infra. Set up AWS, GCP, or HPC execution environments, grant the right access, and use tags to match resources to jobs, then monitor runs in one place.

Community and Sponsors

None of this works without the community of method authors, dataset curators, task leaders, and reviewers who contribute code, issues, and discussion. OpenProblems is hosted in the open, with core support from the Chan Zuckerberg Initiative (CZI) and Helmholtz Munich, and engineering and enterprise enablement from Data Intuitive.

Seqera provides the execution infrastructure that can handle community-scale benchmarking. Try it now

Try Seqera Now