Jeferyd Yepes GarcíaJeferyd Yepes García
Sep 22, 2025

MAGFlow/BIgMAG: The Batman & Robin of the Metagenomics Field

Anyone tasked with building Metagenome-Assembled Genomes (MAGs) from whole genome sequencing (WGS) data knows that it can be overwhelming when selecting the right pipeline or set of tools. In the metagenomics field, there are at least 33 bioinformatics pipelines aimed at recovering MAGs (please check 2Pipe), and you may wonder: do I obtain the same results with different pipelines? How do they compare in terms of genome quality and completeness? Do my bins (draft MAGs) match the Minimum Information about a MAG (MIMAG criteria)? Is the taxonomic assignment of the bin shared across pipeline outputs? For instance, let’s say that you’ve run three different binning tools and a handful of quality-measuring software, how do you keep track of all those results without drowning in TSV files?

Don’t panic, MAGFlow/BIgMAG has come to the rescue. This is an integrated tool composed of a Nextflow pipeline (MAGFlow) to orchestrate software execution and an interactive dashboard (Board Integrating Metagenome-Assembled Genomes, aka BIgMAG) that uses MAGFlow’s output to rapidly provide a consistent summary to evaluate the quality of your assembled bins. This combination eases routine MAG analysis by streamlining the software execution through ready-to-run commands, allowing the benchmarking of different pipelines or binning software on the fly. And wait, there’s even more: MAGFlow/BIgMAG can also be used as a tool to perform an exploratory analysis of bins coming from multiple samples, regardless of whether they were assembled using the same or different pipelines. Below, I'll outline the core functionalities that make this approach powerful.

MAGFlow: The Dark Knight

MAGFlow acts like an organized dark knight: silent but trustworthy. Instead of having to deal with running several tools such as BUSCO, CheckM2, GUNC, QUAST, and GTDB-Tk2 on your bins separately, MAGFlow simply runs them all for you. It then collects their outputs into one neat, consistent TSV file. No more wondering which file contains the completeness scores or where that contamination estimate went—it’s all in one place.

The best part? MAGFlow is reproducible and scalable. Whether you’ve got 50 or 500 bins per sample, it’ll keep things tidy and consistent, and you can re-run the same workflow on future projects without headaches. If the pipeline fails for any reason, you can just restart the execution from where it stopped. This is thanks to Nextflow, which keeps evolving with every release (it even has more informative error messages now, thank you developers!).

Running MAGFlow is as simple as executing your old good nextflow run

nextflow run MAGFlow/main.nf -profile apptainer --files 'my_samples/*' --outdir ‘output_magflow’

💡 Note: there are several available profiles and ways to input your data, as well as optional parameters to tailor the execution as MAGFlow follows the nf-core guidelines. Please check the instructions at the repository to take advantage of the perks that adhering the pipeline development to nf-core guidelines brings to the table.

BIgMAG: The Boy Wonder

Once MAGFlow has done the heavy lifting, BIgMAG steps in as your visualization specialist. This interactive dashboard, built with Plotly and Dash, turns messy and cluttered data into interpretable visualizations you can actually make sense of, or alternatively, publication-ready figures (citations are always appreciated).

Do you want to compare how different pipelines or binning tools performed? BIgMAG has you covered with meaningful bar plots, clustering heatmaps, scatterplots, among others. Do you need to check which bins pass quality thresholds across multiple samples? BIgMAG makes that a couple of clicks. Curious about statistical comparisons between workflows? BIgMAG goes beyond visualization by including built-in statistical tests like Welch ANOVA and Kruskal-Wallis. Instead of manually stitching together plots in R, Python or MS Excel, you get an interactive and intuitive interface that feels more like browsing a modern app than wrangling bioinformatics outputs.

Similar to MAGFlow execution, BIgMAG can easily be executed by typing:

BIgMAG/app.py -p 8050 'final_df.tsv'

💡 Note: a few, but straightforward, installation steps are required before running this command, please check the repository. The final_df.tsv is the single file provided by MAGFlow.

To Sum Up

The magic of the Batman-&-Robin of the metagenomics field happens thanks to an integrated design where MAGFlow keeps your analysis reproducible and automates the workflow and BIgMAG helps you explore and communicate your results. And altogether, they make it way easier to benchmark pipelines, binning tools, check bin quality at scale, and share results with collaborators.

Memories of a Nextflow Enthusiast

How did I realize the potential of Nextflow? Let me tell you a short story:

Back in 2022 when I started my doctoral studies I developed a Nextflow wrapper for a metagenomics pipeline that involves a taxonomical classification step with the tool Kraken2. The pipeline worked smoothly and it sped up my analyses ever since. Furthermore, Kraken2 relies on an indexed database that can be either compiled by the user or downloaded from a repository maintained by Benjamin Langmead (a great effort that the whole metagenomics community acknowledges). However, the database files on the BenLangmead repository we were working with were corrupted as they were significantly smaller than the correct indexed database files, and therefore we needed to re-run our analysis; we felt a tiny heart attack when we saw the announcement about the corrupted files! Fortunately, the wrapper we designed for the pipeline using Nextflow saved the day, and it was just a matter of changing the path pointing to the correct database to obtain the results in less than a day.

💡Disclaimer: the author of this article is the main developer of MAGFlow/BIgMAG; however, this tool has been awarded as SIB Remarkable Output 2024 and a poster showcasing its capabilities was recognized as The Best Poster in Bioinformatics during Life Sciences Switzerland (LS2) Annual Meeting 2025.
This post was contributed by a Nextflow AmbassadorAmbassadors are passionate individuals who support the Nextflow community. Interested in becoming an ambassador?