MultiQC turns eight years old!
MultiQC is an open-source bioinformatics tool used to generate quality-control HTML reports from analysis logs. It understands outputs from over 120 different pieces of software and works with any number of samples. In a single command, you can summarise your analysis workflow results and spot any problematic samples.
When I joined Seqera in 2022 MultiQC was added to the Seqera portfolio. We are committed to maintaining MultiQC as free open-source software, available to everyone in the community. Indeed; we intend to do much more than mere maintenance - improving and extending functionality in the months and years to come.
The first commit for MultiQC was made back on August 4th 2015, making today its eighth birthday 🥳 🎂 As with the recent Nextflow birthday celebration it’s a good opportunity to look back and see how far it’s come.
I feel very lucky to be writing this today - that first commit was written purely because I wanted to solve a problem I was facing at the time: quality control of a large number of samples at the Swedish National Genomics Infrastructure (part of SciLifeLab Sweden). I suspected that it could be useful for others, but I never expected the level of popularity it would achieve - enough to shape my entire career.
I’m very proud of how far MultiQC has come, and for the small part that it plays in the amazing research of scientists all around the world: MultiQC has now been cited in over 3700 scientific papers. The numbers keep climbing every year, as do the number of repository issues (1,283 at time of writing), pull requests (680) and GitHub stars (1035).
By way of a birthday present to the community, I’m happy to release version 1.15 of MultiQC today. As with most releases, it contains some bug fixes and support for some new tools. However, what I’m most excited about is a relatively minor tweak to the procedure used for searching files (pull request #1904).
The MultiQC runtime broadly comprises three stages:
- File search, where input directories are searched recursively and files checked against search patterns for all ~130 supported bioinformatics tools.
- Running modules, where each module is given a shortlist of files that are iterated through and parsed.
- Report generation, where the parsed data is used to create the final HTML report.
The amount of time spent on each phase will differ from person to person (you can use the
--profile-runtime flag to check yourself).
Large analysis outputs can produce millions of files, meaning that for most users the file
search stage represents a significant percentage of the total run time.
Pull request #1904 is small (a
functional change), but its impact is large. By some rearrangement of logic and addition of
in-memory caching, the file search is sped up somewhere in the region of 5-7 fold.
To put this into context, the file search time for MultiQC on the test-data repository dropped from 21.31s down to 2.95s. This shaves minutes of every continuous integration test, and could have a material impact for users running MultiQC with large analysis result sets.
What I find particularly exciting about the pull request above is that it came from Vladislav Savelyev. Vlad has been a long-time contributor to MultiQC since 2017, making him the 4th most numerous committer to the repository. Today I’m delighted to announce that Vlad will be joining Seqera as a full-time developer to work specifically on MultiQC.
Vlad is well known in the bioinformatics community, cutting his teeth by working on the popular SPAdes Genome Assembler and developing QUAST (Quality Assessment Tool for Genome Assemblies) at St Petersburg State University. More recently he’s worked at the University of Melbourne Centre for Cancer Research and the Centre for Population Genomics at the Garvan Institute, building major production-grade analysis workflows for Human genomics.
Given the impact that Vlad has already had on MultiQC, I am thrilled that he’s joining us at Seqera to work full-time on the project. I hope that this will help us to propel MultiQC to a new level - stay tuned to see what happens next.