Introducing the MultiQC Config Wizard

MultiQC pulls together the output of bioinformatics tools into a single report. By default it tries to be sensible: detect what's there, build appropriate plots, pick reasonable defaults. But once you start running it in production you'll want to customize things - for example: renaming samples, reordering modules, dropping general stats columns you don’t care about and tweaking plot defaults.

For all of that, you point MultiQC at a YAML config file. You can customize a lot of things, and as a result the config file can have a lot of options. About 200 of them at the time of writing, now neatly organized across 14 sections and 54 groups. Documentation has been a bit patchy in the past so there’s been quite a bit of trial and error, and worse, the majority of MultiQC users don’t know what’s possible!

So, to try to fix this, we built a MultiQC config wizard! It's a static HTML page with no signup and no install. It's a static HTML page with no signup and no install. Open it, browse every option in the sidebar, fill in the form, and watch the generated YAML appear on the right. If you already have a multiqc_config.yaml, paste it in and the wizard will explain each line, flag anything that's wrong, and suggest fixes.

In this post I’ll drill into how it works, because it’s kind of nerdy and fun and I like JSON Schema.

Pydantic, All the Way Down

The wizard is generated from a Pydantic model within the MultiQC codebase called MultiQCConfig. This was a new addition in 2024-25 from Vlad’s refactoring and updates in the MultiQC internals. The model has now been the source of truth for MultiQC's configuration for a while: one entry per option, with type hints, defaults, descriptions, and examples defined alongside the code that uses them. As of v1.35, the model now also carries layout metadata, which means we can use it to drive a UI.

Pydantic gives you most of what you need for this out of the box. A field like:

title: Optional[str] = cfg(
    default=None,
    description="Report title. Printed as the page header.",
    examples=["My Project"],
)

…lands in the JSON schema with its description, examples, type, and nullability. Pydantic's model_json_schema() does the heavy lifting of turning Python types into a fully-resolved JSON Schema document, including the nested submodels.

What Pydantic doesn't give you is layout information. The list is overwhelming by itself, so to make it more manageable I wanted to group related config options together into a hierarchy. To do this, the wizard needs to know that title and report_comment are both in the "Report" section under the "Header" group, that data_format belongs in "Output" under "Files", and so on.

`section()` and `group()`

To annotate the organization, we used two Python contextmanagers to tag fields as they're defined. This is much nicer than maintaining a separate sidecar config as it avoids drift over time and gives very clean code:

with section("Report"):
    with group("Header"):
        title: Optional[str] = cfg(...)
        subtitle: Optional[str] = cfg(...)
        intro_text: Optional[str] = cfg(..., multiline=True)

    with group("Comments"):
        report_comment: Optional[str] = cfg(...)

cfg() is a thin wrapper around Pydantic's Field that reads the active section and group from the context managers and attaches them to json_schema_extra. Every field's JSON schema entry then carries its section, its group, and any UI hints we want to layer on top (e.g. multiline=True for textareas).

The pattern keeps the model file readable. The sections themselves now read like a table of contents. Adding a new field is a one-line change in the right with block, and the wizard, the docs, and the JSON schema all pick it up automatically. The layout of the code effectively translates to the layout of the JSON schema, which in turn powers the docs and the config wizard.

The Wizard, in One HTML File

The frontend is a single HTML file. It pulls in a handful of CDN libraries: js-yaml for YAML parsing, Ajv for client-side JSON Schema validation, Monaco for the code editor view, and highlight.js for syntax highlighting the example snippets. Everything else is vanilla JavaScript.

A Python script reads the Pydantic-generated JSON schema, walks the section/group/property tree, and substitutes the data into a template. The result is a single self-contained page where the wizard data is embedded as JSON inside the HTML itself. Booleans render as a two-button toggle with the schema default pre-selected; if you don't change it, the field is left out of the generated YAML. Small enums (up to four options) use the same toggle, larger ones fall back to a dropdown. Union types (like Union[bool, List[str]]) get a type-picker so you can switch between modes.

The whole thing weighs about 470 KB. It ships inside the MultiQC repo at docs/multiqc_config_wizard.html, and we host the latest version at seqera.io/multiqc_config_wizard.

Same Model, Same Docs

While we were at it, we pointed the docs generation at the same Pydantic model. scripts/generate_config_docs.py produces docs/markdown/config_schema.md: a reference page with every config option, organized by the same sections and groups as the wizard, with type info, defaults, and YAML examples pulled from each field's examples=[] metadata.

You can find the rendered version in the MultiQC docs. Previously this lived in a hand-maintained Markdown file that fell out of sync the moment anyone added a new option. Now it's regenerated on every release, and the schema-drift tests catch any field that's missing from one place or the other.

Native VS Code Validation

The third thing that is generated by the Pydantic model is multiqc/utils/config_schema.json: a plain JSON Schema document. We've published it to schemastore.org, which has a useful property. If your config file matches one of the standard MultiQC config names (multiqc_config.yaml, multiqc_config.yml, .multiqc_config.yaml, and friends) in a JSON-Schema-aware editor, the schema gets pulled in automatically.

In VS Code that means inline validation, autocomplete for option names, hover-help with each option's description, and red squiggles under any value that doesn't match the expected type. So from a single Pydantic model, we get three places where it shows up for the user: the wizard, the docs, and the editor.

MultiQC config validation in VS Code

Try It Out

The wizard is live at seqera.io/multiqc_config_wizard. If you've got an old multiqc_config.yaml lying around that you've never been quite sure about, paste it in and see what comes back. The reference docs are at docs.seqera.io/multiqc/config_schema, and the VSCode validation works on any file matching the standard config names.

All shipping in MultiQC v1.35: github.com/MultiQC/MultiQC/releases/tag/v1.35.

MultiQC - The Standard for Bioinformatics ReportingMultiQC is the gold standard open-source tool to aggregate bioinformatic analysis results. Taught in genomic courses globally, MultiQC has become a fixture at the end of most biological data analysis pipelines. Bioinformaticians and data scientists choose MultiQC because it "just works".

Discover MultiQC