nf-core/metaboigniter

Pre-processing of mass spectrometry-based metabolomics data with quantification and identification based on MS1 and MS2 data.

nextflowpipelineworkflownf-coremass-spectrometrymetabolomicsidentificationquantificationms1ms2

15
174 KB
Updated over 1 year ago

View on GitHub Customize with AI

README Schema Launch Pipeline

README

Introduction

nf-core/metaboigniter is a bioinformatics pipeline that ingests raw mass spectrometry data in mzML format, typically in the form of peak lists and MS2 spectral data, for comprehensive metabolomics analysis. The key stages involve centroiding, feature detection, adduct detection, alignment, and linking, which progressively refine and align the data. The pipeline can also perform requantification to compensate for missing values and leverages MS2Query for compound identification based on MS2 data, outputting a comprehensive list of detected and potentially identified metabolites.

nf-core/metaboigniter workflow

Centroiding: Converts the continuous mass spectra into a series of discrete points.
Feature Detection: Identifies unique signals or 'features' in the spectra.
Adduct Detection: Identifies adduct ions, which are formed by the interaction of the sample with the ion source.
Alignment: Ensures that the same features across different samples are matched together.
Linking: Establishes connections between features across different ionization modes or adducts.
Requantification: Fills in missing values in the data set for a more complete analysis.
Identification: Uses MS2Query and SIRIUS to identify compounds based on their MS2 spectral data.
Output Generation: Produces a comprehensive list of detected and potentially identified metabolites.

nf-core/metaboigniter metro map

Usage

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,level,type,msfile
CONTROL_REP1,MS1,normal,mzML_POS_Quant/X2_Rep1.mzML
CONTROL_REP2,MS1,normal,mzML_POS_Quant/X2_Rep2.mzML
POOL_MS2,MS2,normal,mzML_POS_ID/POOL_MS2.mzML

Each row in this CSV file represents a unique sample, with the details provided in the columns.

sample: This column should contain unique names for each sample. No two samples should share the same name in this column.
level: This column should specify the level of mass spectrometry data contained in each sample file. This can be 'MS1' for files containing only MS1 data, 'MS2' for files containing only MS2 data, and 'MS12' for files containing both MS1 and MS2 data.
type: This column can contain any descriptor of your choice, such as 'normal', 'disease', etc. This is usually used to provide some classification or group identification to your samples.
msfile: This column should contain the path to the mzML file for each sample.

Now, you can run the pipeline using:

nextflow run nf-core/metaboigniter \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/metaboigniter was originally written by Payam Emami. The DSL2 version was developed with significant contributions from Axel Walter and Efi Kontou.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #metaboigniter channel (you can join with this invite).

Citations

If you use nf-core/metaboigniter for your analysis, please cite it using the following doi: 10.5281/zenodo.4743790

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

nf-core/metaboigniter pipeline parameters

Pre-processing of mass spectrometry-based metabolomics data

Total Parameters: 229

Generic controls

8 parameters

Mapping and Identification

45 parameters

mz_tolerance_pyopenms

mz tolerance (ppm) for finding C13

Type: number

Default: 20

rt_tolerance_pyopenms

rt tolerance for finding C13

Type: number

Default: 5

ms2_use_feature_ionization

If set, detected adduct will be used in identification

Type: boolean

sirius_sirius_ppm_max

Maximum allowed mass deviation in ppm for decomposing masses (ppm)

Type: number

Default: 10

sirius_sirius_ppm_max_ms2

Maximum allowed mass deviation in ppm for decomposing masses in MS2 (ppm).If not specified, the same value as for the MS1 is used.

Type: number

Default: 10

sirius_sirius_no_recalibration

Disable recalibration of input spectra

Type: boolean

sirius_sirius_profile

Name of the configuration profile

Type: string

Default: "default"

Options:

"default"
"qtof"
"orbitrap"
"fticr"

sirius_sirius_candidates

The number of formula candidates in the SIRIUS output

Type: integer

Default: 10

sirius_sirius_candidates_per_ion

Minimum number of candidates in the output for each ionization. Set to force output of results for each possible ionization, even if not part of highest ranked results.

Type: integer

Default: 1

sirius_sirius_ions_considered

the iontype/adduct of the MS/MS data. Example: [M+H]+, [M-H]-, [M+Cl]-, [M+Na]+, [M]+. You can also provide a comma separated list of adducts.

Type: string

Default: "[M+H]+,[M+K]+,[M+Na]+,[M+H-H2O]+,[M+H-H4O2]+,[M+NH4]+,[M-H]-,[M+Cl]-,[M-H2O-H]-,[M+Br]-"

sirius_sirius_db

Search formulas in the Union of the given databases db-name1,db-name2,db-name3. If no database is given all possible molecular formulas will be respected (no database is used). Example: possible DBs: ALL,BIO,PUBCHEM,MESH,HMDB,KNAPSACK,CHEBI,PUBMED,KEGG,HSDB,MACONDA,METACYC,GNPS,ZINCBIO,UNDP,YMDB,PLANTCYC,NORMAN,ADDITIONAL,PUBCHEMANNOTATIONBIO,PUBCHEMANNOTATIONDRUG,PUBCHEMANNOTATIONSAFETYANDTOXIC,PUBCHEMANNOTATIONFOOD,KEGGMINE,ECOCYCMINE,YMDBMINE

Type: string

sirius_runpassatutto

If set, passatutto will be run

Type: boolean

sirius_fingerid_db

Search structures in the Union of the given databases db-name1,db-name2,db-name3. If no database is given all possible molecular formulas will be respected (no database is used). Example: possible DBs: ALL,BIO,PUBCHEM,MESH,HMDB,KNAPSACK,CHEBI,PUBMED,KEGG,HSDB,MACONDA,METACYC,GNPS,ZINCBIO,UNDP,YMDB,PLANTCYC,NORMAN,ADDITIONAL,PUBCHEMANNOTATIONBIO,PUBCHEMANNOTATIONDRUG,PUBCHEMANNOTATIONSAFETYANDTOXIC,PUBCHEMANNOTATIONFOOD,KEGGMINE,ECOCYCMINE,YMDBMINE

Type: string

sirius_email

E-mail for your SIRIUS account.

Type: string

sirius_password

Password for your SIRIUS account.

Type: string

sirius_split

If set, SIRIUS will be run in parallel. See mgf_splitmgf_pyopenms parameter for segmentation

Type: boolean

split_consensus_parts

For running MS2 mapping in parallel set this higher than 1

Type: integer

Default: 20

run_ms2query

If set, MS2Query will be run

Type: boolean

sirius_runfid

If set, FingerID will be run. This has to be run together with run_sirius

Type: boolean

run_sirius

If set SIRIUS will run

Type: boolean

Annotation

16 parameters

algorithm_metabolitefeaturedeconvolution_charge_min_metaboliteadductdecharger_openms

Minimal possible charge

Type: integer

Default: 1

algorithm_metabolitefeaturedeconvolution_charge_max_metaboliteadductdecharger_openms

Maximal possible charge

Type: integer

Default: 1

algorithm_metabolitefeaturedeconvolution_charge_span_max_metaboliteadductdecharger_openms

Maximal range of charges for a single analyte, i.e. observing q1=[5,6,7] implies span=3. Setting this to 1 will only find adduct variants of the same charge

Type: integer

Default: 1

algorithm_metabolitefeaturedeconvolution_q_try_metaboliteadductdecharger_openms

Try different values of charge for each feature according to the above settings ('heuristic' [does not test all charges, just the likely ones] or 'all' ), or leave feature charge untouched ('feature').

Type: string

Default: "feature"

Options:

"feature"
"heuristic"
"all"

algorithm_metabolitefeaturedeconvolution_retention_max_diff_metaboliteadductdecharger_openms

Maximum allowed RT difference between any two features if their relation shall be determined

Type: number

Default: 1

algorithm_metabolitefeaturedeconvolution_retention_max_diff_local_metaboliteadductdecharger_openms

Maximum allowed RT difference between between two co-features, after adduct shifts have been accounted for (if you do not have any adduct shifts, this value should be equal to 'retention_max_diff', otherwise it should be smaller!)

Type: number

Default: 1

algorithm_metabolitefeaturedeconvolution_mass_max_diff_metaboliteadductdecharger_openms

Maximum allowed mass tolerance per feature. Defines a symmetric tolerance window around the feature. When looking at possible feature pairs, the allowed feature-wise errors are combined for consideration of possible adduct shifts. For ppm tolerances, each window is based on the respective observed feature mz (instead of putative experimental mzs causing the observed one)!

Type: number

Default: 5

algorithm_metabolitefeaturedeconvolution_unit_metaboliteadductdecharger_openms

Unit of the 'max_difference' parameter

Type: string

Default: "ppm"

Options:

"Da"
"ppm"

algorithm_metabolitefeaturedeconvolution_max_neutrals_metaboliteadductdecharger_openms

Maximal number of neutral adducts(q=0) allowed. Add them in the 'potential_adducts' section!

Type: integer

Default: 1

algorithm_metabolitefeaturedeconvolution_use_minority_bound_metaboliteadductdecharger_openms

Prune the considered adduct transitions by transition probabilities.

Type: boolean

Default: true

algorithm_metabolitefeaturedeconvolution_max_minority_bound_metaboliteadductdecharger_openms

Limits allowed adduct compositions and changes between compositions in the underlying graph optimization problem by introducing a probability-based threshold: the minority bound sets the maximum count of the least probable adduct (according to 'potential_adducts' param) within a charge variant with maximum charge only containing the most likely adduct otherwise. E.g., for 'charge_max' 4 and 'max_minority_bound' 2 with most probable adduct being H+ and least probable adduct being Na+, this will allow adduct compositions of '2(H+),2(Na+)' but not of '1(H+),3(Na+)'. Further, adduct compositions/changes less likely than '2(H+),2(Na+)' will be discarded as well.

Type: integer

Default: 1

algorithm_metabolitefeaturedeconvolution_min_rt_overlap_metaboliteadductdecharger_openms

Minimum overlap of the convex hull' RT intersection measured against the union from two features (if CHs are given)

Type: number

Default: 0.66

algorithm_metabolitefeaturedeconvolution_intensity_filter_metaboliteadductdecharger_openms

Enable the intensity filter, which will only allow edges between two equally charged features if the intensity of the feature with less likely adducts is smaller than that of the other feature. It is not used for features of different charge.

Type: boolean

adducts_pos

possible positive adducts for adduct detection in the format of adduct:charge:probablity

Type: string

Default: "H:+:0.6 Na:+:0.1 NH4:+:0.1 H-1O-1:+:0.1 H-3O-2:+:0.1"

adducts_neg

possible negative adducts for adduct detection in the format of adduct:charge:probablity

Type: string

Default: "H-1:-:0.8 H-3O-1:-:0.2"

Alignment and Linking

70 parameters

algorithm_max_num_peaks_considered_mapalignerposeclustering_openms

The maximal number of peaks/features to be considered per map. To use all, set to '-1'.

Type: integer

Default: 1000

algorithm_superimposer_mz_pair_max_distance_mapalignerposeclustering_openms

Maximum of m/z deviation of corresponding elements in different maps. This condition applies to the pairs considered in hashing.

Type: number

Default: 0.5

algorithm_superimposer_num_used_points_mapalignerposeclustering_openms

Maximum number of elements considered in each map (selected by intensity). Use this to reduce the running time and to disregard weak signals during alignment. For using all points, set this to -1.

Type: integer

Default: 2000

algorithm_pairfinder_ignore_charge_mapalignerposeclustering_openms

false [default]: pairing requires equal charge state (or at least one unknown charge '0'); true: Pairing irrespective of charge state

Type: boolean

algorithm_pairfinder_distance_rt_max_difference_mapalignerposeclustering_openms

Never pair features with a larger RT distance (in seconds).

Type: number

Default: 100

algorithm_pairfinder_distance_mz_max_difference_mapalignerposeclustering_openms

Never pair features with larger m/z distance (unit defined by 'unit')

Type: number

Default: 0.3

algorithm_pairfinder_distance_mz_unit_mapalignerposeclustering_openms

Unit of the 'max_difference' parameter

Type: string

Default: "Da"

Options:

"Da"
"ppm"

algorithm_mz_unit_featurelinkerunlabeledkd_openms

Unit of m/z tolerance

Type: string

Default: "ppm"

Options:

"ppm"
"Da"

algorithm_nr_partitions_featurelinkerunlabeledkd_openms

Number of partitions in m/z space

Type: integer

Default: 100

algorithm_warp_enabled_featurelinkerunlabeledkd_openms

Whether or not to internally warp feature RTs using LOWESS transformation before linking (reported RTs in results will always be the original RTs)

Type: boolean

Default: true

algorithm_warp_rt_tol_featurelinkerunlabeledkd_openms

Width of RT tolerance window (sec)

Type: number

Default: 100

algorithm_warp_mz_tol_featurelinkerunlabeledkd_openms

m/z tolerance (in ppm or Da)

Type: number

Default: 5

algorithm_link_rt_tol_featurelinkerunlabeledkd_openms

Width of RT tolerance window (sec)

Type: number

Default: 30

algorithm_link_mz_tol_featurelinkerunlabeledkd_openms

m/z tolerance (in ppm or Da)

Type: number

Default: 10

algorithm_link_charge_merging_featurelinkerunlabeledkd_openms

whether to disallow charge mismatches (Identical), allow to link charge zero (i.e., unknown charge state) with every charge state, or disregard charges (Any).

Type: string

Default: "With_charge_zero"

Options:

"Identical"
"With_charge_zero"
"Any"

algorithm_link_adduct_merging_featurelinkerunlabeledkd_openms

whether to only allow the same adduct for linking (Identical), also allow linking features with adduct-free ones, or disregard adducts (Any).

Type: string

Default: "Any"

Options:

"Identical"
"With_unknown_adducts"
"Any"

Re-Quantification

18 parameters

extract_mz_window_featurefindermetaboident_openms

m/z window size for chromatogram extraction (unit: ppm if 1 or greater, else Da/Th)

Type: number

extract_n_isotopes_featurefindermetaboident_openms

Number of isotopes to include in each peptide assay.

Type: integer

detect_peak_width_featurefindermetaboident_openms

Expected elution peak width in seconds, for smoothing (Gauss filter). Also determines the RT extration window, unless set explicitly via 'extract:rt_window'.

Type: number

model_type_featurefindermetaboident_openms

Type of elution model to fit to features

Type: string

Default: "symmetric"

Options:

"symmetric"
"asymmetric"
"none"

emgscoring_max_iteration_featurefindermetaboident_openms

Maximum number of iterations for EMG fitting.

Type: integer

emgscoring_init_mom_featurefindermetaboident_openms

Alternative initial parameters for fitting through method of moments.

Type: boolean

Quantification

43 parameters

algorithm_signal_to_noise_peakpickerhires_openms

Minimal signal-to-noise ratio for a peak to be picked (0.0 disables SNT estimation!)

Type: number

algorithm_common_noise_threshold_int_featurefindermetabo_openms

Intensity threshold below which peaks are regarded as noise.

Type: number

Default: 10

algorithm_common_chrom_peak_snr_featurefindermetabo_openms

Minimum signal-to-noise a mass trace should have.

Type: number

Default: 3

algorithm_common_chrom_fwhm_featurefindermetabo_openms

Expected chromatographic peak width (in seconds).

Type: number

Default: 5

algorithm_mtd_mass_error_ppm_featurefindermetabo_openms

Allowed mass deviation (in ppm).

Type: number

Default: 20

algorithm_mtd_reestimate_mt_sd_featurefindermetabo_openms

Enables dynamic re-estimation of m/z variance during mass trace collection stage.

Type: boolean

Default: true

algorithm_mtd_quant_method_featurefindermetabo_openms

Method of quantification for mass traces. For LC data 'area' is recommended, 'median' for direct injection data. 'max_height' simply uses the most intense peak in the trace.

Type: string

Default: "area"

Options:

"area"
"median"
"max_height"

algorithm_epd_enabled_featurefindermetabo_openms

Enable splitting of isobaric mass traces by chromatographic peak detection. Disable for direct injection.

Type: boolean

Default: true

algorithm_epd_width_filtering_featurefindermetabo_openms

Enable filtering of unlikely peak widths. The fixed setting filters out mass traces outside the [min_fwhm, max_fwhm] interval (set parameters accordingly!). The auto setting filters with the 5 and 95% quantiles of the peak width distribution.

Type: string

Default: "fixed"

Options:

"off"
"fixed"
"auto"

algorithm_ffm_enable_rt_filtering_featurefindermetabo_openms

Require sufficient overlap in RT while assembling mass traces. Disable for direct injection data..

Type: boolean

Default: true

algorithm_ffm_isotope_filtering_model_featurefindermetabo_openms

Remove/score candidate assemblies based on isotope intensities. SVM isotope models for metabolites were trained with either 2% or 5% RMS error. For peptides, an averagine cosine scoring is used. Select the appropriate noise model according to the quality of measurement or MS device.

Type: string

Default: "metabolites (5% RMS)"

Options:

"metabolites (2% RMS)"
"metabolites (5% RMS)"
"peptides"
"none"

algorithm_ffm_mz_scoring_13c_featurefindermetabo_openms

Use the 13C isotope peak position (~1.003355 Da) as the expected shift in m/z for isotope mass traces (highly recommended for lipidomics!). Disable for general metabolites (as described in Kenar et al. 2014, MCP.).

Type: boolean

Input/output options

5 parameters

Institutional config options

6 parameters

Max job request options

3 parameters

Generic options

15 parameters

Launch Pipeline

Run this pipeline using Seqera on top of your own compute.