nf-core/metaboigniter
- 15
- 174 KB
- Updated about 1 year ago
README
Introduction
nf-core/metaboigniter is a bioinformatics pipeline that ingests raw mass spectrometry data in mzML format, typically in the form of peak lists and MS2 spectral data, for comprehensive metabolomics analysis. The key stages involve centroiding, feature detection, adduct detection, alignment, and linking, which progressively refine and align the data. The pipeline can also perform requantification to compensate for missing values and leverages MS2Query for compound identification based on MS2 data, outputting a comprehensive list of detected and potentially identified metabolites.
- Centroiding: Converts the continuous mass spectra into a series of discrete points.
- Feature Detection: Identifies unique signals or 'features' in the spectra.
- Adduct Detection: Identifies adduct ions, which are formed by the interaction of the sample with the ion source.
- Alignment: Ensures that the same features across different samples are matched together.
- Linking: Establishes connections between features across different ionization modes or adducts.
- Requantification: Fills in missing values in the data set for a more complete analysis.
- Identification: Uses MS2Query and SIRIUS to identify compounds based on their MS2 spectral data.
- Output Generation: Produces a comprehensive list of detected and potentially identified metabolites.
Usage
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test
before running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv
:
sample,level,type,msfile
CONTROL_REP1,MS1,normal,mzML_POS_Quant/X2_Rep1.mzML
CONTROL_REP2,MS1,normal,mzML_POS_Quant/X2_Rep2.mzML
POOL_MS2,MS2,normal,mzML_POS_ID/POOL_MS2.mzML
Each row in this CSV file represents a unique sample, with the details provided in the columns.
- sample: This column should contain unique names for each sample. No two samples should share the same name in this column.
- level: This column should specify the level of mass spectrometry data contained in each sample file. This can be 'MS1' for files containing only MS1 data, 'MS2' for files containing only MS2 data, and 'MS12' for files containing both MS1 and MS2 data.
- type: This column can contain any descriptor of your choice, such as 'normal', 'disease', etc. This is usually used to provide some classification or group identification to your samples.
- msfile: This column should contain the path to the mzML file for each sample.
Now, you can run the pipeline using:
nextflow run nf-core/metaboigniter \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>
Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file
option. Custom config files including those provided by the -c
Nextflow option can be used to provide any configuration except for parameters;
see docs.
For more details and further functionality, please refer to the usage documentation and the parameter documentation.
Pipeline output
To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.
Credits
nf-core/metaboigniter was originally written by Payam Emami. The DSL2 version was developed with significant contributions from Axel Walter and Efi Kontou.
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #metaboigniter
channel (you can join with this invite).
Citations
If you use nf-core/metaboigniter for your analysis, please cite it using the following doi: 10.5281/zenodo.4743790
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
nf-core/metaboigniter pipeline parameters
Pre-processing of mass spectrometry-based metabolomics data
Total Parameters: 229
Generic controls
8 parameters
skip_centroiding
If true, the centroiding will be skipped
Type: boolean
Default: true
skip_alignment
If true, the alignment will be skipped
Type: boolean
skip_adduct_detection
If true, the adduct detection will be skipped
Type: boolean
requantification
If set to true, requantification will be performed
Type: boolean
identification
If set to true, identification will be performed. Remember to set identification specific parameters
Type: boolean
polarity
Polarity of the data
Type: string
Default: "positive"
- "positive"
- "negative"
parallel_linking
If set, the linking will be performed in parallel, see nr_partitions in linking
Type: boolean
ms2_collection_model
Set wether the MS2 collections have been done on all the MS1 data. If there is a separate MS2 file, set to separate
Type: string
Default: "paired"
- "separate"
- "paired"
Mapping and Identification
45 parameters
offline_model_ms2query
If set, the workflow expects the models to be in models_dir_ms2query
Type: boolean
models_dir_ms2query
If running offline, this directory has to contain all the files necessary for running MS2Query
Type: string
Default: "models"
train_library_ms2query
If set, the model training will be performed using library_path_ms2query
Type: boolean
library_path_ms2query
path to ms2query library
Type: string
mgf_splitmgf_pyopenms
If higher than one, parameter files will be split into the selected number. The result of the identification will be perform on each part separately
Type: integer
Default: 1
mz_tolerance_pyopenms
mz tolerance (ppm) for finding C13
Type: number
Default: 20
rt_tolerance_pyopenms
rt tolerance for finding C13
Type: number
Default: 5
annotate_ids_with_subelements_pyopenms
Store the map index of the sub-feature in the peptide ID.
Type: boolean
Default: true
measure_from_subelements_pyopenms
Match using RT and m/z of sub-features instead of consensus RT and m/z. A consensus feature matches if any of its sub-features matches.
Type: boolean
Default: true
ignore_msms_mapping_charge_pyopenms
When mapping MS2 precursors to consensus elements, ignore the charge. Specially beneficial in negative mode, if the charges of the consensus features are and spectra are different
Type: boolean
Default: false
ms2_use_feature_ionization
If set, detected adduct will be used in identification
Type: boolean
ms2_feature_selection
whether feature quality or intensity should be used for feature selection
Type: string
Default: "quality"
- "quality"
- "intensity"
ms2_normalized_intensity
If ture, normalized intesity will be used for selecting the best feature
Type: boolean
Default: true
ms2_iterations
Number of iterations that should be performed to extract the C13 isotope pattern. If no peak is found (C13 distance) the function will abort. Be careful with noisy data - since this can lead to wrong isotope patterns
Type: integer
Default: 3
ms2_ppm_map
PPM for detecting MS C13
Type: number
Default: 10
sirius_project_maxmz
Just consider compounds with a precursor mz lower or equal this maximum mz. All other compounds in the input file are ignored.=
Type: number
Default: -1
sirius_project_loglevel
Set logging level of the Jobs SIRIUS will execute. Valid values: SEVERE, WARNING, INFO, FINER, ALL
Type: string
Default: "WARNING"
- "SEVERE"
- "WARNING"
- "INFO"
- "FINER"
- "ALL"
sirius_project_ignore_formula
Ignore given molecular formula in internal .ms format, while processing.
Type: boolean
sirius_sirius_ppm_max
Maximum allowed mass deviation in ppm for decomposing masses (ppm)
Type: number
Default: 10
sirius_sirius_ppm_max_ms2
Maximum allowed mass deviation in ppm for decomposing masses in MS2 (ppm).If not specified, the same value as for the MS1 is used.
Type: number
Default: 10
sirius_sirius_tree_timeout
Time out in seconds per fragmentation tree computations. 0 for an infinite amount of time
Type: number
Default: 100
sirius_sirius_compound_timeout
Time out in seconds per fragmentation tree computations. 0 for an infinite amount of time
Type: number
Default: 100
sirius_sirius_no_recalibration
Disable recalibration of input spectra
Type: boolean
sirius_sirius_profile
Name of the configuration profile
Type: string
Default: "default"
- "default"
- "qtof"
- "orbitrap"
- "fticr"
sirius_sirius_formulas
Specify the neutral molecular formula of the measured compound to compute its tree or a list of candidate formulas the method should discriminate. Omit this option if you want to consider all possible molecular formulas
Type: string
sirius_sirius_ions_enforced
The iontype/adduct of the MS/MS data. Example: [M+H]+, [M-H]-, [M+Cl]-, [M+Na]+, [M]+. You can also provide a comma separated list of adducts.
Type: string
sirius_sirius_candidates
The number of formula candidates in the SIRIUS output
Type: integer
Default: 10
sirius_sirius_candidates_per_ion
Minimum number of candidates in the output for each ionization. Set to force output of results for each possible ionization, even if not part of highest ranked results.
Type: integer
Default: 1
sirius_sirius_elements_considered
Set the allowed elements for rare element detection. Write SBrClBSe to allow the elements S,Br,Cl,B and Se.
Type: string
Default: "SBrClBSe"
sirius_sirius_elements_enforced
Enforce elements for molecular formula determination. Write CHNOPSCl to allow the elements C, H, N, O, P, S and Cl. Add numbers in brackets to restrict the minimal and maximal allowed occurrence of these elements: CHNOP[5]S[8]Cl[1-2]. When one number is given then it is interpreted as upper bound.
Type: string
Default: "CHNOP"
sirius_sirius_no_isotope_score
Disable isotope pattern score.
Type: boolean
sirius_sirius_no_isotope_filter
Disable molecular formula filter. When filtering is enabled, molecular formulas are excluded if their theoretical isotope pattern does not match the theoretical one, even if their MS/MS pattern has high score.
Type: boolean
sirius_sirius_ions_considered
the iontype/adduct of the MS/MS data. Example: [M+H]+, [M-H]-, [M+Cl]-, [M+Na]+, [M]+. You can also provide a comma separated list of adducts.
Type: string
Default: "[M+H]+,[M+K]+,[M+Na]+,[M+H-H2O]+,[M+H-H4O2]+,[M+NH4]+,[M-H]-,[M+Cl]-,[M-H2O-H]-,[M+Br]-"
sirius_sirius_db
Search formulas in the Union of the given databases db-name1,db-name2,db-name3. If no database is given all possible molecular formulas will be respected (no database is used). Example: possible DBs: ALL,BIO,PUBCHEM,MESH,HMDB,KNAPSACK,CHEBI,PUBMED,KEGG,HSDB,MACONDA,METACYC,GNPS,ZINCBIO,UNDP,YMDB,PLANTCYC,NORMAN,ADDITIONAL,PUBCHEMANNOTATIONBIO,PUBCHEMANNOTATIONDRUG,PUBCHEMANNOTATIONSAFETYANDTOXIC,PUBCHEMANNOTATIONFOOD,KEGGMINE,ECOCYCMINE,YMDBMINE
Type: string
sirius_runpassatutto
If set, passatutto will be run
Type: boolean
sirius_fingerid_db
Search structures in the Union of the given databases db-name1,db-name2,db-name3. If no database is given all possible molecular formulas will be respected (no database is used). Example: possible DBs: ALL,BIO,PUBCHEM,MESH,HMDB,KNAPSACK,CHEBI,PUBMED,KEGG,HSDB,MACONDA,METACYC,GNPS,ZINCBIO,UNDP,YMDB,PLANTCYC,NORMAN,ADDITIONAL,PUBCHEMANNOTATIONBIO,PUBCHEMANNOTATIONDRUG,PUBCHEMANNOTATIONSAFETYANDTOXIC,PUBCHEMANNOTATIONFOOD,KEGGMINE,ECOCYCMINE,YMDBMINE
Type: string
sirius_sirius_solver
For GUROBI and CPLEX environment variables need to be configured.
Type: string
Default: "CLP"
sirius_email
E-mail for your SIRIUS account.
Type: string
sirius_password
Password for your SIRIUS account.
Type: string
sirius_split
If set, SIRIUS will be run in parallel. See mgf_splitmgf_pyopenms parameter for segmentation
Type: boolean
split_consensus_parts
For running MS2 mapping in parallel set this higher than 1
Type: integer
Default: 20
run_ms2query
If set, MS2Query will be run
Type: boolean
sirius_runfid
If set, FingerID will be run. This has to be run together with run_sirius
Type: boolean
run_sirius
If set SIRIUS will run
Type: boolean
run_umapped_spectra
If set identification will be performed on unmapped MS2 spectra
Type: boolean
Annotation
16 parameters
algorithm_metabolitefeaturedeconvolution_charge_min_metaboliteadductdecharger_openms
Minimal possible charge
Type: integer
Default: 1
algorithm_metabolitefeaturedeconvolution_charge_max_metaboliteadductdecharger_openms
Maximal possible charge
Type: integer
Default: 1
algorithm_metabolitefeaturedeconvolution_charge_span_max_metaboliteadductdecharger_openms
Maximal range of charges for a single analyte, i.e. observing q1=[5,6,7] implies span=3. Setting this to 1 will only find adduct variants of the same charge
Type: integer
Default: 1
algorithm_metabolitefeaturedeconvolution_q_try_metaboliteadductdecharger_openms
Try different values of charge for each feature according to the above settings ('heuristic' [does not test all charges, just the likely ones] or 'all' ), or leave feature charge untouched ('feature').
Type: string
Default: "feature"
- "feature"
- "heuristic"
- "all"
algorithm_metabolitefeaturedeconvolution_retention_max_diff_metaboliteadductdecharger_openms
Maximum allowed RT difference between any two features if their relation shall be determined
Type: number
Default: 1
algorithm_metabolitefeaturedeconvolution_retention_max_diff_local_metaboliteadductdecharger_openms
Maximum allowed RT difference between between two co-features, after adduct shifts have been accounted for (if you do not have any adduct shifts, this value should be equal to 'retention_max_diff', otherwise it should be smaller!)
Type: number
Default: 1
algorithm_metabolitefeaturedeconvolution_mass_max_diff_metaboliteadductdecharger_openms
Maximum allowed mass tolerance per feature. Defines a symmetric tolerance window around the feature. When looking at possible feature pairs, the allowed feature-wise errors are combined for consideration of possible adduct shifts. For ppm tolerances, each window is based on the respective observed feature mz (instead of putative experimental mzs causing the observed one)!
Type: number
Default: 5
algorithm_metabolitefeaturedeconvolution_unit_metaboliteadductdecharger_openms
Unit of the 'max_difference' parameter
Type: string
Default: "ppm"
- "Da"
- "ppm"
algorithm_metabolitefeaturedeconvolution_max_neutrals_metaboliteadductdecharger_openms
Maximal number of neutral adducts(q=0) allowed. Add them in the 'potential_adducts' section!
Type: integer
Default: 1
algorithm_metabolitefeaturedeconvolution_use_minority_bound_metaboliteadductdecharger_openms
Prune the considered adduct transitions by transition probabilities.
Type: boolean
Default: true
algorithm_metabolitefeaturedeconvolution_max_minority_bound_metaboliteadductdecharger_openms
Limits allowed adduct compositions and changes between compositions in the underlying graph optimization problem by introducing a probability-based threshold: the minority bound sets the maximum count of the least probable adduct (according to 'potential_adducts' param) within a charge variant with maximum charge only containing the most likely adduct otherwise. E.g., for 'charge_max' 4 and 'max_minority_bound' 2 with most probable adduct being H+ and least probable adduct being Na+, this will allow adduct compositions of '2(H+),2(Na+)' but not of '1(H+),3(Na+)'. Further, adduct compositions/changes less likely than '2(H+),2(Na+)' will be discarded as well.
Type: integer
Default: 1
algorithm_metabolitefeaturedeconvolution_min_rt_overlap_metaboliteadductdecharger_openms
Minimum overlap of the convex hull' RT intersection measured against the union from two features (if CHs are given)
Type: number
Default: 0.66
algorithm_metabolitefeaturedeconvolution_intensity_filter_metaboliteadductdecharger_openms
Enable the intensity filter, which will only allow edges between two equally charged features if the intensity of the feature with less likely adducts is smaller than that of the other feature. It is not used for features of different charge.
Type: boolean
algorithm_metabolitefeaturedeconvolution_default_map_label_metaboliteadductdecharger_openms
Label of map in output consensus file where all features are put by default
Type: string
Default: "decharged features"
adducts_pos
possible positive adducts for adduct detection in the format of adduct:charge:probablity
Type: string
Default: "H:+:0.6 Na:+:0.1 NH4:+:0.1 H-1O-1:+:0.1 H-3O-2:+:0.1"
adducts_neg
possible negative adducts for adduct detection in the format of adduct:charge:probablity
Type: string
Default: "H-1:-:0.8 H-3O-1:-:0.2"
Alignment and Linking
70 parameters
algorithm_max_num_peaks_considered_mapalignerposeclustering_openms
The maximal number of peaks/features to be considered per map. To use all, set to '-1'.
Type: integer
Default: 1000
algorithm_superimposer_mz_pair_max_distance_mapalignerposeclustering_openms
Maximum of m/z deviation of corresponding elements in different maps. This condition applies to the pairs considered in hashing.
Type: number
Default: 0.5
algorithm_superimposer_rt_pair_distance_fraction_mapalignerposeclustering_openms
Within each of the two maps, the pairs considered for pose clustering must be separated by at least this fraction of the total elution time interval (i.e., max - min).
Type: number
Default: 0.1
algorithm_superimposer_num_used_points_mapalignerposeclustering_openms
Maximum number of elements considered in each map (selected by intensity). Use this to reduce the running time and to disregard weak signals during alignment. For using all points, set this to -1.
Type: integer
Default: 2000
algorithm_superimposer_scaling_bucket_size_mapalignerposeclustering_openms
The scaling of the retention time interval is being hashed into buckets of this size during pose clustering. A good choice for this would be a bit smaller than the error you would expect from repeated runs.
Type: number
Default: 0.005
algorithm_superimposer_shift_bucket_size_mapalignerposeclustering_openms
The shift at the lower (respectively, higher) end of the retention time interval is being hashed into buckets of this size during pose clustering. A good choice for this would be about the time between consecutive MS scans.
Type: number
algorithm_superimposer_max_shift_mapalignerposeclustering_openms
Maximal shift which is considered during histogramming (in seconds). This applies for both directions.
Type: number
algorithm_superimposer_max_scaling_mapalignerposeclustering_openms
Maximal scaling which is considered during histogramming. The minimal scaling is the reciprocal of this.
Type: number
algorithm_superimposer_dump_buckets_mapalignerposeclustering_openms
[DEBUG] If non-empty, base filename where hash table buckets will be dumped to. A serial number for each invocation will be appended automatically.
Type: string
algorithm_superimposer_dump_pairs_mapalignerposeclustering_openms
[DEBUG] If non-empty, base filename where the individual hashed pairs will be dumped to (large!). A serial number for each invocation will be appended automatically.
Type: string
algorithm_pairfinder_second_nearest_gap_mapalignerposeclustering_openms
Only link features whose distance to the second nearest neighbors (for both sides) is larger by 'second_nearest_gap' than the distance between the matched pair itself.
Type: number
algorithm_pairfinder_use_identifications_mapalignerposeclustering_openms
Never link features that are annotated with different peptides (features without ID's always match; only the best hit per peptide identification is considered).
Type: boolean
algorithm_pairfinder_ignore_charge_mapalignerposeclustering_openms
false [default]: pairing requires equal charge state (or at least one unknown charge '0'); true: Pairing irrespective of charge state
Type: boolean
algorithm_pairfinder_ignore_adduct_mapalignerposeclustering_openms
true [default]: pairing requires equal adducts (or at least one without adduct annotation); true: Pairing irrespective of adducts
Type: boolean
Default: true
algorithm_pairfinder_distance_rt_max_difference_mapalignerposeclustering_openms
Never pair features with a larger RT distance (in seconds).
Type: number
Default: 100
algorithm_pairfinder_distance_rt_exponent_mapalignerposeclustering_openms
Normalized RT differences ([0-1], relative to 'max_difference') are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
Type: number
algorithm_pairfinder_distance_rt_weight_mapalignerposeclustering_openms
Final RT distances are weighted by this factor
Type: number
algorithm_pairfinder_distance_mz_max_difference_mapalignerposeclustering_openms
Never pair features with larger m/z distance (unit defined by 'unit')
Type: number
Default: 0.3
algorithm_pairfinder_distance_mz_unit_mapalignerposeclustering_openms
Unit of the 'max_difference' parameter
Type: string
Default: "Da"
- "Da"
- "ppm"
algorithm_pairfinder_distance_mz_exponent_mapalignerposeclustering_openms
Normalized ([0-1], relative to 'max_difference') m/z differences are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
Type: number
algorithm_pairfinder_distance_mz_weight_mapalignerposeclustering_openms
Final m/z distances are weighted by this factor
Type: number
algorithm_pairfinder_distance_intensity_exponent_mapalignerposeclustering_openms
Differences in relative intensity ([0-1]) are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
Type: number
algorithm_pairfinder_distance_intensity_weight_mapalignerposeclustering_openms
Final intensity distances are weighted by this factor
Type: number
algorithm_pairfinder_distance_intensity_log_transform_mapalignerposeclustering_openms
Log-transform intensities? If disabled, d = |int_f2 - int_f1| / int_max. If enabled, d = |log(int_f2 + 1) - log(int_f1 + 1)| / log(int_max + 1))
Type: string
Default: "disabled"
- "enabled"
- "disabled"
algorithm_mz_unit_featurelinkerunlabeledkd_openms
Unit of m/z tolerance
Type: string
Default: "ppm"
- "ppm"
- "Da"
algorithm_nr_partitions_featurelinkerunlabeledkd_openms
Number of partitions in m/z space
Type: integer
Default: 100
algorithm_warp_enabled_featurelinkerunlabeledkd_openms
Whether or not to internally warp feature RTs using LOWESS transformation before linking (reported RTs in results will always be the original RTs)
Type: boolean
Default: true
algorithm_warp_rt_tol_featurelinkerunlabeledkd_openms
Width of RT tolerance window (sec)
Type: number
Default: 100
algorithm_warp_mz_tol_featurelinkerunlabeledkd_openms
m/z tolerance (in ppm or Da)
Type: number
Default: 5
algorithm_warp_max_pairwise_log_fc_featurelinkerunlabeledkd_openms
Maximum absolute log10 fold change between two compatible signals during compatibility graph construction. Two signals from different maps will not be connected by an edge in the compatibility graph if absolute log fold change exceeds this limit (they might still end up in the same connected component, however). Note: this does not limit fold changes in the linking stage, only during RT alignment, where we try to find high-quality alignment anchor points. Setting this to a value < 0 disables the FC check.
Type: number
Default: 0.5
algorithm_warp_min_rel_cc_size_featurelinkerunlabeledkd_openms
Only connected components containing compatible features from at least max(2, (warp_min_occur * number_of_input_maps)) input maps are considered for computing the warping function
Type: number
Default: 0.5
algorithm_warp_max_nr_conflicts_featurelinkerunlabeledkd_openms
Allow up to this many conflicts (features from the same map) per connected component to be used for alignment (-1 means allow any number of conflicts)
Type: integer
Default: 0
algorithm_link_rt_tol_featurelinkerunlabeledkd_openms
Width of RT tolerance window (sec)
Type: number
Default: 30
algorithm_link_mz_tol_featurelinkerunlabeledkd_openms
m/z tolerance (in ppm or Da)
Type: number
Default: 10
algorithm_link_charge_merging_featurelinkerunlabeledkd_openms
whether to disallow charge mismatches (Identical), allow to link charge zero (i.e., unknown charge state) with every charge state, or disregard charges (Any).
Type: string
Default: "With_charge_zero"
- "Identical"
- "With_charge_zero"
- "Any"
algorithm_link_adduct_merging_featurelinkerunlabeledkd_openms
whether to only allow the same adduct for linking (Identical), also allow linking features with adduct-free ones, or disregard adducts (Any).
Type: string
Default: "Any"
- "Identical"
- "With_unknown_adducts"
- "Any"
algorithm_distance_rt_exponent_featurelinkerunlabeledkd_openms
Normalized RT differences ([0-1], relative to 'max_difference') are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
Type: number
Default: 1
algorithm_distance_rt_weight_featurelinkerunlabeledkd_openms
Final RT distances are weighted by this factor
Type: number
Default: 1
algorithm_distance_mz_exponent_featurelinkerunlabeledkd_openms
Normalized ([0-1], relative to 'max_difference') m/z differences are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
Type: number
Default: 2
algorithm_distance_mz_weight_featurelinkerunlabeledkd_openms
Final m/z distances are weighted by this factor
Type: number
Default: 1
algorithm_distance_intensity_exponent_featurelinkerunlabeledkd_openms
Differences in relative intensity ([0-1]) are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)
Type: number
Default: 1
algorithm_distance_intensity_weight_featurelinkerunlabeledkd_openms
Final intensity distances are weighted by this factor
Type: number
Default: 1
algorithm_distance_intensity_log_transform_featurelinkerunlabeledkd_openms
Log-transform intensities? If disabled, d = |int_f2 - int_f1| / int_max. If enabled, d = |log(int_f2 + 1) - log(int_f1 + 1)| / log(int_max + 1))
Type: string
Default: "enabled"
- "enabled"
- "disabled"
algorithm_lowess_span_featurelinkerunlabeledkd_openms
Fraction of datapoints (f) to use for each local regression (determines the amount of smoothing). Choosing this parameter in the range .2 to .8 usually results in a good fit.
Type: number
Default: 0.666666666666667
algorithm_lowess_num_iterations_featurelinkerunlabeledkd_openms
Number of robustifying iterations for lowess fitting.
Type: integer
Default: 3
algorithm_lowess_delta_featurelinkerunlabeledkd_openms
Nonnegative parameter which may be used to save computations (recommended value is 0.01 of the range of the input, e.g. for data ranging from 1000 seconds to 2000 seconds, it could be set to 10). Setting a negative value will automatically do this.
Type: number
Default: -1
algorithm_lowess_interpolation_type_featurelinkerunlabeledkd_openms
Method to use for interpolation between datapoints computed by lowess. 'linear': Linear interpolation. 'cspline': Use the cubic spline for interpolation. 'akima': Use an akima spline for interpolation
Type: string
Default: "cspline"
- "linear"
- "cspline"
- "akima"
algorithm_lowess_extrapolation_type_featurelinkerunlabeledkd_openms
Method to use for extrapolation outside the data range. 'two-point-linear': Uses a line through the first and last point to extrapolate. 'four-point-linear': Uses a line through the first and second point to extrapolate in front and and a line through the last and second-to-last point in the end. 'global-linear': Uses a linear regression to fit a line through all data points and use it for interpolation.
Type: string
Default: "four-point-linear"
- "two-point-linear"
- "four-point-linear"
- "global-linear"
keep_subelements_featurelinkerunlabeledkd_openms
For consensusXML input only: If set, the sub-features of the inputs are transferred to the output.
Type: boolean
invert_maprttransformer_openms
Invert transformation (approximatively) before applying it
Type: boolean
store_original_rt_maprttransformer_openms
Store the original retention times (before transformation) as meta data in the output file
Type: boolean
model_type_maprttransformer_openms
Type of model
Type: string
Default: "none"
- "none"
- "linear"
- "b_spline"
- "lowess"
- "interpolated"
model_linear_symmetric_regression_maprttransformer_openms
Perform linear regression on 'y - x' vs. 'y + x', instead of on 'y' vs. 'x'.
Type: boolean
model_linear_x_weight_maprttransformer_openms
Weight x values
Type: string
Default: "x"
- "1/x"
- "1/x2"
- "ln(x)"
- "x"
model_linear_y_weight_maprttransformer_openms
Weight y values
Type: string
Default: "y"
- "1/y"
- "1/y2"
- "ln(y)"
- "y"
model_linear_x_datum_min_maprttransformer_openms
Minimum x value
Type: number
Default: 1e-15
model_linear_x_datum_max_maprttransformer_openms
Maximum x value
Type: number
Default: 1000000000000000
model_linear_y_datum_min_maprttransformer_openms
Minimum y value
Type: number
Default: 1e-15
model_linear_y_datum_max_maprttransformer_openms
Maximum y value
Type: number
Default: 1000000000000000
model_b_spline_wavelength_maprttransformer_openms
Determines the amount of smoothing by setting the number of nodes for the B-spline. The number is chosen so that the spline approximates a low-pass filter with this cutoff wavelength. The wavelength is given in the same units as the data; a higher value means more smoothing. '0' sets the number of nodes to twice the number of input points.
Type: number
model_b_spline_num_nodes_maprttransformer_openms
Number of nodes for B-spline fitting. Overrides 'wavelength' if set (to two or greater). A lower value means more smoothing.
Type: integer
model_b_spline_extrapolate_maprttransformer_openms
Method to use for extrapolation beyond the original data range. 'linear': Linear extrapolation using the slope of the B-spline at the corresponding endpoint. 'b_spline': Use the B-spline (as for interpolation). 'constant': Use the constant value of the B-spline at the corresponding endpoint. 'global_linear': Use a linear fit through the data (which will most probably introduce discontinuities at the ends of the data range).
Type: string
Default: "linear"
- "linear"
- "b_spline"
- "constant"
- "global_linear"
model_b_spline_boundary_condition_maprttransformer_openms
Boundary condition at B-spline endpoints: 0 (value zero), 1 (first derivative zero) or 2 (second derivative zero)
Type: integer
model_lowess_span_maprttransformer_openms
Fraction of datapoints (f) to use for each local regression (determines the amount of smoothing). Choosing this parameter in the range .2 to .8 usually results in a good fit.
Type: number
model_lowess_num_iterations_maprttransformer_openms
Number of robustifying iterations for lowess fitting.
Type: integer
model_lowess_delta_maprttransformer_openms
Nonnegative parameter which may be used to save computations (recommended value is 0.01 of the range of the input, e.g. for data ranging from 1000 seconds to 2000 seconds, it could be set to 10). Setting a negative value will automatically do this.
Type: number
Default: -1
model_lowess_interpolation_type_maprttransformer_openms
Method to use for interpolation between datapoints computed by lowess. 'linear': Linear interpolation. 'cspline': Use the cubic spline for interpolation. 'akima': Use an akima spline for interpolation
Type: string
Default: "cspline"
- "linear"
- "cspline"
- "akima"
model_lowess_extrapolation_type_maprttransformer_openms
Method to use for extrapolation outside the data range. 'two-point-linear': Uses a line through the first and last point to extrapolate. 'four-point-linear': Uses a line through the first and second point to extrapolate in front and and a line through the last and second-to-last point in the end. 'global-linear': Uses a linear regression to fit a line through all data points and use it for interpolation.
Type: string
Default: "four-point-linear"
- "two-point-linear"
- "four-point-linear"
- "global-linear"
model_interpolated_interpolation_type_maprttransformer_openms
Type of interpolation to apply.
Type: string
Default: "cspline"
- "linear"
- "cspline"
- "akima"
model_interpolated_extrapolation_type_maprttransformer_openms
Type of extrapolation to apply: two-point-linear: use the first and last data point to build a single linear model, four-point-linear: build two linear models on both ends using the first two / last two points, global-linear: use all points to build a single linear model. Note that global-linear may not be continuous at the border.
Type: string
Default: "two-point-linear"
- "two-point-linear"
- "four-point-linear"
- "global-linear"
Re-Quantification
18 parameters
extract_mz_window_featurefindermetaboident_openms
m/z window size for chromatogram extraction (unit: ppm if 1 or greater, else Da/Th)
Type: number
extract_rt_window_featurefindermetaboident_openms
RT window size (in sec.) for chromatogram extraction. If set, this parameter takes precedence over 'extract:rt_quantile'.
Type: number
extract_n_isotopes_featurefindermetaboident_openms
Number of isotopes to include in each peptide assay.
Type: integer
extract_isotope_pmin_featurefindermetaboident_openms
Minimum probability for an isotope to be included in the assay for a peptide. If set, this parameter takes precedence over 'extract:n_isotopes'.
Type: number
detect_peak_width_featurefindermetaboident_openms
Expected elution peak width in seconds, for smoothing (Gauss filter). Also determines the RT extration window, unless set explicitly via 'extract:rt_window'.
Type: number
detect_min_peak_width_featurefindermetaboident_openms
Minimum elution peak width. Absolute value in seconds if 1 or greater, else relative to 'peak_width'.
Type: number
detect_signal_to_noise_featurefindermetaboident_openms
Signal-to-noise threshold for OpenSWATH feature detection
Type: number
model_type_featurefindermetaboident_openms
Type of elution model to fit to features
Type: string
Default: "symmetric"
- "symmetric"
- "asymmetric"
- "none"
model_add_zeros_featurefindermetaboident_openms
Add zero-intensity points outside the feature range to constrain the model fit. This parameter sets the weight given to these points during model fitting; '0' to disable.
Type: number
model_unweighted_fit_featurefindermetaboident_openms
Suppress weighting of mass traces according to theoretical intensities when fitting elution models
Type: boolean
model_no_imputation_featurefindermetaboident_openms
If fitting the elution model fails for a feature, set its intensity to zero instead of imputing a value from the initial intensity estimate
Type: boolean
model_each_trace_featurefindermetaboident_openms
Fit elution model to each individual mass trace
Type: boolean
model_check_min_area_featurefindermetaboident_openms
Lower bound for the area under the curve of a valid elution model
Type: number
model_check_boundaries_featurefindermetaboident_openms
Time points corresponding to this fraction of the elution model height have to be within the data region used for model fitting
Type: number
model_check_width_featurefindermetaboident_openms
Upper limit for acceptable widths of elution models (Gaussian or EGH), expressed in terms of modified (median-based) z-scores. '0' to disable. Not applied to individual mass traces (parameter 'each_trace').
Type: number
model_check_asymmetry_featurefindermetaboident_openms
Upper limit for acceptable asymmetry of elution models (EGH only), expressed in terms of modified (median-based) z-scores. '0' to disable. Not applied to individual mass traces (parameter 'each_trace').
Type: number
emgscoring_max_iteration_featurefindermetaboident_openms
Maximum number of iterations for EMG fitting.
Type: integer
emgscoring_init_mom_featurefindermetaboident_openms
Alternative initial parameters for fitting through method of moments.
Type: boolean
Quantification
43 parameters
algorithm_signal_to_noise_peakpickerhires_openms
Minimal signal-to-noise ratio for a peak to be picked (0.0 disables SNT estimation!)
Type: number
algorithm_spacing_difference_gap_peakpickerhires_openms
The extension of a peak is stopped if the spacing between two subsequent data points exceeds 'spacing_difference_gap * min_spacing'. 'min_spacing' is the smaller of the two spacings from the peak apex to its two neighboring points. '0' to disable the constraint. Not applicable to chromatograms.
Type: number
algorithm_spacing_difference_peakpickerhires_openms
Maximum allowed difference between points during peak extension, in multiples of the minimal difference between the peak apex and its two neighboring points. If this difference is exceeded a missing point is assumed (see parameter 'missing'). A higher value implies a less stringent peak definition, since individual signals within the peak are allowed to be further apart. '0' to disable the constraint. Not applicable to chromatograms.
Type: number
algorithm_missing_peakpickerhires_openms
Maximum number of missing points allowed when extending a peak to the left or to the right. A missing data point occurs if the spacing between two subsequent data points exceeds 'spacing_difference * min_spacing'. 'min_spacing' is the smaller of the two spacings from the peak apex to its two neighboring points. Not applicable to chromatograms.
Type: integer
algorithm_report_fwhm_peakpickerhires_openms
Add metadata for FWHM (as floatDataArray named 'FWHM' or 'FWHM_ppm', depending on param 'report_FWHM_unit') for each picked peak.
Type: boolean
algorithm_report_fwhm_unit_peakpickerhires_openms
Unit of FWHM. Either absolute in the unit of input, e.g. 'm/z' for spectra, or relative as ppm (only sensible for spectra, not chromatograms).
Type: string
Default: "relative"
- "relative"
- "absolute"
algorithm_signaltonoise_max_intensity_peakpickerhires_openms
maximal intensity considered for histogram construction. By default, it will be calculated automatically (see auto_mode). Only provide this parameter if you know what you are doing (and change 'auto_mode' to '-1')! All intensities EQUAL/ABOVE 'max_intensity' will be added to the LAST histogram bin. If you choose 'max_intensity' too small, the noise estimate might be too small as well. If chosen too big, the bins become quite large (which you could counter by increasing 'bin_count', which increases runtime). In general, the Median-S/N estimator is more robust to a manual max_intensity than the MeanIterative-S/N.
Type: integer
algorithm_signaltonoise_auto_max_stdev_factor_peakpickerhires_openms
parameter for 'max_intensity' estimation (if 'auto_mode' == 0): mean + 'auto_max_stdev_factor' * stdev
Type: number
algorithm_signaltonoise_auto_max_percentile_peakpickerhires_openms
parameter for 'max_intensity' estimation (if 'auto_mode' == 1): auto_max_percentile th percentile
Type: integer
algorithm_signaltonoise_auto_mode_peakpickerhires_openms
method to use to determine maximal intensity: -1 --> use 'max_intensity'; 0 --> 'auto_max_stdev_factor' method (default); 1 --> 'auto_max_percentile' method
Type: integer
algorithm_signaltonoise_win_len_peakpickerhires_openms
window length in Thomson
Type: number
algorithm_signaltonoise_bin_count_peakpickerhires_openms
number of bins for intensity values
Type: integer
algorithm_signaltonoise_min_required_elements_peakpickerhires_openms
minimum number of elements required in a window (otherwise it is considered sparse)
Type: integer
algorithm_signaltonoise_noise_for_empty_window_peakpickerhires_openms
noise value used for sparse windows
Type: number
Default: 100000000000000000000
algorithm_common_noise_threshold_int_featurefindermetabo_openms
Intensity threshold below which peaks are regarded as noise.
Type: number
Default: 10
algorithm_common_chrom_peak_snr_featurefindermetabo_openms
Minimum signal-to-noise a mass trace should have.
Type: number
Default: 3
algorithm_common_chrom_fwhm_featurefindermetabo_openms
Expected chromatographic peak width (in seconds).
Type: number
Default: 5
algorithm_mtd_mass_error_ppm_featurefindermetabo_openms
Allowed mass deviation (in ppm).
Type: number
Default: 20
algorithm_mtd_reestimate_mt_sd_featurefindermetabo_openms
Enables dynamic re-estimation of m/z variance during mass trace collection stage.
Type: boolean
Default: true
algorithm_mtd_quant_method_featurefindermetabo_openms
Method of quantification for mass traces. For LC data 'area' is recommended, 'median' for direct injection data. 'max_height' simply uses the most intense peak in the trace.
Type: string
Default: "area"
- "area"
- "median"
- "max_height"
algorithm_mtd_trace_termination_criterion_featurefindermetabo_openms
Termination criterion for the extension of mass traces. In 'outlier' mode, trace extension cancels if a predefined number of consecutive outliers are found (see trace_termination_outliers parameter). In 'sample_rate' mode, trace extension in both directions stops if ratio of found peaks versus visited spectra falls below the 'min_sample_rate' threshold.
Type: string
Default: "outlier"
- "outlier"
- "sample_rate"
algorithm_mtd_trace_termination_outliers_featurefindermetabo_openms
Mass trace extension in one direction cancels if this number of consecutive spectra with no detectable peaks is reached.
Type: integer
Default: 5
algorithm_mtd_min_sample_rate_featurefindermetabo_openms
Minimum fraction of scans along the mass trace that must contain a peak.
Type: number
Default: 0.5
algorithm_mtd_min_trace_length_featurefindermetabo_openms
Minimum expected length of a mass trace (in seconds).
Type: number
Default: 5
algorithm_mtd_max_trace_length_featurefindermetabo_openms
Maximum expected length of a mass trace (in seconds). Set to a negative value to disable maximal length check during mass trace detection.
Type: number
Default: -1
algorithm_epd_enabled_featurefindermetabo_openms
Enable splitting of isobaric mass traces by chromatographic peak detection. Disable for direct injection.
Type: boolean
Default: true
algorithm_epd_width_filtering_featurefindermetabo_openms
Enable filtering of unlikely peak widths. The fixed setting filters out mass traces outside the [min_fwhm, max_fwhm] interval (set parameters accordingly!). The auto setting filters with the 5 and 95% quantiles of the peak width distribution.
Type: string
Default: "fixed"
- "off"
- "fixed"
- "auto"
algorithm_epd_min_fwhm_featurefindermetabo_openms
Minimum full-width-at-half-maximum of chromatographic peaks (in seconds). Ignored if parameter width_filtering is off or auto.
Type: number
Default: 1
algorithm_epd_max_fwhm_featurefindermetabo_openms
Maximum full-width-at-half-maximum of chromatographic peaks (in seconds). Ignored if parameter width_filtering is off or auto.
Type: number
Default: 60
algorithm_epd_masstrace_snr_filtering_featurefindermetabo_openms
Apply post-filtering by signal-to-noise ratio after smoothing.
Type: boolean
algorithm_ffm_local_rt_range_featurefindermetabo_openms
RT range where to look for coeluting mass traces
Type: number
Default: 10
algorithm_ffm_local_mz_range_featurefindermetabo_openms
MZ range where to look for isotopic mass traces
Type: number
Default: 6.5
algorithm_ffm_charge_lower_bound_featurefindermetabo_openms
Lowest charge state to consider
Type: integer
Default: 1
algorithm_ffm_charge_upper_bound_featurefindermetabo_openms
Highest charge state to consider
Type: integer
Default: 1
algorithm_ffm_report_summed_ints_featurefindermetabo_openms
Set to true for a feature intensity summed up over all traces rather than using monoisotopic trace intensity alone.
Type: boolean
algorithm_ffm_enable_rt_filtering_featurefindermetabo_openms
Require sufficient overlap in RT while assembling mass traces. Disable for direct injection data..
Type: boolean
Default: true
algorithm_ffm_isotope_filtering_model_featurefindermetabo_openms
Remove/score candidate assemblies based on isotope intensities. SVM isotope models for metabolites were trained with either 2% or 5% RMS error. For peptides, an averagine cosine scoring is used. Select the appropriate noise model according to the quality of measurement or MS device.
Type: string
Default: "metabolites (5% RMS)"
- "metabolites (2% RMS)"
- "metabolites (5% RMS)"
- "peptides"
- "none"
algorithm_ffm_mz_scoring_13c_featurefindermetabo_openms
Use the 13C isotope peak position (~1.003355 Da) as the expected shift in m/z for isotope mass traces (highly recommended for lipidomics!). Disable for general metabolites (as described in Kenar et al. 2014, MCP.).
Type: boolean
algorithm_ffm_use_smoothed_intensities_featurefindermetabo_openms
Use LOWESS intensities instead of raw intensities.
Type: boolean
Default: true
algorithm_ffm_report_convex_hulls_featurefindermetabo_openms
Augment each reported feature with the convex hull of the underlying mass traces (increases featureXML file size considerably).
Type: boolean
algorithm_ffm_remove_single_traces_featurefindermetabo_openms
Remove unassembled traces (single traces).
Type: boolean
algorithm_ffm_mz_scoring_by_elements_featurefindermetabo_openms
Use the m/z range of the assumed elements to detect isotope peaks. A expected m/z range is computed from the isotopes of the assumed elements. If enabled, this ignores 'mz_scoring_13C'
Type: boolean
algorithm_ffm_elements_featurefindermetabo_openms
Elements assumes to be present in the sample (this influences isotope detection).
Type: string
Default: "CHNOPS"
Input/output options
5 parameters
Define where the pipeline should find input data and save output data.
inputrequired
Path to comma-separated file containing information about the samples in the experiment.
Type: string
Pattern: ^\S+\.csv$
outdirrequired
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
Type: string
Email address for completion summary.
Type: string
Pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
multiqc_title
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
Type: string
save_intermeds
Save intermediate files
Type: boolean
Institutional config options
6 parameters
Parameters used to describe centralised config profiles. These should not be edited.
custom_config_version
Git commit id for Institutional configs.
Type: string
Default: "master"
custom_config_base
Base directory for Institutional configs.
Type: string
Default: "https://raw.githubusercontent.com/nf-core/configs/master"
config_profile_name
Institutional config name.
Type: string
config_profile_description
Institutional config description.
Type: string
config_profile_contact
Institutional config contact information.
Type: string
config_profile_url
Institutional config URL link.
Type: string
Max job request options
3 parameters
Set the top limit for requested resources for any single job.
max_cpus
Maximum number of CPUs that can be requested for any single job.
Type: integer
Default: 16
max_memory
Maximum amount of memory that can be requested for any single job.
Type: string
Pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Default: "128.GB"
max_time
Maximum amount of time that can be requested for any single job.
Type: string
Pattern: ^(\d+\.?\s*(s|m|h|d|day)\s*)+$
Default: "240.h"
Generic options
15 parameters
Less common options for the pipeline, typically set in a config file.
help
Display help text.
Type: boolean
version
Display version and exit.
Type: boolean
publish_dir_mode
Method used to save pipeline results to output directory.
Type: string
Default: "copy"
- "symlink"
- "rellink"
- "link"
- "copy"
- "copyNoFollow"
- "move"
email_on_fail
Email address for completion summary, only when pipeline fails.
Type: string
Pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
plaintext_email
Send plain-text email instead of HTML.
Type: boolean
monochrome_logs
Do not use coloured log outputs.
Type: boolean
hook_url
Incoming hook URL for messaging service
Type: string
validate_params
Boolean whether to validate parameters against the schema at runtime
Type: boolean
Default: true
validationShowHiddenParams
Show all params when using --help
Type: boolean
validationFailUnrecognisedParams
Validation of parameters fails when an unrecognised parameter is found.
Type: boolean
validationLenientMode
Validation of parameters in lenient more.
Type: boolean
max_multiqc_email_size
File size limit when attaching MultiQC reports to summary emails.
Type: string
Pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Default: "25.MB"
multiqc_config
Custom config file to supply to MultiQC.
Type: string
multiqc_logo
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
Type: string
multiqc_methods_description
Custom MultiQC yaml file containing HTML including a methods description.
Type: string
Launch Pipeline
Run this pipeline using Seqera on top of your own compute.