A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
/Users/vlad/git/seqeralabs/web/packages/website/public/examples/bs-seq
General Statistics
Sample Name | mCpG | mCHG | mCHH | C's | Dups | Unique | Aligned | Aligned | Trimmed bases | Dups | GC | Avg len | Median len | Failed | Seqs |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MethylC-Seq_mm_fc_1wk_SRR921767_1 | 77.0% | 0.5% | 0.5% | 1568.0 | 2.9% | 83.5M | 86.0M | 86.0% | 2.6% | 8.8% | 21.0% | 98bp | 100bp | 8% | 99.9M |
MethylC-Seq_mm_fc_1wk_SRR921768_1 | 77.0% | 0.5% | 0.5% | 1559.9 | 2.9% | 83.2M | 85.7M | 85.7% | 2.9% | 9.3% | 21.0% | 97bp | 100bp | 17% | 99.9M |
MethylC-Seq_mm_fc_1wk_SRR921769_1 | 77.0% | 0.5% | 0.5% | 321.4 | 1.6% | 17.2M | 17.5M | 85.7% | 3.0% | 6.0% | 21.0% | 97bp | 100bp | 8% | 20.4M |
MethylC-Seq_mm_fc_1wk_SRR921770_1 | 77.0% | 0.5% | 0.5% | 1562.8 | 2.9% | 83.4M | 85.9M | 85.9% | 2.8% | 8.7% | 21.0% | 97bp | 100bp | 8% | 99.9M |
MethylC-Seq_mm_fc_2wk_SRR921694_1 | 75.7% | 0.9% | 1.1% | 1928.2 | 7.0% | 111.7M | 120.1M | 78.4% | 13.5% | 15.5% | 26.0% | 92bp | 100bp | 17% | 153.1M |
MethylC-Seq_mm_fc_2wk_SRR921695_1 | 75.8% | 0.9% | 1.1% | 1930.1 | 7.0% | 111.8M | 120.2M | 78.4% | 13.6% | 14.6% | 26.0% | 92bp | 100bp | 8% | 153.4M |
MethylC-Seq_mm_fc_2wk_SRR921696_1 | 75.7% | 0.9% | 1.1% | 1366.7 | 6.1% | 84.1M | 89.6M | 77.1% | 20.7% | 11.5% | 25.0% | 86bp | 94bp | 8% | 116.2M |
MethylC-Seq_mm_fc_2wk_SRR921773_1 | 75.8% | 0.9% | 1.1% | 1472.5 | 6.3% | 91.6M | 97.7M | 76.3% | 23.6% | 10.9% | 25.0% | 85bp | 94bp | 8% | 128.1M |
Bismark
0.14.4
Maps bisulfite converted sequence reads and determine cytosine methylation states.URL: http://www.bioinformatics.babraham.ac.uk/projects/bismarkDOI: 10.1093/bioinformatics/btr167
Alignment Rates
Deduplication
Strand Alignment
All samples were run with --directional
mode; alignments to complementary strands (CTOT, CTOB) were ignored.
Cytosine Methylation
M-Bias
This plot shows the average percentage methylation and coverage across reads. See the bismark user guide for more information on how these numbers are generated.
Cutadapt
1.8
Finds and removes adapter sequences, primers, poly-A tails, and other types of unwanted sequences.URL: https://cutadapt.readthedocs.ioDOI: 10.14806/ej.17.1.200
Filtered Reads
This plot shows the number of reads (SE) / pairs (PE) removed by Cutadapt.
Trimmed Sequence Lengths (3')
This plot shows the number of reads with certain lengths of adapter trimmed for the 3' end.
Obs/Exp shows the raw counts divided by the number expected due to sequencing errors. A defined peak may be related to adapter length.
See the cutadapt documentation for more information on how these numbers are generated.
FastQC: trimmed
0.11.2
Quality control tool for high throughput sequencing data.URL: http://www.bioinformatics.babraham.ac.uk/projects/fastqc
Sequence Counts
Sequence counts for each sample. Duplicate read counts are an estimate only.
This plot show the total number of reads, broken down into unique and duplicate if possible (only more recent versions of FastQC give duplicate info).
You can read more about duplicate calculation in the FastQC documentation. A small part has been copied here for convenience:
Only sequences which first appear in the first 100,000 sequences in each file are analysed. This should be enough to get a good impression for the duplication levels in the whole file. Each sequence is tracked to the end of the file to give a representative count of the overall duplication level.
The duplication detection requires an exact sequence match over the whole length of the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.
Sequence Quality Histograms
The mean quality value across each base position in the read.
To enable multiple samples to be plotted on the same graph, only the mean quality scores are plotted (unlike the box plots seen in FastQC reports).
Taken from the FastQC help:
The y-axis on the graph shows the quality scores. The higher the score, the better the base call. The background of the graph divides the y axis into very good quality calls (green), calls of reasonable quality (orange), and calls of poor quality (red). The quality of calls on most platforms will degrade as the run progresses, so it is common to see base calls falling into the orange area towards the end of a read.
Per Sequence Quality Scores
The number of reads with average quality scores. Shows if a subset of reads has poor quality.
From the FastQC help:
The per sequence quality score report allows you to see if a subset of your sequences have universally low quality values. It is often the case that a subset of sequences will have universally poor quality, however these should represent only a small percentage of the total sequences.
Per Base Sequence Content
The proportion of each base position for which each of the four normal DNA bases has been called.
To enable multiple samples to be shown in a single plot, the base composition data is shown as a heatmap. The colours represent the balance between the four bases: an even distribution should give an even muddy brown colour. Hover over the plot to see the percentage of the four bases under the cursor.
To see the data as a line plot, as in the original FastQC graph, click on a sample track.
From the FastQC help:
Per Base Sequence Content plots out the proportion of each base position in a file for which each of the four normal DNA bases has been called.
In a random library you would expect that there would be little to no difference between the different bases of a sequence run, so the lines in this plot should run parallel with each other. The relative amount of each base should reflect the overall amount of these bases in your genome, but in any case they should not be hugely imbalanced from each other.
It's worth noting that some types of library will always produce biased sequence composition, normally at the start of the read. Libraries produced by priming using random hexamers (including nearly all RNA-Seq libraries) and those which were fragmented using transposases inherit an intrinsic bias in the positions at which reads start. This bias does not concern an absolute sequence, but instead provides enrichement of a number of different K-mers at the 5' end of the reads. Whilst this is a true technical bias, it isn't something which can be corrected by trimming and in most cases doesn't seem to adversely affect the downstream analysis.