Outputs

This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.

Pipeline overview

The pipeline is built using Nextflow and processes the data using the steps presented in the main README file.
Briefly, its goal is to process single-cell ChIP-seq data from inDrop protocol.

The directories listed below will be created in the output directory after the pipeline has finished.

Barcode Matching

Barcodes are composed of three indexes originating from three different libraries.

If 40% or more sequences are correctly barcoded ("Barcoded" in the legend), barcoding step went well. If it is between 20% and 40% be carrefull and if it less, barcoding step went wrong.

Bowtie2

Bowtie2 software is used to aligned barcodes to index reference libraries.

SE mapped uniquely : successfully aligned sequence to a unique index.
SE multimapped : sequence aligned on multiple indexes.
SE not aligned : sequence corresponding to none index.

A drop in successfull alignement percentage is observed after the first index. It is due to the sequential ligation of indexes (first the index 1 is ligated, then the 2, then the 3) that leads to a loss at each round. It is attempted that arround 75% of the first index map correctly, arround 50% for the second and arround 40% for the third.

STAR

STAR software is used to aligned reads to a reference genome.

Uniquely mapped : successfully aligned reads to one single locus.
Mapped to multiple loci : accepted reads aligned on 2 to 10 loci.
Mapped too many loci : reads aligned to more than 10 loci.
Unmapped: too many mismatches : reads having an alignement of too poor quality
Unmapped: too short : less than 66% of reads length (R1+R2) are correctly aligned on the genome.
Unmapped other: other : other reasons than "too short" or "too many" like for example due to a foreign genome contamination or if reads came from a from a higly repeated region.

Alignement is made on all the initial reads.
Reads passing alignment are uniquely mapped or mapped to multiple loci.
If this two made up to 60%, the alignemnt is a success. If they made between 60 and 40%, be carrefull and if they made less than 40% the sample may have a problem.

Alignment Scores'

Summary of alignment scores accross the pipeline.

Deduplicated : Correct unique reads.
Window duplicates : Remove duplicates by window (if R2 is unmapped).
RT duplicates : Removed RT duplicates which are reads having similar barcode, R2 position but a different R1 position.
PCR duplicates : Removed PCR duplicates which correspond to reads having exactly the same barcode, R1 position and R2 position on the genome.
Uniquely mapped not barcoded : Correctly aligned reads but having an undetermined barcode.
Mapped to multiple loci : Reads aligned on 2 to 10 loci.
Unmapped : All unmaped reads (too many mismatches + too short + other on STAR results).

Deduplicated reads
More than 10% : Success
5-10% : Warning
Less than 5%: Danger
RT duplicates
Less than 40% : Success
40-50% : Warning
More than 50% : Danger
PCR duplicates
Less than 40% : Success
40-50% : Warning
More than 50% : Danger
Window duplicates
Less than 10% : Success
10-15% : Warning
More than 15% : Danger

Read distributions across cells

Overview of read distribution across cells. If you observed a bimodal distribution, your data may contain empty droplets as first pics represent the proportion of cells having a low number of reads. Higher these pics are, higher the number of cells having a low number of reads is. This information can be used to define a threshold to remove empty droplets identified by a low number of reads.

MultiQC

MultiQC is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available within the report data directory.

The pipeline has special steps which allow the software versions used to be reported in the MultiQC output for future traceability.

Output directory: multiqc

multiqc_report.html
- MultiQC report - a standalone HTML file that can be viewed in your web browser.
multiqc_data/
- Directory containing parsed statistics from the different tools used in the pipeline.

For more information about how to use MultiQC reports, see http://multiqc.info.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

output.md

output.md

Outputs

Pipeline overview

Barcode Matching

Bowtie2

STAR

Alignment Scores'

Read distributions across cells

MultiQC

Files

output.md

Latest commit

History

output.md

File metadata and controls

Outputs

Pipeline overview

Barcode Matching

Bowtie2

STAR

Alignment Scores'

Read distributions across cells

MultiQC