The pipeline is run using conda environments on the local machine using:
snakemake --cores {cores} --resources ncbi_connection=1 --use-conda --conda-frontend mamba all
For examples of different config files see parameter_templates/
The pipeline supports either STAR or bowtie2 for aligning given in config.yml
as aligner: star
or aligner: bowtie2
Currently the pipeline only acceps pair end data, except for bam file inputs.
It expects readlength: {the_readlength}
in config.yml
and optionally for aligning adapters can be specified in config.yml
see parameter_templates/adapter_trimming.yml, with type of adapter present beind passed on to cutadapt
.
With metadata_files: srr
in config.yml
and SRRxxxxxxxx
for sample_ids
in metadata.tsv
With metadata_files: fastq
in config.yml
and R1
and R2
columns with paths to the respective fastq files in metadata.tsv
With metadata_files: bam
in config.yml
and bam
column with paths to the respective bam files in metadata.tsv
Runs qc, and generates a count matrix, super basic ready for downstream analysis.
The pipeline runs peakcalling using MACS2, and then run through various peak detection methods:
- IDR
- PePr
- DEQ
- Thor
- Genrich
before saving pileup summaries of these analysis tools. The pipeline strictly requires unique matched input control for each condtion.
The metadata.tsv
file requires one line per sample with the following columns:
- sample_id
- condition
- method: IP or Input
- matching_input_control: the sample_id of the matching input control
The config file requires the following terms in config.yml
:
control_condition
: setting which condition is compared against, all other conditions compared against this
See parameter_templates/ripseq_config.yml and parameter_templates/ripseq_metadata.tsv for an example template
The pipeline runs the Bullseye C to T editing pipeline an performs analysis on the data, and plotting pileups with optionally other bam files on edit genes.
The metadata.tsv
file requires one line per sample with the following columns:
- sample_id
- condition
- method: IP or Input
- matching_{condition}: One column per condition providing the matching samples of the same condition
The config file requires the following terms in config.yml
:
complex_comparisons
: allowing comparisons of one condition against multiple conditionssimple_comparisons
: allowing comparisons of pairs of conditionsdisplay_order
: determining the order of conditions in the plots
See parameter_templates/stamp_config.yml and parameter_templates/stamp_metadata.tsv for an example template
These will be downloaded automatically by the pipeline when running using conda.
- snakemake
- sra-tools
- parallel-fastq-dump
- samtools
- [bedtools](http://bioinformatics.oxfordjournals.org/content/26/6/841.short
- pybedtools
- tidyverse
- pandas
- STAR
- bowtie2
- cutadapt
- fastqc
- multiqc
- biomaRt
- GenomicRanges
- Rsamtools
- rtracklayer
- ggseqlogo