Are there any Cut & Run analysis specific parameters implemented in bcbio? #3037

leiendeckerlu · 2019-12-11T13:23:11Z

Hi there,

it's my understanding that for Cut&Run data analysis the ChiPseq pipeline can be used to a large extent, however, for certain steps such as Cut&Run specific peak calling (like SEACR) specific input files are needed.

So my question boils down to whether there are predefined parameters implemented that I can set for Cut&Run data analysis?

Many thanks & best,

Lukas

mjsteinbaugh · 2019-12-12T15:25:19Z

We're looking into using the Henikoff Lab software (https://research.fhcrc.org/henikoff/en/methods.html) at my company, and I would be up for helping test integration of CUT&RUN and CUT&TAG into bcbio.

mjsteinbaugh · 2019-12-12T15:25:40Z

See also here: https://github.com/Henikoff/Cut-and-Run

roryk · 2019-12-12T16:06:38Z

Hi everyone, we aren't really set up to cut and run in bcbio, but we'd totally accept some pull requests to implement it. Looks like https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1802-4 is using MACS to do the peak calling; it would be good if we could figure out if we could have some cut and run specific settings and use one of the standard peakcallers along with whatever downstream processing needs to happen. The Henikoff stuff looks pretty abandoned to me.

mjsteinbaugh · 2019-12-12T16:23:09Z

@roryk We're likely going to be working on this for the next couple of months, so I can potentially put some code together for a future pull request once we start plugging away on this.

roryk · 2019-12-12T16:28:47Z

Okey doke, let me know if I can be any help. If, for the peak calling, all we need to do is set some parameters that is easy to implement.

roryk · 2019-12-12T16:34:06Z

It would be good to flesh out here what would have to happen to implement it, which will let folks who have more experience with it chime in and help everyone come up with a good plan. Some recent papers:

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1802-4

The Henikoff paper:

https://elifesciences.org/articles/46314

has this blurb:

The size distribution of libraries was determined by Agilent 4200 TapeStation analysis, and libraries were mixed to achieve equal representation as desired aiming for a final concentration as recommended by the manufacturer. Paired-end Illumina sequencing was performed on the barcoded libraries following the manufacturer’s instructions. Paired-end reads were aligned using Bowtie2 version 2.2.5 with options: --local --very-sensitive-local --no-unal --no-mixed --no-discordant --phred33 -I 10 -X 700. For MACS2 peak calling, parameters used were macs2 callpeak – t input_file –p 1e-5 –f BEDPE/BED(Paired End vs. Single End sequencing data) –keep-dup all –n out_name. Some datasets showed contamination by sequences of undetermined origin consisting of the sequence (TA)n. To avoid cross-mapping, we searched blastn for TATATATATATATATATATATATAT against hg19, collapsed the overlapping hits into 34,832 regions and intersected with sequencing datasets, keeping only the fragments that did not overlap any of these regions.

roryk · 2019-12-12T16:35:21Z

So looks like we could implement it with just a few options twiddling. @leiendeckerlu do you have any experience with this type of data? Does this look reasonable to you?

roryk added enhancement discussion labels Dec 12, 2019

naumenko-sa mentioned this issue May 29, 2020

bcbio priorities #3242

Open

90 tasks

naumenko-sa closed this as completed May 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are there any Cut & Run analysis specific parameters implemented in bcbio? #3037

Are there any Cut & Run analysis specific parameters implemented in bcbio? #3037

leiendeckerlu commented Dec 11, 2019

mjsteinbaugh commented Dec 12, 2019

mjsteinbaugh commented Dec 12, 2019

roryk commented Dec 12, 2019

mjsteinbaugh commented Dec 12, 2019

roryk commented Dec 12, 2019

roryk commented Dec 12, 2019 •

edited

Loading

roryk commented Dec 12, 2019

Are there any Cut & Run analysis specific parameters implemented in bcbio? #3037

Are there any Cut & Run analysis specific parameters implemented in bcbio? #3037

Comments

leiendeckerlu commented Dec 11, 2019

mjsteinbaugh commented Dec 12, 2019

mjsteinbaugh commented Dec 12, 2019

roryk commented Dec 12, 2019

mjsteinbaugh commented Dec 12, 2019

roryk commented Dec 12, 2019

roryk commented Dec 12, 2019 • edited Loading

roryk commented Dec 12, 2019

roryk commented Dec 12, 2019 •

edited

Loading