-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are there any Cut & Run analysis specific parameters implemented in bcbio? #3037
Comments
We're looking into using the Henikoff Lab software (https://research.fhcrc.org/henikoff/en/methods.html) at my company, and I would be up for helping test integration of CUT&RUN and CUT&TAG into bcbio. |
See also here: https://github.com/Henikoff/Cut-and-Run |
Hi everyone, we aren't really set up to cut and run in bcbio, but we'd totally accept some pull requests to implement it. Looks like https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1802-4 is using MACS to do the peak calling; it would be good if we could figure out if we could have some cut and run specific settings and use one of the standard peakcallers along with whatever downstream processing needs to happen. The Henikoff stuff looks pretty abandoned to me. |
@roryk We're likely going to be working on this for the next couple of months, so I can potentially put some code together for a future pull request once we start plugging away on this. |
Okey doke, let me know if I can be any help. If, for the peak calling, all we need to do is set some parameters that is easy to implement. |
It would be good to flesh out here what would have to happen to implement it, which will let folks who have more experience with it chime in and help everyone come up with a good plan. Some recent papers: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1802-4 The Henikoff paper: https://elifesciences.org/articles/46314 has this blurb: The size distribution of libraries was determined by Agilent 4200 TapeStation analysis, and libraries were mixed to achieve equal representation as desired aiming for a final concentration as recommended by the manufacturer. Paired-end Illumina sequencing was performed on the barcoded libraries following the manufacturer’s instructions. Paired-end reads were aligned using Bowtie2 version 2.2.5 with options: --local --very-sensitive-local --no-unal --no-mixed --no-discordant --phred33 -I 10 -X 700. For MACS2 peak calling, parameters used were macs2 callpeak – t input_file –p 1e-5 –f BEDPE/BED(Paired End vs. Single End sequencing data) –keep-dup all –n out_name. Some datasets showed contamination by sequences of undetermined origin consisting of the sequence (TA)n. To avoid cross-mapping, we searched blastn for TATATATATATATATATATATATAT against hg19, collapsed the overlapping hits into 34,832 regions and intersected with sequencing datasets, keeping only the fragments that did not overlap any of these regions. |
So looks like we could implement it with just a few options twiddling. @leiendeckerlu do you have any experience with this type of data? Does this look reasonable to you? |
Hi there,
it's my understanding that for Cut&Run data analysis the ChiPseq pipeline can be used to a large extent, however, for certain steps such as Cut&Run specific peak calling (like SEACR) specific input files are needed.
So my question boils down to whether there are predefined parameters implemented that I can set for Cut&Run data analysis?
Many thanks & best,
Lukas
The text was updated successfully, but these errors were encountered: