Skip to content

tborrman/DNA-rep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DNA-rep

Code for DNA replication projects

References

Das S et al. 2015. Replication timing is regulated by the number of MCMs loaded at origins.

BioNano

Trial filtering and segmenting protocol for BioNano data

Example: Sync_HeLA_1708

./direction_bed_seg_new_format.py -b input.bnx -x input.xmap -o output.bed

Requirements:

numpy version >= 1.10.2

Input:

Output:

  • -o : (Example: Sync_HeLA_1708_direction.bed) bed like file where each row is a segment of filtered red label data with the following columns:

    • chrom
    • start
    • end
    • molecule ID
    • fork direction ('+' = rightward moving fork, '-' = leftward moving fork)
    • direction strength
    • sum of red label signal in segment

    Filtering:
    Data from molecules with a red label with at least 5 neighboring red labels within 20kb. Keep these 5 neighbor red labels and their associated neighbors.

    Segmenting:
    Molecules segmented if distance to next label is > 30kb

  • .bedGraph for unfiltered red label tracks

  • .bedGraph for filtered red label tracks

Viewing output tracks on IGV:

Alt

To generate bam file for "Sync_HeLA_1708 segments" track above:

./cleanbed_for_bed.py Sync_HeLA_1708_direction.bed > Sync_HeLA_1708_direction_clean.bed
sort -k1,1V -k2,2n Sync_HeLA_1708_direction_clean.bed > Sync_HeLA_1708_direction_clean_sorted.bed

bedtools bedtobam -i Sync_HeLA_1708_direction_clean_sorted.bed -g hg19.genome > Sync_HeLA_1708_direction.bam
samtools sort Sync_HeLA_1708_direction.bam Sync_HeLA_1708_direction_sort
samtools index Sync_HeLA_1708_direction_sort.bam

WARNING

One problem in code is that sometimes it is not the first green label of a molecule in .bnx that maps to refStartPos in .xmap (could be second, third, fourth, etc. but usually first). Or vice versa, its not the first green label that maps to refEndPos ('-' direction) but instead maybe the second, third or fourth green label of a molecule is the first green label to map to refEndPos. These cases slightly shift the results off since I assume it is ALWAYS the first green label that is mapping to the nick position. Average bp distance between labels is only ~ 9kb, so this does not make much of a difference analyzing data at 100kb resolution. However, this will need to be fixed.

PARALLELIZE

For faster processing split bnx file to run direction_bed_seg_new_format.py in parallel

split -d -a 3 -l 350000 input.bnx input_split_bnx_
./direction_bed_multi_controller.py

direction_bed_multi_controller.py specific to LSF environment

About

Code for DNA replication projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published