Skip to content

jessicaStorer88/RepeatMasker_library_CHM13

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 

Repository files navigation

RepeatMasker library : custom human repeats

RepeatMasker was developed by Arian Smit and Robert Hubley
Please refer to: Smit, AFA, Hubley, R. & Green, P "RepeatMasker" at
http://www.repeatmasker.org

The completion of the human genome (T2T-CHM13 project) allowed for a complete repeat analysis of the centromeres and telomeres therein. In order investigate new repetitive elements, a new repeat pipeline was developed (Fig 1)1.

This RepeatMasker-based pipeline was used to generate new models not only the T2T-CHM13 project, but also for the human Y chromosome2, the X and Y chromosomes of apes3, and ape autosomes4. The investigation of these genomes not only allowed for the discovery of novel satellite sequences, but also to update the taxonomic label of select repeats. The combined repeat library discovered by all of these analyses for investigation of human repeats can be found in this repository.

Untitled presentation

Figure 1 : A discovery workflow afforded comprehensive annotations of a complete human genome as part of Hoyt et al. 2022. Workflow implemented to obtain updated repeat models and teh derivation of RepeatMasker Annoatations 2 (RMv2), consisting of combined and polished RM annotations submitted to Dfam (A) and applied to T2T-CHM13 and GRCh38 as RepeatMaskerv2 tracks. Workflow consisted of multiple iterations of RepeatMasker and RepeatModeler. (A) The components intersected during manual curation (B) include CAT/gene annotations, segmental duplications, repeates masked using DFam (v3.3) repeat models, tandem repeat arrays identified as gaps in annotations >10 kbp and overlap with ULTRA tandem repeat models. (C) Repeat model polishing was derived from a compilation of RepeatMasker output (previous repeat models; HM1) RepeatMasker 2 output (updated models; RM annotation 2), and gap entires. Additional and previously unclassified family entries identified from RMv2 were further filtered following multiple seqence alignment (MSA) among members of the predicted category.

Building a custom repeat library for human repeat analyses
Some human models produced as part of the T2T-CHM13 analysis1 are part of the Dfam5 database, released in v3.6. In order to produce a non-redundant repeat library for use in RepeatMasker and include the models produced for the human Y2 and ape autosomal and X/Y3,4 analyses, the following pipeline is suggested:

  1. Make a directory for the new libraries     
    $ mkdir ~/TEproject/RMplusSpeciesLib/
  2. Copy the RepeatMasker libraries to a new location     
    $ cp -r /usr/local/RepeatMasker-4.1.4/Libraries/ ~/TEproject/RMplusSpeciesLib/
        NOTE: RepeatMasker v4.1.4 and v4.1.5 come precompiled libraries which include the human repeats submitted to Dfam as part of the T2T-CHM13 project1
  3. Append the .embl file to the existing library     
    $ famdb.py -i ~/TEproject/RMplusSpeciesLib/Libraries/RepeatMaskerlib.h5 append humanAutoXYape.embl --name 'your_favorite_name'
        NOTE: the stage and taxonomic label information is included for each entry
        famdb.py included as part of the RepeatMasker package.
        For additional famdb.py usage, please refer to the Dfam-consortium FamDB github page
  4. Run RepeatMasker with the appended library     
    $ RepeatMasker -libdir ~/TEproject/RMplusSpeciesLib/Libraries/ -s -species human yourGenome.fa

Please refer to the RepeatMasker manual for additional program settings.

References

  1. Hoyt, S. J. et al (2022). From telomere to telomere: The transcriptional and epigenetic state of human repeat elements. Science (New York, N.Y.), 376(6588), eabk3112. https://doi.org/10.1126/science.abk3112
  2. Rhie, A. et al (2023). The complete sequence of a human Y chromosome. Nature, 621(7978), 344–354. https://doi.org/10.1038/s41586-023-06457-y
  3. Makova, K. D. et al (2023). The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes. bioRxiv : the preprint server for biology, 2023.11.30.569198. https://doi.org/10.1101/2023.11.30.569198
  4. Yoo, D et al (2024). Complete sequencing of ape genomes. bioRxiv 2024.07.31.605654; doi: https://doi.org/10.1101/2024.07.31.605654
  5. Storer, J et all (2021). The Dfam community resource of transposable element families, sequence models, and genome annotations. Mobile DNA 12:2 doi: https://doi.org/10.1186/s13100-020-00230-y

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published