Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAD (truncated average depth) and Trimmed mean depth #34

Closed
jianshu93 opened this issue May 29, 2020 · 1 comment
Closed

TAD (truncated average depth) and Trimmed mean depth #34

jianshu93 opened this issue May 29, 2020 · 1 comment

Comments

@jianshu93
Copy link

Dear Ben,
I am now comparing different methods for calculating genome coverage.

Our group proposed a method to calculate the abundance of each MAG binned from assembled contigs called TAD80:

The abundance of each genome species (representative of a cluster after dereplication) was estimated using the MAG of highest genome quality as representative. For each metagenomic dataset, the sequencing depth was estimated per position (Bowtie (Langmead and Salzberg, 2012, for mapping shot reads to MAGs), bedtools (Quinlan and Hall, 2010, for calculation coverage using mapped bam file)) and truncated to the central 80% (BedGraph.tad.rb (Rodriguez-R and Konstantinidis, 2016)), a metric hereafter termed TAD (truncated average sequencing depth). Abundance was then estimated as TAD80 normalized by the genome equivalents of the metagenomic dataset. Three steps need to calculated TAD80:

  1. Map reads to MAG using mapping tools (bwa or bowtie2) and get the sorted bam file
  2. Calculate coverage for each position:
    bedtools genomecov -ibam MAG_sorted.bam >> MAG.bedtools.cov.txt
  3. Calculate TAD80 using the script BedGraph.tad.rb (https://github.com/lmrodriguezr/enveomics/blob/master/Scripts/BedGraph.tad.rb):
    BedGraph.tad.rb -i lab5_MAG.001.bedtools.cov.txt -r 0.8

For the CoverM genome trimmed mean method, if I understand it correctly:

You did similar thing (choosing --trim-min 0.1 and —trim-max 0.9) compared to TAD80, but you also remove the first and last 75 bp to avoid bad mapping (edge effects):

coverm genome -d ./try_MAGs_1 -x fasta -b ./mapping_bam/MAG.001.bam --min-covered-fraction 0.001 -m trimmed_mean --trim-min 0.1 --trim-max 0.9 --contig-end-exclusion 75

The directory try_MAGs_1 contains only MAG.001.fasta, MAG.001.bam is generated by mapping reads to MAG.001.fasta

My question is: is TAD80 basically the same thing as CoverM trimmed mean (--trim-min 0.1 and —trim-max 0.9) (my understanding is: it is)? They might not be exactly the same but the general idea of workflow and logic for calculating average coverage for a MAG is the same right (except the removed 75 bp at both ends of contig)?

I know you are on vacation. Please feel free to answer the question whenever you have time.

Thank you very much,

Best,

Jianshu

@wwood
Copy link
Owner

wwood commented Jul 13, 2020

Addressed in f0840a5 - thanks. Let me know if there are further disparities.

@wwood wwood closed this as completed Jul 13, 2020
alienzj pushed a commit to alienzj/CoverM that referenced this issue Mar 17, 2021
Fixes wwood#34.

Reported by: Jianshu Zhao.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants