Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dsl2 sex determination #1035

Merged
merged 42 commits into from
Mar 20, 2024
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
ce47f48
adding base version
jbv2 Oct 6, 2023
4dd74a3
first draft of sex determination subworkflow
jbv2 Oct 20, 2023
14c1325
missing adding parameters
jbv2 Nov 3, 2023
bedc01f
adding module configs
jbv2 Dec 8, 2023
4b12727
linting
jbv2 Dec 8, 2023
7073bfe
Adding sex determine subworfflow
jbv2 Dec 8, 2023
f530965
Merge branch 'dev' into dsl2-sex_determination
jbv2 Dec 8, 2023
25c465e
rerunning prettier after merge
jbv2 Dec 8, 2023
24d0a88
fixing linting
jbv2 Dec 8, 2023
1c2cb37
Apply suggestions from code review
jbv2 Jan 26, 2024
b805c83
adding parameters for samtools depth
jbv2 Jan 26, 2024
c9d4788
Header option
jbv2 Jan 26, 2024
4d1f765
importing properly samtools module
jbv2 Jan 26, 2024
b36bc45
trying to add bed from reference file
jbv2 Jan 26, 2024
839b81c
adding bed to reference index
jbv2 Feb 16, 2024
3dda4e1
breaking inputs inside the module with the reference outputs
jbv2 Feb 16, 2024
09e800f
linting changes
jbv2 Feb 16, 2024
38c1592
Merge branch 'dev' into dsl2-sex_determination
jbv2 Feb 23, 2024
79f097e
prettier
jbv2 Feb 23, 2024
b35a487
linting
jbv2 Feb 23, 2024
e78b791
removing tab
jbv2 Feb 23, 2024
e1e8853
fixing passing ids, bams as list
jbv2 Mar 8, 2024
cd3e744
linting
jbv2 Mar 8, 2024
7d908e0
Apply suggestions from code review
jbv2 Mar 18, 2024
7f6b3cf
Apply suggestions from code review
jbv2 Mar 18, 2024
3aa9482
aligning comments
jbv2 Mar 18, 2024
f394152
consistent naming of sexdeterrmine
jbv2 Mar 18, 2024
ef5fd44
Removing commented lines code
jbv2 Mar 18, 2024
c9fb55e
Adding manual tests
jbv2 Mar 18, 2024
bc53f26
Adding sexdeterrmine to CI test
jbv2 Mar 18, 2024
ade2cdb
Matching current dev nf-core template
jbv2 Mar 18, 2024
8a01c0c
Apply suggestions from code review
jbv2 Mar 18, 2024
384bc90
prettier
jbv2 Mar 18, 2024
47838c7
Merge branch 'dev' into dsl2-sex_determination
TCLamnidis Mar 19, 2024
cabc09e
including "addNewMetaFromAttributes"
jbv2 Mar 20, 2024
ddb4d78
removing subworflows
jbv2 Mar 20, 2024
6ebc965
lint
jbv2 Mar 20, 2024
40c6237
lint
jbv2 Mar 20, 2024
d03589b
fix linting
jbv2 Mar 20, 2024
9f7f8a1
Merge branch 'dev' into dsl2-sex_determination
TCLamnidis Mar 20, 2024
f66e0fd
remove dumpsoftwareversions. linting
TCLamnidis Mar 20, 2024
624ba69
Update nextflow.config
TCLamnidis Mar 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ jobs:
- "-profile test,docker --mapping_tool bowtie2 --damagecalculation_tool mapdamage --damagecalculation_mapdamage_downsample 100"
- "-profile test,docker --skip_preprocessing"
- "-profile test_humanbam,docker --run_mtnucratio --run_contamination_estimation_angsd --snpcapture_bed 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'"
- "-profile test_humanbam,docker --run_sexdeterrmine"
- "-profile test_multiref,docker" ## TODO add damage manipulation here instead once it goes multiref
steps:
- name: Check out pipeline code
Expand Down
5 changes: 5 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,13 @@
> QualiMap Okonechnikov, K., Conesa, A., & García-Alcalde, F. (2016). Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics , 32(2), 292–294. Download: http://qualimap.bioinfo.cipf.es/

- [DamageProfiler](https://doi.org/10.1093/bioinformatics/btab190)

> DamageProfiler Neukamm, J., Peltzer, A., & Nieselt, K. (2020). DamageProfiler: Fast damage pattern calculation for ancient DNA. In Bioinformatics (btab190). doi: [10.1093/bioinformatics/btab190](https://doi.org/10.1093/bioinformatics/btab190). Download: https://github.com/Integrative-Transcriptomics/DamageProfiler

- [Sex.DetERRmine.py](http://dx.doi.org/10.1038/s41467-018-07483-5)

> Sex.DetERRmine.py Lamnidis, T.C. et al., 2018. Ancient Fennoscandian genomes reveal origin and spread of Siberian ancestry in Europe. Nature communications, 9(1), p.5018. Available at: http://dx.doi.org/10.1038/s41467-018-07483-5. Download: https://github.com/TCLamnidis/Sex.DetERRmine

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
23 changes: 23 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -909,6 +909,29 @@ process {
]
}

//
// RUN SEXDETERRMINE
//
withName: SAMTOOLS_DEPTH_SEXDETERRMINE {
tag = { "${meta1.reference}|${meta1.sample_id}_${meta1.library_id}" }
ext.prefix = { "${meta2.id}_samtoolsdepth" }
ext.args = '-aa -q30 -Q30 -H'
publishDir = [
enabled: false
]
}

withName: SEXDETERRMINE {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.prefix = { "${meta.reference}_sexdeterrmine" }
publishDir = [
path: { "${params.outdir}/sex_determination/" },
mode: params.publish_dir_mode,
pattern: '*{_sexdeterrmine}*',
enabled: true
]
}

//
// LIBRARY MERGE
//
Expand Down
2 changes: 1 addition & 1 deletion conf/test_humanbam.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ params {

// TODO Reactivate sexDet and genotyping params when those steps get implemented.
// //Sex Determination
// sexdeterrmine_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'
sexdeterrmine_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'
// // Genotyping
// pileupcaller_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'
// pileupcaller_snpfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K_covered_in_JK2067_downsampled_s0.1.numeric_chromosomes.snp'
Expand Down
14 changes: 14 additions & 0 deletions docs/development/manual_tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,12 @@ Tool Specific combinations
- single reference: with damage manipulation (pmd + trimming), on pmd filtered data ✅
- multi reference: no damage manipulation ✅

- Sex determination

- With sexdeterrmine

- with default parameters

### Multi-reference tests

```bash
Expand Down Expand Up @@ -738,3 +744,11 @@ nextflow run main.nf -profile test,docker --outdir ./results -w work/ -resume --
## NOTE: PG tags are repeated for each chromosome in the reference, times each library! Maybe there's some flag missing from samtools MERGE runs?
nextflow run main.nf -profile test_multiref,docker --outdir ./results -w work/ -resume --genotyping_source 'raw' -ansi-log false -dump-channels
```

### Run Sexdeterrmine

```bash
## Running sex determination subworkflow from deduplicated bams
## Expect: sex_deterrmine/sexdeterrmine directory with tsv summary table for all individuals.
nextflow run main.nf -profile test_humanbam,arm,docker --outdir ./results --run_sexdeterrmine
```
16 changes: 16 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -540,3 +540,19 @@ is a tool which calculates a variety of standard 'aDNA' metrics from a BAM file.
</details>

[ANGSD](http://www.popgen.dk/angsd/index.php/ANGSD) is a software for analyzing next generation sequencing data. Among other functions, ANGSD can estimate contamination for chromosomes for which one copy exists, i.e. X-chromosome for humans with karyotype XY. To do this, we first generate a binary count file for the X-chromosome (`angsd`) and then perform a Fisher's exact test for finding a p-value and jackknife to get an estimate of contamination (`contamination`). Contamination is estimated with Method of Moments (MOM) and Maximum Likelihood (ML) for both Method1 and Method2. Method1 compares the total number of minor and major reads at SNP sites with the number of minor and major reads at adjacent sites, assuming independent errors between reads and sites, while Method2 only samples one read at each site to remove the previous assumption. The results of all methods for each library, as well as respective standard errors are summarised in `nuclear_contamination.txt` and `nuclear_contamination_mqc.json`.

### Sex Determination

<details markdown="1">
<summary>Output files</summary>

- `sex_determination/`: this contains the output for the sex determination run. This is a single `.tsv` file that includes a table with the sample name, the number of autosomal SNPs, number of SNPs on the X/Y chromosome, the number of reads mapping to the autosomes, the number of reads mapping to the X/Y chromosome, the relative coverage on the X/Y chromosomes, and the standard error associated with the relative coverages. These measures are provided for each bam file, one row per file. If the `sexdeterrmine_bedfile` option has not been provided, the error bars cannot be trusted!
- </details>

#### Sex.DetERRmine

Sex.DetERRmine calculates the coverage of your mapped reads on the X and Y chromosomes relative to the coverage on the autosomes (X-/Y-rate). This metric can be thought of as the number of copies of chromosomes X and Y that is found within each cell, relative to the autosomal copies. The number of autosomal copies is assumed to be two, meaning that an X-rate of 1.0 means there are two X chromosomes in each cell, while 0.5 means there is a single copy of the X chromosome per cell. Human females have two copies of the X chromosome and no Y chromosome (XX), while human males have one copy of each of the X and Y chromosomes (XY).

When a bedfile of specific sites is provided, Sex.DetERRmine runs much faster and additionally calculates error bars around each relative coverage estimate. For this estimate to be trustworthy, the sites included in the bedfile should be spaced apart enough that a single sequencing read cannot overlap multiple sites. Hence, when a bedfile has not been provided, this error should be ignored. When a suitable bedfile is provided, each observation of a covered site is independent, and the error around the coverage is equal to the binomial error estimate. This error is then propagated during the calculation of relative coverage for the X and Y chromosomes.

> Note that in nf-core/eager this will be run on single- and double-stranded variants of the same library separately. This can also help assess for differential contamination between libraries.
25 changes: 8 additions & 17 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,11 @@
"git_sha": "6b0e4fe14ca1b12e131f64608f0bbaf36fd11451",
"installed_by": ["modules"]
},
"samtools/depth": {
"branch": "master",
"git_sha": "a1ffbc1fd87bd5a829e956cc26ec9cc53af3e817",
"installed_by": ["modules"]
},
"samtools/faidx": {
"branch": "master",
"git_sha": "ce0b1aed7d504883061e748f492a31bf44c5777c",
Expand Down Expand Up @@ -214,25 +219,11 @@
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
}
}
},
"subworkflows": {
"nf-core": {
"bam_docounts_contamination_angsd": {
"branch": "master",
"git_sha": "cfd937a668919d948f6fcbf4218e79de50c2f36f",
"installed_by": ["subworkflows"]
},
"bam_split_by_region": {
"sexdeterrmine": {
"branch": "master",
"git_sha": "cfd937a668919d948f6fcbf4218e79de50c2f36f",
"installed_by": ["subworkflows"]
},
"fastq_align_bwaaln": {
"branch": "master",
"git_sha": "cfd937a668919d948f6fcbf4218e79de50c2f36f",
"installed_by": ["subworkflows"]
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
}
}
},
Expand Down
7 changes: 7 additions & 0 deletions modules/nf-core/custom/dumpsoftwareversions/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

24 changes: 24 additions & 0 deletions modules/nf-core/custom/dumpsoftwareversions/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

37 changes: 37 additions & 0 deletions modules/nf-core/custom/dumpsoftwareversions/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

39 changes: 39 additions & 0 deletions modules/nf-core/samtools/depth/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading