-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Smetanin Alexander
committed
Apr 23, 2020
1 parent
9800b73
commit a0fb78d
Showing
7 changed files
with
739 additions
and
107 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,43 @@ | ||
# pylae | ||
Local ancestry estimation | ||
Python for local ancestry estimation | ||
|
||
### Data preparation stage: | ||
(will be performed by script itself in future) | ||
|
||
1. In case we have .bed .bim .fam files, we need to convert to vcf using plink: | ||
```bash | ||
plink2 --bfile <bfile_prefix> --recode vcf --out <vcf_file> | ||
``` | ||
|
||
2. Calculate snp frequencies for population groups using bcftools. | ||
User groups defined in file `configs/vcf_groups.txt`: | ||
|
||
```bash | ||
cat <vcf_file> | bcftools view -c 1 -Ou | bcftools +fill-tags -Ou -- -S configs/vcf_groups.txt -t AF | bcftools query -H -f "%CHROM %POS %ID %AF_<group> %AF_Mediterranean %AF_NativeAmerican %AF_NorthEastAsian %AF_NorthernEuropean %AF_Oceanian %AF_SouthAfrican %AF_SouthEastAsian %AF_SouthWestAsian %AF_SubsaharanAfrican\n" > <group>.<sample>.txt | ||
``` | ||
|
||
In case vcf file is (b)gzipped use samtools tabix. | ||
|
||
3. Then use main script. | ||
Currently supported modes: bayes, fb. | ||
Note: fb is around 20 times slower. | ||
|
||
```bash | ||
python3 src/process_individuals.py --mode fb --window-len 200 <group>.<sample>.txt | ||
``` | ||
|
||
Example pipeline: | ||
```bash | ||
plink2 --bfile America.QuechuaCandelaria_3.txt_GENO --recode vcf --out America.QuechuaCandelaria_3_GENO | ||
|
||
cat America.QuechuaCandelaria_3_GENO.vcf | bcftools view -c 1 -Ou | bcftools +fill-tags -Ou -- -S vcf_groups.txt -t AF | bcftools query -H -f "%CHROM %POS %ID %AF_QuechuaCandelaria_3 %AF_Mediterranean %AF_NativeAmerican %AF_NorthEastAsian %AF_NorthernEuropean %AF_Oceanian %AF_SouthAfrican %AF_SouthEastAsian %AF_SouthWestAsian %AF_SubsaharanAfrican\n" > "QuechuaCandelaria_3.GA002786.txt" | ||
|
||
python3 src/process_individuals.py --mode fb --window-len 200 "QuechuaCandelaria_3.GA002786.txt" | ||
``` | ||
|
||
## Estimated performance: | ||
for vcf file with around 120k SNPs. | ||
|mode|exec time, m| ? | | ||
|--|--|--| | ||
|fb | 20 | ? | | ||
|bayes| 1 | ? | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
;SouthWestAsian;NorthernEuropean;Mediterranean;SouthAfrican;SubsaharanAfrican;NativeAmerican;NorthEastAsian;Oceanian;SouthEastAsian | ||
SouthWestAsian;0;0.03249898;0.03262916;0.08696902;0.08696902;0.07102692;0.06377022;0.08496066;0.07167506 | ||
NorthernEuropean;0.03249898;0;0.00743894;0.083706606;0.083706606;0.067764506;0.060507806;0.081698246;0.068412646 | ||
Mediterranean;0.03262916;0.00743894;0;0.08383678;0.08383678;0.06789468;0.06063798;0.08182842;0.06854282 | ||
SouthAfrica;0.08696902;0.083706606;0.08383678;0;0.06;0.1131131;0.1058564;0.12704684;0.11376124 | ||
SubsaharanAfrican;0.08696902;0.083706606;0.08383678;0.06;0;0.1131131;0.1058564;0.12704684;0.11376124 | ||
NativeAmerican;0.07102692;0.067764506;0.06789468;0.1131131;0.1131131;0;0.0463173;0.09783194;0.08454634 | ||
NorthEastAsian;0.06377022;0.060507806;0.06063798;0.1058564;0.1058564;0.0463173;0;0.09057524;0.07728964 | ||
Oceanian;0.08496066;0.081698246;0.08182842;0.12704684;0.12704684;0.09783194;0.09057524;0;0.0895936 | ||
SouthEastAsian;0.07167506;0.068412646;0.06854282;0.11376124;0.11376124;0.08454634;0.07728964;0.0895936;0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,136 @@ | ||
Mediterranean_Mediterranean1 Mediterranean | ||
Mediterranean_Mediterranean2 Mediterranean | ||
Mediterranean_Mediterranean3 Mediterranean | ||
Mediterranean_Mediterranean4 Mediterranean | ||
Mediterranean_Mediterranean5 Mediterranean | ||
Mediterranean_Mediterranean6 Mediterranean | ||
Mediterranean_Mediterranean7 Mediterranean | ||
Mediterranean_Mediterranean8 Mediterranean | ||
Mediterranean_Mediterranean9 Mediterranean | ||
Mediterranean_Mediterranean10 Mediterranean | ||
Mediterranean_Mediterranean11 Mediterranean | ||
Mediterranean_Mediterranean12 Mediterranean | ||
Mediterranean_Mediterranean13 Mediterranean | ||
Mediterranean_Mediterranean14 Mediterranean | ||
Mediterranean_Mediterranean15 Mediterranean | ||
NativeAmerican_NativeAmerican1 NativeAmerican | ||
NativeAmerican_NativeAmerican2 NativeAmerican | ||
NativeAmerican_NativeAmerican3 NativeAmerican | ||
NativeAmerican_NativeAmerican4 NativeAmerican | ||
NativeAmerican_NativeAmerican5 NativeAmerican | ||
NativeAmerican_NativeAmerican6 NativeAmerican | ||
NativeAmerican_NativeAmerican7 NativeAmerican | ||
NativeAmerican_NativeAmerican8 NativeAmerican | ||
NativeAmerican_NativeAmerican9 NativeAmerican | ||
NativeAmerican_NativeAmerican10 NativeAmerican | ||
NativeAmerican_NativeAmerican11 NativeAmerican | ||
NativeAmerican_NativeAmerican12 NativeAmerican | ||
NativeAmerican_NativeAmerican13 NativeAmerican | ||
NativeAmerican_NativeAmerican14 NativeAmerican | ||
NativeAmerican_NativeAmerican15 NativeAmerican | ||
NorthEastAsian_NorthEastAsian1 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian2 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian3 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian4 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian5 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian6 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian7 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian8 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian9 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian10 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian11 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian12 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian13 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian14 NorthEastAsian | ||
NorthEastAsian_NorthEastAsian15 NorthEastAsian | ||
NorthernEuropean_NorthernEuropean1 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean2 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean3 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean4 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean5 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean6 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean7 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean8 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean9 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean10 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean11 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean12 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean13 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean14 NorthernEuropean | ||
NorthernEuropean_NorthernEuropean15 NorthernEuropean | ||
Oceanian_Oceanian1 Oceanian | ||
Oceanian_Oceanian2 Oceanian | ||
Oceanian_Oceanian3 Oceanian | ||
Oceanian_Oceanian4 Oceanian | ||
Oceanian_Oceanian5 Oceanian | ||
Oceanian_Oceanian6 Oceanian | ||
Oceanian_Oceanian7 Oceanian | ||
Oceanian_Oceanian8 Oceanian | ||
Oceanian_Oceanian9 Oceanian | ||
Oceanian_Oceanian10 Oceanian | ||
Oceanian_Oceanian11 Oceanian | ||
Oceanian_Oceanian12 Oceanian | ||
Oceanian_Oceanian13 Oceanian | ||
Oceanian_Oceanian14 Oceanian | ||
Oceanian_Oceanian15 Oceanian | ||
QuechuaCandelaria_3_GA002786 QuechuaCandelaria_3 | ||
SouthAfrican_SouthAfrican1 SouthAfrican | ||
SouthAfrican_SouthAfrican2 SouthAfrican | ||
SouthAfrican_SouthAfrican3 SouthAfrican | ||
SouthAfrican_SouthAfrican4 SouthAfrican | ||
SouthAfrican_SouthAfrican5 SouthAfrican | ||
SouthAfrican_SouthAfrican6 SouthAfrican | ||
SouthAfrican_SouthAfrican7 SouthAfrican | ||
SouthAfrican_SouthAfrican8 SouthAfrican | ||
SouthAfrican_SouthAfrican9 SouthAfrican | ||
SouthAfrican_SouthAfrican10 SouthAfrican | ||
SouthAfrican_SouthAfrican11 SouthAfrican | ||
SouthAfrican_SouthAfrican12 SouthAfrican | ||
SouthAfrican_SouthAfrican13 SouthAfrican | ||
SouthAfrican_SouthAfrican14 SouthAfrican | ||
SouthAfrican_SouthAfrican15 SouthAfrican | ||
SouthEastAsian_SouthEastAsian1 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian2 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian3 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian4 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian5 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian6 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian7 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian8 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian9 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian10 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian11 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian12 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian13 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian14 SouthEastAsian | ||
SouthEastAsian_SouthEastAsian15 SouthEastAsian | ||
SouthWestAsian_SouthWestAsian1 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian2 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian3 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian4 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian5 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian6 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian7 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian8 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian9 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian10 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian11 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian12 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian13 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian14 SouthWestAsian | ||
SouthWestAsian_SouthWestAsian15 SouthWestAsian | ||
SubsaharanAfrican_SubsaharanAfrican1 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican2 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican3 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican4 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican5 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican6 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican7 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican8 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican9 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican10 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican11 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican12 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican13 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican14 SubsaharanAfrican | ||
SubsaharanAfrican_SubsaharanAfrican15 SubsaharanAfrican |
Oops, something went wrong.