Skip to content

Commit

Permalink
v0.1
Browse files Browse the repository at this point in the history
  • Loading branch information
Smetanin Alexander committed Apr 23, 2020
1 parent 9800b73 commit a0fb78d
Show file tree
Hide file tree
Showing 7 changed files with 739 additions and 107 deletions.
43 changes: 42 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,43 @@
# pylae
Local ancestry estimation
Python for local ancestry estimation

### Data preparation stage:
(will be performed by script itself in future)

1. In case we have .bed .bim .fam files, we need to convert to vcf using plink:
```bash
plink2 --bfile <bfile_prefix> --recode vcf --out <vcf_file>
```

2. Calculate snp frequencies for population groups using bcftools.
User groups defined in file `configs/vcf_groups.txt`:

```bash
cat <vcf_file> | bcftools view -c 1 -Ou | bcftools +fill-tags -Ou -- -S configs/vcf_groups.txt -t AF | bcftools query -H -f "%CHROM %POS %ID %AF_<group> %AF_Mediterranean %AF_NativeAmerican %AF_NorthEastAsian %AF_NorthernEuropean %AF_Oceanian %AF_SouthAfrican %AF_SouthEastAsian %AF_SouthWestAsian %AF_SubsaharanAfrican\n" > <group>.<sample>.txt
```

In case vcf file is (b)gzipped use samtools tabix.

3. Then use main script.
Currently supported modes: bayes, fb.
Note: fb is around 20 times slower.

```bash
python3 src/process_individuals.py --mode fb --window-len 200 <group>.<sample>.txt
```

Example pipeline:
```bash
plink2 --bfile America.QuechuaCandelaria_3.txt_GENO --recode vcf --out America.QuechuaCandelaria_3_GENO

cat America.QuechuaCandelaria_3_GENO.vcf | bcftools view -c 1 -Ou | bcftools +fill-tags -Ou -- -S vcf_groups.txt -t AF | bcftools query -H -f "%CHROM %POS %ID %AF_QuechuaCandelaria_3 %AF_Mediterranean %AF_NativeAmerican %AF_NorthEastAsian %AF_NorthernEuropean %AF_Oceanian %AF_SouthAfrican %AF_SouthEastAsian %AF_SouthWestAsian %AF_SubsaharanAfrican\n" > "QuechuaCandelaria_3.GA002786.txt"

python3 src/process_individuals.py --mode fb --window-len 200 "QuechuaCandelaria_3.GA002786.txt"
```

## Estimated performance:
for vcf file with around 120k SNPs.
|mode|exec time, m| ? |
|--|--|--|
|fb | 20 | ? |
|bayes| 1 | ? |
10 changes: 10 additions & 0 deletions configs/matrix.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
;SouthWestAsian;NorthernEuropean;Mediterranean;SouthAfrican;SubsaharanAfrican;NativeAmerican;NorthEastAsian;Oceanian;SouthEastAsian
SouthWestAsian;0;0.03249898;0.03262916;0.08696902;0.08696902;0.07102692;0.06377022;0.08496066;0.07167506
NorthernEuropean;0.03249898;0;0.00743894;0.083706606;0.083706606;0.067764506;0.060507806;0.081698246;0.068412646
Mediterranean;0.03262916;0.00743894;0;0.08383678;0.08383678;0.06789468;0.06063798;0.08182842;0.06854282
SouthAfrica;0.08696902;0.083706606;0.08383678;0;0.06;0.1131131;0.1058564;0.12704684;0.11376124
SubsaharanAfrican;0.08696902;0.083706606;0.08383678;0.06;0;0.1131131;0.1058564;0.12704684;0.11376124
NativeAmerican;0.07102692;0.067764506;0.06789468;0.1131131;0.1131131;0;0.0463173;0.09783194;0.08454634
NorthEastAsian;0.06377022;0.060507806;0.06063798;0.1058564;0.1058564;0.0463173;0;0.09057524;0.07728964
Oceanian;0.08496066;0.081698246;0.08182842;0.12704684;0.12704684;0.09783194;0.09057524;0;0.0895936
SouthEastAsian;0.07167506;0.068412646;0.06854282;0.11376124;0.11376124;0.08454634;0.07728964;0.0895936;0
136 changes: 136 additions & 0 deletions configs/vcf_groups.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
Mediterranean_Mediterranean1 Mediterranean
Mediterranean_Mediterranean2 Mediterranean
Mediterranean_Mediterranean3 Mediterranean
Mediterranean_Mediterranean4 Mediterranean
Mediterranean_Mediterranean5 Mediterranean
Mediterranean_Mediterranean6 Mediterranean
Mediterranean_Mediterranean7 Mediterranean
Mediterranean_Mediterranean8 Mediterranean
Mediterranean_Mediterranean9 Mediterranean
Mediterranean_Mediterranean10 Mediterranean
Mediterranean_Mediterranean11 Mediterranean
Mediterranean_Mediterranean12 Mediterranean
Mediterranean_Mediterranean13 Mediterranean
Mediterranean_Mediterranean14 Mediterranean
Mediterranean_Mediterranean15 Mediterranean
NativeAmerican_NativeAmerican1 NativeAmerican
NativeAmerican_NativeAmerican2 NativeAmerican
NativeAmerican_NativeAmerican3 NativeAmerican
NativeAmerican_NativeAmerican4 NativeAmerican
NativeAmerican_NativeAmerican5 NativeAmerican
NativeAmerican_NativeAmerican6 NativeAmerican
NativeAmerican_NativeAmerican7 NativeAmerican
NativeAmerican_NativeAmerican8 NativeAmerican
NativeAmerican_NativeAmerican9 NativeAmerican
NativeAmerican_NativeAmerican10 NativeAmerican
NativeAmerican_NativeAmerican11 NativeAmerican
NativeAmerican_NativeAmerican12 NativeAmerican
NativeAmerican_NativeAmerican13 NativeAmerican
NativeAmerican_NativeAmerican14 NativeAmerican
NativeAmerican_NativeAmerican15 NativeAmerican
NorthEastAsian_NorthEastAsian1 NorthEastAsian
NorthEastAsian_NorthEastAsian2 NorthEastAsian
NorthEastAsian_NorthEastAsian3 NorthEastAsian
NorthEastAsian_NorthEastAsian4 NorthEastAsian
NorthEastAsian_NorthEastAsian5 NorthEastAsian
NorthEastAsian_NorthEastAsian6 NorthEastAsian
NorthEastAsian_NorthEastAsian7 NorthEastAsian
NorthEastAsian_NorthEastAsian8 NorthEastAsian
NorthEastAsian_NorthEastAsian9 NorthEastAsian
NorthEastAsian_NorthEastAsian10 NorthEastAsian
NorthEastAsian_NorthEastAsian11 NorthEastAsian
NorthEastAsian_NorthEastAsian12 NorthEastAsian
NorthEastAsian_NorthEastAsian13 NorthEastAsian
NorthEastAsian_NorthEastAsian14 NorthEastAsian
NorthEastAsian_NorthEastAsian15 NorthEastAsian
NorthernEuropean_NorthernEuropean1 NorthernEuropean
NorthernEuropean_NorthernEuropean2 NorthernEuropean
NorthernEuropean_NorthernEuropean3 NorthernEuropean
NorthernEuropean_NorthernEuropean4 NorthernEuropean
NorthernEuropean_NorthernEuropean5 NorthernEuropean
NorthernEuropean_NorthernEuropean6 NorthernEuropean
NorthernEuropean_NorthernEuropean7 NorthernEuropean
NorthernEuropean_NorthernEuropean8 NorthernEuropean
NorthernEuropean_NorthernEuropean9 NorthernEuropean
NorthernEuropean_NorthernEuropean10 NorthernEuropean
NorthernEuropean_NorthernEuropean11 NorthernEuropean
NorthernEuropean_NorthernEuropean12 NorthernEuropean
NorthernEuropean_NorthernEuropean13 NorthernEuropean
NorthernEuropean_NorthernEuropean14 NorthernEuropean
NorthernEuropean_NorthernEuropean15 NorthernEuropean
Oceanian_Oceanian1 Oceanian
Oceanian_Oceanian2 Oceanian
Oceanian_Oceanian3 Oceanian
Oceanian_Oceanian4 Oceanian
Oceanian_Oceanian5 Oceanian
Oceanian_Oceanian6 Oceanian
Oceanian_Oceanian7 Oceanian
Oceanian_Oceanian8 Oceanian
Oceanian_Oceanian9 Oceanian
Oceanian_Oceanian10 Oceanian
Oceanian_Oceanian11 Oceanian
Oceanian_Oceanian12 Oceanian
Oceanian_Oceanian13 Oceanian
Oceanian_Oceanian14 Oceanian
Oceanian_Oceanian15 Oceanian
QuechuaCandelaria_3_GA002786 QuechuaCandelaria_3
SouthAfrican_SouthAfrican1 SouthAfrican
SouthAfrican_SouthAfrican2 SouthAfrican
SouthAfrican_SouthAfrican3 SouthAfrican
SouthAfrican_SouthAfrican4 SouthAfrican
SouthAfrican_SouthAfrican5 SouthAfrican
SouthAfrican_SouthAfrican6 SouthAfrican
SouthAfrican_SouthAfrican7 SouthAfrican
SouthAfrican_SouthAfrican8 SouthAfrican
SouthAfrican_SouthAfrican9 SouthAfrican
SouthAfrican_SouthAfrican10 SouthAfrican
SouthAfrican_SouthAfrican11 SouthAfrican
SouthAfrican_SouthAfrican12 SouthAfrican
SouthAfrican_SouthAfrican13 SouthAfrican
SouthAfrican_SouthAfrican14 SouthAfrican
SouthAfrican_SouthAfrican15 SouthAfrican
SouthEastAsian_SouthEastAsian1 SouthEastAsian
SouthEastAsian_SouthEastAsian2 SouthEastAsian
SouthEastAsian_SouthEastAsian3 SouthEastAsian
SouthEastAsian_SouthEastAsian4 SouthEastAsian
SouthEastAsian_SouthEastAsian5 SouthEastAsian
SouthEastAsian_SouthEastAsian6 SouthEastAsian
SouthEastAsian_SouthEastAsian7 SouthEastAsian
SouthEastAsian_SouthEastAsian8 SouthEastAsian
SouthEastAsian_SouthEastAsian9 SouthEastAsian
SouthEastAsian_SouthEastAsian10 SouthEastAsian
SouthEastAsian_SouthEastAsian11 SouthEastAsian
SouthEastAsian_SouthEastAsian12 SouthEastAsian
SouthEastAsian_SouthEastAsian13 SouthEastAsian
SouthEastAsian_SouthEastAsian14 SouthEastAsian
SouthEastAsian_SouthEastAsian15 SouthEastAsian
SouthWestAsian_SouthWestAsian1 SouthWestAsian
SouthWestAsian_SouthWestAsian2 SouthWestAsian
SouthWestAsian_SouthWestAsian3 SouthWestAsian
SouthWestAsian_SouthWestAsian4 SouthWestAsian
SouthWestAsian_SouthWestAsian5 SouthWestAsian
SouthWestAsian_SouthWestAsian6 SouthWestAsian
SouthWestAsian_SouthWestAsian7 SouthWestAsian
SouthWestAsian_SouthWestAsian8 SouthWestAsian
SouthWestAsian_SouthWestAsian9 SouthWestAsian
SouthWestAsian_SouthWestAsian10 SouthWestAsian
SouthWestAsian_SouthWestAsian11 SouthWestAsian
SouthWestAsian_SouthWestAsian12 SouthWestAsian
SouthWestAsian_SouthWestAsian13 SouthWestAsian
SouthWestAsian_SouthWestAsian14 SouthWestAsian
SouthWestAsian_SouthWestAsian15 SouthWestAsian
SubsaharanAfrican_SubsaharanAfrican1 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican2 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican3 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican4 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican5 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican6 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican7 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican8 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican9 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican10 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican11 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican12 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican13 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican14 SubsaharanAfrican
SubsaharanAfrican_SubsaharanAfrican15 SubsaharanAfrican
Loading

0 comments on commit a0fb78d

Please sign in to comment.