Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import plink-ng as a git dependency #16

Closed
wants to merge 244 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
244 commits
Select commit Hold shift + click to select a range
d40ec8e
Simulation framework and classes
mlamkin7 Jan 22, 2022
e94ea45
added GeneticMarker class
mlamkin7 Jan 24, 2022
5a8ad8a
Completed Classes
mlamkin7 Jan 26, 2022
41a4a76
Restructure of repo
mlamkin7 Feb 8, 2022
17e724f
Refactor repository
mlamkin7 Feb 9, 2022
60472da
Completed Simulator (Needs to be tested)
mlamkin7 Feb 9, 2022
f14ab6a
Merged haptools updates
mlamkin7 Feb 9, 2022
fb7c4c9
Fixed formatting
mlamkin7 Feb 9, 2022
e44bacc
Increased runtime by precalculating randomized events in the simulati…
mlamkin7 Feb 13, 2022
bb8afae
Hacked together karyogram code
mlamkin7 Feb 13, 2022
38b091a
Hacked together code
mlamkin7 Feb 13, 2022
9eba3b5
initiating haptools readme pages
Feb 14, 2022
4d73de8
fixing minor readme typos
Feb 14, 2022
2c9da55
minor readme changes
Feb 14, 2022
f65204b
adding simgenotypes readme
Feb 14, 2022
06fcaaa
Update README.md
mlamkin7 Feb 15, 2022
e41bfee
Update README.md
mlamkin7 Feb 15, 2022
058b0e8
Update README.md
mlamkin7 Feb 15, 2022
2c99a25
Update README.md
mlamkin7 Feb 15, 2022
ec872bb
Update README.md
mlamkin7 Feb 15, 2022
94fa0fe
Update README.md
mlamkin7 Feb 15, 2022
b5d1bc2
Merge pull request #14 from aryarm/mgymrek-docs-admixsim
mlamkin7 Feb 27, 2022
3f8f662
update file locations and fixed cM -> M
mlamkin7 Feb 27, 2022
663af36
Merge remote-tracking branch 'origin/admix-sim' into admix-sim
mlamkin7 Feb 27, 2022
453075a
removed snakemake
mlamkin7 Mar 1, 2022
8c84401
update poetry to v1.2
aryarm Mar 6, 2022
4563bde
add pgenlib as a dependency
aryarm Mar 7, 2022
87a7a5e
oops - specify it as a tag instead of rev
aryarm Mar 7, 2022
1e9c929
Merge pull request #17 from aryarm/admix-sim
gymreklab Mar 15, 2022
18786eb
adding haplotype classes
Mar 15, 2022
ee62fa0
correct version of simgenotype
mlamkin7 Mar 16, 2022
d54ba73
handle SNP haplotype finding in transform
Mar 16, 2022
884225f
initial simulator code
Mar 16, 2022
b12f04a
adding simphenotype readme
Mar 16, 2022
1a4747c
adding simphenotype readme
Mar 16, 2022
07aac6a
adding simphenotype readme
Mar 16, 2022
0b1099b
adding simphenotype readme
Mar 16, 2022
0ce8574
update dependencies in lock
aryarm Apr 1, 2022
f33a6c3
install pandas
aryarm Apr 2, 2022
482db62
decide that we actually dont need pandas, after all
aryarm Apr 2, 2022
a8d5af8
update black to fix incompatibility with click
aryarm Apr 2, 2022
8c042fb
add covariates class to data module
aryarm Apr 2, 2022
a50b90c
separate python standard library dependencies from external dependencies
aryarm Apr 2, 2022
0756290
allow for filtering multiallelic variants in genotypes class
aryarm Apr 2, 2022
dc7cad0
warn user if the data module loaded zero variants
aryarm Apr 2, 2022
c14fd91
add md file to the docs via steps in #18
aryarm Apr 2, 2022
ebaf7ed
reference md files external from docs/
aryarm Apr 2, 2022
6bd5ea5
remove header b/c it already appears in the README
aryarm Apr 8, 2022
f71b853
support gz and bz2 file extensions (see #19)
aryarm Apr 9, 2022
435ef2c
fmt with black
aryarm Apr 10, 2022
e1b4e17
update and lock dependencies
aryarm Apr 11, 2022
3454ad6
add matplotlib dep (resolves #22)
aryarm Apr 11, 2022
427f030
remove pytabix dependency and use pysam instead
aryarm Apr 11, 2022
a5877a9
separate imports into blocks
aryarm Apr 11, 2022
a8c263c
move mpl outside of docs dep section
aryarm Apr 11, 2022
e290f2e
Merge pull request #21 from gymrek-lab/feat/add-md-docs
gymreklab Apr 13, 2022
7827e3d
reduce memory in Genotypes.read (see #19)
aryarm Apr 13, 2022
18f5a97
use logging instead of assertions in data module (see #19)
aryarm Apr 13, 2022
47f61e6
add iterate function to data classes (see #19)
aryarm Apr 13, 2022
81be2be
convert assertions in phens and covars classes to logs
aryarm Apr 13, 2022
d001916
fmt with black
aryarm Apr 13, 2022
133777b
switch to using namedtuple in iterate function
aryarm Apr 13, 2022
ed28bd9
fmt with black
aryarm Apr 13, 2022
af26af6
Merge pull request #20 from gymrek-lab/feat/multi-allelic
aryarm Apr 13, 2022
cb3f43b
Merge pull request #24 from gymrek-lab/fix/dependencies
aryarm Apr 13, 2022
30f3f75
resolve merge conflicts from haplotype_classes
aryarm Apr 13, 2022
d059079
move files out of directories
aryarm Apr 13, 2022
af7e5c3
resolve docs after moving files
aryarm Apr 13, 2022
71faf08
remove conflict markers in pyproject
aryarm Apr 13, 2022
71c2448
Merge pull request #27 from gymrek-lab/ref/directories
aryarm Apr 13, 2022
6ade14c
VCF updates
mlamkin7 Apr 13, 2022
c6c66e5
fixing relative import
Apr 13, 2022
7ff1b6b
Start of VCF implementation
mlamkin7 Apr 13, 2022
b79afac
fixing relative imports
Apr 13, 2022
f5ca258
fixed merge conflict
mlamkin7 Apr 13, 2022
39de906
Merge branch 'karyogram' into feat/vcf_output
mlamkin7 Apr 13, 2022
0a32585
Merge pull request #28 from gymrek-lab/feat/vcf_output
mlamkin7 Apr 13, 2022
2365418
VCF and HAP file examples
s041629 Apr 14, 2022
dc3c3ce
copy variant module from happler
aryarm Apr 14, 2022
3d033d9
cleaning up karyogram code
Apr 14, 2022
aa2f682
Merge branch 'karyogram' of https://github.com/gymrek-lab/haptools in…
Apr 14, 2022
b8c74e2
Added specifying chromosome functionality to simulating genotypes.
mlamkin7 Apr 14, 2022
3ea4e83
DAT example files
s041629 Apr 14, 2022
5600ba5
Merge pull request #29 from gymrek-lab/feat/vcf_output
mlamkin7 Apr 14, 2022
89cc333
Example DAT files
s041629 Apr 14, 2022
5a11f98
Merge pull request #31 from gymrek-lab/karyogram
gymreklab Apr 14, 2022
420903d
cleaning up karyogram code
Apr 14, 2022
769a106
Merge pull request #32 from gymrek-lab/karyogram
gymreklab Apr 14, 2022
9a0d40c
Admixed individuals from the US
s041629 Apr 14, 2022
9a0fa20
start on work for haplotype parser
aryarm Apr 14, 2022
93e28a4
solving karyogram color issues when color not specified
Apr 14, 2022
e06dba6
Fixed error with regex parsing of chromosomes
mlamkin7 Apr 14, 2022
36ca43e
Merge pull request #34 from gymrek-lab/feat/vcf_output
mlamkin7 Apr 14, 2022
2b88859
adding centromeres
Apr 14, 2022
fcf2b1e
cleaning up karyogram user interface
Apr 14, 2022
b13a5d7
cleaning up karyogram user interface
Apr 14, 2022
1f3b67d
Merge pull request #35 from gymrek-lab/karyogram
gymreklab Apr 14, 2022
7bcd48a
catching user errors in karyogram
Apr 14, 2022
6f36202
Merge pull request #36 from gymrek-lab/karyogram
gymreklab Apr 14, 2022
84873cd
fixing broken links in readme to docs that moved
Apr 14, 2022
c789595
adding karyogram docs page
Apr 14, 2022
4b4cfa6
adding karyogram docs page
Apr 14, 2022
6834440
adding test karyogram image
Apr 14, 2022
b73ab38
adding test karyogram image
Apr 14, 2022
84045b2
typos in karyogram docs
Apr 14, 2022
e8ccbff
typos in karyogram docs
Apr 14, 2022
92e42a2
adding example breakpoints
Apr 14, 2022
3644aed
Merge pull request #37 from gymrek-lab/docs
gymreklab Apr 14, 2022
8dc7bd0
Fixed floating point error
mlamkin7 Apr 15, 2022
1391b0c
continue implementing Haplotypes.read and Haplotypes.iterate methods
aryarm Apr 17, 2022
d488acd
Merge branch 'main' of https://github.com/gymrek-lab/haptools into fe…
mlamkin7 Apr 19, 2022
c51467b
create Haplotype and Variant classes for storing lines from .haps files
aryarm Apr 19, 2022
c8428fa
create specific section in docs for file formats
aryarm Apr 19, 2022
2015215
fix issues with commands not appearing in toc of docs
aryarm Apr 19, 2022
1879e04
add docs for .hap haplotypes file format
aryarm Apr 19, 2022
0a2af60
copy variant module from happler
aryarm Apr 14, 2022
7c8d182
start on work for haplotype parser
aryarm Apr 14, 2022
99071f8
continue implementing Haplotypes.read and Haplotypes.iterate methods
aryarm Apr 17, 2022
c5500fe
create Haplotype and Variant classes for storing lines from .haps files
aryarm Apr 19, 2022
32bd815
create specific section in docs for file formats
aryarm Apr 19, 2022
600e032
fix issues with commands not appearing in toc of docs
aryarm Apr 19, 2022
8cb274a
add docs for .hap haplotypes file format
aryarm Apr 19, 2022
dc63ed2
Merge branch 'feat/haplotypes' of github.com:gymrek-lab/haptools into…
aryarm Apr 19, 2022
8f856b0
rename hap data files
aryarm Apr 19, 2022
91856b4
create new example hap files with beta added
aryarm Apr 19, 2022
1f24949
Added pulse events
mlamkin7 Apr 19, 2022
8a71d2b
Merge pull request #39 from gymrek-lab/feat/vcf_output
mlamkin7 Apr 19, 2022
5aa0deb
change allele to str in hap format spec
aryarm Apr 19, 2022
2dc7a74
fixed error in arguments for simulate gt
mlamkin7 Apr 19, 2022
465f6ec
Merge pull request #40 from gymrek-lab/feat/vcf_output
mlamkin7 Apr 19, 2022
f5fea3a
Update cuba.dat
gymreklab Apr 19, 2022
ade8b0d
Update README.md
gymreklab Apr 19, 2022
a62a03b
correct type-hinting of return of Haplotypes.iterate
aryarm Apr 19, 2022
e06b7f9
renaming sim_admixture to sim_genotypes to match cmd name
Apr 19, 2022
a784e6b
use fname property in Haplotypes.write
aryarm Apr 19, 2022
b324fe1
updating help messages for simgenotype
Apr 19, 2022
64c92af
renaming sim_genotype
Apr 19, 2022
5175676
Merge pull request #41 from gymrek-lab/simgenotypes-docs
gymreklab Apr 19, 2022
cf82d4f
start handling extras in Haplotypes class
aryarm Apr 21, 2022
555deba
store variants as tuple intead of list in Haplotype class
aryarm Apr 21, 2022
ec69ae7
rewrite from_hap_spec to automatically use properties from subclasses
aryarm Apr 21, 2022
0eff78d
define new haplotype class for haptools
aryarm Apr 21, 2022
41b104e
updated chroms default values
mlamkin7 Apr 21, 2022
d8bb990
Merge pull request #42 from gymrek-lab/feat/vcf_output
mlamkin7 Apr 21, 2022
0973654
Update simgenotype.md
mlamkin7 Apr 21, 2022
5ba8f78
check header lines in Haplotypes.read
aryarm Apr 21, 2022
54a0617
add docs for usage of the .hap file
aryarm Apr 21, 2022
4f7c7fa
fmt with black
aryarm Apr 21, 2022
b196736
rebuild api docs with haplotypes.py
aryarm Apr 21, 2022
3e2a426
add examples for Haplotypes class
aryarm Apr 22, 2022
7e86aaf
validate that all extras are there in Haplotypes.check_ex_header
aryarm Apr 22, 2022
6232929
make _fmt a private field
aryarm Apr 23, 2022
62ab36b
convert iterate to __iter__ in data module
aryarm Apr 23, 2022
b369539
add more examples and docs to haplotypes class
aryarm Apr 23, 2022
f2fe5ac
add example hap files to docs
aryarm Apr 23, 2022
effb035
create smaller hap example files
aryarm Apr 23, 2022
b3d05ff
add HaplotypeTests class to testing module
aryarm Apr 23, 2022
fb27999
call __iter__ from read in Haplotypes class
aryarm Apr 23, 2022
3cca45f
use basic.hap in haplotypes examples
aryarm Apr 23, 2022
054a01b
add indexed basic hap and test example.hap.gz
aryarm Apr 23, 2022
5080728
test Haplotypes.write() method
aryarm Apr 23, 2022
e2bf695
add header lines to example.hap
aryarm Apr 23, 2022
4480b19
reformat with black -- oops
aryarm Apr 24, 2022
6d3b598
require sorting of line type symbols for indexed hap files
aryarm Apr 24, 2022
68b3047
Delete nohup.out
mlamkin7 Apr 28, 2022
f0e173c
Delete nohup.out
mlamkin7 Apr 28, 2022
394fa20
Updated documentation
mlamkin7 May 5, 2022
c3074c7
Merge branch 'main' of https://github.com/gymrek-lab/haptools into fe…
mlamkin7 May 5, 2022
b5ca354
Merge branch 'feat/vcf_output' of https://github.com/gymrek-lab/hapto…
mlamkin7 May 5, 2022
6a7c5ee
add Extra object encoding extra fields in Haplotypes module
aryarm May 9, 2022
2edcb52
revise hap test data files to pass tests
aryarm May 9, 2022
122f062
Merge branch 'feat/haplotypes' of github.com:gymrek-lab/haptools into…
aryarm May 9, 2022
68ba119
add docs for new extra field declarations in header
aryarm May 9, 2022
001a28e
Preallocate np array when loading genotypes
aryarm May 9, 2022
daaeadf
retest genotypes module after changes
aryarm May 10, 2022
eb641c0
create transform subcommand
aryarm May 10, 2022
95a1619
create TestGenotypes class in testing module
aryarm May 10, 2022
2e33f4a
test variant selection in Genotypes class
aryarm May 10, 2022
e54143a
refmt with black
aryarm May 10, 2022
60bda2b
create Data.unset() to check if data is unset
aryarm May 10, 2022
56ea690
add variants param to Genotypes.load()
aryarm May 10, 2022
e830119
output from a file path in transform subcommand
aryarm May 11, 2022
c1b55ff
create Genotypes class that also stores REF/ALT
aryarm May 11, 2022
db74659
create Haplotype.transform function
aryarm May 11, 2022
b72d1d3
create Haplotypes.transform function and add tests
aryarm May 11, 2022
e084ea8
write Haplotypes to a VCF
aryarm May 11, 2022
2c1dc3c
refmt with black and get rid of HaplotypesGT class
aryarm May 11, 2022
9e83254
clean up transform docs
aryarm May 11, 2022
6bad9d8
warn against importing at the top of __main__
aryarm May 11, 2022
4384cb8
clean up duplicated code in Genotypes class
aryarm May 13, 2022
259aaee
add Genotypes._prephased attr to ignore phasing while debugging
aryarm May 13, 2022
e72f2d3
allow for discarding samples that are missing genotypes
aryarm May 13, 2022
13c06e7
add more docs and messages to Genotypes and Haplotypes classes
aryarm May 13, 2022
1410315
require GenotypeRefAlt instance as input to Haplotypes.transform
aryarm May 13, 2022
8b502d1
Incomplete VCF output
mlamkin7 May 14, 2022
8ccb7d2
refmt with black
aryarm May 14, 2022
75e75be
prelim code for other gts readers
aryarm May 14, 2022
34a839d
Merge pull request #45 from gymrek-lab/feat/transform
aryarm May 14, 2022
9b4393a
Merge branch 'feat/haplotypes' into main
aryarm May 14, 2022
d092a1e
fix TypeError in sim_genotype._write_vcf
aryarm May 18, 2022
95ddb6e
Merge branch 'main' of https://github.com/gymrek-lab/haptools into fe…
mlamkin7 Jun 9, 2022
f1432d4
need to stash changes
mlamkin7 Jun 9, 2022
13af0a5
fixed merging issues
mlamkin7 Jun 9, 2022
09ba1ed
Completed output vcf
mlamkin7 Jun 15, 2022
5efd815
Completed Output VCF and one test case
mlamkin7 Jun 16, 2022
6c81f8d
Added validation to input files for sim genotype
mlamkin7 Jun 17, 2022
f1a2f12
Merge pull request #53 from gymrek-lab/feat/vcf_output
mlamkin7 Jun 27, 2022
229b285
Added 1000genomes sampleinfo file for input.
mlamkin7 Jun 29, 2022
e4aa6e2
Validation of map files' fields during execution.
mlamkin7 Jun 29, 2022
49998d8
Updated with Arya's recommendations on pull request
mlamkin7 Jun 29, 2022
738eaeb
Merge branch 'main' of https://github.com/gymrek-lab/haptools into fe…
mlamkin7 Jun 29, 2022
8a931f9
Updated docs to include simgenotypes
mlamkin7 Jun 29, 2022
a760af3
Fixed bug with isfile instead of isdir
mlamkin7 Jun 30, 2022
07e51b9
Merge pull request #56 from gymrek-lab/feat/vcf_output
mlamkin7 Jun 30, 2022
09a757c
Updated Example in __main__.py
mlamkin7 Jun 30, 2022
8f91078
Update simgenotype.md
mlamkin7 Jun 30, 2022
8e95499
Update simgenotype.md
mlamkin7 Jun 30, 2022
2b5e501
Update simgenotype.md
mlamkin7 Jun 30, 2022
d525dce
Update simgenotype.md
mlamkin7 Jun 30, 2022
4bdece4
Update simgenotype.md
mlamkin7 Jun 30, 2022
80ce432
Working example.
mlamkin7 Jun 30, 2022
54ed5a4
working example
mlamkin7 Jun 30, 2022
d76c013
Merge pull request #57 from gymrek-lab/feat/vcf_output
mlamkin7 Jun 30, 2022
703d36e
Merge branch 'main' of https://github.com/gymrek-lab/haptools into fe…
mlamkin7 Jun 30, 2022
062afdc
Merge pull request #58 from gymrek-lab/feat/vcf_output
mlamkin7 Jun 30, 2022
f021d3a
minor cleanups to simgenotype output messages
Jun 30, 2022
9e63e06
Merge pull request #59 from gymrek-lab/test-simgenotype
mlamkin7 Jun 30, 2022
d4b10ad
stashing changes
mlamkin7 Jun 30, 2022
c011f60
Fixed merge conflicts and added minor changes
mlamkin7 Jun 30, 2022
48363a1
Updated 1000 genomes documentation under formats
mlamkin7 Jun 30, 2022
3d99787
fixed simgenotype --help command output
mlamkin7 Jun 30, 2022
f17edd5
Merge pull request #60 from gymrek-lab/feat/vcf_output
mlamkin7 Jun 30, 2022
ac5aa13
adding apoe haplotype example
Jul 1, 2022
db2aa61
changing apoe4 to hg19
Jul 1, 2022
7e32b51
updating apoe4 example in transform docs
Jul 1, 2022
d35c1ec
Merge pull request #61 from gymrek-lab/test-transform
gymreklab Jul 1, 2022
0dee04c
Added SAMPLE Format Field
mlamkin7 Jul 2, 2022
a26a01f
Merge pull request #62 from gymrek-lab/feat/vcf_output
mlamkin7 Jul 2, 2022
c1fabfb
Fixed issue in test_outputvcf.py
mlamkin7 Jul 2, 2022
3acdbc1
Merge pull request #63 from gymrek-lab/feat/vcf_output
mlamkin7 Jul 2, 2022
295c40b
update poetry to v1.2
aryarm Mar 6, 2022
c595077
add pgenlib as a dependency
aryarm Mar 7, 2022
5343191
oops - specify it as a tag instead of rev
aryarm Mar 7, 2022
803f8e0
update pyproject to poetry-core >=1.1.0b1
aryarm Jul 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixed error with regex parsing of chromosomes
mlamkin7 committed Apr 14, 2022

Verified

This commit was signed with the committer’s verified signature.
Mamaduka George Mamadashvili
commit e06dba62487967d85fcad520a7d3d624a18d990d
6 changes: 2 additions & 4 deletions haptools/sim_admixture.py
Original file line number Diff line number Diff line change
@@ -87,7 +87,6 @@ def simulate_gt(model_file, coords_dir, chroms, popsize, seed=None):
Return

"""
# TODO new parameter for range of chroms we want to work on
# initialize seed used for breakpoints
if seed:
np.random.seed(seed)
@@ -103,7 +102,7 @@ def simulate_gt(model_file, coords_dir, chroms, popsize, seed=None):
coords = []

def numeric_alpha(x):
chrom = re.search(r'(?<=chr)X|\d+', x).group()
chrom = re.search(r'(?<=chr)(X|\d+)', x).group()
if chrom == 'X':
return 23
else:
@@ -113,7 +112,7 @@ def numeric_alpha(x):
# remove all chr files not found in chroms list
all_coord_files = glob.glob(f'{coords_dir}/*.map')
all_coord_files = [coord_file for coord_file in all_coord_files \
if re.search(r'(?<=chr)X|\d+', coord_file).group() in chroms]
if re.search(r'(?<=chr)(X|\d+)', coord_file).group() in chroms]
all_coord_files.sort(key=numeric_alpha)

# coords list has form chroms x coords
@@ -221,7 +220,6 @@ def write_breakpoints(samples, breakpoints, out):
return breakpoints

def _simulate(samples, pops, pop_fracs, pop_gen, chroms, coords, end_coords, recomb_probs, prev_gen_samples=None):
# TODO incorporate chroms variable from sim_genotype in order to limit range of admixture done on
# convert chroms to integer and change X to 23
chroms = [int(chrom) if chrom != 'X' else 23 for chrom in chroms]