Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mgymrek docs admixsim #14

Merged
merged 10 commits into from
Feb 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,7 @@ __pycache__
# pytest cache
.pytest_cache
# poetry
dist/
dist/

# OSX
*.DS_Store*
29 changes: 27 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,32 @@
Please wait until we have published our first tagged release before using our code.

# haptools
Simulate phenotypes for fine-mapping. Use real variants to simulate real, biological LD patterns.
The Snakemake pipeline in the `snakemake/` directory uses the results of the simulation to test several fine-mapping methods, including FINEMAP and SuSiE.

Haptools is a collection of tools for simulating and analyzing genotypes and phenotypes while taking into account haplotype information. It is particularly designed for analysis of individuals with admixed ancestries, although the tools can also be used for non-admixed individuals.

Homepage: https://haptools.readthedocs.io/

## Installation

UNDER CONSTRUCTION

## Haptools utilities

Haptools consists of multiple utilities listed below. Click on a utility to see more detailed usage information.

* [`haptools simgenome`](haptools/simgenotype/README.md): Simulate genotypes for admixed individuals under user-specified demographic histories.

* [`haptools simphenotype`](haptools/simphenotype/README.md): Simulate a complex trait, taking into account local ancestry- or haplotype- specific effects. `haptools simphenotype` takes as input a VCF file and outputs simulated phenotypes for each sample.

* [`haptools karyogram`](haptools/karyogram/README.md): Visualize a "chromosome painting" of local ancestry labels based on breakpoints output by `haptools simgenome`.

Outputs produced by these utilities are compatible with each other. For example
`haptools simgenome` outputs a VCF file with local ancestry information annotated for each variant. The output VCF file can be used as input to `haptools simphenotype` to simulate phenotype information. `haptools simgenome` also outputs a list of local ancestry breakpoints which can be visualized using `haptools karyogram`.


## Contributing

If you are interested in contributing to `haptools`, please get in touch by submitting a Github issue or contacting us at [email protected].



3 changes: 3 additions & 0 deletions haptools/karyogram/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Haptools karyogram

UNDER CONSTRUCTION
File renamed without changes.
File renamed without changes.
86 changes: 86 additions & 0 deletions haptools/simgenotype/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Haptools simgenotype

`haptools simgenotype` takes as input a reference set of haplotypes in VCF format and a user-specified admixture model. It outputs a VCF file with simulated genotype information for admixed genotypes, as well as a breakpoints file that can be used for visualization.

## Basic usage

```
haptools simgenotype \
--invcf REFVCF \
--sample_info SAMPLEINFOFILE \
--model MODELFILE \
--map GENETICMAP \
--out OUTPREFIX
```

Detailed information about each option, and example commands using publicly available files, are shown below.

## Detailed usage

`--invcf` - Input VCF file used to simulate specifiic haplotypes for resulting samples
`--sample_info` - File used to map samples in `REFVCF` to populations found in `MODELFILE`
`--model` - Parameters for simulating admixture across generations
`--map` - .map file used to determine recombination events during the simulation
`--out` - Output prefix of the structure `/path/to/output` which results in the vcf file `output.vcf.gz` and breakpoints file `output.bp`

## File formats

Model Format

Structure of model.dat file

`num_samples` - Total number of samples to be output by the simulator (`num_samples*2` haplotypes)
`num_generations` - Number of generations to simulate admixture, must be > 0
`*_freq` - Frequency of populations to be present in the simulated samples

```
{num_samples} Admixed Pop_label1 Pop_label2 ... Pop_labeln
{num_generations} {admixed_freq} {pop_label1_freq} {pop_label2_freq} ... {pop_labeln_freq}
```

Example model.dat file

```
40 Admixed CEU YRI
6 0 0.2 0.8
```
Simulating 6 generations in this case implies the first generation has population freqs `Admixed=0, CEU=0.2, YRI=0.8` and the remaining 2-6 generations have population frequency `Admixed=1, CEU=0, YRI=0`

Map Format

`chr` - chromosome of coordinate (1-22, X)
`var` - variant identifier
`pos cM` - Position in centimorgans
`pos bp` - Base pair coordinate

```
{chr}\t{var}\t{pos cM}\t{pos bp}
```
Beagle Genetic Maps used in simulation (GRCh38): http://bochet.gcc.biostat.washington.edu/beagle/genetic_maps/


Outfile Format

`Sample Header` - Name of sample following the structure `Sample_{number}_{hap}` eg. `Sample_10_1` for sample number 10 haplotype 1
`pop` - Population label corresponding to the index of the population in the dat file so in the example above CEU = 1, YRI = 2
`chr` - chromosome (1-22, X)

```
Sample Header
{pop}\t{chr}\t{pos bp}
...
Sample Header 2
...
```

## Examples

Example Command
```
haptools simgenotype
--invcf 1000Genomes.vcf.gz \
--sample_info /path/to/sampleinfo.csv \
--model /path/to/model/file.dat \
--map /path/to/plink/file/ \
--out /path/to/output
```
3 changes: 3 additions & 0 deletions haptools/simphenotype/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Haptools simphenotype

UNDER CONSTRUCTION