Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish testing malaria.em and add wrapper script #56

Open
a-hubbard opened this issue Jan 22, 2025 · 1 comment
Open

Finish testing malaria.em and add wrapper script #56

a-hubbard opened this issue Jan 22, 2025 · 1 comment

Comments

@a-hubbard
Copy link
Collaborator

Shazia's update on her malaria.em work, copied from the "brain dump" Google Doc:

  • Got malaria.em code base running (fixed minor bugs in code) - this is now in the PlasmoGenEpi Github repository
    • Spoke to Max about putting into plasmogenepi r-universe to see if it builds
  • Tested malaria.em on test data included in package, Vietnam SNP VCF from PGEforge and Laos allele table from PGEcore
    • Test data: 200 samples + 2 loci
    • Vietnam SNP VCF: 97 samples + 8 loci
      • Very rough run to see when it crashes (it is very slow depending on combinatorics of possible haplotype combos)
      • Ran it several times incrementing n bi-allelic loci and found slight upper limit of ~9-10 loci where it gets really slow
    • Laos data: 25 samples + 99 microhaps
      • Few runs to see how it works on microhap data (eg incrementing n microhaps - but depends on how diverse microhaps are)
        • 6 loci works and we started to cross-check with expected results (given this is data simulated by Nick and we can calculate the ground truth allele freqs and phased genotypes). Some COI estimates and phased genotypes didn’t match expected - but this was not unexpected because we chose first 6 loci at random and there may not be sufficient resolution if loci not diverse enough
      • So then selected top 6 loci based on highest He and ran malaria.em (still running as of 15:30 thursday!) - results TBD
        • Rationale: use case is DR dhps mutants (6 loci multi-locus analyses needed - so we want to get this running and this will be material for PGEforge tutorial)
  • End-to-end dirty scripts running malaria.em on all datasets mentioned above
  • Skeleton of PGEforge tutorial of how to run malaria.em (cleanest for the Laos allele table)
  • Functions to wrangle malaria.em output into gt_freq_summary (Population-level multilocus genotype frequency estimates + standard error) and gt_phase_summary (Phased genotypes per sample + posterior probability) (these are not yet in proper PGEcore documented module script format)

End of copied text.

@shaziaruybal let us know if you have capacity to work on this, or if you need to pass it off.

@shaziaruybal
Copy link
Collaborator

will aim to finish this and PR into PGEcore, but if I need extra support will reach out :) @a-hubbard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants