Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

functions to get binary matrix for geno + pheno data #11

Open
katholt opened this issue Jan 24, 2025 · 6 comments
Open

functions to get binary matrix for geno + pheno data #11

katholt opened this issue Jan 24, 2025 · 6 comments
Assignees

Comments

@katholt
Copy link
Contributor

katholt commented Jan 24, 2025

Created getBinMat to take genotype & phenotype tables (as output by genotype 'parser' and 'import_ncbi_ast' functions), and output a new data frame with binary (1/0) variables for

  • phenotype for a specified drug (two columns: R, NWT)
  • genetic markers relevant to a specified list of drug classes (presence/absence, one column per marker)

Added examples to readme

To do

  1. add optional list of strains that were subjected to genotyping, so we can fully account for those in which no markers were detected
  2. if no drug_class_list specified, determine relevant drug classes to get markers for
  3. generalise to deal with multiple drugs, or all drugs present in the phenotype table
@katholt katholt self-assigned this Jan 24, 2025
@katholt
Copy link
Contributor Author

katholt commented Jan 24, 2025

added option 'keep_assay_values' (default F)

if set to TRUE, the mic and disk columns for the specified antibiotic will be included in the output matrix

  • currently this just assumes there are columns labelled 'mic' and/or 'disk' and adds these to the output
  • this works with the output of 'import_ncbi_ast' but we will need to
  1. decide whether to rename these to include the drug name (as is currently done for the 'R' and 'NWT' binary output columns, - CHANGED SO ALL COLUM HEADERS ARE GENERIC 'pheno', 'R', 'WT' with no drug name
  2. try to auto-identify columns of class 'mic' or 'disk' instead, or
  3. let user specify quantitative column/s to retain - DONE

@katholt
Copy link
Contributor Author

katholt commented Jan 24, 2025

Added parameter keep_assay_values_from=c("mic", "disk")
for users to specify which phenotype columns to retain in the output

Could be scenarios where this is actually all you want for pheno data, and don't want binary phenotype output (or don't have appropriate columns to calculate it) so should make it possible to output ONLY assay data + binary genotype

@natacha-couto
Copy link
Collaborator

does this now include the measurement sign?

@katholt
Copy link
Contributor Author

katholt commented Jan 24, 2025

hi @natacha-couto

this function doesn't currently interpret MIC or disk data into S/I/R

the input pheno_table is expected to already have a column with interpreted phenotypes, the name of this column must be specified as a parameter

@katholt
Copy link
Contributor Author

katholt commented Jan 24, 2025

TO DO: add option to return S/I/R values also (helpful for downstream plots/analysis including solo_ppv_stats and upset plots) - DONE

@katholt
Copy link
Contributor Author

katholt commented Jan 25, 2025

updated import_ncbi_ast to interpret against ecoff and report in column 'ecoff' coded as WT/NWT

should now update getBinMat to use this if available, rather than defining NWT column based on I/R vs S - DONE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants