Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

function to check sample IDs across geno/pheno tables #9

Open
jhawkey opened this issue Jan 23, 2025 · 2 comments
Open

function to check sample IDs across geno/pheno tables #9

jhawkey opened this issue Jan 23, 2025 · 2 comments
Assignees

Comments

@jhawkey
Copy link
Contributor

jhawkey commented Jan 23, 2025

helper function to check that you can merge the geno/pheno files together - do all the sample names in one exist in the other? print a warning if samples are missing from one or the other table

@jhawkey jhawkey self-assigned this Jan 23, 2025
@katholt
Copy link
Contributor

katholt commented Jan 23, 2025

need helper function e.g. compareGenoPhenoID

  • input = pheno data frame, name of the ID column (otherwise take first column, or try to find 'sample' or 'biosample'; same for geno data frame
  • compare the sample IDs, make a list of unique entries that appear in both
  • report the number of overlaps and uniques
  • return copies of geno and pheno data frames that each contain the overlapping samples only

This helper function can be used by getBinMat and lots of other functions as a starting point

@jhawkey
Copy link
Contributor Author

jhawkey commented Jan 23, 2025

Made a simple function to check sample ids, inside helpers.R 23bf22b

Still needs to be made more generic to deal with different options for column names

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants