-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate uniprot id into gene info #160
Incorporate uniprot id into gene info #160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the commented out regex GX expectation still meant to be there? - If you wanted to check if the list members matched this regex you could do that by treating the field as nested and using a JSON schema validation expectation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-approving this, but please add the file version number in the four places Brad flagged. Otherwise we will default to whatever the latest version of the file is, and that can result in unexpected data surprises and murky provenance...
tests/transform/test_gene_info.py
Outdated
output_df = gene_info.add_uniprot_id_to_gene_info(gene_info_df, uniprot_df) | ||
|
||
# Read in expected dataframe and convert uniprotkb_accessions to list | ||
expected_df = pd.read_csv( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matter of opinion (so, optional): It might be cleaner to save the expected output as a json file so no conversion code is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So true, I like your suggestion better than what I implemented, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a little bit of moving code around but it's almost there!
tests/transform/test_gene_info.py
Outdated
@@ -227,6 +233,23 @@ class TestTransformGeneInfo: | |||
"Fail with bad data type in tep_adi_info's is_adi column", | |||
"Fail with bad data type in tep_adi_info's is_tep column", | |||
] | |||
unpiprot_input_files = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section goes away since the test for the function no longer exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm if this section should go away, or stay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just cleanup of unused variables and files for testing, everything else looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 Pre-approving pending finishing up Jaclyn's comments
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, approved!
AG-1392 - Incorporate uniprot ID into gene_info
Adding
uniprotkb_accessions
column to thegene_info
dataset using the uniprot to ensembl mapping file (syn54113663).Changes
config.yaml
andtest_config.yaml
to include the uniprot mapping datasetgene_info
tests to include auniprotkb_accessions
columngene_info
output assets to include auniprotkb_accessions
columnensg_to_uniprot_mapping_good.tsv
to thegene_info
testing assetsuniprotkb_accessions
column