Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate uniprot id into gene info #160

Conversation

beatrizsaldana
Copy link
Member

@beatrizsaldana beatrizsaldana commented Nov 23, 2024

AG-1392 - Incorporate uniprot ID into gene_info

Adding uniprotkb_accessions column to the gene_info dataset using the uniprot to ensembl mapping file (syn54113663).

Changes

  1. Updated config.yaml and test_config.yaml to include the uniprot mapping dataset
  2. Updated existing gene_info tests to include a uniprotkb_accessions column
  3. Updated gene_info output assets to include a uniprotkb_accessions column
  4. Added ensg_to_uniprot_mapping_good.tsv to the gene_info testing assets
  5. Added gx validation for the uniprotkb_accessions column

@beatrizsaldana beatrizsaldana marked this pull request as ready for review November 23, 2024 00:19
@beatrizsaldana beatrizsaldana requested a review from a team as a code owner November 23, 2024 00:19
Copy link
Contributor

@BWMac BWMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the commented out regex GX expectation still meant to be there? - If you wanted to check if the list members matched this regex you could do that by treating the field as nested and using a JSON schema validation expectation.

Copy link
Contributor

@JessterB JessterB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving this, but please add the file version number in the four places Brad flagged. Otherwise we will default to whatever the latest version of the file is, and that can result in unexpected data surprises and murky provenance...

@beatrizsaldana beatrizsaldana requested a review from BWMac November 25, 2024 19:36
output_df = gene_info.add_uniprot_id_to_gene_info(gene_info_df, uniprot_df)

# Read in expected dataframe and convert uniprotkb_accessions to list
expected_df = pd.read_csv(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matter of opinion (so, optional): It might be cleaner to save the expected output as a json file so no conversion code is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So true, I like your suggestion better than what I implemented, thank you!

Copy link
Contributor

@jaclynbeck-sage jaclynbeck-sage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a little bit of moving code around but it's almost there!

@@ -227,6 +233,23 @@ class TestTransformGeneInfo:
"Fail with bad data type in tep_adi_info's is_adi column",
"Fail with bad data type in tep_adi_info's is_tep column",
]
unpiprot_input_files = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section goes away since the test for the function no longer exists?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm if this section should go away, or stay?

Copy link
Contributor

@jaclynbeck-sage jaclynbeck-sage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just cleanup of unused variables and files for testing, everything else looks great!

Copy link
Contributor

@BWMac BWMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Pre-approving pending finishing up Jaclyn's comments

Copy link
Contributor

@jaclynbeck-sage jaclynbeck-sage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, approved!

@beatrizsaldana beatrizsaldana merged commit ae4cf82 into dev Nov 27, 2024
9 checks passed
@beatrizsaldana beatrizsaldana deleted the beatrizsaldana/AG-1392/ADT-Incorporate-uniprot-ID-into-gene_info branch December 2, 2024 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants