Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DrugBank labels #335

Merged
merged 6 commits into from
Sep 26, 2024
Merged

Add DrugBank labels #335

merged 6 commits into from
Sep 26, 2024

Conversation

gaurav
Copy link
Collaborator

@gaurav gaurav commented Aug 10, 2024

This PR adds DrugBank labels (from DrugBank v5.1.12). Somehow closes #332, but I'm not sure how (it might be a previous change in PR #279 that really closed this).

Should be merged after PR #279.

@gaurav gaurav marked this pull request as ready for review August 13, 2024 05:47
@gaurav gaurav requested a review from cbizon August 13, 2024 05:48
Copy link
Contributor

@cbizon cbizon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno about this one. Are the entries in DrugBank really Drugs from our POV? They look like active ingredients (small molecules etc) to me.

I changed it to biolink:Drug previously.
@gaurav
Copy link
Collaborator Author

gaurav commented Sep 23, 2024

I dunno about this one. Are the entries in DrugBank really Drugs from our POV? They look like active ingredients (small molecules etc) to me.

I think you're right. This PR uses the DrugBank Open Vocabulary file, and most of the names are generic names like ibuprofen, Ibuprofen piconol, captopril, VTP-194204, Etanercept, Erythropoietin, WRR-99, Zofin, Krill Oil, MK-886, BMS-833923 and others. I figured it made sense to categorize all of these as drugs as a way of grouping everything from small molecules to protein hormones to organic substances that all have some sort of medical benefit. But if our criteria for "Drug" is a specific formulation (e.g. "acetaminophen 5mg capsule"), then yeah, these would not make sense. I'm not sure if we can uniformly say these are all small molecules, but I think most of them are, so I've reverted the type for DrugBank entries from biolink:Drug back to biolink:ChemicalEntity (d38ce21). I've also made a note for us to check for other small molecules/chemical entities that might have accidentally ended up in Drug.txt (#348).

Incidentally, in addition to the DrugBank ID, many of the 16,581 chemicals in the DrugBank download have a UNII, CAS or InChI Key. I don't think we can use those to categorize DrugBank entries better (or if we want to include those concords), but just wanted to mention that in case it's useful.

@gaurav gaurav requested a review from cbizon September 23, 2024 18:15
Base automatically changed from babel-1.6 to master September 23, 2024 18:15
@gaurav gaurav merged commit e6cc26d into master Sep 26, 2024
gaurav added a commit that referenced this pull request Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DrugBank ID being dropped between untyped compendium and final ChemicalEntity compendium
2 participants