Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Found document where drugcentral value is array, but items are for different chemicals #191

Open
colleenXu opened this issue Jan 1, 2025 · 3 comments

Comments

@colleenXu
Copy link

colleenXu commented Jan 1, 2025

As part of biothings/biothings.api#368, I'm reviewing documents where drugcentral is an array to see this data-structure makes sense.

I've come across an example where one of the drugcentral items looks like it's in the wrong document: https://mychem.info/v1/chem/CPZBLNMUGSZIPR-NVXWUHKLSA-N

  • the rest of the data in the document is for palonosetron
  • But the 2nd item in the drugcentral array is for aurothioglucose, which is a completely different chemical
@colleenXu
Copy link
Author

colleenXu commented Jan 1, 2025

I'm not sure about these, because they're kinda similar but also kinda not:

@colleenXu
Copy link
Author

colleenXu commented Jan 3, 2025

Here's an example of downstream effects (BTE querying and processing responses):

Take https://mychem.info/v1/chem/IZTQOLKUZKXIRV-YRVFCXMDSA-N. It has two drugcentral items with slightly diff bioactivity contents (pointing out 1 diff here):

  • cholecystokinin (drugcentral.xrefs.umls ==C0008328). Only this item has bioactivity objects with action_type = AGONIST: uniprot IDs are P32239 and P30551
  • sincalide (drugcentral.xrefs.umls ==C0037167)

If I start from the chemical ID sincalide (C0037167) and action_type=AGONIST, BTE will send a query like this and cholecystokinin (C0008328)'s data will be returned (it should return nothing for sincalide C0037167 alone). So BTE will link sincalide with the returned uniprot IDs (create edges), which isn't really desired behavior. But it's understandable because the 2 chemicals' drugcentral data are together in 1 document.

Query from chemical ID for sincalide

curl --location --globoff 'https://mychem.info/v1/query?fields=drugcentral.bioactivity&size=1000&with_total=true&jmespath=drugcentral.bioactivity%7C[%3Faction_type%3D%3D%60AGONIST%60%20%7C%7C%20action_type%3D%3D%60PARTIAL%20AGONIST%60]&always_list=drugcentral.bioactivity&filter=drugcentral.bioactivity.action_type%3A(%22AGONIST%22%20OR%20%22PARTIAL%20AGONIST%22)' \
--header 'Content-Type: application/json' \
--data '{
    "q": ["C0037167"],
    "scopes": "drugcentral.xrefs.umlscui"
}'

Note: The query jmespath could be modified to add the chemical umls ID. But I kinda don't want to do that because then this type of query wouldn't be a batch-query anymore. And it kinda covers up this data problem...

@colleenXu
Copy link
Author

FYI this problem doesn't happen if I query from the "reverse" direction.

If I start from the uniprot ID (ex: P32239 for action_type=AGONIST), BTE will send a query like this and only cholecystokinin (C0008328)'s data will be returned (not sincalide C0037167).

Query from UniProt ID

curl --location --globoff 'https://mychem.info/v1/query?size=1000&fields=drugcentral.bioactivity%2Cdrugcentral.xrefs.umlscui&jmespath_exclude_empty=true&always_list=drugcentral.bioactivity&jmespath=drugcentral.bioactivity%7C[%3F(action_type%3D%3D%60AGONIST%60%20%7C%7C%20action_type%3D%3D%60PARTIAL%20AGONIST%60)%20%26%26%20length(uniprot[%3Funiprot_id%3D%3D%27P32239%27])%20%3E%20%600%60]&filter=drugcentral.bioactivity.action_type%3A(%22AGONIST%22%20OR%20%22PARTIAL%20AGONIST%22)%20AND%20_exists_%3Adrugcentral.xrefs.umlscui' \
--header 'Content-Type: application/json' \
--data '{
    "q": ["P32239"],
    "scopes": "drugcentral.bioactivity.uniprot.uniprot_id"
}'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant