Skip to content
This repository has been archived by the owner on Apr 12, 2023. It is now read-only.

investigate OpenFDA as a data source for compound relationships #131

Open
andrewsu opened this issue Oct 27, 2020 · 1 comment
Open

investigate OpenFDA as a data source for compound relationships #131

andrewsu opened this issue Oct 27, 2020 · 1 comment

Comments

@andrewsu
Copy link
Member

andrewsu commented Oct 27, 2020

The OpenFDA API may be a useful source of relationships between compounds and other entities (including protein targets). (EDIT 2020-10-28: separated out the issue of compound properties from OpenFDA into another ticket: biothings/biothings_explorer#132.)

The OpenFDA API is described at https://open.fda.gov/apis/. The primary key is UNII. For example, drug report for afatinib (UNII: 41UD74L59M) is at https://api.fda.gov/other/substance.json?search=unii:%2241UD74L59M%22 .

The Relationships section has a variety of relationships to other entities. Among these are TARGET->INHIBITOR relationships to proteins that the drug targets (including specific protein variants). (Yes, we have drug targets through other resources, but consulting an FDA API seems to be quite valuable from a provenance standpoint.) The current challenge is that the protein targets are also indexed by UNII IDs, and I'm not aware of any resource to translate those to other protein identifiers (e.g., UniProt).

image

In some cases, there is additional quantitative information on those relationships. For example, from the record for Acalabrutinib https://api.fda.gov/other/substance.json?search=uuid:%229d4c7efc-e2a1-4b9c-8d57-559ef5dd30b2%22, we see that the Acalabrutinib inhibits BTK edge also has qualifiers that the IC50 is 5nM and that it is an irreversible inhibitor:

image

Referencing is very good, with statement-level references to primary literature, databases, documents from FDA/WHO, etc.

(ref: email from Tyler Peryea to me, 2020-10-26)

@andrewsu
Copy link
Member Author

Explanation below is paraphrased from an email from Tyler. It seems that all the information is there to get official FDA info on drugs-target relationships. Would be great to figure out whether this can be set up as a BTE-compatible SmartAPI...


There are several steps involved in going to UNIPROT IDs, but it’s usually possible. In the case below, the link to targets as part of the relationships is here:

        {
          "uuid": "94491d1b-6f00-458e-9beb-647d85ced760",
          "type": "TARGET->INHIBITOR",
          "interaction_type": "IRREVERSIBLE INHIBITOR",
          "references": [
            "e2678158-4b39-423b-a846-7f3fd2aad252"
          ],
          "related_substance": {
            "uuid": "a43b9029-c670-4bc6-9732-30d690bf89de",
            "refuuid": "00b709bc-7f21-4daf-91bc-3e17f9d26ab7",
            "name": "RECEPTOR TYROSINE-PROTEIN KINASE ERBB-2",
            "unii": "7J9Y28299R",
            "linking_id": "7J9Y28299R",
            "ref_pname": "RECEPTOR TYROSINE-PROTEIN KINASE ERBB-2",
            "substance_class": "reference"
          }
        },

refuuid is the real link to the raw record. unii and linking_id can also be used, but I’d stick to “refuuid” if it exists, and use that as a uuid search:

https://api.fda.gov/other/substance.json?search=uuid:%2200b709bc-7f21-4daf-91bc-3e17f9d26ab7%22

That’s a link to the full record associated (I used the refuuid as the uuid search, but could have used the UNII instead). If you look inside that record’s “codes” section, you see this:

"codes": [
        {
          "uuid": "b00b343d-3b90-4a31-af67-acc23a5d3cce",
          "code": "P04626",
          "type": "PRIMARY",
          "url": "http://www.uniprot.org/uniprot/P04626",
          "code_system": "UNIPROT",
          "references": [
            "fc3833da-a6a5-4d4f-9f07-6c8b072f865c"
          ]
        },
        {
          "uuid": "80cc2e1c-9338-4f96-8d97-457ace835fe8",
          "code": "HER2/neu",
          "type": "PRIMARY",
          "code_system": "WIKIPEDIA",
          "references": [
            "f1d699eb-295a-055d-d931-985b19340bcf"
          ]
        },
        {
          "uuid": "2d924ccd-1c7a-43ee-9e5c-38798992bd7c",
          "code": "7J9Y28299R",
          "type": "PRIMARY",
          "code_system": "FDA UNII"
        }
      ],

A few things to note here:

  • Not all target substances have UNIPROT links, but most do.
  • The type of “PRIMARY” implies that it’s the primary code for that substance record, but there are other markers of specificity like “ALTERNATIVE”, “GENERIC”, etc. So pay attention to that field.
  • A record can have more than one UNIPROT code, but it won’t typically have more than one PRIMARY UNIPROT code.
  • There are other codesystems that may also be present. In general, the “codes” section are kind of a treasure trove of great linking and classification information for targets, drug substances, etc.
  • The data is not infallible. It’s hand-entered and manually curated by FDA staff and associates based on publicly available data sources. There are holes and sometimes mistakes, but we’re always trying to improve the data and process.

@andrewsu andrewsu changed the title investigate OpenFDA as a data source investigate OpenFDA as a data source for compound relationships Oct 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant