clinicaltrials.gov #40

Jiros · 2020-03-31T16:17:26Z

For more information and comprehensive guidance see the excellent article from Kirsten Langendorf -
https://www.s-cubed-global.com/news/covidgraph-nerds-response-to-the-pandemic

Repo

https://github.com/covidgraph/data_clinical-trials-gov

Description

Suggested by - lynnehansen

Add data about clinical trials. There are a few databases where the results of clinical trials are published. The most relevant general purpose databases are clinicaltrials.gov and clinicaltrialsregister.eu

There might be two more datasources:

collections of clinical trials relevant for specific areas such as Covid-19
monitoring of ongoing clinical trials by the competent authorities (such as https://www.pei.de/EN)

Data Sources

https://clinicaltrials.gov/
https://www.clinicaltrialsregister.eu/

Note

All clinical studies registered on https://clinicaltrials.gov/ related to covid19.

Dependencies

None

motey · 2020-04-28T08:38:47Z

With https://clinicaltrials.gov/ct2/download_studies?down_chunk=1
https://clinicaltrials.gov/ct2/download_studies?down_chunk=2 and so on, one can download all the raw data in xml format.
Only thing is that i cant find information about how many chunks there are :)

Source: https://clinicaltrials.gov/ct2/resources/download#DownloadMultipleRecords

paltusplintus · 2020-04-28T11:02:36Z

I would suggest to start with loading the basic trial description data (only for trials relevant to COVID) from clinicaltrials.gov api endpoint (see cypher query attached)

load_covid_trials_clintrials_gov.txt

Unfortunately max_rnk for this query is limited to 1000, so when there are more than 1000 trials in total, the query should be splitted into several more specific queries (e.g. per trial phase) - 'expr=covid' to be updated.

After the basic info is loaded as nodes, some parts of it (e.g. PrimaryOutcomeMeasure) could be already parsed and linked to other data in the graph.
Additional data from clinicaltrials.gov could then be loaded per trial (NCTId) with following query:

MATCH (ct:ClinicalTrial)
call apoc.load.json('https://clinicaltrials.gov/api/query/full_studies?expr='+ct.NCTId[0]+'&fmt=json') yield value
with value.FullStudiesResponse.FullStudies as studies unwind studies as study
// add code to store in Neo
return study

Description of the API: https://clinicaltrials.gov/api/gui/ref/api_urls

KirstenLangendorf · 2020-04-29T12:11:36Z

I have started to write the missing code (// add code to store Neo) unwinding the relevant info from JSON being returned. Do you suggest to add this additional info as nodes or as properties to the ClinicalTrial type nodes? I guess from your text it should be new nodes linked to ClinicalTrial nodes?

paltusplintus · 2020-04-29T14:45:22Z

Do you suggest to add this additional info as nodes or as properties to the ClinicalTrial type nodes? I guess from your text it should be new nodes linked to ClinicalTrial nodes?

Yes, I suggest separate nodes linked to ClinicalTrial, especially the data that could be linked to other data in the graph: what comes to my mind - endpoints, inclusion/exclusion criteria. If you feel that some of the data is not relevant for linking, we could leave it as a properties for now and refactor the graph in the future if required to link this data.

motey · 2020-04-29T14:53:16Z

I have started to write the missing code (// add code to store Neo) unwinding the relevant info from JSON being returned.

Awesome!
Hint: To later integrate the data to the main graph, a docker image would be great. see https://github.com/covidgraph/data_template and https://github.com/covidgraph/motherlode for more informations. if you have any questions ping me (@tim.bleimehl:meet.dzd-ev.de).

KirstenLangendorf · 2020-04-29T14:54:59Z

ok, thanks. BTW there seems to be 1095 studies containing COVID. I have downloaded the JSON and will use that instead of the URL having the limit of 1000.

KirstenLangendorf · 2020-05-06T19:31:56Z

Hi, sorry but been busy with daily work and needed to get my head around the JSON input data. I have made a first attempt. For COVID studies I could not find any results, yet. PrimaryOutcomeMeasure are made as nodes, but the data is a bit messy. I have made my script in Jupyter notes (attached) using my own local graph for testing (that can be changed). Comments/feedback are
more than welcome. I am happy to do more scripting extending/changing what I have made so far. EligibilityCriteria could be added as a property-the in/exclusion criteria tend to be non-standard too. Also appreciate feedback on the scripting :-) (it is not part of my daily work)
@tim.bleimehl:meet.dzd-ev.de I think I need a bit of help if you need the suff differently.
clinicaltrials.ipynb.zip

motey · 2020-05-07T06:20:44Z

@KirstenLangendorf Great work! The json is a mess (why is every single attribute value wrapped in list :D ? ) but looks like you tamed it 😎
I could setup a repository with a bit of boilerplate code (python/docker setup), where you can then paste your queries in. If that would help you? I would try to make it today in the afternoon or tomorrow morning.

KirstenLangendorf · 2020-05-07T06:42:49Z

@KirstenLangendorf Great work! The json is a mess (why is every single attribute value wrapped in list :D ? ) but looks like you tamed it 😎
I could setup a repository with a bit of boilerplate code (python/docker setup), where you can then paste your queries in. If that would help you? I would try to make it today in the afternoon or tomorrow morning.

Let me try it out. No rush - tomorrow is a Danish bank holiday. I think I will add Eligibility as nodes too. Saw the presentation by Martin Preusse and it seems that you are using Machine Learning type tools to combine messy data. Which tools are you using?

There is more data in the ClinicalTrials.gov - and hopefully also some study results at some point. What is the best way for me to get information about important data needed for the rest of the graph? will that be reading the use cases?

motey · 2020-05-07T06:58:10Z

Saw the presentation by Martin Preusse and it seems that you are using Machine Learning type tools to combine messy data. Which tools are you using?

The ML/NLP Team is still in a experimentation/poc phase (as far as i can keep track of that atm). if you are interested in can invite you in the chat group.

There is more data in the ClinicalTrials.gov - and hopefully also some study results at some point. What is the best way for me to get information about important data needed for the rest of the graph? will that be reading the use cases?

afaik atm there is no standardized process to determine that. A discussion in the CovidGraph chat group would be the most purposeful way atm.

KirstenLangendorf · 2020-05-07T07:01:30Z

Saw the presentation by Martin Preusse and it seems that you are using Machine Learning type tools to combine messy data. Which tools are you using?

The ML/NLP Team is still in a experimentation/poc phase (as far as i can keep track of that atm). if you are interested in can invite you in the chat group.
Yes please, thank you :-)

There is more data in the ClinicalTrials.gov - and hopefully also some study results at some point. What is the best way for me to get information about important data needed for the rest of the graph? will that be reading the use cases?

afaik atm there is no standardized process to determine that. A discussion in the CovidGraph chat group would be the most purposeful way atm.
Ok will look out there.

motey · 2020-05-08T09:35:39Z

Saw the presentation by Martin Preusse and it seems that you are using Machine Learning type tools to combine messy data. Which tools are you using?

The ML/NLP Team is still in a experimentation/poc phase (as far as i can keep track of that atm). if you are interested in can invite you in the chat group.
Yes please, thank you :-)

Just saw you are already in the group :) (CovidGraph Data Analysis)

KirstenLangendorf · 2020-05-12T05:20:00Z

@KirstenLangendorf Great work! The json is a mess (why is every single attribute value wrapped in list :D ? ) but looks like you tamed it 😎
I could setup a repository with a bit of boilerplate code (python/docker setup), where you can then paste your queries in. If that would help you? I would try to make it today in the afternoon or tomorrow morning.

Hi Tim,
I have time this weekend to work on covidgraph in case I should try out the python/docker setup.

motey · 2020-05-12T07:48:30Z

Hi Kirsten,

you can start with https://github.com/covidgraph/data_template by clicking "Use this template" in the github webinterface.
Basicly you have to copy your queries into https://github.com/covidgraph/data_template/blob/master/dataloader/main.py

If you need any further help with git,docker or python just ping me in the chat.

mpreusse · 2020-05-14T19:36:06Z

@KirstenLangendorf I can also help with the data loading template!

KirstenLangendorf · 2020-05-15T05:14:15Z

@KirstenLangendorf I can also help with the data loading template!

Thanks:-) I will start looking at the loading tomorrow. I am at work today.

Ok, couldn't help it. Had to look :-)
Documentation is made in https://github.com/covidgraph/data_template. I have made one: https://github.com/KirstenLangendorf/load_clinical_trials_gov and will fill in during tomorrow.

Do I just paste the queries I have in after line 22 (delete the rest)? in
https://github.com/KirstenLangendorf/load_clinical_trials_gov/blob/master/dataloader/main.py

..ok I will read trough the instruction and revert once I have everything in my Github template.

KirstenLangendorf · 2020-05-17T10:14:27Z

@tim and @mpreusse I have now put the script on the dataloader folder: load_data and data_profile for the stats queries.
I have written a bit on the ReadMe.

https://github.com/KirstenLangendorf/load_clinical_trials_gov

I need help on the rest since I not quite sure how to make it execute and publish in the right way.

motey · 2020-05-17T11:12:11Z

@KirstenLangendorf cool! i will have a deeper look at it tomorrow, fork it and and try to bring it in an executable state.

mpreusse · 2020-05-17T11:38:56Z

@KirstenLangendorf that looks great! @motey tell me if I can help. Looks similar to e.g. the text fragger, there are now downloads but only Cypher queries. Pretty long ones though 😄

KirstenLangendorf · 2020-05-17T13:06:50Z

@KirstenLangendorf that looks great! @motey tell me if I can help. Looks similar to e.g. the text fragger, there are now downloads but only Cypher queries. Pretty long ones though 😄

I know the queries are long but It was to avoid calling the ClinicalTrials.gov json several times.

motey · 2020-05-18T12:44:59Z

@KirstenLangendorf Hi Kirsten. i have done following things today:

renamed data_profile and load_data to data_profile,cypher and load_data.cypher
Created a function in main.py to read in your queries from the file data_profile.cypher
created a main function in main.py to run your queries
created a pipeline to build a docker image when there is a new release of the reposiory (aka git tag) and push the container to docker hub at covidgraph/data-clinical_trials_gov (see .github/workflows/build_container_prd.yml)
Updated the readme.md
Forked your whole repo to covidgraph/data_clinical-trials-gov and made you an admin (full rights). this was needed to allow me to add docker hub credentials and to have the repo in the same scheme as the others. if that is an issue for you, just let me know and we can find another solution

If you could test the repo against a neo4j db? I am to lazy to setup a local neo4j instance with apoc :)

If the tests are successful i can integrate your script in the covidgraph dataloader pipeline 🚀

KirstenLangendorf · 2020-05-19T10:33:29Z

@KirstenLangendorf Hi Kirsten. i have done following things today:

renamed data_profile and load_data to data_profile,cypher and load_data.cypher

Created a function in main.py to read in your queries from the file data_profile.cypher

created a main function in main.py to run your queries

created a pipeline to build a docker image when there is a new release of the reposiory (aka git tag) and push the container to docker hub at covidgraph/data-clinical_trials_gov (see .github/workflows/build_container_prd.yml)

Updated the readme.md

Forked your whole repo to covidgraph/data_clinical-trials-gov and made you an admin (full rights). this was needed to allow me to add docker hub credentials and to have the repo in the same scheme as the others. if that is an issue for you, just let me know and we can find another solution

If you could test the repo against a neo4j db? I am to lazy to setup a local neo4j instance with apoc :)

If the tests are successful i can integrate your script in the covidgraph dataloader pipeline 🚀

Installed the Docker app.
In my terminal docker pull covidgraph/data-clinical_trials_gov
then writing this
docker build -t data-clinical_trials_gov .
returns this error:
error checking context: 'can't stat '/Users/Kirsten/.Trash''.
Tried to google it but couldn't find a fix. @motey Do you know what to do?

motey · 2020-05-19T13:39:13Z

stupid question, but did you try it with sudo :=) ?

KirstenLangendorf · 2020-05-19T13:41:58Z

stupid question, but did you try it with sudo :=) ?

nope - I can try. It reported same error :-(

Couldn't see your message on Riot - encrypted

Jiros assigned motey Jun 26, 2020

Jiros transferred this issue from covidgraph/documentation Dec 7, 2020

Jiros added Type: Data Source To identify an issue as a data source Status: In Dev This issue has been moved to the Dev environment for testing labels Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clinicaltrials.gov #40

clinicaltrials.gov #40

Jiros commented Mar 31, 2020

motey commented Apr 28, 2020

paltusplintus commented Apr 28, 2020 •

edited

Loading

KirstenLangendorf commented Apr 29, 2020

paltusplintus commented Apr 29, 2020

motey commented Apr 29, 2020 •

edited

Loading

KirstenLangendorf commented Apr 29, 2020

KirstenLangendorf commented May 6, 2020

motey commented May 7, 2020

KirstenLangendorf commented May 7, 2020

motey commented May 7, 2020 •

edited

Loading

KirstenLangendorf commented May 7, 2020

motey commented May 8, 2020

KirstenLangendorf commented May 12, 2020

motey commented May 12, 2020

mpreusse commented May 14, 2020

KirstenLangendorf commented May 15, 2020 •

edited

Loading

KirstenLangendorf commented May 17, 2020 •

edited

Loading

motey commented May 17, 2020

mpreusse commented May 17, 2020

KirstenLangendorf commented May 17, 2020

motey commented May 18, 2020 •

edited

Loading

KirstenLangendorf commented May 19, 2020

motey commented May 19, 2020

KirstenLangendorf commented May 19, 2020 •

edited

Loading

clinicaltrials.gov #40

clinicaltrials.gov #40

Comments

Jiros commented Mar 31, 2020

Repo

Description

Data Sources

Note

Dependencies

motey commented Apr 28, 2020

paltusplintus commented Apr 28, 2020 • edited Loading

KirstenLangendorf commented Apr 29, 2020

paltusplintus commented Apr 29, 2020

motey commented Apr 29, 2020 • edited Loading

KirstenLangendorf commented Apr 29, 2020

KirstenLangendorf commented May 6, 2020

motey commented May 7, 2020

KirstenLangendorf commented May 7, 2020

motey commented May 7, 2020 • edited Loading

KirstenLangendorf commented May 7, 2020

motey commented May 8, 2020

KirstenLangendorf commented May 12, 2020

motey commented May 12, 2020

mpreusse commented May 14, 2020

KirstenLangendorf commented May 15, 2020 • edited Loading

KirstenLangendorf commented May 17, 2020 • edited Loading

motey commented May 17, 2020

mpreusse commented May 17, 2020

KirstenLangendorf commented May 17, 2020

motey commented May 18, 2020 • edited Loading

KirstenLangendorf commented May 19, 2020

motey commented May 19, 2020

KirstenLangendorf commented May 19, 2020 • edited Loading

paltusplintus commented Apr 28, 2020 •

edited

Loading

motey commented Apr 29, 2020 •

edited

Loading

motey commented May 7, 2020 •

edited

Loading

KirstenLangendorf commented May 15, 2020 •

edited

Loading

KirstenLangendorf commented May 17, 2020 •

edited

Loading

motey commented May 18, 2020 •

edited

Loading

KirstenLangendorf commented May 19, 2020 •

edited

Loading