-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clinicaltrials.gov #40
Comments
With Source: https://clinicaltrials.gov/ct2/resources/download#DownloadMultipleRecords |
I would suggest to start with loading the basic trial description data (only for trials relevant to COVID) from clinicaltrials.gov api endpoint (see cypher query attached) load_covid_trials_clintrials_gov.txt Unfortunately max_rnk for this query is limited to 1000, so when there are more than 1000 trials in total, the query should be splitted into several more specific queries (e.g. per trial phase) - 'expr=covid' to be updated. After the basic info is loaded as nodes, some parts of it (e.g. PrimaryOutcomeMeasure) could be already parsed and linked to other data in the graph. MATCH (ct:ClinicalTrial) Description of the API: https://clinicaltrials.gov/api/gui/ref/api_urls |
I have started to write the missing code (// add code to store Neo) unwinding the relevant info from JSON being returned. Do you suggest to add this additional info as nodes or as properties to the ClinicalTrial type nodes? I guess from your text it should be new nodes linked to ClinicalTrial nodes? |
Yes, I suggest separate nodes linked to ClinicalTrial, especially the data that could be linked to other data in the graph: what comes to my mind - endpoints, inclusion/exclusion criteria. If you feel that some of the data is not relevant for linking, we could leave it as a properties for now and refactor the graph in the future if required to link this data. |
Awesome! |
ok, thanks. BTW there seems to be 1095 studies containing COVID. I have downloaded the JSON and will use that instead of the URL having the limit of 1000. |
Hi, sorry but been busy with daily work and needed to get my head around the JSON input data. I have made a first attempt. For COVID studies I could not find any results, yet. PrimaryOutcomeMeasure are made as nodes, but the data is a bit messy. I have made my script in Jupyter notes (attached) using my own local graph for testing (that can be changed). Comments/feedback are |
@KirstenLangendorf Great work! The json is a mess (why is every single attribute value wrapped in list :D ? ) but looks like you tamed it 😎 |
Let me try it out. No rush - tomorrow is a Danish bank holiday. I think I will add Eligibility as nodes too. Saw the presentation by Martin Preusse and it seems that you are using Machine Learning type tools to combine messy data. Which tools are you using? There is more data in the ClinicalTrials.gov - and hopefully also some study results at some point. What is the best way for me to get information about important data needed for the rest of the graph? will that be reading the use cases? |
The ML/NLP Team is still in a experimentation/poc phase (as far as i can keep track of that atm). if you are interested in can invite you in the chat group.
afaik atm there is no standardized process to determine that. A discussion in the CovidGraph chat group would be the most purposeful way atm. |
|
Just saw you are already in the group :) (CovidGraph Data Analysis) |
Hi Tim, |
Hi Kirsten, you can start with https://github.com/covidgraph/data_template by clicking "Use this template" in the github webinterface. If you need any further help with git,docker or python just ping me in the chat. |
@KirstenLangendorf I can also help with the data loading template! |
Thanks:-) I will start looking at the loading tomorrow. I am at work today. Ok, couldn't help it. Had to look :-) Do I just paste the queries I have in after line 22 (delete the rest)? in ..ok I will read trough the instruction and revert once I have everything in my Github template. |
@tim and @mpreusse I have now put the script on the dataloader folder: load_data and data_profile for the stats queries. https://github.com/KirstenLangendorf/load_clinical_trials_gov I need help on the rest since I not quite sure how to make it execute and publish in the right way. |
@KirstenLangendorf cool! i will have a deeper look at it tomorrow, fork it and and try to bring it in an executable state. |
@KirstenLangendorf that looks great! @motey tell me if I can help. Looks similar to e.g. the text fragger, there are now downloads but only Cypher queries. Pretty long ones though 😄 |
I know the queries are long but It was to avoid calling the ClinicalTrials.gov json several times. |
@KirstenLangendorf Hi Kirsten. i have done following things today:
If you could test the repo against a neo4j db? I am to lazy to setup a local neo4j instance with apoc :) If the tests are successful i can integrate your script in the covidgraph dataloader pipeline 🚀 |
Installed the Docker app. |
stupid question, but did you try it with sudo :=) ? |
nope - I can try. It reported same error :-( Couldn't see your message on Riot - encrypted |
For more information and comprehensive guidance see the excellent article from Kirsten Langendorf -
https://www.s-cubed-global.com/news/covidgraph-nerds-response-to-the-pandemic
Repo
https://github.com/covidgraph/data_clinical-trials-gov
Description
Suggested by - lynnehansen
Add data about clinical trials. There are a few databases where the results of clinical trials are published. The most relevant general purpose databases are clinicaltrials.gov and clinicaltrialsregister.eu
There might be two more datasources:
Data Sources
https://clinicaltrials.gov/
https://www.clinicaltrialsregister.eu/
Note
All clinical studies registered on https://clinicaltrials.gov/ related to covid19.
Dependencies
None
The text was updated successfully, but these errors were encountered: