Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support back in for GEO IDs #155

Merged
merged 11 commits into from
May 16, 2023
Merged

Add support back in for GEO IDs #155

merged 11 commits into from
May 16, 2023

Conversation

ejseqera
Copy link
Contributor

@ejseqera ejseqera commented May 10, 2023

Closes #104

Adding support for GEO IDs back in by:

  • Resolving GSM* (GEO sample accessions) and GSE* (GEO series/experiment IDs which contain multiple GSM IDs) IDs differently
  • Updating the API endpoint for GSM IDs (to use esearch) and resolve to SRA IDs directly
  • Use esummary to resolve GSE IDs -> GDS UIDs (with db: gds) then esearch to resolve GDS UIDs -> GSM UIDs -> SRA
  • Updated relevant documentation and removed error handling added in v1.7 to catch GEO IDs

Has been tested on the following GEO IDs:
GSM4907283
GSM465244
GSM465247
GSE215376
GSE18729

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/fetchngs branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@ejseqera ejseqera added this to the 1.10.0 milestone May 10, 2023
@ejseqera ejseqera self-assigned this May 10, 2023
@ejseqera ejseqera changed the title Geoids Add support back in for GEO IDs May 10, 2023
@github-actions
Copy link

github-actions bot commented May 10, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit dc72345

+| ✅ 149 tests passed       |+
#| ❔   3 tests were ignored |#
!| ❗   7 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: conf/igenomes.config
  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml
  • files_exist - File not found: lib/WorkflowFetchngs.groovy
  • nextflow_config - Config manifest.version should end in dev: '1.10.0'
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline
  • schema_description - No description provided in schema for parameter: dbgap_key

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/fetchngs/fetchngs/.github/workflows/awstest.yml

✅ Tests passed:

Run details

  • nf-core/tools version 2.8
  • Run at 2023-05-16 02:21:51

@ejseqera ejseqera requested a review from drpatelh May 10, 2023 23:23
@ejseqera
Copy link
Contributor Author

ejseqera commented May 12, 2023

We'll need to update the test-datasets for fetchngs to include a GSM ID (maybe GSE ID aswell?) for a small dataset

@drpatelh
Copy link
Member

drpatelh commented May 13, 2023

This is the GEO id I was using previously for testing GSM4907283 before we removed support (117480 raw reads). See this commit. If we don't have smaller, better options we can just add this back into the same file. Be nice to test for an id with multiple runs per experiment.

@drpatelh
Copy link
Member

drpatelh commented May 14, 2023

Added GSM4907283 back in nf-core/test-datasets@45bdee9

But as I mentioned in the previous comment, be nice to find another small GEO id that is composed of multiple runs for a given experiment.

@ejseqera
Copy link
Contributor Author

Made a PR to add GSE214215 for a series/experiment that is composed of 2 SRA runs, see nf-core/test-datasets#887.

CHANGELOG.md Outdated Show resolved Hide resolved
@drpatelh drpatelh mentioned this pull request May 15, 2023
@ejseqera ejseqera requested a review from drpatelh May 16, 2023 02:31
Copy link
Member

@drpatelh drpatelh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gorgeous!

@drpatelh drpatelh merged commit 7a5417d into nf-core:dev May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants