Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to use img-annotation v5.3 #44

Open
wants to merge 45 commits into
base: master
Choose a base branch
from

Conversation

kaijli
Copy link
Contributor

@kaijli kaijli commented Feb 12, 2025

Updated workflow according to these tickets:

#36 annotation update to 5.3 -genomad

In October JGI is updating to annotation 5.3 to sync with that we need to add a WLD for genomad that gets run as part of annotation pipeline.

Recommend setting this up as a separate WDL, having that WDL be imported to full_annotation.wdl

WDL task to wrap genomad.sh https://code.jgi.doe.gov/img/img-pipelines/img-annotation-pipeline/-/blob/5.3/bin/genomad/genomad.sh?ref_type=heads

Program and DB file will need to be added to the info file.

There should be a boolean flag to execute this.

#37 IMG 5.3 update - databases

database updates-

  • Pfam version 37.0
    /global/dna/projectdirs/microbial/omics/databases/Pfam/Pfam-A/37.0
  • img-nr
    /global/dna/projectdirs/microbial/omics/databases/IMG-NR/20240916
  • product lookup files
    /global/cfs/cdirs/m3408/refdata/img/Product_Name_Mappings/20250123/pfam.tsv
    /global/cfs/cdirs/m3408/refdata/img/Product_Name_Mappings/20250123/kegg.tsv

software updates-

  • LAST
    1. Update container to use LAST version 1584
    2. Add -m 180 to the lastal argument in the ko_ec task (line 198). This is separate from the -m argument for lastal_img_nr_ko_ec_gene_phylo_hit_selector.py
    3. memory should be 256 gb to get half a permutter node based on Marcel's testing
  • genomad -being tested in annotation update to 5.3 -genomad #36

Sample run at jaws id 98507

kaijli and others added 30 commits October 21, 2024 07:39
- Separate installation fo system upgrade / ca-certificates / other tools to different RUN
- Replace tRNAscan's downloading path to github due to trna.ucsc.edu's SSL certificate can't varified.
- Fix crt-cli-imgap-version-main's build directory
@kaijli kaijli requested a review from aclum February 12, 2025 22:24
@kaijli kaijli self-assigned this Feb 12, 2025
@kaijli kaijli linked an issue Feb 12, 2025 that may be closed by this pull request
@kaijli
Copy link
Contributor Author

kaijli commented Feb 12, 2025

Some things of note:
This PR is the minimum working product. Here are some more things that can be addressed

  • The dockerfile looks messy because I was dealing with CA certificate issues that were some combination of VPN, proxy, and openjdk-11-jdk disagreeing with Apple silicon.
    • I changed a lot of the wget/curls to ADD which solve some of my issues, but may not be best practice, so I will go back and sort through them
  • Do I need to update the workflow figure? I've updated the documentation so far
  • Is this a change to incorporate into NMDC EDGE or just automation?
  • Do we keep the existing version of genomad on NMDC EDGE as its separate entity?

@kaijli kaijli linked an issue Feb 12, 2025 that may be closed by this pull request
@kaijli kaijli requested a review from chienchi February 12, 2025 22:37
@chienchi
Copy link
Contributor

  1. For the docker issue, I think we should implement it using Github action to automatically build and push the GH registry eventually (#92). Hope this way will avoid build it on Mac ARM-based architecture using some simulated env.
  2. The workflow figure is pertty high level abstract and probably still appropriate for the update.
  3. Please create a ticket for the update to annotation workflow to 5.3 in NMDC-EDGE
  4. The genomad can be turn on/off in this updated annotation workflow. The NMDC-EDGE version may need update the corresponding output visualization part, we can turn off the genomad on the updated annotation workflow for now. Depends on the need of running genomad individually on NMDC-EDGE, we should discuss to keep it or remove it.

@aclum
Copy link
Collaborator

aclum commented Feb 18, 2025

At some point we missed a patch version bump to
INFERNAL 1.1.4 (Dec 2020)
and a minor version bump to HMMER 3.3.2, fix this by updating the ENV to 3.3.2 in line 68 of the Dockerfile
COG 2014. I've copied over the files to /global/cfs/cdirs/m3408/refdata/img/COG/2014

please make the genomad version info look the same in the info file as what Marcel has
geNomad Programs Used: seqkit v2.9.0; geNomad version 1.8.1
geNomad DBs Used: geNomad db v1.7

Genomad should be run on splits and then merged, not as its own step.

It would be great to update the image to include GeNomad.

@aclum
Copy link
Collaborator

aclum commented Feb 18, 2025

We also need to look how genomad is being running, looks like you are running it on the entire assembly and I suspect Marcel is running this on splits and then merging

RUN \
wget https://github.com/Larofeticus/hpc_hmmsearch/archive/a8d641046729328fdda97331d527edb2ce81510a.zip && \
unzip a8d641046729328fdda97331d527edb2ce81510a.zip && \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhuntemann is this correct or you are still using the old commit a8d641046729328fdda97331d527edb2ce81510a id?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old commit won't complied with the hmm version 3.3.2. But please confirmed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IMG 5.3 update - databases annotation update to 5.3 -genomad
4 participants