Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to use img-annotation v5.3 #44

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
ba5fe26
add jgi genomad integration
kaijli Oct 21, 2024
8538e3f
untested, but integrated genomad to annotation_full
kaijli Oct 22, 2024
8fd8947
awaiting jaws image for testing
kaijli Nov 3, 2024
7cfd74a
debugging
kaijli Nov 14, 2024
a81ac85
debugging
kaijli Nov 15, 2024
9a319ae
update genomad container
kaijli Dec 2, 2024
df9a915
Merge branch 'master' into 36-annotation-update-to-53-genomad
kaijli Dec 2, 2024
eb4659b
updating index
kaijli Dec 2, 2024
fed91f2
update readme
kaijli Dec 2, 2024
43e0600
updated some documentation
kaijli Dec 4, 2024
80c742a
testing different call methods
kaijli Dec 4, 2024
a8d523a
completed run, fixing file names
kaijli Dec 6, 2024
2bebccb
trying entrypoint.sh script
kaijli Dec 7, 2024
a85b8e6
successful genomad run, testing full annotation
kaijli Dec 10, 2024
9f3aff0
clean up commented code
kaijli Dec 10, 2024
8586e89
testing new container
kaijli Dec 18, 2024
402f26f
added changes from ticket 249
kaijli Jan 9, 2025
e2a90f1
update memory and -m 180 for ko_ec
kaijli Jan 14, 2025
c9e97e0
testing genomad sed in job
kaijli Jan 14, 2025
40984a3
push from nersc for shutdown
kaijli Jan 27, 2025
30a9a92
update databases in index.rst
kaijli Jan 27, 2025
8bbe6e3
push to test locally
kaijli Jan 29, 2025
0f220ac
working on updating docker image
kaijli Feb 3, 2025
5a6f9f6
fix some warnings and testing add vs run and ca certs
kaijli Feb 4, 2025
5669bea
working on cert issues
kaijli Feb 5, 2025
2bea8f6
successful genomad file renaming
kaijli Feb 5, 2025
f502523
Merge branch '36-annotation-update-to-53-genomad' of https://github.c…
kaijli Feb 5, 2025
ce2eabf
Update Dockerfile
poeli Feb 8, 2025
23e0fd3
changing layers for apt
kaijli Feb 10, 2025
298af40
Merge branch 'poeli-patch-1' into 36-annotation-update-to-53-genomad
kaijli Feb 10, 2025
edca75c
finally got an image to build
kaijli Feb 11, 2025
5692986
test new image with lastal update
kaijli Feb 11, 2025
23ab582
Merge branch '36-annotation-update-to-53-genomad' of https://github.c…
kaijli Feb 11, 2025
08f628a
successful run of full workflow
kaijli Feb 12, 2025
4dc1571
update LAST version in readme
kaijli Feb 12, 2025
937fea2
update hmmer version to 3.3.2. infernal version to 1.1.4.
Feb 26, 2025
8695bd1
minor version bumps and genomad info changes
kaijli Feb 26, 2025
18d435f
Merge branch '36-annotation-update-to-53-genomad' of https://github.c…
kaijli Feb 26, 2025
11d9fd1
fix typo on Dockerfile and add openjdk which required by CRT tool
Feb 26, 2025
828f43f
for hmmer version 3.3.2 the hpc_hmmsearch should use the code in mast…
Feb 26, 2025
adbc5ab
remove add commands for use with github action
kaijli Feb 28, 2025
5a3a538
change img annotation pipeline calling
kaijli Feb 28, 2025
62709f4
using git clone instead of pulling zip file. need to update to shas
kaijli Feb 28, 2025
4a83a42
forgot infernal prefix
kaijli Feb 28, 2025
4691cc4
forgot v for transcan
kaijli Feb 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
209 changes: 135 additions & 74 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,115 +1,172 @@
FROM debian:bullseye as buildbase
FROM debian:bullseye AS buildbase

# Update and clean package lists
RUN apt-get -y update \
&& apt-get -y upgrade \
&& apt-get -y clean

# Install CA certificates
RUN apt-get -y update && apt-get -y install ca-certificates
# RUN apt-get -y install ca-certificates
RUN update-ca-certificates --fresh

# Install OpenJDK
# original: RUN apt-get -y install openjdk-11-jdk
# for building on arm / mac machine for amd
RUN apt-get -y update && apt-get install -y openjdk-11-jdk:amd64
# potential fix with openjdk:19-alpine following this comment, if we want
# to use wget instead of ADD (which is better practice)
# https://forums.docker.com/t/how-to-make-wget-run-in-docker/140555/6

# Install essential packages
RUN apt-get -y install \
git \
gcc \
make \
wget \
time \
autoconf \
unzip \
curl \
libz-dev \
g++

RUN apt-get -y update && apt-get -y install git gcc make wget time autoconf unzip curl

RUN apt-get -y install libz-dev
#
# Build prodigal
########## Build prodigal
#
FROM buildbase as prodigal
FROM buildbase AS prodigal
#4/20/23 Marcel is using a patched version, get from NERSC instead of offical repo
RUN \
cd /opt && \
wget http://portal.nersc.gov/dna/metagenome/assembly/prodigal_2.6.3_patched/prodigal && \
chmod 755 prodigal

#RUN git clone --branch v2.6.3 https://github.com/hyattpd/Prodigal

#RUN cd Prodigal && make install
ADD --chmod=755 http://portal.nersc.gov/dna/metagenome/assembly/prodigal_2.6.3_patched/prodigal /opt/
# RUN \
# cd /opt && \
# wget http://portal.nersc.gov/dna/metagenome/assembly/prodigal_2.6.3_patched/prodigal && \
# chmod 755 prodigal

#RUN git clone --branch v2.6.3 https://github.com/hyattpd/Prodigal
#RUN cd Prodigal && make install


# Build trnascan 2.0.08
######### Build trnascan
#
FROM buildbase as trnascan
FROM buildbase AS trnascan
ADD https://github.com/UCSC-LoweLab/tRNAscan-SE/archive/refs/tags/v2.0.12.tar.gz .

RUN wget http://trna.ucsc.edu/software/trnascan-se-2.0.12.tar.gz
# RUN wget https://github.com/UCSC-LoweLab/tRNAscan-SE/archive/refs/tags/v2.0.12.tar.gz

RUN \
tar xzvf trnascan-se-2.0.12.tar.gz && \
cd tRNAscan-SE-2.0 && \
tar -xzf v2.0.12.tar.gz && \
cd tRNAscan-SE-2.0.12 && \
./configure --prefix=/opt/omics/programs/tRNAscan-SE/tRNAscan-SE-2.0.12/ && \
make && make install

#
# Build HMMER 3.1b2 with HPC enhancements from Arndt
########## Build HMMER 3.1b2 with HPC enhancements from Arndt
#
FROM buildbase as hmm
FROM buildbase AS hmm

ENV V=3.1b2
ADD http://eddylab.org/software/hmmer/hmmer-$V.tar.gz /opt/
RUN \
cd /opt && \
wget http://eddylab.org/software/hmmer/hmmer-$V.tar.gz && \
tar -zxvf hmmer-$V.tar.gz && \
# wget http://eddylab.org/software/hmmer/hmmer-$V.tar.gz && \
tar -zxf hmmer-$V.tar.gz && \
cd hmmer-$V && ./configure --prefix /opt/omics/programs/hmmer/ && \
make && make install

# get and extract commit sha a8d641046729328fdda97331d527edb2ce81510a of master branch of modification file, copy into hmmer source code
RUN \
wget https://github.com/Larofeticus/hpc_hmmsearch/archive/a8d641046729328fdda97331d527edb2ce81510a.zip && \
unzip a8d641046729328fdda97331d527edb2ce81510a.zip && \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhuntemann is this correct or you are still using the old commit a8d641046729328fdda97331d527edb2ce81510a id?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old commit won't complied with the hmm version 3.3.2. But please confirmed.

cp /hpc_hmmsearch-*/hpc_hmmsearch.c /opt/hmmer-3.1b2/src && \
cp /hpc_hmmsearch-*/hpc_hmmsearch.c /opt/hmmer-$V/src && \
cd /opt/hmmer-$V/src && \
gcc -std=gnu99 -O3 -fomit-frame-pointer -fstrict-aliasing -march=core2 -fopenmp -fPIC -msse2 -DHAVE_CONFIG_H -I../easel -I../libdivsufsort -I../easel -I. -I. -o hpc_hmmsearch.o -c hpc_hmmsearch.c && \
gcc -std=gnu99 -O3 -fomit-frame-pointer -fstrict-aliasing -march=core2 -fopenmp -fPIC -msse2 -DHAVE_CONFIG_H -L../easel -L./impl_sse -L../libdivsufsort -L. -o hpc_hmmsearch hpc_hmmsearch.o -lhmmer -leasel -ldivsufsort -lm && \
cp hpc_hmmsearch /opt/omics/programs/hmmer/bin/ && \
/opt/omics/programs/hmmer/bin/hpc_hmmsearch -h
# Build last 1456

########## Build last 1584
#
FROM buildbase as last

RUN apt-get -y install g++

FROM buildbase AS last

# RUN \
# wget https://gitlab.com/mcfrith/last/-/archive/1584/last-1584.tar.gz && \
# tar -zxf last-1584.tar.gz
ADD https://gitlab.com/mcfrith/last/-/archive/1584/last-1584.tar.gz .
# RUN curl -L https://gitlab.com/mcfrith/last/-/archive/1584/last-1584.tar.gz
# RUN tar -zxf last-1584.tar.gz
RUN \
git clone --depth 1 --branch 1456 https://gitlab.com/mcfrith/last && \
cd last && \
tar -zxf last-1584.tar.gz && \
cd last-1584 && \
make && \
make prefix=/opt/omics/programs/last install

# Build infernal 1.1.3
# RUN \
# git clone --depth 1 --branch 1584 https://gitlab.com/mcfrith/last && \
# cd last && \
# make && \
# make prefix=/opt/omics/programs/last install

########## Build infernal 1.1.3
#
FROM buildbase as infernal
FROM buildbase AS infernal

RUN \
wget http://eddylab.org/infernal/infernal-1.1.3.tar.gz && \
tar xzf infernal-1.1.3.tar.gz
tar -zxf infernal-1.1.3.tar.gz

RUN \
cd infernal-1.1.3 && \
./configure --prefix=/opt/omics/programs/infernal/infernal-1.1.3 && \
make && make install

#
# IMG scripts and tools v 5.1.14, repo is public 4/2023. Add split.py from bfoster1/img-omics:0.1.12 (md5sum 21fb20bf430e61ce55430514029e7a83)
########## IMG scripts and tools v 5.1.14, repo is public 4/2023. Add split.py from bfoster1/img-omics:0.1.12 (md5sum 21fb20bf430e61ce55430514029e7a83)
#
FROM buildbase as img
FROM buildbase AS img

RUN \
cd /opt && \
git clone -b scaffold-lineage https://code.jgi.doe.gov/img/img-pipelines/img-annotation-pipeline
# RUN \
# cd /opt && \
# git clone --depth 1 --branch 5.3 https://code.jgi.doe.gov/img/img-pipelines/img-annotation-pipeline

ADD https://code.jgi.doe.gov/img/img-pipelines/img-annotation-pipeline/-/archive/5.3.0/img-annotation-pipeline-5.3.0.tar.gz /opt/

RUN \
cd /opt && \
curl https://code.jgi.doe.gov/official-jgi-workflows/jgi-wdl-pipelines/img-omics/-/raw/83c5483f0fd8afc43a2956ed065bffc08d8574da/bin/split.py > split.py && \
chmod 755 split.py
cd /opt && \
tar -zxvf img-annotation-pipeline-5.3.0.tar.gz
# && \
# mkdir img-annotation-pipeline && \
# mv img-annotation-pipeline-5.3.0/* img-annotation-pipeline/ && \
# ls img-annotation-pipeline

# MetaGeneMark version was updated for img annotation pipeline 5.1.*
ADD --chmod=755 https://code.jgi.doe.gov/official-jgi-workflows/jgi-wdl-pipelines/img-omics/-/raw/83c5483f0fd8afc43a2956ed065bffc08d8574da/bin/split.py /opt/
# RUN \
# cd /opt && \
# curl https://code.jgi.doe.gov/official-jgi-workflows/jgi-wdl-pipelines/img-omics/-/raw/83c5483f0fd8afc43a2956ed065bffc08d8574da/bin/split.py > split.py && \
# chmod 755 split.py

########## MetaGeneMark version was updated for img annotation pipeline 5.1.*

ADD http://portal.nersc.gov/dna/metagenome/assembly/gms2_linux_64.v1.14_1.25_lic.tar.gz /opt/
RUN \
cd /opt && \
wget http://portal.nersc.gov/dna/metagenome/assembly/gms2_linux_64.v1.14_1.25_lic.tar.gz && \
tar -zxvf gms2_linux_64.v1.14_1.25_lic.tar.gz && \
#chmod -R 755 omics && \
rm gms2_linux_64.v1.14_1.25_lic.tar.gz

RUN apt-get update && apt-get install -y openjdk-11-jdk
# get CRT version 1.8.4
# RUN \
# cd /opt && \
# wget http://portal.nersc.gov/dna/metagenome/assembly/gms2_linux_64.v1.14_1.25_lic.tar.gz && \
# tar -zxvf gms2_linux_64.v1.14_1.25_lic.tar.gz && \
# #chmod -R 755 omics && \
# rm gms2_linux_64.v1.14_1.25_lic.tar.gz

#
########## get CRT version 1.8.4
ADD https://code.jgi.doe.gov/img/img-pipelines/crt-cli-imgap-version/-/archive/main/crt-cli-imgap-version-main.zip .
RUN \
wget https://code.jgi.doe.gov/img/img-pipelines/crt-cli-imgap-version/-/archive/main/crt-cli-imgap-version-main.zip && \
unzip crt-cli-imgap-version-main.zip && \
cd crt-cli-imgap-version-main && \
# wget https://code.jgi.doe.gov/img/img-pipelines/crt-cli-imgap-version/-/archive/main/crt-cli-imgap-version-main.zip && \
unzip -q crt-cli-imgap-version-main.zip && \
cd crt-cli-imgap-version-main/src && \
javac *.java && \
jar cfe CRT-CLI.jar crt *.class && \
cp CRT-CLI.jar /opt/.
Expand All @@ -119,45 +176,51 @@ RUN \
#
# Build the final image
#
FROM buildbase as conda
FROM buildbase AS conda

# Install Miniconda
########## Install Miniconda
#
RUN \
wget -q https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
bash ./Miniconda3-latest-Linux-x86_64.sh -b -p /miniconda3
# RUN \
# replaced by ADD
# wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \

ENV PATH /miniconda3/bin:/miniconda3/condabin:$PATH
ADD https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh .
RUN bash ./Miniconda3-latest-Linux-x86_64.sh -b -p /miniconda3

RUN conda config --add channels conda-forge && conda config --add channels bioconda && conda config --add channels anaconda
ENV PATH=/miniconda3/bin:/miniconda3/condabin:$PATH

RUN conda config --add channels conda-forge && conda config --add channels bioconda && conda config --add channels anaconda
RUN conda install -y conda-forge::ca-certificates
RUN conda install -y curl git wget jq parallel pyyaml openjdk perl-getopt-long bc procps-ng

RUN conda clean -y -a

#
# Install Cromwell v49
########## Install Cromwell v49
#
FROM buildbase as cromwell
FROM buildbase AS cromwell

RUN \
mkdir -p /opt/omics/bin && \
cd /opt/omics/bin && \
wget -q https://github.com/broadinstitute/cromwell/releases/download/49/cromwell-49.jar && \
ln -sf cromwell-49.jar cromwell.jar
RUN mkdir -p /opt/omics/bin
ADD https://github.com/broadinstitute/cromwell/releases/download/49/cromwell-49.jar /opt/omics/bin/
RUN ln -sf cromwell-49.jar cromwell.jar
# RUN \
# mkdir -p /opt/omics/bin && \
# cd /opt/omics/bin && \
# wget -q https://github.com/broadinstitute/cromwell/releases/download/49/cromwell-49.jar && \
# ln -sf cromwell-49.jar cromwell.jar

FROM buildbase

ENV PERL5LIB '/opt/omics/lib'
ENV PERL5LIB='/opt/omics/lib'
COPY --from=conda /miniconda3 /miniconda3

# conda shell.posix activate
ENV PATH '/miniconda3/bin:/miniconda3/condabin:/opt/omics/bin:/opt/omics/bin/functional_annotation:/opt/omics/bin/qc/post-annotation:/opt/omics/bin/qc/pre-annotation:/opt/omics/bin/structural_annotation:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
ENV CONDA_PREFIX '/miniconda3'
ENV CONDA_EXE '/miniconda3/bin/conda'
ENV _CE_M ''
ENV _CE_CONDA ''
ENV CONDA_PYTHON_EXE '/miniconda3/bin/python'
ENV PATH='/miniconda3/bin:/miniconda3/condabin:/opt/omics/bin:/opt/omics/bin/functional_annotation:/opt/omics/bin/qc/post-annotation:/opt/omics/bin/qc/pre-annotation:/opt/omics/bin/structural_annotation:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
ENV CONDA_PREFIX='/miniconda3'
ENV CONDA_EXE='/miniconda3/bin/conda'
ENV _CE_M=''
ENV _CE_CONDA=''
ENV CONDA_PYTHON_EXE='/miniconda3/bin/python'

COPY --from=cromwell /opt/omics/bin/ /opt/omics/bin/

Expand All @@ -172,12 +235,11 @@ COPY --from=last /opt/omics/programs/last/ /opt/omics/programs/last
COPY --from=last /opt/omics/programs/last /opt/omics/programs/last
COPY --from=img /opt/CRT-CLI.jar /opt/omics/programs/CRT/CRT-CLI.jar
COPY --from=img /opt/split.py /opt/omics/bin/split.py
#COPY --from=img /opt/omics/programs/tmhmm-2.0c /opt/omics/programs/tmhmm-2.0c

COPY --from=infernal /opt/omics/programs/infernal /opt/omics/programs/infernal/
COPY --from=img /opt/img-annotation-pipeline/bin/ /opt/omics/bin/
COPY --from=img /opt/img-annotation-pipeline-5.3.0/bin/ /opt/omics/bin/
COPY --from=img /opt/gms2_linux_64 /opt/omics/programs/gms2_linux_64
COPY --from=img /opt/img-annotation-pipeline/VERSION /opt/omics/VERSION
COPY --from=img /opt/img-annotation-pipeline-5.3.0/VERSION /opt/omics/VERSION
RUN \
mkdir /opt/omics/lib && cd /opt/omics/lib && \
ln -s ../programs/tRNAscan-SE/tRNAscan-SE-2.0.12/lib/tRNAscan-SE/* .
Expand All @@ -203,4 +265,3 @@ RUN \
ln -s /opt/omics/programs/infernal/infernal-1.1.3/bin/cmscan

#COPY --from=img /opt/omics /opt/omics3/

4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,9 @@ A JSON file containing the following:
- CRT-CLI 1.8.4 (Public domain software, last official version is 1.2)
- Prodigal 2.6.3_patched (GNU GPL v3)
- GeneMarkS-2 >= 1.25 ([Academic license for GeneMark family software](http://topaz.gatech.edu/GeneMark/license_download.cgi))
- Last >= 1456 (GNU GPL v3)
- Last >= 1584 (GNU GPL v3)
- HMMER 3.1b2 (3-clause BSD, [thread optimized hmmsearch](https://github.com/Larofeticus/hpc_hmmsearch))
- GeNomad 1.8.1 (GNU GPL v3, pulled from [IMG Annotation Pipeline repo](https://code.jgi.doe.gov/img/img-pipelines/img-annotation-pipeline))


#### Databases used (+ their licenses):
Expand All @@ -49,3 +50,4 @@ A JSON file containing the following:
- SUPERFAMILY (permissive/custom); [more info](http://reusabledata.org/supfam)
- Pfam (public domain/ CC0 1.0); [more info](http://reusabledata.org/pfam)
- Cath-FunFam (permissive/CC BY 4.0); [more info](http://reusabledata.org/cath)
- GeNomad DB v1.7 (permissive/CC BY 4.0; [more info](https://zenodo.org/records/10594875))
Loading