Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to use img-annotation v5.3 #44

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
ba5fe26
add jgi genomad integration
kaijli Oct 21, 2024
8538e3f
untested, but integrated genomad to annotation_full
kaijli Oct 22, 2024
8fd8947
awaiting jaws image for testing
kaijli Nov 3, 2024
7cfd74a
debugging
kaijli Nov 14, 2024
a81ac85
debugging
kaijli Nov 15, 2024
9a319ae
update genomad container
kaijli Dec 2, 2024
df9a915
Merge branch 'master' into 36-annotation-update-to-53-genomad
kaijli Dec 2, 2024
eb4659b
updating index
kaijli Dec 2, 2024
fed91f2
update readme
kaijli Dec 2, 2024
43e0600
updated some documentation
kaijli Dec 4, 2024
80c742a
testing different call methods
kaijli Dec 4, 2024
a8d523a
completed run, fixing file names
kaijli Dec 6, 2024
2bebccb
trying entrypoint.sh script
kaijli Dec 7, 2024
a85b8e6
successful genomad run, testing full annotation
kaijli Dec 10, 2024
9f3aff0
clean up commented code
kaijli Dec 10, 2024
8586e89
testing new container
kaijli Dec 18, 2024
402f26f
added changes from ticket 249
kaijli Jan 9, 2025
e2a90f1
update memory and -m 180 for ko_ec
kaijli Jan 14, 2025
c9e97e0
testing genomad sed in job
kaijli Jan 14, 2025
40984a3
push from nersc for shutdown
kaijli Jan 27, 2025
30a9a92
update databases in index.rst
kaijli Jan 27, 2025
8bbe6e3
push to test locally
kaijli Jan 29, 2025
0f220ac
working on updating docker image
kaijli Feb 3, 2025
5a6f9f6
fix some warnings and testing add vs run and ca certs
kaijli Feb 4, 2025
5669bea
working on cert issues
kaijli Feb 5, 2025
2bea8f6
successful genomad file renaming
kaijli Feb 5, 2025
f502523
Merge branch '36-annotation-update-to-53-genomad' of https://github.c…
kaijli Feb 5, 2025
ce2eabf
Update Dockerfile
poeli Feb 8, 2025
23e0fd3
changing layers for apt
kaijli Feb 10, 2025
298af40
Merge branch 'poeli-patch-1' into 36-annotation-update-to-53-genomad
kaijli Feb 10, 2025
edca75c
finally got an image to build
kaijli Feb 11, 2025
5692986
test new image with lastal update
kaijli Feb 11, 2025
23ab582
Merge branch '36-annotation-update-to-53-genomad' of https://github.c…
kaijli Feb 11, 2025
08f628a
successful run of full workflow
kaijli Feb 12, 2025
4dc1571
update LAST version in readme
kaijli Feb 12, 2025
937fea2
update hmmer version to 3.3.2. infernal version to 1.1.4.
Feb 26, 2025
8695bd1
minor version bumps and genomad info changes
kaijli Feb 26, 2025
18d435f
Merge branch '36-annotation-update-to-53-genomad' of https://github.c…
kaijli Feb 26, 2025
11d9fd1
fix typo on Dockerfile and add openjdk which required by CRT tool
Feb 26, 2025
828f43f
for hmmer version 3.3.2 the hpc_hmmsearch should use the code in mast…
Feb 26, 2025
adbc5ab
remove add commands for use with github action
kaijli Feb 28, 2025
5a3a538
change img annotation pipeline calling
kaijli Feb 28, 2025
62709f4
using git clone instead of pulling zip file. need to update to shas
kaijli Feb 28, 2025
4a83a42
forgot infernal prefix
kaijli Feb 28, 2025
4691cc4
forgot v for transcan
kaijli Feb 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
221 changes: 141 additions & 80 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,195 +1,258 @@
FROM debian:bullseye as buildbase
FROM debian:bullseye AS buildbase

# Update and clean package lists
RUN apt-get -y update \
&& apt-get -y upgrade \
&& apt-get -y clean

# Install CA certificates
RUN apt-get -y update && apt-get -y install ca-certificates
RUN update-ca-certificates --fresh

# Install OpenJDK
# for building on arm / mac machine for amd, use `openjdk-11-jdk:amd64`
RUN apt-get -y update && apt-get install -y openjdk-11-jdk
# potential fix with openjdk:19-alpine following this comment, if we want
# to use wget instead of ADD (which is better practice) for building on MacOS
# https://forums.docker.com/t/how-to-make-wget-run-in-docker/140555/6

# Install essential packages
RUN apt-get -y install \
git \
gcc \
make \
wget \
time \
autoconf \
unzip \
curl \
libz-dev \
g++

RUN apt-get -y update && apt-get -y install git gcc make wget time autoconf unzip curl

RUN apt-get -y install libz-dev
#
# Build prodigal
########## Build prodigal
#
FROM buildbase as prodigal
FROM buildbase AS prodigal
#4/20/23 Marcel is using a patched version, get from NERSC instead of offical repo

# ADD --chmod=755 http://portal.nersc.gov/dna/metagenome/assembly/prodigal_2.6.3_patched/prodigal /opt/
RUN \
cd /opt && \
wget http://portal.nersc.gov/dna/metagenome/assembly/prodigal_2.6.3_patched/prodigal && \
chmod 755 prodigal

#RUN git clone --branch v2.6.3 https://github.com/hyattpd/Prodigal

#RUN cd Prodigal && make install

#RUN cd Prodigal && make install


# Build trnascan 2.0.08
######### Build trnascan
#
FROM buildbase as trnascan
FROM buildbase AS trnascan
ENV trnascan_ver=2.0.12
# ADD https://github.com/UCSC-LoweLab/tRNAscan-SE/archive/refs/tags/v${trnascan_ver}.tar.gz .

RUN wget http://trna.ucsc.edu/software/trnascan-se-2.0.12.tar.gz
RUN git clone --depth 1 --branch v${trnascan_ver} https://github.com/UCSC-LoweLab/tRNAscan-SE
# RUN wget https://github.com/UCSC-LoweLab/tRNAscan-SE/archive/refs/tags/v${trnascan_ver}.tar.gz

RUN \
tar xzvf trnascan-se-2.0.12.tar.gz && \
cd tRNAscan-SE-2.0 && \
./configure --prefix=/opt/omics/programs/tRNAscan-SE/tRNAscan-SE-2.0.12/ && \
# tar -xzf v${trnascan_ver}.tar.gz && \
# cd tRNAscan-SE-${trnascan_ver} && \
cd tRNAscan-SE && \
./configure --prefix=/opt/omics/programs/tRNAscan-SE/ && \
make && make install

#
# Build HMMER 3.1b2 with HPC enhancements from Arndt
########## Build HMMER 3.3.2
#
FROM buildbase as hmm
FROM buildbase AS hmm

ENV V=3.1b2
ENV hmm_ver=3.3.2
# ADD http://eddylab.org/software/hmmer/hmmer-${hmm_ver}.tar.gz /opt/
RUN \
cd /opt && \
wget http://eddylab.org/software/hmmer/hmmer-$V.tar.gz && \
tar -zxvf hmmer-$V.tar.gz && \
cd hmmer-$V && ./configure --prefix /opt/omics/programs/hmmer/ && \
wget http://eddylab.org/software/hmmer/hmmer-${hmm_ver}.tar.gz && \
tar -zxf hmmer-${hmm_ver}.tar.gz && \
cd hmmer-${hmm_ver} && ./configure --prefix /opt/omics/programs/hmmer/ && \
make && make install

# get and extract commit sha a8d641046729328fdda97331d527edb2ce81510a of master branch of modification file, copy into hmmer source code
## for hmmer version 3.3.2 the hpc_hmmsearch should use the code in master branch
RUN \
wget https://github.com/Larofeticus/hpc_hmmsearch/archive/a8d641046729328fdda97331d527edb2ce81510a.zip && \
unzip a8d641046729328fdda97331d527edb2ce81510a.zip && \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhuntemann is this correct or you are still using the old commit a8d641046729328fdda97331d527edb2ce81510a id?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old commit won't complied with the hmm version 3.3.2. But please confirmed.

cp /hpc_hmmsearch-*/hpc_hmmsearch.c /opt/hmmer-3.1b2/src && \
cd /opt/hmmer-$V/src && \
wget https://github.com/Larofeticus/hpc_hmmsearch/archive/master.zip && \
unzip master.zip && \
cp /hpc_hmmsearch-*/hpc_hmmsearch.c /opt/hmmer-${hmm_ver}/src && \
cd /opt/hmmer-${hmm_ver}/src && \
gcc -std=gnu99 -O3 -fomit-frame-pointer -fstrict-aliasing -march=core2 -fopenmp -fPIC -msse2 -DHAVE_CONFIG_H -I../easel -I../libdivsufsort -I../easel -I. -I. -o hpc_hmmsearch.o -c hpc_hmmsearch.c && \
gcc -std=gnu99 -O3 -fomit-frame-pointer -fstrict-aliasing -march=core2 -fopenmp -fPIC -msse2 -DHAVE_CONFIG_H -L../easel -L./impl_sse -L../libdivsufsort -L. -o hpc_hmmsearch hpc_hmmsearch.o -lhmmer -leasel -ldivsufsort -lm && \
cp hpc_hmmsearch /opt/omics/programs/hmmer/bin/ && \
/opt/omics/programs/hmmer/bin/hpc_hmmsearch -h
# Build last 1456

########## Build last 1584
#
FROM buildbase as last
FROM buildbase AS last
ENV last_ver=1584

# ADD https://gitlab.com/mcfrith/last/-/archive/${last_ver}/last-${last_ver}.tar.gz .

RUN apt-get -y install g++
# RUN \
# tar -zxf last-${last_ver}.tar.gz && \
# cd last-${last_ver} && \
# make && \
# make prefix=/opt/omics/programs/last install

RUN \
git clone --depth 1 --branch 1456 https://gitlab.com/mcfrith/last && \
git clone --depth 1 --branch ${last_ver} https://gitlab.com/mcfrith/last && \
cd last && \
make && \
make prefix=/opt/omics/programs/last install

# Build infernal 1.1.3
########## Build infernal 1.1.4
#
FROM buildbase as infernal
FROM buildbase AS infernal

RUN \
wget http://eddylab.org/infernal/infernal-1.1.3.tar.gz && \
tar xzf infernal-1.1.3.tar.gz
ENV infernal_ver=1.1.4

# RUN \
# wget http://eddylab.org/infernal/infernal-${infernal_ver}.tar.gz && \
# tar -zxf infernal-${infernal_ver}.tar.gz

RUN git clone --depth 1 --branch infernal-${infernal_ver} https://github.com/EddyRivasLab/infernal

RUN \
cd infernal-1.1.3 && \
./configure --prefix=/opt/omics/programs/infernal/infernal-1.1.3 && \
cd infernal && \
./configure --prefix=/opt/omics/programs/infernal/ && \
make && make install

#
# IMG scripts and tools v 5.1.14, repo is public 4/2023. Add split.py from bfoster1/img-omics:0.1.12 (md5sum 21fb20bf430e61ce55430514029e7a83)
########## IMG scripts and tools v 5.1.14, repo is public 4/2023. Add split.py from bfoster1/img-omics:0.1.12 (md5sum 21fb20bf430e61ce55430514029e7a83)
#
FROM buildbase as img
FROM buildbase AS img

ENV IMG_annotation_pipeline_ver=5.3.0

RUN \
cd /opt && \
git clone -b scaffold-lineage https://code.jgi.doe.gov/img/img-pipelines/img-annotation-pipeline
git clone --depth 1 --branch ${IMG_annotation_pipeline_ver} https://code.jgi.doe.gov/img/img-pipelines/img-annotation-pipeline

RUN \
cd /opt && \
curl https://code.jgi.doe.gov/official-jgi-workflows/jgi-wdl-pipelines/img-omics/-/raw/83c5483f0fd8afc43a2956ed065bffc08d8574da/bin/split.py > split.py && \
chmod 755 split.py
cd /opt && \
curl https://code.jgi.doe.gov/official-jgi-workflows/jgi-wdl-pipelines/img-omics/-/raw/83c5483f0fd8afc43a2956ed065bffc08d8574da/bin/split.py > split.py && \
chmod 755 split.py

# MetaGeneMark version was updated for img annotation pipeline 5.1.*
# ADD https://code.jgi.doe.gov/img/img-pipelines/img-annotation-pipeline/-/archive/${IMG_annotation_pipeline_ver}/img-annotation-pipeline-${IMG_annotation_pipeline_ver}.tar.gz /opt/
# RUN \
# cd /opt && \
# tar -zxvf img-annotation-pipeline-${IMG_annotation_pipeline_ver}.tar.gz

# ADD --chmod=755 https://code.jgi.doe.gov/official-jgi-workflows/jgi-wdl-pipelines/img-omics/-/raw/83c5483f0fd8afc43a2956ed065bffc08d8574da/bin/split.py /opt/

#
########## MetaGeneMark version was updated for img annotation pipeline 5.1.*

# ADD http://portal.nersc.gov/dna/metagenome/assembly/gms2_linux_64.v1.14_1.25_lic.tar.gz /opt/
RUN \
cd /opt && \
wget http://portal.nersc.gov/dna/metagenome/assembly/gms2_linux_64.v1.14_1.25_lic.tar.gz && \
tar -zxvf gms2_linux_64.v1.14_1.25_lic.tar.gz && \
#chmod -R 755 omics && \
rm gms2_linux_64.v1.14_1.25_lic.tar.gz

RUN apt-get update && apt-get install -y openjdk-11-jdk
# get CRT version 1.8.4


#
########## get CRT version 1.8.4
# ADD https://code.jgi.doe.gov/img/img-pipelines/crt-cli-imgap-version/-/archive/main/crt-cli-imgap-version-main.zip .
# RUN \
# # wget https://code.jgi.doe.gov/img/img-pipelines/crt-cli-imgap-version/-/archive/main/crt-cli-imgap-version-main.zip && \
# unzip -q crt-cli-imgap-version-main.zip && \
# cd crt-cli-imgap-version-main/src && \
# javac *.java && \
# jar cfe CRT-CLI.jar crt *.class && \
# cp CRT-CLI.jar /opt/.

ENV CRT_ver=1.8.4

RUN \
wget https://code.jgi.doe.gov/img/img-pipelines/crt-cli-imgap-version/-/archive/main/crt-cli-imgap-version-main.zip && \
unzip crt-cli-imgap-version-main.zip && \
cd crt-cli-imgap-version-main && \
git clone --depth 1 --branch ${CRT_ver} https://code.jgi.doe.gov/img/img-pipelines/crt-cli-imgap-version && \
cd crt-cli-imgap-version/src && \
javac *.java && \
jar cfe CRT-CLI.jar crt *.class && \
cp CRT-CLI.jar /opt/.



#
# Build the final image
#
FROM buildbase as conda
FROM buildbase AS conda

# Install Miniconda
########## Install Miniconda
#
# ADD https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh .
RUN \
wget -q https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
bash ./Miniconda3-latest-Linux-x86_64.sh -b -p /miniconda3

ENV PATH /miniconda3/bin:/miniconda3/condabin:$PATH
ENV PATH=/miniconda3/bin:/miniconda3/condabin:$PATH

RUN conda config --add channels conda-forge && conda config --add channels bioconda && conda config --add channels anaconda

RUN conda install -y conda-forge::ca-certificates
RUN conda install -y curl git wget jq parallel pyyaml openjdk perl-getopt-long bc procps-ng

RUN conda clean -y -a

#
# Install Cromwell v49
########## Install Cromwell v49
#
FROM buildbase as cromwell
FROM buildbase AS cromwell
ENV cromwell_ver=49

# RUN mkdir -p /opt/omics/bin
# ADD https://github.com/broadinstitute/cromwell/releases/download/${cromwell_ver}/cromwell-${cromwell_ver}.jar /opt/omics/bin/
# RUN ln -sf cromwell-${cromwell_ver}.jar cromwell.jar
RUN \
mkdir -p /opt/omics/bin && \
cd /opt/omics/bin && \
wget -q https://github.com/broadinstitute/cromwell/releases/download/49/cromwell-49.jar && \
ln -sf cromwell-49.jar cromwell.jar
wget -q https://github.com/broadinstitute/cromwell/releases/download/${cromwell_ver}/cromwell-${cromwell_ver}.jar && \
ln -sf cromwell-${cromwell_ver}.jar cromwell.jar

FROM buildbase

ENV PERL5LIB '/opt/omics/lib'
ENV PERL5LIB='/opt/omics/lib'
COPY --from=conda /miniconda3 /miniconda3

# conda shell.posix activate
ENV PATH '/miniconda3/bin:/miniconda3/condabin:/opt/omics/bin:/opt/omics/bin/functional_annotation:/opt/omics/bin/qc/post-annotation:/opt/omics/bin/qc/pre-annotation:/opt/omics/bin/structural_annotation:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
ENV CONDA_PREFIX '/miniconda3'
ENV CONDA_EXE '/miniconda3/bin/conda'
ENV _CE_M ''
ENV _CE_CONDA ''
ENV CONDA_PYTHON_EXE '/miniconda3/bin/python'
ENV PATH='/miniconda3/bin:/miniconda3/condabin:/opt/omics/bin:/opt/omics/bin/functional_annotation:/opt/omics/bin/qc/post-annotation:/opt/omics/bin/qc/pre-annotation:/opt/omics/bin/structural_annotation:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'
ENV CONDA_PREFIX='/miniconda3'
ENV CONDA_EXE='/miniconda3/bin/conda'
ENV _CE_M=''
ENV _CE_CONDA=''
ENV CONDA_PYTHON_EXE='/miniconda3/bin/python'

COPY --from=cromwell /opt/omics/bin/ /opt/omics/bin/

COPY --from=prodigal /opt/prodigal /opt/omics/programs/prodigal

COPY --from=trnascan /opt/omics/programs/tRNAscan-SE /opt/omics/programs/tRNAscan-SE
#COPY --from=trnascan /usr/local/lib /opt/omics/programs/tRNAscan-SE/tRNAscan-SE-2.0.12/lib/
#COPY --from=trnascan /usr/local/lib /opt/omics/programs/tRNAscan-SE/tRNAscan-SE-${trnascan_ver}/lib/

COPY --from=hmm /opt/omics/programs/hmmer/ /opt/omics/programs/hmmer
COPY --from=last /opt/omics/programs/last/ /opt/omics/programs/last

COPY --from=last /opt/omics/programs/last /opt/omics/programs/last
COPY --from=img /opt/CRT-CLI.jar /opt/omics/programs/CRT/CRT-CLI.jar
COPY --from=img /opt/split.py /opt/omics/bin/split.py
#COPY --from=img /opt/omics/programs/tmhmm-2.0c /opt/omics/programs/tmhmm-2.0c

COPY --from=infernal /opt/omics/programs/infernal /opt/omics/programs/infernal/
COPY --from=img /opt/img-annotation-pipeline/bin/ /opt/omics/bin/
COPY --from=img /opt/gms2_linux_64 /opt/omics/programs/gms2_linux_64
COPY --from=img /opt/img-annotation-pipeline/VERSION /opt/omics/VERSION
RUN \
mkdir /opt/omics/lib && cd /opt/omics/lib && \
ln -s ../programs/tRNAscan-SE/tRNAscan-SE-2.0.12/lib/tRNAscan-SE/* .
ln -s ../programs/tRNAscan-SE/lib/tRNAscan-SE/* .

#link things to the bin directory

RUN \
cd /opt/omics/bin &&\
ln -s ../programs/gms2_linux_64/gms2.pl &&\
ln -s ../programs/gms2_linux_64/gmhmmp2 &&\
ln -s ../programs/infernal/infernal-1.1.3/bin/cmsearch && \
ln -s ../programs/tRNAscan-SE/tRNAscan-SE-2.0.12/bin/tRNAscan-SE && \
ln -s ../programs/infernal/bin/cmsearch && \
ln -s ../programs/tRNAscan-SE/bin/tRNAscan-SE && \
ln -s ../programs/last/bin/lastal && \
ln -s ../programs/CRT/CRT-CLI.jar CRT-CLI.jar && \
ln -s ../programs/prodigal &&\
Expand All @@ -198,9 +261,7 @@ RUN \
#make sure tRNAscan can see cmsearch and cmscan

RUN \
cd /opt/omics/programs/tRNAscan-SE/tRNAscan-SE-2.0.12/bin/ &&\
ln -s /opt/omics/programs/infernal/infernal-1.1.3/bin/cmsearch && \
ln -s /opt/omics/programs/infernal/infernal-1.1.3/bin/cmscan

#COPY --from=img /opt/omics /opt/omics3/
cd /opt/omics/programs/tRNAscan-SE/bin/ &&\
ln -s /opt/omics/programs/infernal/bin/cmsearch && \
ln -s /opt/omics/programs/infernal/bin/cmscan

8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,13 @@ A JSON file containing the following:
#### Third party software used (+ their licenses)
- Conda (3-clause BSD)
- tRNAscan-SE >= 2.0.12 (GNU GPL v3)
- Infernal 1.1.3 (BSD)
- Infernal 1.1.4 (BSD)
- CRT-CLI 1.8.4 (Public domain software, last official version is 1.2)
- Prodigal 2.6.3_patched (GNU GPL v3)
- GeneMarkS-2 >= 1.25 ([Academic license for GeneMark family software](http://topaz.gatech.edu/GeneMark/license_download.cgi))
- Last >= 1456 (GNU GPL v3)
- HMMER 3.1b2 (3-clause BSD, [thread optimized hmmsearch](https://github.com/Larofeticus/hpc_hmmsearch))
- Last >= 1584 (GNU GPL v3)
- HMMER 3.3.2 (3-clause BSD, [thread optimized hmmsearch](https://github.com/Larofeticus/hpc_hmmsearch))
- GeNomad 1.8.1 (GNU GPL v3, pulled from [IMG Annotation Pipeline repo](https://code.jgi.doe.gov/img/img-pipelines/img-annotation-pipeline))


#### Databases used (+ their licenses):
Expand All @@ -49,3 +50,4 @@ A JSON file containing the following:
- SUPERFAMILY (permissive/custom); [more info](http://reusabledata.org/supfam)
- Pfam (public domain/ CC0 1.0); [more info](http://reusabledata.org/pfam)
- Cath-FunFam (permissive/CC BY 4.0); [more info](http://reusabledata.org/cath)
- GeNomad DB v1.7 (permissive/CC BY 4.0; [more info](https://zenodo.org/records/10594875))
Loading