Skip to content

Commit

Permalink
Merge branch 'main' into feature/update-loadable-refs-to-read_group
Browse files Browse the repository at this point in the history
  • Loading branch information
mcrusch authored Nov 1, 2024
2 parents 75d6a1f + 181f36a commit e969838
Show file tree
Hide file tree
Showing 60 changed files with 740 additions and 227 deletions.
14 changes: 12 additions & 2 deletions .github/workflows/docker_branches.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ jobs:
id: meta
uses: docker/metadata-action@v3
with:
images: ${{ env.REPO_LOWER }}
images: |
${{ env.REPO_LOWER }}
ghcr.io/${{ env.REPO_LOWER }}
tags: |
type=ref,event=branch,prefix=branch-
type=ref,event=pr
Expand All @@ -35,8 +37,16 @@ jobs:
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
-
name: Login to GitHub Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ secrets.GH_USERNAME }}
password: ${{ secrets.GH_TOKEN }}
- name: Push to Docker Hub
uses: docker/build-push-action@v4
with:
push: true
tags: ${{ steps.meta.outputs.tags }}
tags: |
${{ steps.meta.outputs.tags }}
48 changes: 24 additions & 24 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,23 +13,24 @@ RUN apt-get update \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*

RUN pip3 install --user --ignore-installed \
RUN pip3 install --ignore-installed \
--prefix /usr/local \
cwlref-runner \
html5lib

RUN cd /tmp \
&& wget https://github.com/lh3/bwa/releases/download/v0.7.13/bwa-0.7.13.tar.bz2 \
&& echo "559b3c63266e5d5351f7665268263dbb9592f3c1c4569e7a4a75a15f17f0aedc *bwa-0.7.13.tar.bz2" | sha256sum --check \
&& tar xf bwa-0.7.13.tar.bz2 \
&& cd bwa-0.7.13 \
&& wget https://github.com/lh3/bwa/releases/download/v0.7.17/bwa-0.7.17.tar.bz2 \
&& echo "de1b4d4e745c0b7fc3e107b5155a51ac063011d33a5d82696331ecf4bed8d0fd *bwa-0.7.17.tar.bz2" | sha256sum --check \
&& tar xf bwa-0.7.17.tar.bz2 \
&& cd bwa-0.7.17 \
&& make -j$(nproc) \
&& mv bwa /usr/local/bin

RUN cd /tmp \
&& wget https://github.com/alexdobin/STAR/archive/2.7.1a.tar.gz \
&& echo "9a35bf4e8a12bec505e11132bc53f94671f596584a6a0dd8f237120dd0df740e *2.7.1a.tar.gz" | sha256sum --check \
&& tar xf 2.7.1a.tar.gz \
&& mv STAR-2.7.1a/bin/Linux_x86_64_static/STAR /usr/local/bin
&& wget https://github.com/alexdobin/STAR/archive/refs/tags/2.7.10a.tar.gz \
&& echo "af0df8fdc0e7a539b3ec6665dce9ac55c33598dfbc74d24df9dae7a309b0426a *2.7.10a.tar.gz" | sha256sum --check \
&& tar xf 2.7.10a.tar.gz \
&& mv STAR-2.7.10a/bin/Linux_x86_64_static/STAR /usr/local/bin

# bz2 and lzma support is for CRAM files. curses is for `samtools tview`.
RUN cd /tmp \
Expand Down Expand Up @@ -68,11 +69,13 @@ ENV PATH /opt/gradle/bin:${PATH}
COPY bin /tmp/xenocp/bin
COPY src /tmp/xenocp/src
COPY dependencies /tmp/xenocp/dependencies
COPY gradle /tmp/xenocp/gradle
COPY gradlew /tmp/xenocp/gradlew
COPY build.gradle /tmp/xenocp/build.gradle
COPY settings.gradle /tmp/xenocp/settings.gradle

RUN cd /tmp/xenocp \
&& gradle installDist \
&& ./gradlew installDist \
&& cp -r build/install/xenocp /opt

FROM ubuntu:20.04
Expand All @@ -88,17 +91,14 @@ RUN apt-get update \
file \
&& rm -rf /var/lib/apt/lists/*

ENV PATH /root/.local/bin:$PATH

COPY --from=builder /root/.local /root/.local
COPY --from=builder /usr/local/bin/bwa /usr/local/bin/bwa
COPY --from=builder /usr/local/bin/STAR /usr/local/bin/STAR
COPY --from=builder /usr/local/bin/samtools /usr/local/bin/samtools
COPY --from=builder /usr/local/bin/sambamba /usr/local/bin/sambamba
COPY --from=builder /opt/picard /opt/picard
COPY --from=builder /opt/xenocp /opt/xenocp
COPY --from=builder /opt/xenocp/bin/* /usr/local/bin/

COPY cwl /opt/xenocp/cwl

ENTRYPOINT ["cwl-runner", "--parallel", "--outdir", "results", "--no-container", "/opt/xenocp/cwl/xenocp.cwl"]
COPY --chmod=755 --from=builder /usr/local/bin/cwl* /usr/local/bin/
COPY --chmod=755 --from=builder /usr/local/lib /usr/local/lib/
COPY --chmod=755 --from=builder /usr/local/bin/bwa /usr/local/bin/bwa
COPY --chmod=755 --from=builder /usr/local/bin/STAR /usr/local/bin/STAR
COPY --chmod=755 --from=builder /usr/local/bin/samtools /usr/local/bin/samtools
COPY --chmod=755 --from=builder /usr/local/bin/sambamba /usr/local/bin/sambamba
COPY --chmod=755 --from=builder /opt/picard /opt/picard
COPY --chmod=755 --from=builder /opt/xenocp /opt/xenocp
COPY --chmod=755 --from=builder /opt/xenocp/bin/* /usr/local/bin/

COPY --chmod=755 cwl /opt/xenocp/cwl
176 changes: 134 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,36 @@
# XenoCP

- [XenoCP](#xenocp)
- [Quick Start](#quick-start)
- [Introduction to XenoCP](#introduction-to-xenocp)
- [Reference Files](#reference-files)
- [BWA for DNA Reads](#bwa-for-dna-reads)
- [STAR for RNA Reads](#star-for-rna-reads)
- [Local Usage without Docker](#local-usage-without-docker)
- [Prerequisites](#prerequisites)
- [Obtain and Build XenoCP](#obtain-and-build-xenocp)
- [Inputs](#inputs)
- [Run](#run)
- [Local Usage with Docker](#local-usage-with-docker)
- [Build Docker image](#build-docker-image)
- [Run](#run-1)
- [Singularity as a Docker alternative](#singularity-as-a-docker-alternative)
- [WDL workflow](#wdl-workflow)
- [WDL reference files](#wdl-reference-files)
- [Running WDL](#running-wdl)
- [Evaluate test data results](#evaluate-test-data-results)
- [St. Jude Cloud](#st-jude-cloud)
- [Availability](#availability)
- [Seeking help](#seeking-help)
- [Citing XenoCP](#citing-xenocp)
- [Common Issues](#common-issues)

XenoCP is a tool for cleansing mouse reads in xenograft BAMs.
XenoCP can be easily incorporated into any workflow, as it takes a BAM file
as input and efficiently cleans up the mouse contamination. The output is a clean
human BAM file that could be used for downstream genomic analysis.

## Getting started
## Quick Start

XenoCP can be run in the cloud on DNAnexus at
https://platform.dnanexus.com/app/stjude_xenocp
Expand Down Expand Up @@ -39,7 +64,38 @@ XenoCP workflow:
<!--![Alt text](images/xenocp_workflow2.png) -->
<img src="images/xenocp_workflow2.png" width="500">

## Prerequisites
## Reference Files

XenoCP performs mapping against the host genome, so it requires indexes for the
host reference genome and mapper being used.

A common use case is cleansing DNA reads with a mouse host. For this use case,
you can download the a BWA index for MGSCv37 from
http://ftp.stjude.org/pub/software/xenocp/reference/MGSCv37

To build your own reference files, first download the FASTA file for your genome
assembly. Then, create the index for your mapper:

### BWA for DNA Reads

```
$ bwa index -p $FASTA $FASTA
```

### STAR for RNA Reads

Download an annotation file such as gencode, and then run:

```
$ STAR --runMode genomeGenerate --genomeDir STAR --genomeFastaFiles $FASTA --sjdbGTFfile $ANNOTATION --sjdbOverhang 125
```

## Local Usage without Docker

### Prerequisites

First, install the following prerequisites. Note that if you are only using one
of the two mappers, bwa and STAR, you can omit the other.

* [bwa] =0.7.13
* [STAR] =2.7.1a
Expand Down Expand Up @@ -73,28 +129,25 @@ disabled.
[zlib]: https://www.zlib.net/
[sambamba]: http://lomereiter.github.io/sambamba/

### Obtain and Build XenoCP


## Local usage


### Obtain XenoCP

Clone XenoCP from GitHub:
Clone XenoCP from GitHub:
```
git clone https://github.com/stjude/XenoCP.git
```

### Build XenoCP

Once the prerequisites are satisfied, build XenoCP using Gradle.
Build XenoCP using Gradle:

```
$ gradle installDist
```

Add the artifacts under `build/install/xenocp/lib` to your Java `CLASSPATH`.
Add the artifacts under `build/install/xenocp/bin` to your `PATH`.
Add the artifacts under `build/install/xenocp` to your `PATH` and your Java `CLASSPATH`:

```
export PATH=$PATH:`pwd`/build/install/xenocp/bin
export CLASSPATH=$CLASSPATH:`pwd`/build/install/xenocp/lib/*
```

### Inputs

Expand All @@ -113,8 +166,8 @@ aligner: "bwa aln"
For example, a prefix of `MGSCv37.fa` would assume for bwa alignment that
the following files in the same directory exist:
`MGSCv37.fa.amb`, `MGSCv37.fa.ann`, `MGSCv37.fa.bwt`,
`MGSCv37.fa.pac`, and `MGSCv37.fa.sa`.
For STAR alignment, `ref_db_prefix` should be a directory and
`MGSCv37.fa.pac`, and `MGSCv37.fa.sa`. `index` should be the path to that folder.
For STAR alignment, `index` should be a directory and
it would assume the following files exist in the directory:
`chrLength.txt`, `chrNameLength.txt`, `chrName.txt`, `chrStart.txt`,
`exonGeTrInfo.tab`, `exonInfo.tab`, `geneInfo.tab`, `Genome`,
Expand All @@ -134,25 +187,8 @@ output_prefix: xenocp-
output_extension: bam
```

### Create Reference Files

Download the FASTA file for your genome assembly and run the following commands to create other files:
#### BWA reference files
```
$ bwa index -p $FASTA $FASTA
```
#### STAR reference files
In addition the genomic FASTA, STAR reference should use an annotation file (e.g. gencode).
```
$ STAR --runMode genomeGenerate --genomeDir STAR --genomeFastaFiles $FASTA --sjdbGTFfile $ANNOTATION --sjdbOverhang 125
```

[CWL inputs]: https://www.commonwl.org/user_guide/02-1st-example/index.html

### Download MGSCv37 reference files

Reference files are provided for version MGSCv37 of mouse and are available from http://ftp.stjude.org/pub/software/xenocp/reference/MGSCv37

### Run

XenoCP uses [CWL] to describe its workflow.
Expand All @@ -162,12 +198,12 @@ Then run the following.

```
$ mkdir results
$ cwltool --outdir results cwl/xenocp.cwl sample_data/input_data/inputs_local.yml
$ cwltool --preserve-environment CLASSPATH --no-container --outdir results cwl/xenocp.cwl sample_data/input_data/inputs_local.yml
```

[CWL]: https://www.commonwl.org/

## Docker
## Local Usage with Docker

XenoCP provides a [Dockerfile] that builds an image with all the included
dependencies. To use this image, install [Docker] for your platform.
Expand All @@ -184,10 +220,10 @@ $ docker build --tag xenocp .

### Run

The Docker image uses `cwl-runner cwl/xenocp.cwl` as its entrypoint.
The Docker image does not provide an entrypoint.

The image assumes three working directories: `/data` for inputs, `/references` for
reference files, and `/results` for outputs. `/data` and `/references` can be
The image assumes three working directories: `/data` for inputs, `/reference` for
reference files, and `/results` for outputs. `/data` and `/reference` can be
read-only, where as `/results` needs write access.

The paths given in the input parameters file must be from inside the
Expand All @@ -197,13 +233,16 @@ container, not the host, e.g.,
bam:
class: File
path: /data/sample.bam
ref_db_prefix: /reference/ref.fa
ref_db_prefix: ref.fa
index:
class: Directory
path: /reference
aligner: "bwa aln"
```

The following is an example `run` command where files are stored in `test/{data,reference}`. Outputs are saved in `test/results`.
The following is an example `run` command where the data files are stored in the current directory under `sample_data/input_data`. Outputs are saved in `results` in the current directory. The path to the reference files on the host machine needs to be provided.

This example assumes you are running against Mus musculus (genome build MGSCv37). Set the path to the folder containing your reference data
This example assumes you are running against *Mus musculus* (genome build MGSCv37). Set the path to the folder containing your reference data
and run the following command to produce output from the included sample data. Test output for comparison is located at `sample_data/output_data`.

```
Expand All @@ -212,12 +251,65 @@ $ docker run \
--mount type=bind,source=$(pwd)/sample_data/input_data,target=/data,readonly \
--mount type=bind,source=/path/to/reference,target=/reference,readonly \
--mount type=bind,source=$(pwd)/results,target=/results \
xenocp \
ghcr.io/stjude/xenocp:latest \
cwl-runner \
--parallel \
--outdir results \
--no-container \
/opt/xenocp/cwl/xenocp.cwl \
/data/inputs.yml
```

### Singularity as a Docker alternative

Singularity is an experimental container solution that is an HPC-friendly alternative to Docker. For many reasons, `singularity` is not a drop-in replacement for Docker. Many applications require modification to fully run with `singularity`. This alternative is provided on a best-effort basis. If issues are encountered, please open an issue on this repository with details and the maintainers will try to provide support as possible.

```
$ mkdir $(pwd)/results
$ singularity run \
--containall \ # Isolate container from host
-W /path/to/directory \ # Provide a directory with sufficient space to use for working directory
-B $(pwd)/sample_data/input_data:/data \
-B /path/to/reference:/reference \
-B $(pwd)/results:/results \
docker://ghcr.io/stjude/xenocp:latest \
cwl-runner \
--parallel \
--outdir results \
--no-container \
/opt/xenocp/cwl/xenocp.cwl \
/data/inputs.yml
```

Note: when running using Singularity on an HPC, problems can arise if the
default temporary file location, /tmp, is small. To solve this, include
`-W <dir>` when executing via Singularity to redirect temp files to a
larger directory `<dir>`.

Note: By default, `singularity` makes many host resources available inside the container. This is in contrast with Docker's native isolation. This also tends to cause conflicts and errors when running Docker-based workflows. Therefore we recommend always using the `--containall` option to Singularity.

[Dockerfile]: ./Dockerfile

## WDL workflow

XenoCP includes a [WDL](https://github.com/openwdl/wdl) workflow implementation. This can be run locally or on a supported HPC system. It can also use Docker or Singularity for containerization.

### WDL reference files

As of v1.2, WDL does not support directory inputs. Therefore the reference files provided to the WDL workflow must be compressed (`.tar.gz`) before running. The compressed reference files can be downloaded from [Zenodo](https://zenodo.org/uploads/10162103).

### Running WDL

To run the WDL workflow, you will need a WDL engine. We suggest [miniwdl](https://github.com/chanzuckerberg/miniwdl), though the [Cromwell](https://github.com/broadinstitute/cromwell/) engine should work, but is untested with XenoCP.

After acquiring the reference files for your chosen aligner, you can run the sample data through the WDL workflow with the following command.

```
miniwdl run https://raw.githubusercontent.com/stjude/XenoCP/main/wdl/workflows/xenocp.wdl input_bam=https://github.com/stjude/XenoCP/raw/main/sample_data/input_data/SJRB001_X.subset.bam input_bai=https://github.com/stjude/XenoCP/raw/main/sample_data/input_data/SJRB001_X.subset.bam.bai reference_tar_gz=MGSCv37_bwa.tar.gz aligner='bwa aln'
```

This will run all of the steps on the local machine with Docker. The WDL runner `miniwdl` supports alternative execution modes, such as the [Singularity](https://miniwdl.readthedocs.io/en/latest/runner_backends.html#singularity-beta) container engine, [Slurm](https://github.com/miniwdl-ext/miniwdl-slurm) for batch systems, and [LSF](https://github.com/adthrasher/miniwdl-lsf) for batch systems. Alternative execution modes can be specified using `miniwdl`'s [configuration system](https://miniwdl.readthedocs.io/en/latest/runner_reference.html#configuration).

## Evaluate test data results

If you have [bcftools] and a [GRCh37-lite] reference file, the following will show two variants in the input file.
Expand Down
1 change: 1 addition & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
* [ ] Update version in `dx_app/dxapp.json`.
* [ ] Update `wdl/tools/xenocp.wdl` with version.
* [ ] Update `wdl/workflows/xenocp.wdl` with version.
* [ ] Update `build.gradle` with version.
5 changes: 5 additions & 0 deletions bin/java.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
#!/usr/bin/env bash

# If the classpath is already set, then delegate directly to java
if [ "$CLASSPATH" != "" ]; then exec java "$@"; fi

# Otherwise, build an appropriate classpath
# This section assumes you are running inside the container
for arg in "$@"; do
case $arg in
org.stjude.compbio.*)
Expand Down
Loading

0 comments on commit e969838

Please sign in to comment.