diff --git a/content/home/src/_static/images/howto_guides/workflows/quickStart/image16.png b/content/home/src/_static/images/howto_guides/workflows/quickStart/image16.png new file mode 100644 index 0000000..fef1809 Binary files /dev/null and b/content/home/src/_static/images/howto_guides/workflows/quickStart/image16.png differ diff --git a/content/home/src/_static/images/howto_guides/workflows/quickStart/image17.png b/content/home/src/_static/images/howto_guides/workflows/quickStart/image17.png new file mode 100644 index 0000000..f04f487 Binary files /dev/null and b/content/home/src/_static/images/howto_guides/workflows/quickStart/image17.png differ diff --git a/content/home/src/_static/images/howto_guides/workflows/quickStart/image18.png b/content/home/src/_static/images/howto_guides/workflows/quickStart/image18.png new file mode 100644 index 0000000..f004956 Binary files /dev/null and b/content/home/src/_static/images/howto_guides/workflows/quickStart/image18.png differ diff --git a/content/home/src/_static/images/howto_guides/workflows/quickStart/image19.png b/content/home/src/_static/images/howto_guides/workflows/quickStart/image19.png new file mode 100644 index 0000000..43b0337 Binary files /dev/null and b/content/home/src/_static/images/howto_guides/workflows/quickStart/image19.png differ diff --git a/content/home/src/_static/images/howto_guides/workflows/quickStart/image19_bck.png b/content/home/src/_static/images/howto_guides/workflows/quickStart/image19_bck.png new file mode 100644 index 0000000..78ba815 Binary files /dev/null and b/content/home/src/_static/images/howto_guides/workflows/quickStart/image19_bck.png differ diff --git a/content/home/src/_static/images/howto_guides/workflows/quickStart/image20.png b/content/home/src/_static/images/howto_guides/workflows/quickStart/image20.png new file mode 100644 index 0000000..f371cbf Binary files /dev/null and b/content/home/src/_static/images/howto_guides/workflows/quickStart/image20.png differ diff --git a/content/home/src/_static/images/howto_guides/workflows/quickStart/image20_bck.png b/content/home/src/_static/images/howto_guides/workflows/quickStart/image20_bck.png new file mode 100644 index 0000000..55a2290 Binary files /dev/null and b/content/home/src/_static/images/howto_guides/workflows/quickStart/image20_bck.png differ diff --git a/content/home/src/_static/images/howto_guides/workflows/quickStart/image21.png b/content/home/src/_static/images/howto_guides/workflows/quickStart/image21.png new file mode 100644 index 0000000..70ab459 Binary files /dev/null and b/content/home/src/_static/images/howto_guides/workflows/quickStart/image21.png differ diff --git a/content/home/src/_static/images/howto_guides/workflows/quickStart/image22.png b/content/home/src/_static/images/howto_guides/workflows/quickStart/image22.png new file mode 100644 index 0000000..1e42b57 Binary files /dev/null and b/content/home/src/_static/images/howto_guides/workflows/quickStart/image22.png differ diff --git a/content/home/src/_static/images/howto_guides/workflows/quickStart/image22_bck.png b/content/home/src/_static/images/howto_guides/workflows/quickStart/image22_bck.png new file mode 100644 index 0000000..dae135a Binary files /dev/null and b/content/home/src/_static/images/howto_guides/workflows/quickStart/image22_bck.png differ diff --git a/content/home/src/_static/images/howto_guides/workflows/quickStart/image23.png b/content/home/src/_static/images/howto_guides/workflows/quickStart/image23.png new file mode 100644 index 0000000..c9a87b9 Binary files /dev/null and b/content/home/src/_static/images/howto_guides/workflows/quickStart/image23.png differ diff --git a/content/home/src/_static/images/howto_guides/workflows/quickStart/metag_test_results_overview.png b/content/home/src/_static/images/howto_guides/workflows/quickStart/metag_test_results_overview.png new file mode 100644 index 0000000..b1ddd77 Binary files /dev/null and b/content/home/src/_static/images/howto_guides/workflows/quickStart/metag_test_results_overview.png differ diff --git a/content/home/src/_static/images/howto_guides/workflows/quickStart/retrieve_sra.png b/content/home/src/_static/images/howto_guides/workflows/quickStart/retrieve_sra.png new file mode 100644 index 0000000..e9d4118 Binary files /dev/null and b/content/home/src/_static/images/howto_guides/workflows/quickStart/retrieve_sra.png differ diff --git a/content/home/src/howto_guides/run_workflows.md b/content/home/src/howto_guides/run_workflows.md index aba7a09..948afe5 100644 --- a/content/home/src/howto_guides/run_workflows.md +++ b/content/home/src/howto_guides/run_workflows.md @@ -1,87 +1,89 @@ # Running the Workflows +![](../_static/images/howto_guides/workflows/quickStart/image1.png) ## NMDC EDGE Quick Start User Guide -![](../_static/images/howto_guides/workflows/quickStart/image1.png) -### Register for an account +### Register for an account + +1. Visit the homepage for NMDC EDGE platform using the link below.\ +https://nmdc-edge.org/home -Users must register for an account within the NMDC EDGE platform or login using the user's ORCiD account. +2. Click on "ORCiD LOGIN" to login to your account on the NMDC EDGE platform. -![](../_static/images/howto_guides/workflows/quickStart/image2.png) + ![](../_static/images/howto_guides/workflows/quickStart/image16.png) -![](../_static/images/howto_guides/workflows/quickStart/image3.png) +3. Login using your ORCiD and ORCiD password. If you do not have an ORCiD, click on "Register Now" and follow the instructions to set-up an ORCiD account. -### User Profile + ![](../_static/images/howto_guides/workflows/quickStart/image17.png) -Once logged in, the green button with the user's initials on the right provides a drop-down menu which allows the user to manage their projects and uploads; there is also a button which allows users to edit their profile. On this profile page, there are two options: 1) the option to receive email notification of a project's status (OFF by default) and 2) the option to change the user's password (also OFF by default). +4. If you are logging in for the first time, click on "My Profile" and optionally provide your First Name, Last Name, and Email. You can also set the "Project Status Notification" to ON (OFF by default). If ON, notifications about your workflow runs will be sent to the Email you provided. Click on "Save Changes" + + ![](../_static/images/howto_guides/workflows/quickStart/image18.png) -![](../_static/images/howto_guides/workflows/quickStart/image4.png) ### Upload data -Two options are available for users to upload their own data to process through the workflows. The first is using the button in the left menu bar. The second is through the drop-down menu shown when clicking the green button with the user's initials on the right. Either button will open a window which allows the user to drag and drop files or browse for the user's data files. (There are also some datasets in the Public Data folder for users to test the platform.) +You can upload your own data to process through the workflows. Click on "Upload Files" in the left menu bar. This will open a window which allows you to drag and drop files or browse for your data files. If you do not have a dataset to test, you can download this [**test data**](https://portal.nersc.gov/cfs/m3408/test_data/SRR7877884/SRR7877884-int-0.1.fastq.gz) and upload it to the NMDC EDGE platform. -![](../_static/images/howto_guides/workflows/quickStart/image5.png) +Additionally, there are some datasets in the Public Data folder for you to test within the NMDC EDGE platform. -### Running a single workflow +![](../_static/images/howto_guides/workflows/quickStart/image19.png) -To run a workflow, the user must provide: +Alternatively, you can select "Retrieve SRA Data" in the left menu bar and input an NCBI SRA accession number to pull data directly from SRA. +![](../_static/images/howto_guides/workflows/quickStart/retrieve_sra.png) -1. A unique Project/Run Name with no spaces (underscores are fine). +### Running a single metagenomics workflow -2. A description is optional, but recommended. +Click on "Metagenomics" then select the "Run a Single Workflow" option. -3. The user then selects the workflow desired from the drop-down menu. + ![](../_static/images/howto_guides/workflows/quickStart/image20.png) -4. For metagenomic/metatranscriptomic data, the user must also select if the input data is interleaved or separate files for the paired reads. +To run a single workflow, the user must provide: -5. Then the input file(s) from the available list of files. +1. A unique Project/Run Name with no spaces (underscores are fine). -6. The user should click "Submit. +2. A description (optional, but recommended). -> ![](../_static/images/howto_guides/workflows/quickStart/image6.png) +3. The workflow desired from the drop-down menu. -Note: Clicking on the buttons to the right of the data input blanks -opens a box called "Select a file" to allow the user to find the desired files (shown in purple) from previously run -projects, the public data folder, or user uploaded files. +4. Select if the input data is interleaved (YES by default). If the data is paired select NO and it will allow you to upload both forward and reverse files. -![](../_static/images/howto_guides/workflows/quickStart/image7.png) +5. Then select the input file(s). Clicking on the button to the right of the "interleaved FASTQ #1" (as indicated in the image above) opens a box called "Select a file" (as indicated in the image below) to allow the user to find the desired files, either from the public data folder, or files that were uploaded by the user. + + ![](../_static/images/howto_guides/workflows/quickStart/image21.png) + +6. Click "Submit" to start a workflow run. ### Running multiple workflows -1. Another option is to select "Run Multiple Workflows" if the user - desires to run more than one of the metagenomic workflows or the - entire metagenomic pipeline. +1. Another option is to select "Run Multiple Workflows" if you + desire to run the entire metagenomic pipeline that includes multiple workflows. 2. Enter a **unique** Project/Run Name with no spaces (underscores are fine). -3. A description is optional, but recommended. +3. A description (optional, but recommended). -4. The user must also select if the input data is interleaved or - separate files for the paired reads. +4. Select if the input data is interleaved (YES by default). If the data is paired select NO and it will allow you to upload both, forward and reverse files. -> ![](../_static/images/howto_guides/workflows/quickStart/image8.png) +> ![](../_static/images/howto_guides/workflows/quickStart/image22.png) -All five of the metagenomic workflows are "ON" by default, but the user -can select to turn off any workflows not desired. The pipeline uses the -output of each workflow as the input for subsequent workflows. (Note: -Some workflows require input data from prior workflows, so turning one -workflow off may result in other workflows also automatically turning -off.) Then the user can click "Submit." -![](../_static/images/howto_guides/workflows/quickStart/image9.png) +5. Then select the input file(s). Clicking on the button to select "interleaved FASTQ #1" opens a box called "Select a file" (as shown in the image below) to allow the user to find the desired files, either from the public data folder, or files uploaded by the user. + ![](../_static/images/howto_guides/workflows/quickStart/image21.png) + +7. Click "Submit" to start a metagenome workflow run. ### Output -1. The link for 'My Projects' opens the list of projects for that user +1. The link for "My Projects" opens the list of projects for that user 2. Links (in the purple circles) are provided to share projects, make projects public, or delete projects -3. The "Status" column shows whether the job is in the queue (gray), submitted (purple), running (yellow), has failed (red) or completed (green). If a project fails, a log will give the error messages for troubleshooting. +3. The "Status" column shows whether the job is in the queue (gray), submitted (blue), running (yellow), has failed (red) or completed (green). If a project fails, a log will give the error messages for troubleshooting. -4. Clicking on the icon to the left of a project name opens up the results page for that project. +4. Clicking on the icon in the "Result" field opens up the results page for that project. -> ![](../_static/images/howto_guides/workflows/quickStart/image10.png) +> ![](../_static/images/howto_guides/workflows/quickStart/image23.png) ### Project Summary (Results) @@ -89,26 +91,15 @@ The project summary page will show three categories. Clicking on the bar or tab 1. General contains the project run information. -2. "Workflow" Result contains the tabular/visual output. - -3. Browser/Download Outputs contains all the output files available for downloading. There may be several folders. - -> ![](../_static/images/howto_guides/workflows/quickStart/image11.png) - -This example shows the results of a ReadsQC workflow run which shows run time under the General tab, the workflow results of quality trimming and filtering under the ReadsQC Results tab, and the files available for download (shown in purple) under the Browser/Download Outputs tab. +2. "Workflow" Result contains the tabular/visual output for each of the workflows that were run. -![](../_static/images/howto_guides/workflows/quickStart/image12.png) +3. Download Outputs contains all the output files available for downloading. There may be several folders. +> ![](../_static/images/howto_guides/workflows/quickStart/metag_test_results_overview.png) -The full Metagenome pipeline or "Multiple Workflow" run results show -the results of each workflow under a separate tab and the associated -files available for download are in separate workflow folders under the -Browser/Download Outputs tab. +This example shows the results of a metagenome workflow run which shows run time under the General tab, the workflow results of each individual metagenome workflow, and the files available for download under the Download Outputs tab. -![](../_static/images/howto_guides/workflows/quickStart/image13.png) - - -As a second example, the next two figures show the results from the Read-based Taxonomy Classification workflow. The summary includes classified reads and the number of species identified for all of the selected taxonomy classifiers. The top ten organisms identified by each tool at three taxonomic levels is also provided. Tabs for each of the classification tools providing more in-depth results are in the Detail section. Krona plots are generated for the results at each of the three taxonomic levels for each of the tools and can also be found in the Detail section. Full results files (beyond the Top 10) and the graphics are available for download. +As a second example, the next two figures show the results from the Read-based Taxonomy Classification workflow. The summary includes classified reads and the number of species identified for all of the selected taxonomy classifiers. A list of the top ten organisms identified by each tool at three taxonomic levels is also provided. Tabs for each of the classification tools providing more in-depth results are in the Detail section. Krona plots are generated for the results at each of the three taxonomic levels for each of the tools and these can also be found in the Detail section. Full results files (beyond the Top 10) and the graphics are available for download in the "Download Outputs" section. ![](../_static/images/howto_guides/workflows/quickStart/image14.png) @@ -118,8 +109,6 @@ As a second example, the next two figures show the results from the Read-based T ## Metagenomics Workflows ### ReadsQC -![](../_static/images/howto_guides/workflows/readsQC/image2.png) - #### Overview This workflow performs quality control on raw Illumina reads to @@ -131,66 +120,16 @@ contaminants. Currently, this workflow is available in [GitHub](https://github.com/microbiomedata/ReadsQC) and can be run from -the command line. (CLI instructions and requirements are found -[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/1_RQC_index.html).) +the command line. Alternatively, this workflow can be run in [NMDC EDGE](https://nmdc-edge.org/). -#### Input - -Metagenome ReadsQC requires paired-end Illumina data as an interleaved -file or as separate pairs of FASTQ files. - -- **Acceptable file formats:** .fastq, .fq, .fastq.gz, .fq.gz - -#### Details - -This workflow performs quality control on raw Illumina reads using -rqcfilter2. The workflow performs quality trimming, artifact removal, -linker trimming, adapter trimming, and spike-in removal using bbduk, and -performs human/cat/dog/mouse/microbe removal using bbmap. Full -documentation can be found in -[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/1_RQC_index.html). - -#### Software Versions - -- rqcfilter2 (BBTools v38.94) - -- bbduk (BBTools v38.94) - -- bbmap (BBTools v38.94) - -#### Output - -Multiple output files are provided by the workflow; the primary files -are shown below. The full list of output files can be found in -[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/1_RQC_index.html). - - - - - - - - - - - - - - - - - - -
Primary Output FilesDescription
Filtered Sequencing ReadsCleaned paired-end data in interleaved format (.fastq.gz)
QC statistics (2 files)Reads QC summary statistics (.txt)
- #### Running the Reads QC Workflow in NMDC EDGE Select a workflow -1. From the Metagenomics category in the left menu bar, select 'Run a - Single Workflow'. +1. From the Metagenomics tab in the left menu bar, select "Run a + Single Workflow". 2. Enter a **unique** project name with no spaces (underscores are fine). @@ -203,7 +142,7 @@ Select a workflow Input -ReadsQC requires paired-end Illumina data in FASTQ format as the input; +ReadsQC requires Illumina data in FASTQ format as the input; the file can be interleaved and can be compressed. **Acceptable file formats:** .fastq, .fq, .fastq.gz, .fq.gz @@ -231,13 +170,13 @@ run time information. ![](../_static/images/howto_guides/workflows/readsQC/image5.png) -The ReadsQC Result section shows the data input and provides a variety +The ReadsQC Result section provides a variety of metrics including the number of reads and bases before and after trimming and filtering. ![](../_static/images/howto_guides/workflows/readsQC/image6.png) -The Browser/Download Output section provides output files available to +The Download Output section provides output files available to download. The clean data will be in an interleaved .fq.gz file. General QC statistics are in the filterStats.txt file. @@ -246,8 +185,6 @@ QC statistics are in the filterStats.txt file. ### Read-based Taxonomy Classification -![](../_static/images/howto_guides/workflows/readBasedTaxonomy/image2.png) - #### Overview This workflow takes in Illumina sequencing files (single-end or @@ -258,65 +195,10 @@ classification tools. Currently, this workflow is available in [GitHub](https://github.com/microbiomedata/ReadbasedAnalysis) and can be -run from the command line. (CLI instructions and requirements are found -[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/2_ReadAnalysis_index.html).) +run from the command line. Alternatively, this workflow can be run in [NMDC EDGE](https://nmdc-edge.org/). -#### Input - -The Metagenome Read-based Taxonomy Classification workflow requires -Illumina data and can accept data as an interleaved file or as separate -pairs of FASTQ files. Interleaved data will be treated as single-end -reads. (It is highly recommended to input clean data from the ReadsQC -workflow.) - -- **Acceptable file formats:** .fastq, .fq, .fastq.gz, .fq.gz - -#### Details - -To create a community profile, this workflow utilizes three taxonomy -classification tools: GOTTCHA2, Kraken2, and Centrifuge. These tools -vary in levels of specificity and sensitivity. Each tool has a separate -reference database. These databases (152 GB) are built into NMDC EDGE. -Users can select one, two, or all three of the classification tools to -run in the workflow. Full documentation can be found in -[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/2_ReadAnalysis_index.html). - -#### Software Versions - -- GOTTCHA2 v2.1.6 - -- Kraken2 v2.0.8 - -- Centrifuge v1.0.4 - - -#### Output - -Multiple output files are provided by the workflow; the primary files -are shown below. The full list of output files can be found in -[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/2_ReadAnalysis_index.html). - - - - - - - - - - - - - - - - - - -
Primary Output FilesDescription
Profiling results for each toolTabular results of the profile for each tool (.tsv)
Krona plots for each toolInteractive graphic file (.html)
- #### Running the Read-based Taxonomy Classification Workflow in NMDC EDGE Select a workflow @@ -350,7 +232,7 @@ file formats:** .fastq, .fq, .fastq.gz, .fq.gz 6. Additional data files (of the same type--interleaved or separate) can be added with the button below. -7. Click the button to the right of the input blank for data to select +7. Click the button to the right of the input blank to select the data file for the analysis. (If there are separate files, there will be two input blanks.) A box called 'Select a File' will open to allow the user to find the desired file(s) from previously run @@ -380,7 +262,7 @@ tool. ![](../_static/images/howto_guides/workflows/readBasedTaxonomy/image9.png) -The Browser/Download Output section provides output files available to +The Download Output section provides output files available to download. Each tool has a separate folder for the results from that tool. Full tabular results are in the largest .tsv file and the interactive Krona plots (.html files) open in a separate browser window. @@ -389,82 +271,24 @@ interactive Krona plots (.html files) open in a separate browser window. ### Assembly -![](../_static/images/howto_guides/workflows/metagenomeAssembly/image2.png) - #### Overview -This workflow takes in paired-end Illumina data, runs error correction, +This workflow takes in Illumina data, runs error correction, assembly, and assembly validation. #### Running the Workflow Currently, this workflow is available in [GitHub](https://github.com/microbiomedata/metaAssembly) and can be run -from the command line. (CLI instructions and requirements are found -[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/3_MetaGAssemly_index.html).) +from the command line. Alternatively, this workflow can be run in [NMDC EDGE](https://nmdc-edge.org/). -#### Input - -Metagenome Assembly requires paired-end Illumina data as an interleaved -file or as separate pairs of FASTQ files. The recommended input is the -output from the ReadsQC workflow. - -- **Acceptable file formats:** .fastq, .fq, .fastq.gz, .fq.gz - #### Details This workflow takes in paired-end Illumina reads and performs error -correction using bbcms. Then the corrected reads are assembled using -metaSPAdes. After assembly, the reads are mapped back to the contigs -using bbmap for coverage information. Full documentation can be found in -[ReadtheDocs.](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/3_MetaGAssemly_index.html) - -#### Software Versions and Parameters - -- bbcms (BBTools v38.94) - -- metaSpades v3.15.0 - -- bbmap (BBTools v38.94) - -#### Output - -Multiple output files are provided by the workflow; the primary files -are shown below. The full list of output files can be found in -[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/3_MetaGAssemly_index.html). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Primary Output FilesDescription
Assembly ContigsFinal assembly contigs (assembly_contigs.fna)
Assembly ScaffoldsFinal assembly scaffolds (assembly_scaffolds.fna)
Assembly AGPAn AGP format file which describes the assembly
Assembly Coverage BAMSorted bam file of reads mapping back to the final assembly
Assembly Coverage StatsAssembled contigs coverage information
+correction. Then the corrected reads are assembled using +metaSPAdes. After assembly, the reads are mapped back to the contigs for coverage information. #### Running the Metagenome Assembly Workflow in NMDC EDGE @@ -497,7 +321,7 @@ to input clean data from the ReadsQC workflow.) 6. Additional data files (of the same type--interleaved or separate) can be added with the button below. -7. Click the button to the right of the input blank for data to select +7. Click the button to the right of the input blank to select the data file for the analysis. (If there are separate files, there will be two input blanks.) A box called 'Select a File' will open to allow the user to find the desired file(s) from previously run @@ -519,7 +343,7 @@ the assembly. ![](../_static/images/howto_guides/workflows/metagenomeAssembly/image7.png) -The Browser/Download Output section provides output files available to +The Download Output section provides output files available to download. The primary result is the assembly_contigs.fna file which can also be the input for the Metagenome Annotation workflow. The pairedMapped_sorted.bam file along with the assembled contigs file can @@ -530,8 +354,6 @@ be the input for the MAGs Generation workflow. ### Annotation -![](../_static/images/howto_guides/workflows/metagenomeAnnotation/image2.png) - #### Overview This workflow takes assembled metagenomes and generates structural and @@ -541,96 +363,16 @@ functional annotations. Currently, this workflow is available in [GitHub](https://github.com/microbiomedata/mg_annotation/) and can be -run from the command line. (CLI instructions and requirements are found -[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/4_MetaGAnnotation_index.html).) +run from the command line. Alternatively, this workflow can be run in [NMDC EDGE.](https://nmdc-edge.org/) -#### Input - -Metagenome Annotation requires assembled contigs in a FASTA file. This -input can be the output from the Metagenome Assembly workflow and this -is recommended. - -- **Acceptable file formats:** .fasta, .fa, .fna, .fasta.gz, .fa.gz, - .fna.gz - #### Details The workflow uses a number of open-source tools and databases to -generate the structural and functional annotations. The input assembly -is first split into 10MB splits to be processed in parallel. Depending -on the workflow engine configuration, the split can be processed in -parallel. Each split is first structurally annotated, then those results -are used for the functional annotation. The structural annotation uses -tRNAscan_se, RFAM, CRT, Prodigal and GeneMarkS. These results are merged -to create a consensus structural annotation. The resulting GFF is the -input for functional annotation which uses multiple protein family -databases (SMART, COG, TIGRFAM, SUPERFAMILY, Pfam and Cath-FunFam) along -with custom HMM models. The functional predictions are created using -Last and HMM. These annotations are also merged into a consensus GFF -file. Finally, the respective split annotations are merged together to -generate a single structural annotation file and single functional -annotation file. In addition, several summary files are generated in TSV -format. Full documentation can be found in -[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/4_MetaGAnnotation_index.html). - -#### Software Versions - -- Conda - -- tRNAscan-SE \>= 2.0 - -- Infernal 1.1.2 - -- CRT-CLI 1.8 - -- Prodigal 2.6.3 - -- GeneMarkS-2 \>= 1.07 - -- Last \>= 983 - -- HMMER 3.1b2 - -- TMHMM 2.0 - -#### Output - -Multiple output files are provided by the workflow; the primary files -are shown below. The full list of output files can be found in -[ReadtheDocs.](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/4_MetaGAnnotation_index.html) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Primary Output FilesDescription
Structural AnnotationConsensus structural annotation file from multiple tools (.gff)
Functional AnnotationConsensus functional annotation file from multiple tools (.gff)
KEGG summaryKEGG gene function tabular summary (.tsv)
EC summaryEnzyme Commission tabular summary (.tsv)
Gene phylogeny summaryGene phylogeny tabular summary (.tsv)
+generate the structural and functional annotations. The input assembly is +first structurally annotated, then those results +are used for the functional annotation. #### Running the Metagenome Annotation Workflow in NMDC EDGE @@ -657,7 +399,7 @@ the assembled contigs from the Metagenome Assembly workflow.) **Acceptable file formats:** .fasta, .fa, .fna, .fasta.gz, .fa.gz, .fna.gz. -5. Click the button to the right of the input blank for data to select +5. Click the button to the right of the input blank to select the data file for the analysis. (If there are separate files, there will be two input blanks.) A box called 'Select a File' will open to allow the user to find the desired file(s) from previously run @@ -680,7 +422,7 @@ workflow. ![](../_static/images/howto_guides/workflows/metagenomeAnnotation/image6.png) -The Browser/Download Output section provides output files available to +The Download Output section provides output files available to download. The primary results are the functional annotation and the structural annotation files (.gff). The functional annotation file is required input for the MAGs Generation workflow along with the assembled @@ -690,8 +432,6 @@ contigs. ### MAGs Generation -![](../_static/images/howto_guides/workflows/MAGs/image2.png) - #### Overview This workflow classifies contigs into bins and the resulting bins are @@ -703,78 +443,18 @@ and a lineage is assigned to each bin of high or medium quality. Currently, this workflow is available in [GitHub](https://github.com/microbiomedata/metaMAGs) and can be run from -the command line. (CLI instructions and requirements are found -[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/5_MAG_index.html).) +the command line. Alternatively, this workflow can be run in [NMDC EDGE.](https://nmdc-edge.org/) -#### Input - -This workflow requires assembled contigs in a FASTA file, the read -mapping file from the assembly (SAM or BAM), a functional annotation of -the assembly in a GFF file. - -- **Acceptable file formats:** assembled contigs (.fasta, .fa, or - .fna); read mapping to assembly (.sam.gz or .bam); Functional - annotation (.gff) - #### Details -The workflow is based on IMG metagenome binning pipeline and has been -modified specifically for the NMDC project. For all processed -metagenomes, it classifies contigs into bins using MetaBat2. Next, the +For all processed metagenomes, it classifies contigs into bins. Next, the bins are refined using the functional Annotation file (GFF) from the Metagenome Annotation workflow and optional contig lineage information. The completeness of and the contamination present in the bins are -evaluated by CheckM and bins are assigned a quality level (High Quality -(HQ), Medium Quality (MQ), Low Quality (LQ)) based on MiMAG standards. -In the end, GTDB-Tk is used to assign lineage for HQ and MQ bins. The -required GTDB-Tk database is incorporated into NMDC EDGE. Full -documentation can be found in -[ReadtheDocs](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/5_MAG_index.html). - -#### Software Versions - -- Biopython v1.74  - -- Sqlite  - -- Pymysql  - -- requests  - -- samtools \> v1.9 (License: MIT License) - -- Metabat2 v2.15  - -- CheckM v1.1.2 - -- GTDB-TK v1.2.0 - -- FastANI v1.3 - -- FastTree v2.1.10  - -#### Output - -Multiple output files are provided by the workflow; the primary files -are shown below. The full list of output files can be found in -[ReadtheDocs.](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/5_MAG_index.html) - - - - - - - - - - - - - - -
Primary Output FilesDescription
hqmq-metabat-bins.zipBins of contigs rated high or medium quality with an assigned lineage
+evaluated bins are assigned a quality level (High Quality +(HQ), Medium Quality (MQ), Low Quality (LQ)). #### Running the Metagenome Assembled Genomes (MAGs) Workflow in NMDC EDGE @@ -801,17 +481,17 @@ workflows. **Acceptable file formats:** assembled contigs (.fasta, .fa, or .fna); read mapping to assembly (.sam.gz or .bam); functional annotation (.gff) -5. Click the button to the right of the blank for Input Contig File. A +5. Click the button to the right of the blank for the Input Contig File. A box called 'Select a File' will open to allow the user to find the desired file from a previously run assembly project, the public data folder, or a file uploaded by the user. -6. Click the button to the right of the blank for Input Sam/Bam File. A +6. Click the button to the right of the blank for the Input Sam/Bam File. A box called 'Select a File' will open to allow the user to find the read mapping file from a previously run assembly project, the public data folder, or a file uploaded by the user. -7. Click the button to the right of the blank for Input GFF File. A box +7. Click the button to the right of the blank for the Input GFF File. A box called 'Select a File' will open to allow the user to find the desired file(s) from a previously run annotation project, the public data folder, or a file uploaded by the user. @@ -835,278 +515,9 @@ high quality or medium quality. ![](../_static/images/howto_guides/workflows/MAGs/image6.png) -The Browser/Download Output section provides output files available to +The Download Output section provides output files available to download. The primary output file is the zipped file with all bins -determined to be high quality or medium quality (hqmq-metabat-bins.zip). +determined to be high quality or medium quality (hqmq.zip). ![](../_static/images/howto_guides/workflows/MAGs/image7.png) -### Running multiple workflows or the full metagenomic pipeline with a single input - -## Metatranscriptomics Workflow -![](../_static/images/howto_guides/workflows/metaT/image1.png) - -### Overview - -The metatranscriptome (metaT) workflow takes in raw metatranscriptome -data, filters the data for quality, removes rRNA reads, then assembles -and annotates the transcripts. The data is mapped back to the genomic -features in the transcripts and RPKMs ((Reads Per Kilobase of transcript -per Million mapped reads) are calculated for each feature in the -functional annotation file. - -### Running the Workflow - -Currently, this workflow can be run in [NMDC -EDGE](https://nmdc-edge.org/home) or from the command line. (CLI -instructions and requirements are found -[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/6_MetaT_index.html).) - -### Input - -Metatranscriptomics requires paired-end Illumina data as an interleaved -file or as separate pairs of FASTQ files. - -- **Acceptable file formats:** .fastq, .fq, .fastq.gz, .fq.gz - -### Details - -MetaT is a workflow designed to analyze metatranscriptomes, and this -workflow builds upon other NMDC workflows for processing input -sequencing data. The metatranscriptomics workflow takes in raw RNA -sequencing data and quality filters the reads using the ReadsQC -workflow. Then the MetaT workflow filters out ribosomal RNA reads (using -the SILVA rRNA database) and separates interleaved files into separate -pairs of files using bbduk (BBTools). After the filtering steps, the -reads are assembled into transcripts using MEGAHIT and transcripts are -annotated using the [Metagenome Annotation NMDC -Workflow](https://github.com/microbiomedata/mg_annotation) which -produces GFF functional annotation files. Features are counted with -[Subread's featureCounts](http://subread.sourceforge.net/) which assigns -mapped reads to genomic features and generates RPKMs for each feature in -a GFF file for sense and antisense reads. - -### Software Versions - -- BBTools v38.44 - -- hisat2 v2.1 - -- Python v3.7.6. - -- featureCounts v2.0.1 - -- R v3.6.0 - -- edgeR v3.28.1 - -- pandas v1.0.5 - -- gffutils v0.10.1 - -### Output - -The table below lists the primary output files. The main outputs are the -assembled transcripts and annotated features file. Several annotation -files are also available to download. - - - - - - - - - - - - - - - - - - -
Primary Output FilesDescription
INPUT_NAME.contigs.faAssembled transcripts
rpkm_sorted_features.tsvFeature table sorted by RPKM
- -### Running the Metatranscriptomics Workflow in NMDC EDGE - -Select a workflow - -1. From the Metatranscriptomics category in the left menu bar, select - 'Run a Single Workflow'. - -2. Enter a **unique** project name with no spaces - (underscores are fine). - -3. A description is optional, but helpful. - -4. Select 'Metatranscriptome' from the dropdown menu under Workflow. - -> ![](../_static/images/howto_guides/workflows/metaT/image2.png) - -Input - -The metatranscriptome workflow requires paired-end Illumina data in -FASTQ format as the input; the file can be interleaved and can be -compressed. **Acceptable file formats:** .fastq, .fq, .fastq.gz, .fq.gz - -5. The default setting is for the raw data to be in an interleaved - format (paired reads interleaved into one file). If the raw data is - paired reads in separate files (forward and reverse), click 'No'. - -6. Additional data files (of the same type--interleaved or separate) - can be added with the button below. - -7. Click the button to the right of the input blank to select the data - file for the analysis. (If there are separate files, there will be - two input blanks.) A 'Select a File' box will open to allow the user - to find the desired file(s) from previously run projects, the public - data folder, or files uploaded by the user. - -8. Click 'Submit' when ready to run the workflow. - -> ![](../_static/images/howto_guides/workflows/metaT/image3.png) - -Output - -The General section of the output shows which workflow was run, the run -time information, and the Project Configuration - -![](../_static/images/howto_guides/workflows/metaT/image4.png) - -The Metatranscriptome Result section includes a table of the top 100 -RPKM results from the overall metatranscriptome data file sorted by -RPKM. Selecting the header of each column will sort this data by that -column. This section also includes a button to quickly download a tsv -file of all detected features in the input dataset for further analysis. - -![](../_static/images/howto_guides/workflows/metaT/image5.png) - -The Browser/Download Output section provides all output files available -to download. The output contigs can be found in the assembly folder and -the tsv file of all detected features sorted by RPKM is available under -the metat_output folder. - -![](../_static/images/howto_guides/workflows/metaT/image6.png) - -## Natural Organic Matter Workflow - -![](../_static/images/howto_guides/workflows/NOM/image1.png) - -### Overview - -This workflow takes FTICR mass spectrometry data collected from organic -extracts to determine the molecular formulas of natural organic -biomolecules in the input sample. - -### Running the Workflow - -Currently, this workflow can be run in [NMDC -EDGE](https://nmdc-edge.org/home) or from the command line. (CLI -instructions and requirements are found -[here](https://nmdc-workflow-documentation.readthedocs.io/en/latest/chapters/9_NOM_index.html).) - -### Input - -The input for this workflow is the output from a massSpec experiment (a -massSpec list) which includes a minimum of two columns of data: a -mass-to-charge ratio (m/z) and a signal intensity (Intensity) column for -every feature in the analysis. A calibration file of molecular formula -references is also required when running the workflow via command line. -(This calibration file is built into NMDC EDGE.) - -**Acceptable file formats:** .raw, .tsv, .csv, .xlsx - -### Details - -Direct Infusion Fourier Transform Ion Cyclotron Resonance mass -spectrometry (DI FTICR-MS) data undergoes signal processing and -molecular formula assignment leveraging EMSL's CoreMS framework. Raw -time domain data is transformed into the m/z domain using Fourier -Transform and Ledford equation. Data is denoised followed by peak -picking, recalibration using an external reference list of known -compounds, and searched against a dynamically generated molecular -formula library with a defined molecular search space. The confidence -scores for all the molecular formula candidates are calculated based on -the mass accuracy and fine isotopic structure, and the best candidate -assigned as the highest score. This workflow will not work as reliably -with Orbitrap mass spectrometry data. - -### Software Versions - -- CoreMS (2-clause BSD) - -- Click (BSD 3-Clause "New" or "Revised" License) - -### Output - -The primary output file is the Molecular Formula Data Table (in a .csv -file). - - - - - - - - - - - - - - -
Primary Output FilesDescription
INPUT_NAME.csvm/z, Peak height, Peak Area, Molecular Formula IDs, Confidence Score, etc.
- -### Running the Natural Organic Matter Workflow in NMDC EDGE - -Select a workflow - -1. From the Organic Matter category in the left menu bar, select 'Run a - Single Workflow'. - -2. Enter a **unique** project name with no spaces - (underscores are fine). - -3. A description is optional, but helpful. - -4. Select 'EnviroMS' from the dropdown menu under Workflow. - -> ![](../_static/images/howto_guides/workflows/NOM/image2.png) - -Input - -The Natural Organic Matter workflow input is the output from a massSpec -experiment (a massSpec list) with a minimum of two columns of data: a -mass-to-charge ratio (m/z) and a signal intensity (Intensity) column for -every feature in the analysis. **Acceptable file formats:** .tsv, .csv, -.raw, .xlsx - -5. Click the button to the right of the input blank for data to select - the data file for the analysis. (If there are separate files, there - will be two input blanks.) A box called 'Select a File' will open to - allow the user to find the desired file(s) from the public data - folder or files uploaded by the user. - -6. Additional input files can be added by clicking the 'Add file' - button to create additional input blanks. - -7. Once all the input files have been selected, click 'Submit'. - -> ![](../_static/images/howto_guides/workflows/NOM/image3.png) - -Output - -The General section of the output shows which workflow was run and the -run time information. The Project Configuration can be seen by clicking -the three dots in the bracket. - -![](../_static/images/howto_guides/workflows/NOM/image4.png) - -The Browser/Download Output section provides output files available to -download. The primary output files are: the Molecular Formula Data-Table -(.csv file) containing m/z measurements, Peak height, Peak Area, -Molecular Formula Identification, Ion Type, and Confidence Score. - -![](../_static/images/howto_guides/workflows/NOM/image5.png) diff --git a/content/home/src/tutorials/run_workflows.md b/content/home/src/tutorials/run_workflows.md index c3a1d0d..b9efca7 100644 --- a/content/home/src/tutorials/run_workflows.md +++ b/content/home/src/tutorials/run_workflows.md @@ -1,4 +1,4 @@ -# Running the Workflows +# Running the NMDC Workflows in NMDC EDGE Tutorials ## NMDC EDGE QuickStart @@ -10,178 +10,106 @@ >NMDC EDGE QuickStart Tutorial Practice > ->Task 1: Create an NMDC EDGE account with either your email address or your ORCiD account. +>Task 1: Create an NMDC EDGE account with your ORCiD information. > >Task 2: Download the small interleaved [data file](https://portal.nersc.gov/cfs/m3408/test_data/SRR7877884/SRR7877884-int-0.1.fastq.gz) listed here. (Note: This is paired-end data with the pairs interleaved together into a single file.) Upload the file to NMDC EDGE. > ->Task 3: Click the user icon (in the top right corner with your initials) and under “Files”, click on “Manage Uploads”. Verify that the file you uploaded is there. (Note: Later you can delete uploads that are no longer needed.) -> ->Task 4 (optional): Click the user icon and under “Account”, click on “Profile”. Edit your account to receive email notification of project status by clicking “ON”. +>Task 3 (optional): Click on "My Profile". Edit your account to receive email notification of project status by clicking “ON”. ## Metagenomics ### ReadsQC -
-
-

- >NMDC EDGE Metagenome ReadsQC Tutorial Practice > ->Task: Log into NMDC EDGE and run the Metagenome ReadsQC workflow using the dataset uploaded in the QuickStart tutorial. +>Task: Log into NMDC EDGE and run the Metagenome ReadsQC workflow using the dataset uploaded in the QuickStart tutorial. When the run has finished, answer the questions below using the workflow results. > >    Question 1: How many reads were in the input file? How many bases were in the input file? > >    Question 2: How many reads were in the output file? How many bases were in the output file? > ->    Question 3: What file in the output would be used in the next workflow? +>    Question 3: Which output file would you then use as the input for a subsequent workflow if you wanted QC'ed data? ### Read-based Taxonomy Classification -
-
-

- >NMDC EDGE Metagenome Read-based Taxonomy Classification Tutorial Practice > ->Task: -run the Metagenome Read-based Taxonomy Classification workflow with all three taxonomy classification tools. (Note: All three tools are selected by default. While a user can opt to turn off one or two tools, it is recommended to run all three.) Use the clean data output file from the project run in the ReadsQC Tutorial (the file ending in .anqdpht.fq.gz). In this case, the file will be treated as single-end reads. +>Task: Run the Metagenome Read-based Taxonomy Classification workflow with all three taxonomy classification tools. (Note: All three tools are selected by default. While a user can opt to turn off one or two tools, it is recommended to run all three.) Use the clean data output file from the project run in the ReadsQC Tutorial. When the run finishes, answer the questions below using the workflow outputs. > ->    Question 1: How many of the Top 10 species are called by more than one tool? +>    Question 1: How many of the Top 10 **species** are identified by more than one tool? > ->    Question 2: List the **genera** that are called by all three tools in the Top 10. +>    Question 2: List the **genera** that are identified by all three tools within the Top 10. > ->    Question 3: From the Krona plot shown from the taxonomy classification tool Centrifuge results at **species level**, what percentage of the sample is estimated to be _Pseudomonas aeruginosa_? +>    Question 3: From the Krona plot shown from the Centrifuge results at the **species level**, what percentage of the sample is estimated to be _Pseudomonas aeruginosa_? ### Assembly -
-
-

- >NMDC EDGE Metagenome Assembly Tutorial Practice > ->Task: Log into NMDC EDGE and run the Metagenome Assembly workflow. Use the clean data output file from the project run in the ReadsQC Tutorial (the file ending in .anqdpht.fq.gz). In this case, the file is interleaved paired data and only one file is required for input. +>Task: Log into NMDC EDGE and run the Metagenome Assembly workflow. Use the clean data output file from the project run in the ReadsQC Tutorial as the input. In this case, the file is interleaved paired data and only one file is required for input. When the run finishes, answer the questions below using the workflow outputs. > >    Question 1: How many contigs were generated from the assembly? > >    Question 2: How many scaffolds were generated from the assembly? -> ->    Question 3: Download the covstats.txt file. From the top of the file, what percentage of the reads map back to the assembled contigs? ### Annotation -
-
-

- >NMDC EDGE Metagenome Annotation Tutorial Practice > ->Task: Log into NMDC EDGE and run the Metagenome Annotation workflow. Use the assebled contigs which are output from the project run in the Assembly Tutorial (assembled_contigs.fna). +>Task: Log into NMDC EDGE and run the Metagenome Annotation workflow. Use the assembled contigs which are output from the project run in the Assembly Tutorial (assembled_contigs.fna). When the run finishes, answer the questions below using the workflow outputs. > >    Question 1: How many contigs had genes called (sequences_with_genes)? > ->    Question 2: How many coding sequences (genes) were called by Prodigal? How many were called by GeneMark? +>    Question 2: Can you find coding sequences (genes) that were predicted by Prodigal? By GeneMark? > >    Question 3: What is the coding density of this metagenome? ### MAGs Generation -
-
-

- >NMDC EDGE MAGs Generation Tutorial Practice > ->Task: Log into NMDC EDGE and run the Metagenome MAGs workflow. Use the assebled contigs and the read mapping file which are output from the project run in the Assembly Tutorial (assembled_contigs.fna and pairedMapped_sorted.bam) and the combined functional annotation file fromt he Annotation Tutorial (the file ending in fuctional_annotation.gff) +>Task: Log into NMDC EDGE and run the Metagenome MAGs workflow. Use the assembled contigs and the read mapping file which are output from the project run in the Assembly Tutorial (assembled_contigs.fna and pairedMapped_sorted.bam) and the combined functional annotation file from the Annotation Tutorial (the file ending in fuctional_annotation.gff). When the run finishes, answer the questions below using the workflow outputs. > ->    Question 1: Calculate the percentage of the contigs were binned. +>    Question 1: Calculate the percentage of the contigs that were binned. > >    Question 2: How many bins were determined to be high quality (HQ)? How many bins were determined to be medium quality (MQ)? > ->    Question 3: What is the organism identified from genome in the bin which is most complete and has the least contamination (the highest quality bin)? (Note: Scroll to the far right of the summary table in the results to get the species assignment.) - -### Running multiple workflows or the full metagenomic pipeline with a single input - -
-
-

- ->NMDC EDGE Full Metagenome Piepline Tutorial Practice -> ->Task: Log into NMDC EDGE and run the full Metagenome pipeline (multiple workflows- all workflows are selected by default). Use the same dataset uploaded in the QuickStart tutorial. Check your results for the full pipeline against your results from the previous tutorials. The results should be identical/nearly identical to the results from the previous tutorials for each individual workflow with the added benefit of submitting all the workflows with a single input of raw sequencing data. -> ->If you want to just test the full pipeline and not the individual workflows, [download this data set.](https://portal.nersc.gov/cfs/m3408/test_data/SRR7877884/SRR7877884-int-0.1.fastq.gz) Upload the file to NMDC EDGE and run the full pipeline. - - -## Metatranscriptomics +>    Question 3: What is the organism that was identified from the bin that is most complete and has the least contamination (the highest quality bin)? (Note: Scroll to the far right of the summary table in the results to get the species assignment.) -
-
-

- - - - - ->NMDC EDGE Metatranscriptomics Tutorial Practics -> ->Task 1: Download the small interleaved [data file](https://nmdc-edge.org/publicdata/metaT/test_smaller_interleave.fastq.gz) listed here. (Note: This is paired-end data with the pairs interleaved together into a single file.) -> ->Task 2: Log into NMDC EDGE and upload the file. -> ->Task 3: Click the user icon (in the top right corner with your initials) and under “Files”, click on “Manage Uploads”. Verify that the file you uploaded is there. (Note: Later you can delete uploads that are no longer needed.) -> ->Task 4: Run the MetaT single workflow with this dataset in your upload folder. When the analysis is complete, the Top_features summary table under Metatranscriptome Result tab shows the proteins assigned to transcripts with the highest rpkm values. The full results (rpkm_sorted_features.tsv) can be downloaded from the metat_output folder under the Browser/Download output tab. The assembled transcripts and the annotation fules can also be downloaded from the respective folders under the Browser/Download output tab. -> ->    Question 1: What product (protein) is assigned to the transcript with the highest rpkm value? (Note: Scroll to the far right to see these results.) -> ->    Question 2: Download the contigs.fa (transcripts) file. How many transcripts were assembled? -> ->    Question 3: Download the rpkm_sorted_features.tsv file. How many transcripts were assigned a product (protein) that is **not hypothetical**? ## Answers to Tutorial Questions -### Metegenomics ReadsQC ->Question 1: Input contained 4,496,774 reads and 674,516,100 bases. +### Metagenomics ReadsQC +>Question 1: Input file contained 4,496,774 reads and 674,516,100 bases. > >Question 2: Output contained 3,353,438 reads and 487,250,239 bases. > ->Question 3: For this project, the clean, filtered data is in the output file called SRR7877884-int-0.1.anqdpht.fastq.gz. +>Question 3: For this project, the clean, filtered data is in the output file called SRR7877884-int-0.1.filtered.gz. -### Metegenomics Read-based Taxonomy Classification +### Metagenomics Read-based Taxonomy Classification >Question 1: There are seven species called by more than one taxonomy tool: *Pseudomonas aeruginosa, Salmonella enterica, Listeria monocytogenes, Enterococcus faecalis, Lactobacillus fermentum, Bacillus subtilis, and Escherichia coli.* > >Question 2: There are four genera called by all three taxonomy classification tools: *Pseudomonas, Bacillus, Enterococcus, and Lactobacillus.* > ->Question 3: The Krona plot shows that Centrifuge estimates that 12% of the sample is *Pseudomonas aeruginosa." +>Question 3: The Krona plot shows that Centrifuge estimates that 12% of the sample is *Pseudomonas aeruginosa.* -### Metegenomics Assembly +### Metagenomics Assembly >Question 1: 3,196 contigs were assembled. > >Question 2: 3,141 scaffolds were created. -> ->Question 3: -### Metegenomics Annotation +### Metagenomics Annotation >Question 1: 3,031 contigs had genes called. > ->Question 2: 2,495 CDS (coding sequences or genes) were called by GeneMark and 936 CDS were called by Prodigal. +>Question 2: 2,495 CDS (coding sequences or genes) were called by GeneMark and 936 CDS were called by Prodigal. Several should be seen in the output table. > >Question 3: The coding density of the metagenome is 89.15%. ### Metagenome MAGs ->Question 1: 24% of the contigs were binned. +>Question 1: 24% of the contigs were binned (binned contigs divided by total number of binned and unbinned contigs) > >Question 2: One bin was determined to be high quality and five bins were determined to be medium quality. >