-
Notifications
You must be signed in to change notification settings - Fork 1
Quick Usage Guide
David Gaylord edited this page Sep 28, 2021
·
19 revisions
- Run the script
bin/digest_and_ingest.sh
with FASTA proteome files you wish to digest and ingest for types Genome. e.g.:- Add to genomes:
bin/digest_and_ingest.sh file1.fasta file2.fasta ...
- Run the script
bin/digest_and_ingest_specialized_assembly.sh
with FASTA proteome files you wish to digest and ingest for types Specialized Assemblies. e.g.: Add to specialized assemblies:bin/digest_and_ingest.sh MAGfile1.fasta MAGfile2.fasta ...
- Run the script
bin/digest_and_ingest_metagenome.sh
with FASTA proteome files you wish to digest and ingest for types Meta-omic Assemblies along with annotation file for the assemblies. e.g.:bin/digest_and_ingest_metagenome.sh meta-omic-file.fasta
These scripts read the FASTA files, and runs digestions on their sequences. You should see a fair amount of output as these files are processed.
-
Query just one data type, here the specialized assembly type:
bin/query_by_sequence.sh --sequence LSHQAIAEAIGSTR --type sa
-
Query all data categories:
bin/query_by_sequence.sh --sequence MGFPCNR --type all
-
data type parameters (default is genomes): invoke with flag
--type
.-
--type g
- genomes -
--type m
- meta-omic assembly -
--type sa
- specialized assemblies -
--type all
- all types
-
-
perform optional LCA analysis (if lineages exist in the database): invoke with flag
--lca
.bin/query_by_sequence.sh --sequence MGFPCNR --type all --lca
- For more information on adding taxonomic lineages to the database, see the Taxonomic Lineages Section below.
- For the Genome data category, list all the taxa in that category:
bin/list_taxons.sh
- For the Specialized Assembly data category, list all the taxa:
bin/list_specialized_assemblies.sh
- For the Meta-omic Assemblies data category, list all the met-omic files loaded:
bin/list_metaomic_assemblies.sh
- For the Meta-omic Assemblies data category, list all the taxa loaded:
bin/list_metaomic_assembly_taxons.sh
-
Generate redundancy tables for Genomes:
- By entering taxa in command line:
bin/generate_redundancy_tables.sh --taxon-ids syn8102 syn7502 syn7503 --output-dir exampleRedundancyTables
- By inputting a file that contains a list of taxon IDs (one taxon ID per line):
bin/generate_redundancy_tables.sh --taxon-id-file taxon_id_list.txt --output-dir exampleRedundancyTables
-
Generate redundancy tables for Specialized Assemblies:
- By entering taxa in command line:
bin/generate_redundancy_tables_specialized_assembly.sh --sa-ids TARA_RED_MAG_00113 TARA_SOC_MAG_00005 --output-dir exampleRedundancyTables
- By inputting a file that contains a list of taxon IDs (one taxon ID per line):
bin/generate_redundancy_tables_specialized_assembly.sh --sa-id-file sa_id_list.txt --output-dir exampleRedundancyTables
-
View resulting files in /exampleRedundancyTables
- counts.csv contains counts of redundant peptides
- union_percents.csv contains the values in counts.csv, divided by the number of unique peptides in the union of digestions of a taxa pair.
- individual_percents.csv contains the value in counts.csv, divided by the count of unique peptides in taxon A.
If you wish to delete data for a given set of taxa in the database, run a command like this:
- For the Genome data category, remove taxa:
bin/clear_taxon_data.sh --taxon-ids taxa_name taxa2_name
- For the Specialized Assembly data category, remove taxa:
bin/clear_specialized_assembly_data.sh --taxon-ids taxa_name taxa2_name
- For the Meta-omic Assemblies data category, remove taxa:
bin/clear_metaomic_data.sh --taxon-ids taxa_name taxa2_name
- First, need to pull the taxonomic lineage information with consistent lineages.
- You can enter your own lineage information following the specified .csv file/header format found here for Genomes and Specialized Assemblies, or here for Meta-omic Assemblies.
- Or, the python script
bin/NCBI_lineage.py
will pull that information for you using NCBI taxon ids. First, format your input data according to the format required for each data category. For Genomes and Specialized Assemblies, use the format found here, and for Meta-omic Assemblies use the format found here. - If you would like to simply rename your taxa with names other than those of the filenames uploaded, you can create a mapping file to do so with the format found here.
- To pull the NCBI Lineage information, run the python script
bin/NCBI_lineage.py
.- You can get help information by running
bin/NCBI_lineage.py --help
. - Upload the formatted taxa info file in order to pull the lineage info via the NCBI taxon id with flag
f
. - In addition to the formatted input file, an e-mail must be uploaded in order to access the NCBI database with flag
-e
. - You can optionally designate the output file name with flag
-o
. - For example:
bin/NCBI_lineage.py -f genome_lookup_taxa.csv -e [email protected] -o genome_taxa_lineages.csv
- You can get help information by running
- Add taxonomic lineage information to the database:
- For Genomes, run
bin/update_taxons.sh --filepath <lineages-file.csv>
- You can either upload the lineage output file from
NCBI_lineage.py
or your own lineage information following the template (maintaining same header names as the template) found here.
- You can either upload the lineage output file from
- For the Specialized Assembly data category, run
bin/update_specialized_assembly_taxons.sh --filepath <lineages-file.csv>
.- You can either upload the lineage output file from
NCBI_lineage.py
or your own lineage information following the template (maintaining same header names as the template) found here.
- You can either upload the lineage output file from
- For the Meta-omic data category, you need to upload two different files.
- An Annotation file that includes the list of the metagenome files, the ORFs within that metagenome, and the taxons assigned to those ORFs.
- For meta-omic assemblies, there may be taxonomic assignments at two different levels. There may be an ORF level taxonomic assignment and a contig level taxonomic assignment. For any given ORF, METATRYP will preferentially pull the taxon info if there is a contig level assignment. If there is no contig level assignment, then the ORF level taxonomic assignment will be used.
- Run
bin/update_metaomic_annotations.sh --filepath <annotations-file.csv>
to update the annotations. A very basic example of an annotations file can be found here. Additional annotations may require additional custom scripts.
- A Lineage file will also be input which contains the taxonomic lineage information for the taxa included within the meta-omic file. You can either input the lineages generated from
NCBI_lineage.py
or upload your own lineages using the template (maintaining same header names as the template) found here.- Run
bin/update_metaomic_taxons.sh --filepath <lineages-file.csv>
to update the lineages.
- Run
- An Annotation file that includes the list of the metagenome files, the ORFs within that metagenome, and the taxons assigned to those ORFs.
- For Genomes, run
- Call the Least Common Ancestor Calculation
- Once the taxonomic lineages are added to the METATRYP database, you can call the LCA function on peptide queries using the flat
--lca
. See the peptide query section for more information on this function.
- Once the taxonomic lineages are added to the METATRYP database, you can call the LCA function on peptide queries using the flat