This guide provides an overview on how to download and organise most of the data sources used in the pipeline.
The folder tree used in this pipeline and stated in the config file is the following:
data/
├── figures
├── gem
├── genomes
│ ├── aci
│ ├── bme
│ ├── dmi
│ ├── pse
│ ├── rhi1
│ ├── rhi2
│ ├── rho
│ ├── shw
│ ├── tel
│ └── tez
├── metanetx
├── micom
│ └── amils2023
│ ├── equal_abundances
│ └── paper_abundances
├── modelseed
├── papers
│ └── amils2023
├── retropath
│ ├── interesting_metabolites
│ │ └── sources
│ └── amils2023
│ └── sources
├── retropath_classes
└── retrorules
Contains all the generated figures as well as some images used in the repo READMEs.
Contains all GEMs generated by the GEM notebook.
Contains all annotated genomes manually downloaded from their respective sources. See the table below for more information.
Contains compound (chem_prop.tsv) and reaction (reac_prop.tsv) data downloaded from MetaNetX.
Contains the results for all MICOM analyses (check the MICOM notebook). Ideally, one folder per study should be created (amils2023 in our case). Subfolders separate different experiments performed using data from the same study.
Contains compound (compounds.tsv) and reaction (reactions.tsv) data downloaded from ModelSEED.
Contains all data for the different studies being analysed (each in a subfolder). In our case, we used information from Amils et al. 2023.
Constains the results for all RetroPath analyses (check the RetroPath notebook). Ideally, one folder per study should be created (amils2023 and interesting_metabolites in our case).
Contains metadata for exploring compound classes in RetroPath results. This data is still not public and, thus, is not provided.
Contrains the rules downloaded from RetroRules. The version used in this work is the initial release (rr01).