Prediction of 3D structures:
Molecular docking:
Nanobody mutational space
MultiQC HTML report from existing results
N E X T F L O W ~ version 22.10.6
Launching `main.nf` [curious_perlman] DSL2 - revision: 986ad6e9f0
------------------------------------------------------------------------
_ _ _ _ _ _ _ _ _ _ _
/ \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \
( P | r | o | t | e | i | n ) ( F | o | l | d )
\_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/
------------------------------------------------------------------------
Usage:
The typical command for running the pipeline is as follows:
nextflow run main.nf -params-file params.json -profile STRING
MANDATORY ARGUMENTS, NEXTFLOW:
-profile STRING [test, singularity, cluster] Configuration profile to use. Can use multiple (comma separated).
OTHER OPTIONS, NEXTFLOW:
-params-file PATH Set the parameters of the pipeline using a JSON file configuration file (i.e. 'params.json'). All parameters defined as JSON
type must be this way. For example, the JSON can contain: "alphaFoldOptions": "--max_template=2024-01-01 --multimer". WARNING:
passing the option '--alphaFoldOptions' in command line will throw an error when the option contains '-' or '--' characters which
are not appreciated by nextflow.
OTHER OPTIONS:
--afMassiveDatabase PATH Path to the database required by AFMassive.
--afMassiveHelp Display all the options available to run AFMassive. Use this option in combination with -profile singularity.
afMassiveOptions JSON Specific options for AFMassive. As AFMassive is an AlphaFold-like tool, standard AlphaFold options are passed
using the --alphaFoldOptions option.
--alphaFillHelp Display all the options available to run AlphaFill. Use this option in combination with -profile singularity.
--alphaFold3Database PATH Path to the database required by AlphaFold3.
--alphaFold3Help Display all the options available to run AlphaFold3. Use this option in combination with -profile singularity.
alphaFold3Options JSON Prediction model options passed to AlphaFold3.
--alphaFoldDatabase PATH Path to the database required by AlphaFold.
--alphaFoldHelp Display all the options available to run AlphaFold. Use this option in combination with -profile singularity.
alphaFoldOptions JSON Prediction model options passed to AlphaFold or AFMassive.
--colabFoldDatabase PATH Path to the database required by ColabFold.
--colabFoldHelp Display all the options available to run ColabFold. Use this option in combination with -profile singularity.
colabFoldOptions JSON Prediction model options passed to ColabFold.
--diffDockArgsYamlFile YAML Path to the YAML file with the DiffDock options.
--diffDockDatabase PATH Path to the database required by DiffDock.
--dynamicBindDatabase PATH Path to the database required by DynamicBind.
--dynamicBindHelp Display all the options available to run DynamicBind. Use this option in combination with -profile
singularity.
dynamicBindOptions JSON Prediction model options passed to DynamicBind.
--fastaPath PATH Path to the input directory which contains the fasta files.
--fromMsas PATH Path to existing multiple sequence alignments (msas) to use for the 3D protein strcuture prediction.
Typically the path could be the results of the pipeline launcded with the --onlyMsas option.
--launchAfMassive Launch AFMassive
--launchAlphaFill Launch AlphaFill.
--launchAlphaFold Launch AlphaFold.
--launchAlphaFold3 Launch AlphaFold3.
--launchColabFold Launch ColabFold.
--launchDiffDock Launch DiffDock.
--launchDynamicBind Launch DynamicBind.
--multimerVersions INT AlphaFold multimer model versions (v1, v2, v3) which will be evaluated by AFMassive. This parameter is taken
into account when --launchAfMassive is true. The list of the versions to be evaluated must be provided with a
comma separated string, e.g. 'v1,v2', Default is 'v1,v2,v3'.
--numberOfModels INT Number of models that will be evaluated by AFMassive. This parameter is taken into account when
--launchAfMassive is true.
--onlyMsas When true, the pipeline will only generate the multiple sequence alignments (msas).
--outDir PATH The output directory where the results will be saved
--predictionsPerModel INT Number of predictions per model which will be evaluated by AFMassive. This parameter is taken into account
when --launchAfMassive is true.
--proteinLigandFile PATH Path to the input file for molecular docking. The file must be in CSV format, without space. One column named
'protein' contains the path the the 'pdb' file and one column named 'ligand' must contain the path to the
'sdf' file.
--useGpu Run the prediction model on GPU. AlphaFold and AFMassive can run either on CPU or GPU. ColabFold and
DynamicBind require GPU only.
REFERENCES:
--genomeAnnotationPath PATH Path to genome/proteome annotations folder used to predict the protein 3D structure.
=======================================================
Available Profiles
-profile test Run the test dataset
-profile singularity Use the Singularity images for each process. Use `--singularityPath` to define the insallation path
-profile cluster Run the workflow on the cluster, instead of locally
------------------------------------------------------------------------
Visit the AlphaFill GitHub repository for more details about the prediction model.
AlphaFill is launched whenener the option --launchAlphaFill
is set. It can predict the binding of missing compounds from best 3D predicted protein structure provided as PDB files. The PDB file used for the prediction must be always named ranked_0.pdb
or ranked_0.cif
.
nextflow run main.nf -profile singularity --fromPredictions test/data/afmassive/monomer2/ --launchAlphaFill --alphaFillDatabase $PWD/test/data/alphafill/database/ --fastaPath test/data/fasta/monomer2
The option --fromPredictions
takes as imput a directory in which tehre is on folder per protein, each folder container the ranked_0.pdb
or ranked_0.cif
file, for example:
├── MISFA
│ ├── ranked_0.pdb
└── MRLN
├── ranked_0.pdb
or
├── MISFA
│ ├── ranked_0.cif
└── MRLN
├── ranked_0.cif
AlphaFill be also launched in used in combination with the following options:
--launchAlphaFold
--launchAfMassive
In that case, the best 3D predicted protein structure obtained by either AlphaFold or AFMassive will be used.
Visit the AFMassive GitHub repository for more details about the prediction model.
List of AFMassive options:
nextflow run main.nf --afMassiveHelp -profile singularity
Launch the nextflow pipeline using GPU:
nextflow run main.nf -params-file test/params-file/afmassive-monomer.json -profile singularity --useGpu
Define the options in a JSON file, for example:
{
"launchAfMassive": "true",
"afMassiveOptions": "--dropout --dropout_structure_module",
"alphaFoldOptions": "--max_template_date=2024-01-01 --db_preset=full_dbs --random_seed=123456",
"fastaPath": "test/data/fasta/monomer2"
}
Launch the nextflow pipeline using GPU:
nextflow run main.nf -params-file test/params-file/afmassive-multimer.json -profile singularity --useGpu
Define the options in a JSON file, for example:
{
"launchAfMassive": "true",
"afMassiveOptions": "--dropout --dropout_structure_module",
"alphaFoldOptions": "--max_template_date=2024-01-01 --db_preset=full_dbs --random_seed=123456 --model_preset=multimer",
"fastaPath": "test/data/fasta/multimer/alphafold"
}
Visit the AlphaFold GitHub repository for more details about the prediction model.
List of AlphaFold options:
nextflow run main.nf --alphaFoldHelp -profile singularity
Launch the nextflow pipeline using GPU:
nextflow run main.nf -params-file test/params-file/alphafold-monomer.json -profile singularity --useGpu
Define the options in a JSON file, for example:
{
"launchAlphaFold": "true",
"alphaFoldOptions": "--max_template_date=2024-01-01 --db_preset=full_dbs --random_seed=123456",
"fastaPath": "test/data/fasta/monomer2"
}
Launch the nextflow pipeline using GPU:
nextflow run main.nf -params-file test/params-file/alphafold-multimer.json -profile singularity --useGpu
Define the options in a JSON file, for example:
{
"launchAlphaFold": "true",
"alphaFoldOptions": "--max_template_date=2024-01-01 --db_preset=full_dbs --random_seed=123456 --model_preset=multimer",
"fastaPath": "test/data/fasta/multimer/alphafold"
}
Visit the AlphaFold3 GitHub repository for more details about the prediction model.
List of AlphaFold3 options:
nextflow run main.nf --alphaFold3Help -profile singularity
Launch the nextflow pipeline using GPU:
nextflow run main.nf -params-file test/params-file/alphafold3-monomer.json -profile singularity --useGpu
Define the options in a JSON file, for example:
{
"launchAlphaFold": "true",
"alphaFold3Options": "--model_dir=/path/to/alphafold3/params",
"fastaPath": "test/data/fasta/monomer2"
}
Note that you have to apply to otain the AlphaFolf3 model parameters as described on google-deepmind/alphafold3. Therefore, it is mandatory that the alphaFold3Options
option in the JSON file provide the path to the model parameter AF3.bin
via using --model_dir
.
Visit the ColabFold, GitHub repository for more details about the prediction model.
List of ColabFold options:
nextflow run main.nf --colabFoldHelp -profile singularity
Launch the nextflow pipeline using GPU:
nextflow run main.nf -params-file test/params-file/colabfold-monomer.json -profile singularity --useGpu
Define the options in a JSON file, for example:
{
"launchColabFold": "true",
"colabFoldOptions": "--random-seed 654321 --model-type=alphafold2 --amber --use-gpu-relax",
"fastaPath": "test/data/fasta/monomer2"
}
Launch the nextflow pipeline using GPU:
nextflow run main.nf -params-file test/params-file/colabfold-multimer.json -profile singularity --useGpu
Define the options in a JSON file, for example:
{
"launchColabFold": "true",
"colabFoldOptions": "--random-seed 654321 --model-type=alphafold2_multimer_v3 --amber --use-gpu-relax",
"fastaPath": "test/data/fasta/multimer/colabfold"
}
Note that ColabFold expects a particular format for the input fasta file. See
If you want to perform only msas, launch the pipeline with the option --onlyMsas
.
If you want to use existing msas, launch the pipeline with the option --fromMsas
. For example, if you have to predict the structure for two fasta files protein1.fasta
and protein2.fasta
, you must have a tree folder such as:
msas/
protein1/
protein2/
Then, provide the option --fromMsas
, for example:
nextflow run main.nf -params-file test/params-file/alphafold-multimer-frommsas.json -profile singularity --useGpu
Define the options in a JSON file, for example:
{
"launchAlphaFold": "true",
"alphaFoldOptions": "--max_template_date=2024-01-01 --db_preset=full_dbs --random_seed=123456 --model_preset=multimer",
"fastaPath": "test/data/fasta/multimer/alphafold",
"fromMsas": "test/data/msas/multimer/alphafold"
}
The option --fromMsas
can be used with:
- AFMassive
- AlphaFold
- ColabFold
Visit the DiffDock GitHub repository for more details about the prediction model.
Launch the nextflow pipeline using GPU:
nextflow run main.nf -profile singularity --useGpu --proteinLigandFile test/data/diffdock/protein-ligand.csv
The protein-ligand.csv
is a CSV file which must contain at least the two following columns:
protein
: provides the path to thepdb
3D structure fileligand
: provided the path to thesdf
file
Therefore, each row in this file corresponds to a protein/ligand pair. This file must not contain any space.
The options to launch the pipeline can also be defined in a JSON file, for example:
{
"diffDockArgsYamlFile": "assets/diffdock_default_inference_args.yaml",
"launchDiffDock": "true",
"proteinLigandFile": "test/data/diffdock/protein-ligand.csv"
}
Launch the pipleine with the -params-file
option to take into account the JSON file:
nextflow run main.nf -params-file test/params-file/diffdock.json -profile singularity --useGpu
The assets/diffdock_default_inference_args.yaml
makes it possible to tune the DiffDock options.
Visit the DynamicBind GitHub repository for more details about the prediction model.
List of DynamicBind options:
nextflow run main.nf --dynamicBindHelp -profile singularity
Launch the nextflow pipeline using GPU:
nextflow run main.nf -profile singularity --useGpu --proteinLigandFile test/data/dynamicbind/protein-ligand.csv
The protein-ligand.csv
is a CSV file which must contain at least the two following columns:
protein
: provides the path to thepdb
3D structure fileligand
: provided the path to thesdf
file
Therefore, each row in this file corresponds to a protein/ligand pair. This file must not contain any space.
Two options are available
nextflow run main.nf -profile singularity --fromPredictions test/data/afmassive/multimer --htmlProteinStruct --fastaPath test/data/fasta/multimer/alphafold
nextflow run main.nf -profile singularity --fromPredictions test/data/afmassive/multimer --htmlMetricsMultimer --fastaPath test/data/fasta/multimer/alphafold