Merge pull request #30 from ypriverol/dev

docs
bigbio · Nov 25, 2021 · e54e41f · e54e41f
2 parents 8f65b2a + c8765ab
commit e54e41f
Show file tree

Hide file tree

Showing 2 changed files with 31 additions and 9 deletions.
diff --git a/docs/identification.rst b/docs/identification.rst
@@ -20,33 +20,50 @@ However, most of the computational proteomics tools are designed as single-tiere
 - false positive control
 - creation of reports
 
-quantms identification workflow
----------------------
-
 .. image:: images/id-dda-pipeline.png
    :width: 350
 
 Mass spectra processing: Raw conversion
-~~~~~~~~~~~~~~~~~~~~~~
+---------------------------------------
 
 The RAW data (files from the instrument) can be provided to quantms pipeline in two different formats: (i) RAW files - instrument files; (ii) mzML files (HUPO-PSI standard file format). quantms uses the `thermorawfileparser <https://github.com/compomics/ThermoRawFileParser>`_ to convert the input RAW files to mzML and all the following steps are built in top of the standard mzML.
 
 .. important:: Automatic RAW file conversion is only supported from Thermo Scientific.
 
-Additionally to file conversion, the Raw conversion step allows the users to perform an extra peak-picking step ```openmspeakpicker true``` for those datasets/projects where peaks can be extracted using the Thermo RAW API. Read more about the OpenMS peak picker algorithm `here <https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/nightly/html/TOPP_PeakPickerWavelet.html>`_ .
+Additionally to file conversion, the Raw conversion step allows the users to perform an extra peak-picking step ``openmspeakpicker true`` for those datasets/projects where peaks can be extracted using the Thermo RAW API. Read more about the OpenMS peak picker algorithm `here <https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/nightly/html/TOPP_PeakPickerWavelet.html>`_ .
 
 Target/Decoy database generation
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+----------------------------------------
+
+Target/Decoy is the most common approach to control the number of false positive peptides and proteins identified by the corresponding workflow [ref 3]. The user can provide the protein FSATA database with the decoys already attached or generate the database within the pipeline by using the following option: ``add_decoys``.
+
+.. hint:: Additionally, the user can define the prefix for the decoy proteins  (e.g. DECOY_) by using the parameter ``decoy_string``. We STRONGLY recommend to use DECOY_ prefix for all the decoy proteins for better compatibility with exiting tools such as :doc:`pquant` or :doc:`pmultiqc`
+
+Peptide Identification
+------------------------------------
+
+The peptide identification step in the quantms pipeline can be performed (**independently** or **combined**) with two different open-source tools : `Comet <http://comet-ms.sourceforge.net/>`_ or `MS-GF+ <https://github.com/MSGFPlus/msgfplus>`_. The parameters for the search engine Comet or MS-GF+ are read from the SDRF input parameters including the post-translation modifications (annotated with UNIMOD accessions), precursor and fragment ion mass tolerances, etc. The only parameter that MUST be provided by commandline to the quantms workflow is the psm and peptide FDR threshold ``psm_pep_fdr_cutoff`` (default value ``0.01``).
 
-Target/Decoy is the most common approach to control the number of false positive peptides and proteins identified by the corresponding workflow [ref 3]. The user can provide the protein FSATA database with the decoys already attached or generate the database within the pipeline by using the following option: ```add_decoys```.
+.. note:: The benefit of using multiple database search engine combined has been proved to be efficient to identified more around **15% peptides** more than using only one search engine. However, you need to be aware that adding another search engine will increase the CPU computing time. :doc:`identification-benchmarks`.
 
-.. hint:: Additionally, the user can define the prefix for the decoy proteins  (e.g. DECOY_) by using the parameter ```decoy_string```. We STRONGLY recommend to use DECOY_ prefix for all the decoy proteins for better compatibility with exiting tools such as :doc:`pquant` or :doc:`pmultiqc`
+Percolator: Boosting peptide identifications
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+`Percolator <https://github.com/percolator/percolator>`_ uses a semi-supervised machine learning to discriminate correct from incorrect peptide-spectrum matches. Percolator uses different properties from the peptide identifications such as retention time, number of missed-cleavages, peptide identification score, to train a SVM model that separates more accurately the true positive identifications from false positives.
+
+FDR filtering and ConsensusID
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The FDR filtering at peptide spectrum match (PSM) level can be applied for each peptide results. To filter the peptides first the tool compute the peptide error probability (PEP) and then filter using the provided thershold. The PEP score is the probability that a peptide (PSM-peptide spectral match) is incorrect. Basically, the higher the score the more confidence you can have that the given peptide identification is correct.
+
+When multiple search engines are used ```search_engines msgf,comet``` the results for each RAW file is combined into one single identification file including the combination of both search engines. The `ConsensusID tool <https://abibuilder.informatik.uni-tuebingen.de/archive/openms/Documentation/nightly/html/TOPP_ConsensusID.html>`_ is used to combined the results from different search engines.
 
 References
 ---------------------
 
 [1] Perez-Riverol Y, Wang R, Hermjakob H, Müller M, Vesada V, Vizcaíno JA. Open source libraries and frameworks for mass spectrometry based proteomics: a developer's perspective. Biochim Biophys Acta. 2014 Jan;1844(1 Pt A):63-76. doi: 10.1016/j.bbapap.2013.02.032. Epub 2013 Mar 1. PMID: 23467006; PMCID: PMC3898926.
+
 [2] Perez-Riverol Y, Moreno P. Scalable Data Analysis in Proteomics and Metabolomics Using BioContainers and Workflows Engines. Proteomics. 2020 May;20(9):e1900147. doi: 10.1002/pmic.201900147. Epub 2019 Dec 18. PMID: 31657527.
+
 [3] Elias JE, Gygi SP. Target-decoy search strategy for mass spectrometry-based proteomics. Methods Mol Biol. 2010;604:55-71. doi: 10.1007/978-1-60761-444-9_5. PMID: 20013364; PMCID: PMC2922680.
 
diff --git a/docs/index.rst b/docs/index.rst
@@ -15,8 +15,13 @@ Contents
 
    identification
    .. toctree::
-   :maxdepth: 2 
+   :maxdepth: 2
+
+The following links should be follow to get support and help with the quantms maintainers:
 
+[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23quantms-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/quantms)
+[![Report Issue](https://img.shields.io/github/issues/bigbio/quantms)](https://github.com/bigbio/quantms/issues)
+[![Get help on GitHub Forum](https://img.shields.io/badge/Github-Discussions-green)](https://github.com/bigbio/quantms/discussions)