Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the docs for specified terms MT filter #378

Merged
merged 1 commit into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ For general use, we recommend using a combination
of a *phenotype MT filter* (:class:`~gpsea.analysis.mtc_filter.PhenotypeMtcFilter`) with a *multiple testing correction*.
Phenotype MT filter chooses the HPO terms to test according to several heuristics, which
reduce the multiple testing burden and focus the analysis
on the most interesting terms (see :ref:`HPO MT filter <hpo-mtc-filter-strategy>` for more info).
on the most interesting terms (see :ref:`HPO MT filter <hpo-mt-filter>` for more info).
Then the multiple testing correction, such as Bonferroni or Benjamini-Hochberg,
is used to control the family-wise error rate or the false discovery rate.
See :ref:`mtc` for more information.
Expand Down
44 changes: 28 additions & 16 deletions docs/user-guide/analyses/mtc.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
.. _mtc:

===========================
###########################
Multiple-testing correction
===========================
###########################

**********
Background
Expand Down Expand Up @@ -38,6 +38,7 @@ it is likely that we will obtain one or more false-positive results.
GPSEA offers two approaches to mitigate this problem: multiple-testing correction (MTC) procedures
and MT filters to choose the terms to be tested.


.. _mtc-correction-procedures:

Multiple-testing correction procedures
Expand Down Expand Up @@ -118,27 +119,38 @@ may "survive" the multiple-testing correction.

In the context of GPSEA, we represent the concept of phenotype filtering
by :class:`~gpsea.analysis.mtc_filter.PhenotypeMtcFilter`.
We provide three filtering strategies.
We provide three filtering strategies, each of which is a subclass
of :class:`~gpsea.analysis.mtc_filter.PhenotypeMtcFilter`
and can, therefore, be used
as a component of :class:`~gpsea.analysis.pcats.HpoTermAnalysis`,
as shown in :ref:`custom-hpo-analysis`.

There are three phenotype MT filters:

* Use all terms
* Specified terms
* HPO MT filter

.. _use-all-terms-strategy:

Test all terms
--------------
.. _use-all-terms-mt-filter:

The first MT filtering strategy is the simplest - do not apply any filtering at all.
This will result in testing all terms and we do not recommend this strategy,
but it can be used to disable MT filtering.
Use all terms
-------------

The first MT filtering strategy is the simplest - it does not apply any filtering,
resulting in testing all terms.
We do not recommend this strategy, but it can be used to disable MT filtering.

The strategy is implemented in :class:`~gpsea.analysis.mtc_filter.UseAllTermsMtcFilter`.

>>> from gpsea.analysis.mtc_filter import UseAllTermsMtcFilter
>>> use_all = UseAllTermsMtcFilter()

.. _specify-terms-strategy:

Specify terms strategy
----------------------
.. _specified-terms-mt-filter:

Specified terms
---------------

In presence of a specific hypothesis as to which terms may be different between groups,
then you can specify these terms in :class:`~gpsea.analysis.mtc_filter.SpecifiedTermsMtcFilter`.
Expand All @@ -159,12 +171,12 @@ we pass an iterable (e.g. a tuple) with these two terms as an argument:
2


.. _hpo-mtc-filter-strategy:
.. _hpo-mt-filter:

HPO MT filter strategy
-----------------------
HPO MT filter
-------------

The HPO MT strategy involves making several domain judgments and takes advantage of the HPO structure.
The HPO MT filter involves making several domain judgments and takes advantage of the HPO structure.
The strategy needs access to HPO:

>>> import hpotk
Expand Down
31 changes: 26 additions & 5 deletions docs/user-guide/analyses/phenotype-groups.rst
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,9 @@ The function finds 369 HPO terms that annotate at least one individual,
including the *indirect* annotations whose presence is implied by the :ref:`true-path-rule`.


.. _phenotype-groups-statistical-analysis:


Statistical analysis
--------------------

Expand All @@ -201,6 +204,7 @@ The available MTC procedures are listed in the :ref:`mtc-correction-procedures`

We must pick one of these to perform genotype-phenotype analysis.

.. _default-hpo-analysis:

Default analysis
^^^^^^^^^^^^^^^^
Expand All @@ -212,19 +216,26 @@ The default analysis can be configured with :func:`~gpsea.analysis.pcats.configu
>>> from gpsea.analysis.pcats import configure_hpo_term_analysis
>>> analysis = configure_hpo_term_analysis(hpo)

At this point, the ``analysis`` configured to test
a cohort for G/P associations.


.. _custom-hpo-analysis:

Custom analysis
^^^^^^^^^^^^^^^

If the defaults do not work, we can configure the analysis manually.
If the default selection of phenotype MT filter and multiple testing correction is not an option,
we can configure the analysis manually.

First, we choose a phenotype MT filter (e.g. :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`):

>>> from gpsea.analysis.mtc_filter import HpoMtcFilter
>>> mtc_filter = HpoMtcFilter.default_filter(hpo, term_frequency_threshold=.2)

.. note::

See the :ref:`mtc-filters` section for more info on the available MT filters.
See the :ref:`mtc-filters` section for info regarding other phenotype MT filters.

then a statistical test (e.g. Fisher Exact test):

Expand All @@ -242,6 +253,10 @@ and we finalize the setup by choosing a MTC procedure
>>> mtc_correction = 'fdr_bh'
>>> mtc_alpha = 0.05

.. note::

See the :ref:`mtc-correction-procedures` section for a list of available MTC procedure codes.

The final :class:`~gpsea.analysis.pcats.HpoTermAnalysis` is created as:

>>> from gpsea.analysis.pcats import HpoTermAnalysis
Expand All @@ -252,6 +267,8 @@ The final :class:`~gpsea.analysis.pcats.HpoTermAnalysis` is created as:
... mtc_alpha=0.05,
... )

The ``analysis`` is identical to the one configured in the :ref:`default-hpo-analysis` section.


Analysis
========
Expand All @@ -269,8 +286,10 @@ We can now test associations between the genotype groups and the HPO terms:
24


We tested the ``cohort`` for association between the genotype groups (``gt_predicate``)
and HPO terms (``pheno_predicates``).
Thanks to phenotype MT filter, we only tested 24 out of 369 terms.
We can learn more by showing the MT filter report:
The MT filter report shows the filtering details:

>>> from gpsea.view import MtcStatsViewer
>>> mtc_viewer = MtcStatsViewer()
Expand All @@ -289,8 +308,10 @@ We can learn more by showing the MT filter report:
Genotype phenotype associations
===============================

Last, let's explore the associations. The results include a table with all tested HPO terms
ordered by the corrected p value (Benjamini-Hochberg FDR):
Last, let's explore the associations.

GPSEA displays the associations between genotypes and HPO terms in a table,
one HPO term per row. The rows are ordered by the corrected p value and nominal p value in descending order.

>>> from gpsea.view import summarize_hpo_analysis
>>> summary_df = summarize_hpo_analysis(hpo, result)
Expand Down
6 changes: 3 additions & 3 deletions src/gpsea/analysis/mtc_filter/_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ class UseAllTermsMtcFilter(PhenotypeMtcFilter[typing.Any]):
"""
`UseAllTermsMtcFilter` filters out *no* phenotype terms.

See :ref:`use-all-terms-strategy` section for more info.
See :ref:`use-all-terms-mt-filter` section for more info.
"""

def filter(
Expand Down Expand Up @@ -186,7 +186,7 @@ class SpecifiedTermsMtcFilter(PhenotypeMtcFilter[hpotk.TermId]):
terms to the constructor of this class, thereby preventing other terms from
being tested and reducing the multiple testing burden.

See :ref:`specify-terms-strategy` section for more info.
See :ref:`specified-terms-mt-filter` section for more info.
"""

NON_SPECIFIED_TERM = PhenotypeMtcResult.fail(code="ST1", reason="Non-specified term")
Expand Down Expand Up @@ -247,7 +247,7 @@ class HpoMtcFilter(PhenotypeMtcFilter[hpotk.TermId]):
`HpoMtcFilter` decides which phenotypes should be tested and which phenotypes are not worth testing.

The class leverages a number of heuristics and domain decisions.
See :ref:`hpo-mtc-filter-strategy` section for more info.
See :ref:`hpo-mt-filter` section for more info.

We recommend creating an instance using the :func:`default_filter` static factory method.
"""
Expand Down
Loading