Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] support for survival/time-to-event prediction, statsmodels Cox PH model #157

Merged
merged 49 commits into from
Jan 30, 2024

Conversation

fkiraly
Copy link
Collaborator

@fkiraly fkiraly commented Dec 27, 2023

This PR implements framework support for survival (aka time-to-event or failure time) prediction, adds tests, and an interface to statsmodels cox proportional hazards models as test case.

Depends on #155 and #159 which should be merged first.

Design

Survival prediction models use the current BaseRegressorProba base class, which has fit extended to take a third argument C, a dataframe-like with a censoring indicator.

Regressors capable of making use of the third argument C are identified via the capability:survival tag (being True). Regressors without this tag also take C but ignore it, corresponding to the "ignore censoring" reduction strategy.

This way, all existing regressors can be used for survival prediction and vice versa.
The interface is also fully downwards compatible for users - C defaults to None - and for extenders - estimators without the tag do not assume a C present in fit, as in this case only X_inner, y_inner are passed in fit.

As the predict and predict_proba interfaces remain unchanged, metrics do not need to be adapted, they directly work.

To avoid cluttering the docs for users who are interested primarily in probabilistic regression without censoring, models with the capability:survival tag have a more detailed fit docstring. The difference is mediated via a base class BaseSuvReg, which is the same as BaseRegressorProba with docstring overrides.

Testing

As time-to-event models inherit from BaseProbaRegressor, the existing TestAllRegressors suite tests runs on all survival prediction models.

A scenario with a non-trivial C is added.

As regressors and time-to-event models have an interchangeable interface (see above), both are tested with non-trivial C, with C=None, and without a C being passed.

Further contents

  • an inteface to statsmodels proportional hazards models, skpro.survival.coxph.CoxPH, to showcase and test the interface
  • Pipeline is updated to accommodate survival models, for this the tag needs to be carried and C passed through in _fit
  • update to the API reference - new page for survival prediction
  • survival prediction extension template

@fkiraly fkiraly added enhancement implementing framework Implementing or improving framework for learning tasks, e.g., base class functionality module:survival&time-to-event module for time-to-event prediction aka survival prediction labels Dec 27, 2023
Copy link

codecov bot commented Dec 27, 2023

Codecov Report

Attention: 73 lines in your changes are missing coverage. Please review.

Comparison is base (86cc53b) 64.82% compared to head (285eff3) 64.24%.

❗ Current head 285eff3 differs from pull request most recent head 590370a. Consider uploading reports for the commit 590370a to get more accurate results

Files Patch % Lines
skpro/survival/coxph.py 20.40% 39 Missing ⚠️
skpro/regression/base/_base.py 48.97% 17 Missing and 8 partials ⚠️
skpro/regression/compose/_pipeline.py 27.27% 8 Missing ⚠️
skpro/regression/base/_delegate.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #157      +/-   ##
==========================================
- Coverage   64.82%   64.24%   -0.58%     
==========================================
  Files         110      111       +1     
  Lines        5705     5800      +95     
  Branches     1069     1084      +15     
==========================================
+ Hits         3698     3726      +28     
- Misses       1722     1782      +60     
- Partials      285      292       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fkiraly fkiraly merged commit 59e80ea into main Jan 30, 2024
36 checks passed
fkiraly added a commit that referenced this pull request Jan 31, 2024
…luate` and tuners, extend evaluate and tuners to survival predictors (#160)

This PR makes the following changes:

* introduces the `sktime` abstract parallelization backend to `skpro`.
In the future, this should be moved to `scikit-base`.
* refactors `evaluate` to use the parallelization backend
* refactors tuners to use the parallelization backend
* extends `evaluate` to be compatible with survival predictors
* extends tuners to be compatible with survival predictors

Depends on #157 for the survival
prediction functionality

Credits @hazrulakmal due to significant parts of copy-paste from
`sktime` `evaluate` being code written or improved by @hazrulakmal.
fkiraly added a commit that referenced this pull request Jan 31, 2024
…tic regression (#161)

This PR adds two survival prediction compositors which take
probabilistic supervised regressor (including possibly a survival
capable regressor) and create survival predictors - i.e., reducers from
survival prediction to probabilistic supervised regression.

The two compositors added are common simple baselines in survival
regression:
* `FitUncensored` - subsets to uncensored data and fits on the subsample
* `ConditionUncensored` - adds `C` as column in `fit`, and fills 0
(uncensored) for the same column in `predict`-like methods.

Depends on #157
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement implementing framework Implementing or improving framework for learning tasks, e.g., base class functionality module:survival&time-to-event module for time-to-event prediction aka survival prediction
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant