[ENH] design - dealing with incomplete distributions such as predictive survival function estimates #249

fkiraly · 2024-04-17T12:33:54Z

Design and discussion issue how to deal with the following:

Some algorithms and packages produce distributional predictions that are incomplete, in the sense that they specify a full predictive distribution almost but not entirely.

This is in tension with the predict_proba interface which states that it returns a full distribution (full as in, fully specified).

Examples of such returns are Kaplan-Meier or conditional survival function (= one minus cdf) estimates, where function evaluates are available only at some points of the prediction range, rather than over the entire range.

A conrete example output - given by both scikit-survival and lifeline packages - is a 2D numpy array, with one index corresponding to instances on the test/inference set, and the other index corresponding to time points at which the survival function is evaluated. Entries are the predicted survival for the given instance.

Even if we make the approximative assumption that the predicted distribution is supported only at the time points observed in the training data (i.e., sum of weighted delta), there are boundary effects which prevent a bijective mapping onto fully specified probability distributions.

For instance, consider the predictions where survival is estimated as constant zero, or constant one - here, the survival model makes a reasonable prediction that the instances dies before, or survives until afer the first or last point in the training data.
Similar boundary effects occur when attempting to mapping onto an empirical distribution.

These are not severe, if the first and last probability are close to one and zero, respectively, but are the more impactful the more this does not hold.

There are multiple questions in this:

if we return Empirical distibutions, what is the best choice?
or are there better choices of returned distributions?
taking even more steps back, should there be a separate interface point or separat object type even, for incomplete distributions? Or, improper distributions?

The text was updated successfully, but these errors were encountered:

fkiraly · 2024-04-17T12:36:02Z

@VascoSch92, this may be of interest to you because of:

relation to defining mathematical objects and mappings between them (full distribution to almost fully specified distribution, not bijective)
relation to the set/domain specification, in either design step
general questions related to mapping mathematical objects on classes and interfaces which we are also encountering in sequentium

fkiraly added module:probability&simulation probability distributions and simulators module:regression probabilistic regression module module:survival&time-to-event module for time-to-event prediction aka survival prediction API design API design & software architecture labels Apr 17, 2024

fkiraly mentioned this issue Apr 19, 2024

[BUG] AalenAdditive regressor predicts improper survival function CamDavidsonPilon/lifelines#1606

Open

fkiraly mentioned this issue May 6, 2024

[ENH] ngboost survival prediction model #290

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] design - dealing with incomplete distributions such as predictive survival function estimates #249

[ENH] design - dealing with incomplete distributions such as predictive survival function estimates #249

fkiraly commented Apr 17, 2024 •

edited

Loading

fkiraly commented Apr 17, 2024

[ENH] design - dealing with incomplete distributions such as predictive survival function estimates #249

[ENH] design - dealing with incomplete distributions such as predictive survival function estimates #249

Comments

fkiraly commented Apr 17, 2024 • edited Loading

fkiraly commented Apr 17, 2024

fkiraly commented Apr 17, 2024 •

edited

Loading