Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] design - dealing with incomplete distributions such as predictive survival function estimates #249

Open
fkiraly opened this issue Apr 17, 2024 · 1 comment
Labels
API design API design & software architecture module:probability&simulation probability distributions and simulators module:regression probabilistic regression module module:survival&time-to-event module for time-to-event prediction aka survival prediction

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 17, 2024

Design and discussion issue how to deal with the following:

Some algorithms and packages produce distributional predictions that are incomplete, in the sense that they specify a full predictive distribution almost but not entirely.

This is in tension with the predict_proba interface which states that it returns a full distribution (full as in, fully specified).

Examples of such returns are Kaplan-Meier or conditional survival function (= one minus cdf) estimates, where function evaluates are available only at some points of the prediction range, rather than over the entire range.

A conrete example output - given by both scikit-survival and lifeline packages - is a 2D numpy array, with one index corresponding to instances on the test/inference set, and the other index corresponding to time points at which the survival function is evaluated. Entries are the predicted survival for the given instance.

Even if we make the approximative assumption that the predicted distribution is supported only at the time points observed in the training data (i.e., sum of weighted delta), there are boundary effects which prevent a bijective mapping onto fully specified probability distributions.

For instance, consider the predictions where survival is estimated as constant zero, or constant one - here, the survival model makes a reasonable prediction that the instances dies before, or survives until afer the first or last point in the training data.
Similar boundary effects occur when attempting to mapping onto an empirical distribution.

These are not severe, if the first and last probability are close to one and zero, respectively, but are the more impactful the more this does not hold.

There are multiple questions in this:

  • if we return Empirical distibutions, what is the best choice?
  • or are there better choices of returned distributions?
  • taking even more steps back, should there be a separate interface point or separat object type even, for incomplete distributions? Or, improper distributions?
@fkiraly fkiraly added module:probability&simulation probability distributions and simulators module:regression probabilistic regression module module:survival&time-to-event module for time-to-event prediction aka survival prediction API design API design & software architecture labels Apr 17, 2024
@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 17, 2024

@VascoSch92, this may be of interest to you because of:

  • relation to defining mathematical objects and mappings between them (full distribution to almost fully specified distribution, not bijective)
  • relation to the set/domain specification, in either design step
  • general questions related to mapping mathematical objects on classes and interfaces which we are also encountering in sequentium

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design API design & software architecture module:probability&simulation probability distributions and simulators module:regression probabilistic regression module module:survival&time-to-event module for time-to-event prediction aka survival prediction
Projects
None yet
Development

No branches or pull requests

1 participant