-
-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formula documentation for predict_partial_hazard
function with categorical features
#1645
Comments
predict_partial_hazard
function with categorical features
Could anyone please help with the above question? |
reading the code, categorical inputs are transformed into one-hot columns, and the mean of that column from the training set is subtracted, then betas are applied. |
@CamDavidsonPilon Thank you very much for your help with my question, really appreciate your help. From your answer, I have two quick clarification questions:
|
|
@CamDavidsonPilon Thank you very much for your quick reply. Let me rephrase question 2 by a concrete example: let's say we have (https://web.archive.org/web/20070630025831/https://www.stat.nus.edu.sg/%7Estachenz/ST3242Notes3.pdf --- From page 2 of this slide, without de-meaning, we won't have this sum: |
We do, yes. Authors may choose the include demeaning in their formulas or not, but implementations must choose. Demeaning typically leads to better numerical stability, so lifelines demeans. Demeaning isn't that important, either: the output of the predict_partial_hazard is a meaningless number, only good for ranking / ratios, and a mean / demeaned prediction doesn't effect this. |
Does anyone happen to know the formula that is used in
predict_partial_hazard
function of the classCoxPHFitter
when the features have some categorical variables, each of which might have at least3
values (e.g. IDs, day of week)?The text was updated successfully, but these errors were encountered: