-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] GLM
with multiple distributions
and link
function support
#384
Conversation
I have thought of 2 possible ways to pass the Design 1Idea behind this is that I initialize the object of the class with whether
Design 2Idea here is that I pass the
My ConclusionBoth the designs above are not a foolproof way but they both give correct answers. Since we cannot add a test setting for both these I am not sure I can think of a better way to do it where it can be added to the test setting too. |
A strong design principle in As Option 2 violates that principle (offset and exposure are part of the data), as your cons imply, I would have a very strong preference towards option 1. I would vary the idea a little, by adding parameters Re testing, this will require a separate test added to a glm speific test module. |
I am little unsure on what this means. Do you mean to add these 2 parameters Could you please elaborate the idea a little as to what exactly the |
I was suggesting to replace these with two more informative variables, concretely: Replace
The type would be a single
I think it is fine to pass data schema references to the specification, that is different from the data itself (i.e., the entries of the data frame). |
skpro/regression/linear/_glm.py
Outdated
def __init__( | ||
self, | ||
family="Normal", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would have been great to have these as the first params from the get-go, but right now we cannot add these at the start of the parameter list due to the deprecation policy - it would make user code break.
We have to add the new params at the end, and I would suggest following deprecation policy to move them to the start eventually, see https://www.sktime.net/en/latest/developer_guide/deprecation.html
Yea this makes more sense and I have completed implementing it that way and also re ordered the new params to the end. Next we will have to work on adding test setting for it. |
great! btw, if you want to move the position of the parameters later, we should follow the "move parameter position" recipe - we can make the change right away. |
skpro/regression/linear/_glm.py
Outdated
@@ -201,6 +332,88 @@ def __init__( | |||
self.max_start_irls = max_start_irls | |||
self.add_constant = add_constant | |||
|
|||
if family == "0": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks excessive - have you looked at the deprecation handling? It has examples at the end.
I think we should avoid setting new defaults for unaffected parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Great extension.
Small blocking change requests relate to:
- the docstring should be clearer on how the new parameters are handled
- the change pattern is a bit excessive, so there is risk of error. Have you checked that these are indeed the right defaults?
- there is a lot of repetition - if this were "final" code I would say this should be made more DRY
I've made the changes based on the review please let me know if anything else needs modification. |
skpro/regression/linear/_glm.py
Outdated
@@ -182,11 +314,21 @@ def __init__( | |||
disp=False, | |||
max_start_irls=3, | |||
add_constant=False, | |||
family="0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is actually no condition that triggers the warning. The correct condition would be, "user passes a positional argument to __init__
", but that is sth we cannot easily detect.
Unless you have an idea how to do that, I would suggest we use the final defaults already and not "0"
etc, and we always raise the warning.
Reference Issues/PRs
fixes #383 and closes #230
What does this implement/fix? Explain your changes.
This creates an adapter converting the statsmodels GLM families to skpro equivalent giving
GLM
s a broader capability of distributions and link functions.Does your contribution introduce a new dependency? If yes, which one?
No
Did you add any tests for the change?
Yes
PR checklist
For all contributions
How to: add yourself to the all-contributors file in the
skpro
root directory (not theCONTRIBUTORS.md
). Common badges:code
- fixing a bug, or adding code logic.doc
- writing or improving documentation or docstrings.bug
- reporting or diagnosing a bug (get this pluscode
if you also fixed the bug in the PR).maintenance
- CI, test framework, release.See here for full badge reference
For new estimators
docs/source/api_reference/taskname.rst
, follow the pattern.Examples
section.python_dependencies
tag and ensureddependency isolation, see the estimator dependencies guide.