-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
auto apply Grouping()
to grouping variables
#652
Conversation
Codecov ReportPatch coverage is
📢 Thoughts on this report? Let us know!. |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno about this, it's hacky and feels potentially pretty fragile to do this kind of introspection BEFORE the normal apply_schema bits have happened. it's also techncally breaking (albeit for a very small subset of models that we're likely to encounter in practice). it's also fragile since it doesn't work with interaction terms or nesting (IIUC) and so folks will still need to remember to specify Grouping() for grouping factors when fitting models with large N grouping vars. and on top of that this takes away teh guardrail of catching OOMErrors and giving a helpful hint, so it makes it even harder to track down the issue when it does come up...
src/grouping.jl
Outdated
# this arises when there's an interaction as a grouping variable without a corresponding | ||
# non-interaction grouping, e.g.urban&dist in the contra dataset | ||
# adapted from https://github.com/JuliaStats/StatsModels.jl/blob/463eb0acb49bc5428374d749c4da90ea2a6c74c4/src/schema.jl#L355-L372 | ||
function StatsModels.apply_schema( | ||
t::CategoricalTerm{Grouping}, | ||
schema::FullRank, | ||
Mod::Type{<:MixedModel}, | ||
context::AbstractTerm, | ||
) | ||
aliased = drop_term(context, t) | ||
@debug "$t in context of $context: aliases $aliased\n seen already: $(schema.already)" | ||
for seen in schema.already | ||
if StatsModels.symequal(aliased, seen) | ||
@debug " aliased term already present: $seen" | ||
return t | ||
end | ||
end | ||
# aliased term not seen already: | ||
# add aliased term to already seen: | ||
push!(schema.already, aliased) | ||
# repair: | ||
new_contrasts = StatsModels.ContrastsMatrix(Grouping(), t.contrasts.levels) | ||
t = CategoricalTerm(t.sym, new_contrasts) | ||
@debug " aliased term absent, repairing: $t" | ||
return t | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is necessary because otherwise you'd get a full rank dummy right? I'm not sure I understand why this is now necessary when it wasn't before... actually, now that I think about it, the full rank logic shoudl probably not be applied to the RHS of ranef terms right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote that so long ago, I don't remember why. It also looks like something I adapted from something you wrote -- it's that weird mix of our styles. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember now (and it's in the comment above the function):
this arises when there's an interaction as a grouping variable without a corresponding non-interaction grouping, e.g.
urban&dist
in the contra dataset.
Without the FullRank
restriction, we get a method ambiguity.
@kleinschmidt I should update the comment, but it does work with interaction and nesting terms as grouping variables. For the cases where you want to override it (i.e., a variable appearing in a grouping term and in the fixed effects), the OOM is going to happen anyway when you compute the contrasts for the fixed effects. So I think that bit is okay. I also think it feels wonky and brittle though. If I had a less brittle way to do it, I would it! I was hoping you had some insight. 😄 |
This reverts commit 386ede3.
…nto pa/autogrouping
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for following up on this. I think it will be a big deal for teaching if we don't have to remind people to use Grouping()
contrasts.
I didn't do a line-by-line review as
-
I am not as up-to-speed on the schema, etc. logic as I should be
-
I trust you know what you are doing
docs/NEWS-update.jl
to update the cross-references.