Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

posterior_traj example errors #497

Closed
jgabry opened this issue Dec 19, 2020 · 12 comments · Fixed by #499
Closed

posterior_traj example errors #497

jgabry opened this issue Dec 19, 2020 · 12 comments · Fixed by #499
Labels

Comments

@jgabry
Copy link
Member

jgabry commented Dec 19, 2020

@sambrilleman I'm working on #496 and running r cmd check I noticed that this part of the posterior_traj() example

rstanarm/R/posterior_traj.R

Lines 248 to 249 in aab5e59

#' nd <- data.frame(id = rep("new1", 11), year = (0:10 / 2))
#' pt7 <- posterior_traj(example_jm, newdataLong = nd, dynamic = FALSE)

fails with this error:

Error in eval(predvars, data, env) : object 'logBili' not found

Same error happens with the next part of the example:

rstanarm/R/posterior_traj.R

Lines 270 to 271 in aab5e59

#' pt8 <- posterior_traj(example_jm, newdataLong = nd, dynamic = FALSE,
#' re.form = NA)

I think this is because logBili also needs to be in newdata. @sambrilleman I'm happy to fix this but I'm not sure what values logBili should have for these posterior_traj() examples. Should it just be set to its mean from the data? Or multiple different values?

@sambrilleman
Copy link
Collaborator

Hi @jgabry -- sorry for the slow reply. Just going to look at this now. I can't reproduce the bug using my installed version of rstanarm, but I think I am running the feature/survival branch, so I will install master and try again...

@sambrilleman
Copy link
Collaborator

I can't reproduce the bug using the CRAN version either.

So, looking back at recent commits, I am thinking that it might be this model.frame() call that got introduced to generate the offset: 91e63ed#diff-cbb1390f94fffef576647383d8a202057c69438b1a61bdbaff1d61d279e84d70R401

The user shouldn't need to provide logBili in the new data, because they are predicting the outcome and therefore only need to supply covariate values. So my guess is that we want to drop the outcome variable from the terms so that the model.frame() call I linked above is generated using the covariate data only. I haven't verified this yet, but it is my guess. I can't remember how we've done this elsewhere. I will have a look now.

@sambrilleman
Copy link
Collaborator

@pamelanluna -- any chance you are able to assist? What is the intended behaviour here:

newOffset <- model.offset(model.frame(terms(glmod), newX))

The posterior_traj() function is essentially just calling posterior_predict() with a covariate matrix that has interpolation and extrapolation for the time variable (but otherwise uses the covariates provided by the user). So what should be the offset used in these predictions? Is it just a rep of the values for the offset for a single individual (e.g. is the offset constant within an individual, and the offset value is repeated along the interpolation/extrapolation time sequence)?

In any case, there doesn't appear to have been an offset argument added to posterior_traj(), just to posterior_predict(), which may or may not be intentional?

I guess one quick fix option, as unattractive as it is, would be to force the user to call posterior_predict() directly, instead of the convenience function posterior_traj() when the estimated model had an offset. This of course wouldn't solve the issue, but just avoid us having to deal with it.

@jgabry
Copy link
Member Author

jgabry commented Dec 20, 2020

Thanks for looking in to this. One reason we didn't notice this right away was because Travis CI wasn't working properly for rstanarm. With #496 I'm adding R cmd check via GitHub Actions which will help us detect things like this immediately.

sorry for the slow reply. Just going to look at this now.

No worries at all. I opened the issue on a Friday night so I wasn't expecting an immediate reply!

@jgabry
Copy link
Member Author

jgabry commented Dec 20, 2020

And I think you're right that this has to do with the model.offset() stuff. Here's what R cmd check says:

  Error in eval(predvars, data, env) : object 'logBili' not found
  Calls: posterior_traj ... model.offset -> model.frame -> model.frame.default -> eval -> eval
  Execution halted

@pamelanluna
Copy link
Contributor

@pamelanluna -- any chance you are able to assist? What is the intended behaviour here:

newOffset <- model.offset(model.frame(terms(glmod), newX))

The posterior_traj() function is essentially just calling posterior_predict() with a covariate matrix that has interpolation and extrapolation for the time variable (but otherwise uses the covariates provided by the user). So what should be the offset used in these predictions? Is it just a rep of the values for the offset for a single individual (e.g. is the offset constant within an individual, and the offset value is repeated along the interpolation/extrapolation time sequence)?

The offset is defined in model formula (e.g., y ~ x + offset(log(z)) ) and therefore depends on the covariates provided by the user. The values are sample-specific and are interpolated/extrapolated using the same code as the other covariates. The main purpose of this line of code is to extract the offset variable from the newX object. The issue is stemming from the fact that the new data does not include the response variable (which it obviously shouldn't need to include), but the call to terms() retrieves all of the variables from the formula. There's probably another way to do this that doesn't require having the response variable.

In any case, there doesn't appear to have been an offset argument added to posterior_traj(), just to posterior_predict(), which may or may not be intentional?

This was done on purpose since the offset is defined within the model formula. The offset shouldn't be redefined with posterior predictions but will need the relevant covariate from the data to calculate the offset. It's been awhile, so I don't remember if I included the offset variable in posterior_predict() to keep it consistent with posterior_predict.stanreg() or if it was necessary to have the code work properly.

I guess one quick fix option, as unattractive as it is, would be to force the user to call posterior_predict() directly, instead of the convenience function posterior_traj() when the estimated model had an offset. This of course wouldn't solve the issue, but just avoid us having to deal with it.

As mentioned above, the main issue is from the exclusion of the response variable in the new data, so this approach wouldn't fix the case when there is an offset but no response variable. An equivalent quick fix would be the following:

newOffset <- if (glmod$hasOffset) model.offset(model.frame(terms(glmod), newX)) else NULL

Again, this would not fix the case of no response variable when there is an offset, but it would get the tests to pass for now. I probably won't have time to figure out the changes for that case until next week.

@sambrilleman
Copy link
Collaborator

Thanks @pamelanluna! This helps clear up my thinking a bit -- yeah, that makes sense that the offset is just joined on in a similar fashion to the other covariates. Thanks for clarifying.

@jgabry -- is there an option in lme4::terms.merMod() to drop the LHS of the terms? Or I think we have some internal utility functions in rstanarm for that kinda thing (something like RHS()?). Essentially we are just wanting to drop the response when building the model frame, and then extract the offset from that model frame. So a terms object without the response would solve our issue I think.

@jgabry
Copy link
Member Author

jgabry commented Dec 22, 2020

I don't think we have that for the terms objects unfortunately. What about just creating a fake response variable that is only used temporarily to avoid this error? For example,

# need to get the name of the response variable (is there an easier way?)
response_name <- as.character(formula(object)$Long1)[2]

# create a temporary data frame with a column of just 0s (or whatever) as a placeholder for response
newX_temp <- cbind(0, newX)
colnames(newX_temp) <- c(response_name, colnames(newX))

# doesn't error anymore if using newX_temp
newOffset <- model.offset(model.frame(terms(glmod), newX_temp))

@sambrilleman
Copy link
Collaborator

Thanks @jgabry! -- yeah I think this would work. The only bit that might need changing is we'd need to extract the formula for submodel m. So formula(object)[[m]] instead of formula(object)$Long1.

Also if we wanted to follow a principal of only doing this when necessary, then you could nest it under an if condition like @pamelanluna suggested, something like:

if (glmod$hasOffset) {
     # need to get the name of the response variable (is there an easier way?)
     response_name <- as.character(formula(object)[[m]])[2]

     # create a temporary data frame with a column of just 0s (or whatever) as a placeholder for response
     newX_temp <- cbind(0, newX)
     colnames(newX_temp) <- c(response_name, colnames(newX))

    # doesn't error anymore if using newX_temp
    newOffset <- model.offset(model.frame(terms(glmod), newX_temp))
} else {
    newOffset <- NULL
}

P.s. this is untested at my end -- infact I'm typing this from a phone!

jgabry added a commit that referenced this issue Dec 24, 2020
@jgabry
Copy link
Member Author

jgabry commented Dec 24, 2020

Ok I think this should work. About to open a PR fixing this.

@jgabry
Copy link
Member Author

jgabry commented May 8, 2021

@sambrilleman @pamelanluna Finally coming back to this because I'm hoping we can release an rstanarm update after we get the next release of rstan out. Unfortunately, my fix in PR #499 works to fix the case where there's no offset (and the examples in the doc don't fail anymore) but I can't figure out how to fix the case where there's an offset. More details in #499 (comment)

@jgabry
Copy link
Member Author

jgabry commented May 8, 2021

Actually I think I figured it out. More details in #499 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants