Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparseMatrixCSC support for fixed effects #309

Merged
merged 9 commits into from
Sep 30, 2020

Conversation

behinger
Copy link
Contributor

The only thing not working with sparse designmatrices is the rank-check of the fixed effects matrix. So I used multiple dispatch to circumvent it.

It might be worthwhile to use rank(X) in addition for sparse matrices, but I don't really understand the underlying motiv to use QR decomposition for the rank check (because I don't know what pivoting does ;)).

this is probably something @palday wants to look at.

PS: Doug asked me at some point to push to MixedModels.jl what I changed for unfold.jl to get it running with sparse designs. I think this is the only thing

@palday
Copy link
Member

palday commented Apr 28, 2020

The Cholesky decomposition + pivoting is what allows us to work with rank-deficient fixef matrices -- basically the extra columns are moved out of the way for all the numerics and assigned a set coefficient estimate of zero and standard error of NaN.

@palday
Copy link
Member

palday commented Apr 28, 2020

Can you write some tests? There are a few tweaks I'll do to the method (using parametrics instead of eltype and adding in the pivoting), but having tests to check things would be nice. :)

@behinger
Copy link
Contributor Author

sure, I will have a look at the fixef tests and write some for sparse matrices. Not sure yet when I will find time for it though!

@dmbates
Copy link
Collaborator

dmbates commented Apr 29, 2020

Thanks for the PR @behinger . It is probably not necessary to worry about the rank deficient case for the time being.

Are you intending that the products of the form Z'X and X'X will be dense? If so, it will straightforward to do the adjustments for rank-deficient sparse X. Even a sparse Z'X is okay if we need it. The tricky bit is working with a sparse X'X.

In any case it will help to have an example to work on.

BTW, I liked your 5-part tweet on why you are enjoying using Julia.

@behinger
Copy link
Contributor Author

behinger commented May 1, 2020

I started coding up a (comparatively simple) example for a single subject sparse designmatrix as used in unfold.jl.

I didn't finish it for a full MixedModels, partially due to time, partially because I did not know where to best put it in MixedModels.jl (keep in the unittests?). Or do you prefer a pregenerated csv-example?

Regarding your questions: in my brief tests X'X was sometimes sparse (not in this example though). I have to check Z'X - but julia CSV stopped working for now so I will comment later

@behinger
Copy link
Contributor Author

behinger commented May 1, 2020

Z'X seems to be dense in the cases I tested (currently only possible to have one random grouping variable due to limitations in mixedmodels.jl)

test/matrixterm.jl Outdated Show resolved Hide resolved
test/matrixterm.jl Outdated Show resolved Hide resolved
Copy link
Member

@palday palday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed things to use StableRNGs because there the MersenneTwister stream can (and does!) change between Julia versions.

The test looks like something straight from unfold. :) I suspect we could make it more minimal for actually testing functionality here. I would do something like:

  1. construct a standard model from the test datasets
  2. swap out its model matrix with a sparse version of it
  3. fit it to make sure everything works.

I'll investigate how well this actually works later. 😄

test/matrixterm.jl Outdated Show resolved Hide resolved
@palday
Copy link
Member

palday commented Sep 28, 2020

@dmbates How do you feel about this? I think the missing rank-deficiency checks are fine for the sparse FeMat because if you're constructing your own design matrix instead of using the formula interface, then you should be able to make sure its full column rank.

@dmbates dmbates merged commit b8defc7 into JuliaStats:master Sep 30, 2020
@palday palday deleted the patch-1 branch September 30, 2020 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants