-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Allocate Stan data on a local stack in the model #848
[WIP] Allocate Stan data on a local stack in the model #848
Conversation
Shoot I meant to mark this as a draft but I don't see a button to change it to a draft |
It's hidden under the reviewers list before the Assignees section in little font. |
Weird it's not there for me? |
Thanks! My eyes must be going goofy |
…stexpr bool jacobian__ to write_array_impl, change name of read_cholesky_corr to read_cholesky_factor_corr
Alright so I got this almost up and running to the point that I can use this branch of stan to run a toy model with a few fixes the the cpp I'll discuss below. I'm using the diamonds example from posteriordb but modified to use Y ~ normal(Intercept + multiply(Xc, b), sigma); I also took the data Running cmdstan with current develop I get
and running with this PR and the stan branch I get
So pretty nice more than a 2x speedup!!! And these results should be even sillier once we add parallel chains to cmdstan since all the data will be shared across all of the threads without copies. Though there's still a few issues to tackle. I'm not doing the initializer list for transformed data correctly. I totally forgot that transformed data's integers and sizes can depend on the data (lol) so since they weren't being set till the body's constructor happened I wasn't allocating the right amount of memory for transformed data in the initializer list. I think I need to ditch the initializer because of the below issue. Right now I'm just initializing the local allocator with the default size of 64KB (i.e. calling the default constructor data {
int huge_N;
int huge_M;
int small_N;
matrix[huge_N, huge_M] X;
vector[small_N] other_X;
} Say |
Nice nice, this is cool!
If I remember from the easier issue, we're using special allocators here so that we can use type-logic to avoid copies in Math (to_arena knows to not copy these). If those special allocators just forward to malloc/free, then memory usage should be efficient. I don't think we need to stack-allocate these since these allocations shouldn't be performance-limiting themselves. I guess this also means the end of I'd like to get threading finished up before formally reviewing and pulling this stuff in though since they are both changes to the memory model and I am scared if I'm not focused on them each individually I'll miss stuff. |
I'm not sure what you mean here, the allocator we are using is just a
It actually should be fine. At least for now these can be made into
Yeah I agree. I'm going to make a pass at that today and try to flesh out the cmdstan side more |
I'm going to close this as I figured out a nicer way to do this and want to open it up in a separate PR |
Summary
This is a first hack at getting the data in a Stan model to sit a local stack allocator. To do this we need to
stan::math::stack_alloc
inside of the private members of the model classstan::io::initialize_data<{datas_type}>("data_name", context__, model_arena_, {data dimensions})`
That constructs the data from the context into the memory allocated by the
model_arena_
The code gen is not really all there yet, still fighting the fmt library a bit. There's two other big things this is missing though.
initialize_data
to take in a enum as a tag for which constraint to check for such asI'm fine with whatever folks think would be easier or better
*_opencl__
assignments (or how to do them in the initializer list). I think we could do a second pass of the data and if any are opencl tagged then we can also add those to the initializer. Though we will still need to keep opencl data derived from transformed data in the constructor body.It will probably be best if (1) waits on @rybern 's current refactor of the constraints and (2) might need to wait for @rok-cesnovar 's MIR stuff for OpenCL
Release notes
Allows the Stan model to place data and transformed data in a local stack.
Copyright and Licensing
By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)