-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add to_icechunk
for xarray
#357
Conversation
e2cc4d3
to
ea401fe
Compare
This adds two APIs. One is "easy" and does everything in one step, the other allows extra control over initializing the store, and subsequent writes.
from icechunk.xarray import XarrayDatasetWriter
writer = XarrayDatasetWriter(ds, store=store)
writer.write_metadata(group="new2", mode="w") # write metadata
writer.write_eager() # write in-memory arrays
writer.write_lazy() # eagerly write dask arrays The |
bbed44f
to
42c963a
Compare
|
||
Write metadata for arrays in one step. This "initializes" the store but has not written an real values yet. | ||
```python | ||
writer.write_metadata(group="new2", mode="w") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think about appends & region writes here.
Naive question: if Also should we call cc @mpiannucci |
It is not. |
Can you expand on what this means in practice? If I have a big distributed dask array, I call to |
c85c3ba
to
322857c
Compare
The store object you are holding has no record of the distributed writes because those changes were never communicated back. So you'll receive all fill values when trying to read: #383 |
322857c
to
70fcf94
Compare
70fcf94
to
6970a86
Compare
433f017
to
ee14c1e
Compare
* main: Linting with ruff (#394)
latest_layer = f"{aggprefix}-{depth}-{token}" | ||
|
||
layers[latest_layer] = partial_reduce( | ||
aggregate, keys, layer_name=latest_layer, split_every=split_every |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥵
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a masterpiece Deepak!
Reviewed the code and tests and didn't spot any red flags. I think I get the gist of how it works, but I'm not confident I'd be able to identify any bugs there anyway. The tests inspire great confidence.
I left a bunch of suggestions on the docs.
``` | ||
Note that the chunks in the store are a divisor of the dask chunks. This means each individual | ||
write task is independent, and will not conflict. It is your responsibility to ensure that such | ||
conflicts are avoided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've done so much work in Xarray around safe chunks, region writes, etc. It's a pity that we can't reuse any of that here.
Co-authored-by: Ryan Abernathey <[email protected]>
Co-authored-by: Ryan Abernathey <[email protected]>
Adds
icechunk.xarray.to_icechunk
.TODO:
load_stored
kwarg todask.array.store
. dask/dask#11465updateexamples/dask_write.py
I chose to not update
dask_write.py
because it does show a different pattern that uses dask as an orchestrator and not dask.array.