Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shortening edge chunks #233

Open
jakirkham opened this issue Jan 26, 2018 · 3 comments
Open

Shortening edge chunks #233

jakirkham opened this issue Jan 26, 2018 · 3 comments
Labels
enhancement New features or improvements

Comments

@jakirkham
Copy link
Member

In the process of investigating an analogous format ( https://github.com/zarr-developers/zarr/issues/231 ), one of the points raised was that chunks in another implementation shorten their edge chunks. As a trivial example, if an array has a shape (5,) and chunk size of (2,), the last chunk will be smaller than the other ones. Currently we write out this chunk to the same size file (even though we effectively ignore the extra bytes).

However we could opt to write this chunk out as a smaller file, since the extra bytes would be unneeded. If this were implemented in a consistent manner, it seems like it should be possible to compute the truncated shape of these edge chunks. This simply using the shape, number of chunks, and our current chunk index. It would also be easy to check whether we are handling a truncated chunk or not by simply comparing the size to that of a typical chunk before reshaping. Thus handling files that don't use truncating in a compatible way.

Though if we would like to be more explicit, we could also include a Zarr Array option (default disabled?) for this behavior and write it out to .zarray to be explicit.

@jakirkham
Copy link
Member Author

Any thoughts on this @alimanfoo?

@alimanfoo
Copy link
Member

alimanfoo commented Feb 15, 2018 via email

@jakirkham
Copy link
Member Author

If we relaxed the uniform chunking requirement more generally ( https://github.com/zarr-developers/zarr/issues/245 ), it could solve this issue and still allow for fast appends amongst other nice benefits. Admittedly there would be some overhead involved in tracking non-trivial chunk sizes. So it would need some thought/evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features or improvements
Projects
None yet
Development

No branches or pull requests

3 participants