Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial chunk reads #109

Closed
guigrpa opened this issue Oct 13, 2021 · 1 comment
Closed

Partial chunk reads #109

guigrpa opened this issue Oct 13, 2021 · 1 comment

Comments

@guigrpa
Copy link

guigrpa commented Oct 13, 2021

Are partial chunk reads supported, as is the case in zarr-python for datasets using Blosc compression? (see this issue and this merged PR).

We're interested in accessing extremely large public datasets (tens-hundreds of TB) with chunks as large as 100 MB, from a web application. Given their size, it's unlikely that we can create new copies with a more web-manageable chunk size (say, 1-2 MB). Any idea?

cc @manzt

@gzuidhof
Copy link
Owner

Hi @guigrpa,

Currently it doesn't have any special support for these queries, it is technically possible I presume (with a HTTP range request header to specify what part you want to read). I had a look at the merged PR, from what I understand it actually "reads" the entire file and then decompresses only part of it. Now "reading" of course has a different meaning (one can "open" a file that is local and then only actually access a part of it), on the web we have to do this through range requests.

I'm of course happy to accept a PR for this behavior, otherwise perhaps the best way to solve this is with some intermediate service on a server that takes requests for a smaller chunk size, translating it to the larger chunks and serving them partially (and it should probably have a cache for the parts that you access often). I hope that makes sense!

@guigrpa guigrpa closed this as completed Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants