Partial chunk reads #109

guigrpa · 2021-10-13T20:02:22Z

Are partial chunk reads supported, as is the case in zarr-python for datasets using Blosc compression? (see this issue and this merged PR).

We're interested in accessing extremely large public datasets (tens-hundreds of TB) with chunks as large as 100 MB, from a web application. Given their size, it's unlikely that we can create new copies with a more web-manageable chunk size (say, 1-2 MB). Any idea?

cc @manzt

gzuidhof · 2021-10-14T13:11:48Z

Hi @guigrpa,

Currently it doesn't have any special support for these queries, it is technically possible I presume (with a HTTP range request header to specify what part you want to read). I had a look at the merged PR, from what I understand it actually "reads" the entire file and then decompresses only part of it. Now "reading" of course has a different meaning (one can "open" a file that is local and then only actually access a part of it), on the web we have to do this through range requests.

I'm of course happy to accept a PR for this behavior, otherwise perhaps the best way to solve this is with some intermediate service on a server that takes requests for a smaller chunk size, translating it to the larger chunks and serving them partially (and it should probably have a cache for the parts that you access often). I hope that makes sense!

guigrpa closed this as completed Mar 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial chunk reads #109

Partial chunk reads #109

guigrpa commented Oct 13, 2021

gzuidhof commented Oct 14, 2021

Partial chunk reads #109

Partial chunk reads #109

Comments

guigrpa commented Oct 13, 2021

gzuidhof commented Oct 14, 2021