-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ROS3 HDF5 driver or fsspec with local sparse cache for more efficient access. #307
Comments
Hi, I think this could be a nice idea, although there are 2 concerns I have:
If I understand this correctly, local files and streamed files are read in the same fashion, but the latter are only accessed in slices. If that's the case, in terms of usability, there might be added waiting time in between visualization of the slices, or even a possibility of an connection error? More critically, will all the information be presented on NWBE (acquisition/sweep series, as displayed in the below example screenshot), as assuming they wouldn't exist locally if not accessed? |
FWIW, the streaming tutorial is being updated in NeurodataWithoutBorders/pynwb#1526 to reflect possible setup with fsspec and sparse caching. I think such setup as demonstrated around would be ideal for nwb-explorer, while caching locally (and expiring eventually) accessed parts of the files, thus leading to fast performance for frequently accessed files, without needing to download them in full (unless they are accessed in full or just smaller than a default fsspec block size)
Since
correct! But there is possibility for a significant (x10, x100, ... ?) speed up in initial waiting time while avoiding lengthy or prohibitive in size initial download . After all it could just be an option as well -- either to download in full or provide cached or ROS3 access to nwb.
best to ask @bendichter but I guess the would come as requested. |
pynwb reads the entire structure of the HDF5 file and all attributes on the |
NWB Explorer is not really tested with all the streaming cases in mind yet, but it follows the pynwb ideas of separate listing and data fetching. So we need to instruct NWB explorer use the proper api (
io.read on a remote resource -- note that now external urls are downloaded locally by default
|
Relates to #264 as possibly avoidable via complete avoidance of fetching an .nwb file in full twice. Also might be of interest in the scope of the https://github.com/OpenSourceBrain/DANDIArchiveShowcase @anhknguyen96 is working on.
https://pynwb.readthedocs.io/en/stable/tutorials/advanced_io/streaming.html gives an example of how to use
ros3
HDF5 driver to access remote file on S3 bucket (e.g. dandiarchive) without downloading it in full.Another approach is HDF5 agnostic, using some fsspec but it would require pynwb to be able to open from an existing file handle which I am not sure if possible -- filed NeurodataWithoutBorders/pynwb#1525 . (well -- alternative is a fuse file system like the one provided by https://github.com/datalad/datalad-fuse/ for that file -- but might be too ad-hoc/heavy although quite possible via FUSE'ing an entire bucket whenever request comes in, and using local cache with some garbage-collection routines to prune it down once in a while).
The text was updated successfully, but these errors were encountered: