Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update streaming tutorial #1526
update streaming tutorial #1526
Changes from all commits
dd167a9
72df1ae
23ba659
39eab7c
fb8ea47
939a309
2d1e9b2
b4698f4
6a0248d
ee172cc
6410715
cfb7203
809cc32
7148823
ac2e8fc
7014fe7
3ce0a89
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the
s3fs
builds onfsspec
and since thes3fs
code is simpler than using the fullfsspec
I think it would be nice to add a subsection to theStreaming Method 2: fsspec
section that shows this code example fors3fs
as well. In this way you can still keep the tutorial to the two main options but still mention thes3fs
option, which may be sufficient and simpler for many users.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So s3fs is a specific implementation of the fsspec protocol. I changed over to using the more generic fsspec library with http for three reasons:
Do you really think this workflow is more complex than the s3fs syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Showing the
fsspec
example as the main example makes sense. If you think that showing the fulls3fs
example in addition is too confusing, then maybe we can just add a note along the lines of "The s3fs package also builds on fsspec and can be used to access S3 in a similar manner. s3fs provides additional convenience methods for interacting with S3 more specifically while fsspec provides greater flexibility with regard to supported remote file systems. Note, when using s3fs requires the use ofs3
URIs instead ofhttp
. "There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best would be if someone checks for what s3fs might be providing "on top" or in addition to pure straight fsspec. E.g. it could be some better choices of blocksize etc. I would have at least compared performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it true that s3fs provides additional convenience methods for interacting with S3 more specifically?
s3fs implements in
S3FileSystem
which inherits from fsspec'sAsyncFileSystem
abstract class. My impression is that it just implements the method of afsspec.FileSystem
without much else besides maybe credential controls that we don't use. You can switch to an s3 backend by simply changingfs=fsspec.filesystem("http")
tofs=fsspec.filesystem("s3")
, but then you'll have to supply the s3 path, not the http path.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to tell from the documentation what the specific differences are, but my impression is that s3fs adds support for
async
, improved logging, and S3 specific implementation of the file system functions. Purely to read an NWB file, this may not be big difference. Pending further investigation, maybe what we could do is to add a note to indicate that there are a growing number of libraries build on ffspec that provide functionality specific for different file systems (e.g., S3, Google Cloud, or Azure) and link to https://filesystem-spec.readthedocs.io/en/latest/api.html?highlight=S3#other-known-implementations. We could then post alternate examples, such as the code using S3FS, as GitHub Gists online and link to them from the same note. This would keep the docs clean and at the same time provide examples for folks that may want to use the more specific libraries build on fsspec. Given that we don't know how all the various options perform, we could also add a "disclaimer" along those lines.