-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweak file opening #213
Tweak file opening #213
Changes from all commits
059b171
86118f0
db6fad8
5513ae4
98e5090
c001e71
8ed5af9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
""" | ||
Functions related to creating fsspec references. | ||
""" | ||
|
||
from typing import Dict, Tuple, Union | ||
|
||
from fsspec_reference_maker.hdf import SingleHdf5ToZarr | ||
|
||
|
||
def create_hdf5_reference( | ||
fp, fname: str, url: str, inline_threshold: int = 300, **netcdf_storage_options | ||
) -> Dict: | ||
h5chunks = SingleHdf5ToZarr(fp, url, inline_threshold=inline_threshold) | ||
reference_data = h5chunks.translate() | ||
return reference_data | ||
|
||
|
||
def unstrip_protocol(name: str, protocol: Union[str, Tuple[str, ...]]) -> str: | ||
# should be upstreamed into fsspec and maybe also | ||
# be a method on an OpenFile | ||
if isinstance(protocol, str): | ||
if name.startswith(protocol): | ||
return name | ||
return protocol + "://" + name | ||
else: | ||
if name.startswith(tuple(protocol)): | ||
return name | ||
return protocol[0] + "://" + name |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,10 @@ | |
|
||
# fsspec doesn't provide type hints, so I'm not sure what the write type is for open files | ||
OpenFileType = Any | ||
# https://github.com/pangeo-forge/pangeo-forge-recipes/pull/213#discussion_r717801623 | ||
# There is no fool-proof method to tell whether the output of the context was created by fsspec. | ||
# You could check for the few concrete classes that we expect | ||
# like AbstractBufferedFile, LocalFileOpener. | ||
|
||
|
||
def _get_url_size(fname, secrets, **open_kwargs): | ||
|
@@ -116,14 +120,15 @@ def size(self, path: str) -> int: | |
return self.fs.size(self._full_path(path)) | ||
|
||
@contextmanager | ||
def open(self, path: str, **kwargs) -> Iterator[None]: | ||
def open(self, path: str, **kwargs) -> Iterator[OpenFileType]: | ||
"""Open file with a context manager.""" | ||
full_path = self._full_path(path) | ||
logger.debug(f"entering fs.open context manager for {full_path}") | ||
with self.fs.open(full_path, **kwargs) as f: | ||
logger.debug(f"FSSpecTarget.open yielding {f}") | ||
yield f | ||
logger.debug("FSSpecTarget.open yielded") | ||
of = self.fs.open(full_path, **kwargs) | ||
logger.debug(f"FSSpecTarget.open yielding {of}") | ||
yield of | ||
logger.debug("FSSpecTarget.open yielded") | ||
of.close() | ||
Comment on lines
+127
to
+131
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a subtle but important change. We used to do with self.fs.open(full_path, **kwargs) as f:
yield f # -> _io.BufferedReader Now we basically do yield self.fs.open(full_path, **kwargs) # -> fsspec.core.OpenFile This passes all our tests, but this is a sensitive area of the code. @martindurant, can you think of anything that could go wrong here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correction: self.fs.open returns a file-like object (io.IOBase subclass, AbstractBufferedFile for s3, gcs, etc.) For an AbstractBufferedFile, entering the context is a no-op yielding the same object, and exiting calls For fsspec.open, you get OpenFile objects, which can contain more than one file-like layer, for compression and text mode. You must use this with a context, to get the outermost file-like object back. Leaving the context calls close on the file-like objects in reverse order. If you, instead, call |
||
|
||
def __post_init__(self): | ||
if not self.fs.isdir(self.root_path): | ||
|
@@ -189,7 +194,7 @@ def file_opener( | |
bypass_open: bool = False, | ||
secrets: Optional[dict] = None, | ||
**open_kwargs, | ||
) -> Iterator[Union[OpenFileType, str]]: | ||
) -> Iterator[Union[fsspec.core.OpenFile, str]]: | ||
""" | ||
Context manager for opening files. | ||
|
||
|
@@ -201,6 +206,7 @@ def file_opener( | |
before opening. In this case, function yields a path name rather than an open file. | ||
:param bypass_open: If True, skip trying to open the file at all and just | ||
return the filename back directly. (A fancy way of doing nothing!) | ||
:param secrets: Dictionary of secrets to encode into the query string. | ||
""" | ||
|
||
if bypass_open: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martindurant - could you explain this
else
block? Under what circumstances wouldprotocol
not be a string?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be a tuple of strings, like
("gcs", "gs")