-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: basic fsspec writing #1016
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we know uproot will rewrite many parts of the file as it progresses, I wonder if it makes sense to universally append a simplecache::
to any protocol that isn't already a local one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It all looks good! I see that it's tested for local files, in-memory files, and ssh, which are three cases that make sense (depending on how remote the ssh server is).
I'll need to know more about what a "chain of filesystems" would do. Is it something we'd even be able to use, given our API?
I think it's ready to merge. I don't know why the one case below isn't raising an error, and it's an odd way to make a note-to-self about what to do next.
I guess it has some niche uses, for example, opening a root file inside a zip file available via http. It should also work for writing. But in general I think only the local caching for writing will be used "frequently". |
👋 I just wanted to raise that this PR breaks def test_export_root_histogram(mocker, tmp_path):
"""
Test that pyhf.writexml._export_root_histogram writes out a histogram
in the manner that uproot is expecting and verifies this by reading
the serialized file
"""
mocker.patch("pyhf.writexml._ROOT_DATA_FILE", {})
pyhf.writexml._export_root_histogram("hist", [0, 1, 2, 3, 4, 5, 6, 7, 8])
with uproot.recreate(tmp_path.joinpath("test_export_root_histogram.root")) as file:
file["hist"] = pyhf.writexml._ROOT_DATA_FILE["hist"] which then fails with a Collapsed for space:========================================================================================== FAILURES ==========================================================================================
_________________________________________________________________________________ test_export_root_histogram _________________________________________________________________________________
mocker = <pytest_mock.plugin.MockerFixture object at 0x7ff50e8d4cd0>, tmp_path = PosixPath('/tmp/pytest-of-feickert/pytest-11/test_export_root_histogram0')
def test_export_root_histogram(mocker, tmp_path):
"""
Test that pyhf.writexml._export_root_histogram writes out a histogram
in the manner that uproot is expecting and verifies this by reading
the serialized file
"""
mocker.patch("pyhf.writexml._ROOT_DATA_FILE", {})
pyhf.writexml._export_root_histogram("hist", [0, 1, 2, 3, 4, 5, 6, 7, 8])
> with uproot.recreate(tmp_path.joinpath("test_export_root_histogram.root")) as file:
mocker = <pytest_mock.plugin.MockerFixture object at 0x7ff50e8d4cd0>
tmp_path = PosixPath('/tmp/pytest-of-feickert/pytest-11/test_export_root_histogram0')
tests/test_export.py:420:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../.pyenv/versions/pyhf-dev-cpu/lib/python3.11/site-packages/uproot/writing/writable.py:148: in recreate
sink = _sink_from_path(file_path, **storage_options)
file_path = PosixPath('/tmp/pytest-of-feickert/pytest-11/test_export_root_histogram0/test_export_root_histogram.root')
options = {}
storage_options = {}
../../../.pyenv/versions/pyhf-dev-cpu/lib/python3.11/site-packages/uproot/writing/writable.py:80: in _sink_from_path
return uproot.sink.file.FileSink.from_object(file_path_or_object)
file_path_or_object = PosixPath('/tmp/pytest-of-feickert/pytest-11/test_export_root_histogram0/test_export_root_histogram.root')
storage_options = {}
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cls = <class 'uproot.sink.file.FileSink'>, obj = PosixPath('/tmp/pytest-of-feickert/pytest-11/test_export_root_histogram0/test_export_root_histogram.root')
@classmethod
def from_object(cls, obj) -> FileSink:
"""
Args:
obj (file-like object): An object with ``read``, ``write``, ``seek``,
``tell``, and ``flush`` methods.
Creates a :doc:`uproot.sink.file.FileSink` from a file-like object, such
as ``io.BytesIO``. The object must be readable, writable, and seekable
with ``"r+b"`` mode semantics.
"""
if (
callable(getattr(obj, "read", None))
and callable(getattr(obj, "write", None))
and callable(getattr(obj, "seek", None))
and callable(getattr(obj, "tell", None))
and callable(getattr(obj, "flush", None))
and (not hasattr(obj, "readable") or obj.readable())
and (not hasattr(obj, "writable") or obj.writable())
and (not hasattr(obj, "seekable") or obj.seekable())
):
self = cls(None)
self._file = obj
else:
> raise TypeError(
"""writable file can only be created from a file path or an object
* that has 'read', 'write', 'seek', and 'tell' methods
* is 'readable() and writable() and seekable()'"""
)
E TypeError: writable file can only be created from a file path or an object
E
E * that has 'read', 'write', 'seek', and 'tell' methods
E * is 'readable() and writable() and seekable()'
cls = <class 'uproot.sink.file.FileSink'>
obj = PosixPath('/tmp/pytest-of-feickert/pytest-11/test_export_root_histogram0/test_export_root_histogram.root')
../../../.pyenv/versions/pyhf-dev-cpu/lib/python3.11/site-packages/uproot/sink/file.py:55: TypeError
================================================================================== short test summary info ===================================================================================
FAILED tests/test_export.py::test_export_root_histogram - TypeError: writable file can only be created from a file path or an object I can raise an Issue if that's helpful, but I thought I'd start here in case this is relevant to ongoing discussion. |
Hi, I ran into the same thing in import pathlib
import uproot
uproot.recreate(pathlib.Path("f.root")) I will open an issue for reference. -> #1029 |
Looking at the stack trace, it must be because pyhf was relying on passing a file-like object callable(getattr(obj, "read", None))
and callable(getattr(obj, "write", None))
and callable(getattr(obj, "seek", None))
and callable(getattr(obj, "tell", None))
and callable(getattr(obj, "flush", None))
and (not hasattr(obj, "readable") or obj.readable())
and (not hasattr(obj, "writable") or obj.writable())
and (not hasattr(obj, "seekable") or obj.seekable()) and this has somehow changed. (It wasn't supposed to.) @lobis? I just tried a test and I don't see what's wrong with it. import os
import numpy as np
import uproot
class FileLikeObject:
def __init__(self):
self.data = np.zeros(0, dtype=np.uint8)
self.pos = 0
def tobytes(self) -> bytes:
return self.data.tobytes()
def ensure(self, size: int) -> None:
if size > len(self.data):
data = np.zeros(size, dtype=np.uint8)
data[:len(self.data)] = self.data
self.data = data
def read(self, num_bytes: int) -> bytes:
self.pos += num_bytes
self.ensure(self.pos)
return self.data[self.pos - num_bytes : self.pos].tobytes()
def write(self, data: bytes) -> None:
self.pos += len(data)
self.ensure(self.pos)
self.data[self.pos - len(data) : self.pos] = np.frombuffer(data, np.uint8)
def seek(self, offset: int, whence: int = os.SEEK_SET) -> None:
assert whence in (os.SEEK_SET, os.SEEK_CUR, os.SEEK_END)
if whence == os.SEEK_SET:
self.pos = offset
elif whence == os.SEEK_CUR:
self.pos += offset
elif whence == os.SEEK_END:
self.pos = len(self.data) + offset
self.ensure(self.pos)
def tell(self) -> int:
return self.pos
def flush(self) -> None:
pass
# def readable(self) -> bool:
# return True
# def writable(self) -> bool:
# return True
# def seekable(self) -> bool:
# return True
file = FileLikeObject()
writable_file = uproot.recreate(file)
writable_file["hist"] = np.histogram(np.random.normal(0, 1, 1000), bins=100, range=(-5, 5))
file.tobytes() # lots of bytes
readable_file = uproot.open(file)
readable_file["hist"].to_hist() It doesn't matter whether the |
I think this happened via the |
I need to read all of my email before responding to anything... |
(sorry I didn't realise there was activity here after closing the issue). There is a check to check if the argument is a path or an object. It was doing the check poorly (checking if it's a string) and failed when a path object was being passed. Now it tests for object (with a new helper function), otherwise assumes path-like object (and transforms to string). #1031 is ready to be merged with the fix. |
This PR adds writing of root files via fsspec which allows writing files to some interesting backends such as ssh, memory, s3, etc. Some are not yet supported such as http(s). In principle all the fsspec backends that support "wb" mode should work fine.
If the backend requires some special option, it can be passed in the
uproot.recreate
call such asuproot.recreate("ssh:///tmp/file.root", host="localhost", user="user")
.This is a basic implementation. File is writen on close.
The implementation is as follows:
uproot.recreate
method. The sink will hold a reference to the object and is responsible for closing it.Minor features:
As mentioned this is a basic writer using fsspec. When PR becomes a requirement I will refactor the sink class and the surrounding code to make this cleaner.