Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reroute Video cattr to Video.from_filename #1327

Closed
wants to merge 1 commit into from

Conversation

roomrys
Copy link
Collaborator

@roomrys roomrys commented May 25, 2023

Description

Lately, we have been experiencing an intermittent failure revolving around restructuring Videos. Namely, we have seen that SingleImageVideos will occasionally be routed to the NumpyVideo backend when restructuring via cattr. This PR reroutes the cattr Video restructuring to Video.from_filename (which is the default method when adding Videos to the GUI).


Update

Using Video.from_filename when restructuring leads to some strange behavior in Labels.copy. Namely, what if we want to load in a labels object even if the video filename cannot be found - just to access other attributes in the Labels object? Unless we handle this loading using a Labels.make_video_search first, then we cannot load the Labels without being able to locate the video file. So, back to the drawing board: see #1330.


Past attempted fix: #1267

Test failure
# Test reset does not break deserialization of older slp
        labels: Labels = Labels.load_file(
>           TEST_SLP_SIV_ROBOT, video_search="tests/data/videos/"
        )

tests/io/test_video.py:552: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
sleap/io/dataset.py:1966: in load_file
    filename, for_object="labels", video_search=video_search, *args, **kwargs
sleap/io/format/main.py:113: in read
    return disp.read(filename, *args, **kwargs)
sleap/io/format/dispatch.py:56: in read
    return adaptor.read(file, *args, **kwargs)
sleap/io/format/hdf5.py:139: in read
    labels = cls.read_headers(file, video_search, match_to)
sleap/io/format/hdf5.py:124: in read_headers
    labels = labels_json.LabelsJsonAdaptor.from_json_data(dicts, match_to=match_to)
sleap/io/format/labels_json.py:386: in from_json_data
    videos = Video.cattr().structure(dicts["videos"], List[Video])
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:201: in structure
    return self._structure_func.dispatch(cl)(obj, cl)
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:336: in _structure_list
    for e in obj
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:336: in <listcomp>
    for e in obj
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:[323](https://github.com/talmolab/sleap/actions/runs/4659970232/jobs/8247475372#step:8:324): in structure_attrs_fromdict
    dispatch(type_)(val, type_) if type_ is not None else val
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:[407](https://github.com/talmolab/sleap/actions/runs/4659970232/jobs/8247475372#step:8:408): in _structure_union
    return self._structure_func.dispatch(cl)(obj, cl)
sleap/io/video.py:1541: in fixup_video
    return Video.make_specific_backend(cl, x)
sleap/io/video.py:1523: in make_specific_backend
    return backend_class(**attribute_kwargs)
<attrs generated init sleap.io.video.NumpyVideo>:3: in __init__
    self.__attrs_post_init__()
sleap/io/video.py:538: in __attrs_post_init__
    self.__data = np.load(self.filename)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

file = 'tests/data/videos/robot0.jpg', mmap_mode = None, allow_pickle = False
fix_imports = True, encoding = 'ASCII'

    @set_module('numpy')
    def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True,
             encoding='ASCII'):
        """
        Load arrays or pickled objects from ``.npy``, ``.npz`` or pickled files.
    
        .. warning:: Loading files that contain object arrays uses the ``pickle``
                     module, which is not secure against erroneous or maliciously
                     constructed data. Consider passing ``allow_pickle=False`` to
                     load data that is known not to contain object arrays for the
                     safer handling of untrusted sources.
    
        Parameters
        ----------
        file : file-like object, string, or pathlib.Path
            The file to read. File-like objects must support the
            ``seek()`` and ``read()`` methods. Pickled files require that the
            file-like object support the ``readline()`` method as well.
        mmap_mode : {None, 'r+', 'r', 'w+', 'c'}, optional
            If not None, then memory-map the file, using the given mode (see
            `numpy.memmap` for a detailed description of the modes).  A
            memory-mapped array is kept on disk. However, it can be accessed
            and sliced like any ndarray.  Memory mapping is especially useful
            for accessing small fragments of large files without reading the
            entire file into memory.
        allow_pickle : bool, optional
            Allow loading pickled object arrays stored in npy files. Reasons for
            disallowing pickles include security, as loading pickled data can
            execute arbitrary code. If pickles are disallowed, loading object
            arrays will fail. Default: False
    
            .. versionchanged:: 1.16.3
                Made default False in response to CVE-2019-6[446](https://github.com/talmolab/sleap/actions/runs/4659970232/jobs/8247475372#step:8:447).
    
        fix_imports : bool, optional
            Only useful when loading Python 2 generated pickled files on Python 3,
            which includes npy/npz files containing object arrays. If `fix_imports`
            is True, pickle will try to map the old Python 2 names to the new names
            used in Python 3.
        encoding : str, optional
            What encoding to use when reading Python 2 strings. Only useful when
            loading Python 2 generated pickled files in Python 3, which includes
            npy/npz files containing object arrays. Values other than 'latin1',
            'ASCII', and 'bytes' are not allowed, as they can corrupt numerical
            data. Default: 'ASCII'
    
        Returns
        -------
        result : array, tuple, dict, etc.
            Data stored in the file. For ``.npz`` files, the returned instance
            of NpzFile class must be closed to avoid leaking file descriptors.
    
        Raises
        ------
        IOError
            If the input file does not exist or cannot be read.
        ValueError
            The file contains an object array, but allow_pickle=False given.
    
        See Also
        --------
        save, savez, savez_compressed, loadtxt
        memmap : Create a memory-map to an array stored in a file on disk.
        lib.format.open_memmap : Create or load a memory-mapped ``.npy`` file.
    
        Notes
        -----
        - If the file contains pickle data, then whatever object is stored
          in the pickle is returned.
        - If the file is a ``.npy`` file, then a single array is returned.
        - If the file is a ``.npz`` file, then a dictionary-like object is
          returned, containing ``{filename: array}`` key-value pairs, one for
          each file in the archive.
        - If the file is a ``.npz`` file, the returned value supports the
          context manager protocol in a similar fashion to the open function::
    
            with load('foo.npz') as data:
                a = data['a']
    
          The underlying file descriptor is closed when exiting the 'with'
          block.
    
        Examples
        --------
        Store data to disk, and load it again:
    
        >>> np.save('/tmp/123', np.array([[1, 2, 3], [4, 5, 6]]))
        >>> np.load('/tmp/123.npy')
        array([[1, 2, 3],
               [4, 5, 6]])
    
        Store compressed data to disk, and load it again:
    
        >>> a=np.array([[1, 2, 3], [4, 5, 6]])
        >>> b=np.array([1, 2])
        >>> np.savez('/tmp/123.npz', a=a, b=b)
        >>> data = np.load('/tmp/123.npz')
        >>> data['a']
        array([[1, 2, 3],
               [4, 5, 6]])
        >>> data['b']
        array([1, 2])
        >>> data.close()
    
        Mem-map the stored array, and then access the second row
        directly from disk:
    
        >>> X = np.load('/tmp/123.npy', mmap_mode='r')
        >>> X[1, :]
        memmap([4, 5, 6])
    
        """
        if encoding not in ('ASCII', 'latin1', 'bytes'):
            # The 'encoding' value for pickle also affects what encoding
            # the serialized binary data of NumPy arrays is loaded
            # in. Pickle does not pass on the encoding information to
            # NumPy. The unpickling code in numpy.core.multiarray is
            # written to assume that unicode data appearing where binary
            # should be is in 'latin1'. 'bytes' is also safe, as is 'ASCII'.
            #
            # Other encoding values can corrupt binary data, and we
            # purposefully disallow them. For the same reason, the errors=
            # argument is not exposed, as values other than 'strict'
            # result can similarly silently corrupt numerical data.
            raise ValueError("encoding must be 'ASCII', 'latin1', or 'bytes'")
    
        pickle_kwargs = dict(encoding=encoding, fix_imports=fix_imports)
    
        with contextlib.ExitStack() as stack:
            if hasattr(file, 'read'):
                fid = file
                own_fid = False
            else:
                fid = stack.enter_context(open(os_fspath(file), "rb"))
                own_fid = True
    
            # Code to distinguish from NumPy binary files and pickles.
            _ZIP_PREFIX = b'PK\x03\x04'
            _ZIP_SUFFIX = b'PK\x05\x06' # empty zip files start with this
            N = len(format.MAGIC_PREFIX)
            magic = fid.read(N)
            # If the file size is less than N, we need to make sure not
            # to seek past the beginning of the file
            fid.seek(-min(N, len(magic)), 1)  # back-up
            if magic.startswith(_ZIP_PREFIX) or magic.startswith(_ZIP_SUFFIX):
                # zip-file (assume .npz)
                # Potentially transfer file ownership to NpzFile
                stack.pop_all()
                ret = NpzFile(fid, own_fid=own_fid, allow_pickle=allow_pickle,
                              pickle_kwargs=pickle_kwargs)
                return ret
            elif magic == format.MAGIC_PREFIX:
                # .npy file
                if mmap_mode:
                    return format.open_memmap(file, mode=mmap_mode)
                else:
                    return format.read_array(fid, allow_pickle=allow_pickle,
                                             pickle_kwargs=pickle_kwargs)
            else:
                # Try a pickle
                if not allow_pickle:
>                   raise ValueError("Cannot load file containing pickled data "
                                     "when allow_pickle=False")
E                   ValueError: Cannot load file containing pickled data when allow_pickle=False

Types of changes

  • Bugfix
  • New feature
  • Refactor / Code style update (no logical changes)
  • Build / CI changes
  • Documentation Update
  • Other (explain)

Does this address any currently open issues?

Outside contributors checklist

  • Review the guidelines for contributing to this repository
  • Read and sign the CLA and add yourself to the authors list
  • Make sure you are making a pull request against the develop branch (not main). Also you should start your branch off develop
  • Add tests that prove your fix is effective or that your feature works
  • Add necessary documentation (if appropriate)

Thank you for contributing to SLEAP!

❤️

Comment on lines +1538 to +1548
filename_key = "could not find filename key"
backend = x["backend"]
if "filename" in backend:
backend["filename"] = Video.fixup_path(backend["filename"])
filename_key = "filename"
if "file" in backend:
backend["file"] = Video.fixup_path(backend["file"])
filename_key = "file"

filename = backend.pop(filename_key)
return Video.from_filename(filename, **backend)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we expect any other filename keys?

@roomrys roomrys mentioned this pull request May 27, 2023
11 tasks
@roomrys roomrys closed this May 27, 2023
@roomrys roomrys deleted the liezl/reroute-video-cattr-to-from_filename branch June 2, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant