Reroute `Video` cattr to `Video.from_filename` #1327

roomrys · 2023-05-25T17:48:12Z

Description

Lately, we have been experiencing an intermittent failure revolving around restructuring Videos. Namely, we have seen that SingleImageVideos will occasionally be routed to the NumpyVideo backend when restructuring via cattr. This PR reroutes the cattr Video restructuring to Video.from_filename (which is the default method when adding Videos to the GUI).

Update

Using Video.from_filename when restructuring leads to some strange behavior in Labels.copy. Namely, what if we want to load in a labels object even if the video filename cannot be found - just to access other attributes in the Labels object? Unless we handle this loading using a Labels.make_video_search first, then we cannot load the Labels without being able to locate the video file. So, back to the drawing board: see #1330.

Past attempted fix: #1267

Test failure

# Test reset does not break deserialization of older slp
        labels: Labels = Labels.load_file(
>           TEST_SLP_SIV_ROBOT, video_search="tests/data/videos/"
        )

tests/io/test_video.py:552: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
sleap/io/dataset.py:1966: in load_file
    filename, for_object="labels", video_search=video_search, *args, **kwargs
sleap/io/format/main.py:113: in read
    return disp.read(filename, *args, **kwargs)
sleap/io/format/dispatch.py:56: in read
    return adaptor.read(file, *args, **kwargs)
sleap/io/format/hdf5.py:139: in read
    labels = cls.read_headers(file, video_search, match_to)
sleap/io/format/hdf5.py:124: in read_headers
    labels = labels_json.LabelsJsonAdaptor.from_json_data(dicts, match_to=match_to)
sleap/io/format/labels_json.py:386: in from_json_data
    videos = Video.cattr().structure(dicts["videos"], List[Video])
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:201: in structure
    return self._structure_func.dispatch(cl)(obj, cl)
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:336: in _structure_list
    for e in obj
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:336: in <listcomp>
    for e in obj
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:[323](https://github.com/talmolab/sleap/actions/runs/4659970232/jobs/8247475372#step:8:324): in structure_attrs_fromdict
    dispatch(type_)(val, type_) if type_ is not None else val
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:[407](https://github.com/talmolab/sleap/actions/runs/4659970232/jobs/8247475372#step:8:408): in _structure_union
    return self._structure_func.dispatch(cl)(obj, cl)
sleap/io/video.py:1541: in fixup_video
    return Video.make_specific_backend(cl, x)
sleap/io/video.py:1523: in make_specific_backend
    return backend_class(**attribute_kwargs)
<attrs generated init sleap.io.video.NumpyVideo>:3: in __init__
    self.__attrs_post_init__()
sleap/io/video.py:538: in __attrs_post_init__
    self.__data = np.load(self.filename)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

file = 'tests/data/videos/robot0.jpg', mmap_mode = None, allow_pickle = False
fix_imports = True, encoding = 'ASCII'

    @set_module('numpy')
    def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True,
             encoding='ASCII'):
        """
        Load arrays or pickled objects from ``.npy``, ``.npz`` or pickled files.
    
        .. warning:: Loading files that contain object arrays uses the ``pickle``
                     module, which is not secure against erroneous or maliciously
                     constructed data. Consider passing ``allow_pickle=False`` to
                     load data that is known not to contain object arrays for the
                     safer handling of untrusted sources.
    
        Parameters
        ----------
        file : file-like object, string, or pathlib.Path
            The file to read. File-like objects must support the
            ``seek()`` and ``read()`` methods. Pickled files require that the
            file-like object support the ``readline()`` method as well.
        mmap_mode : {None, 'r+', 'r', 'w+', 'c'}, optional
            If not None, then memory-map the file, using the given mode (see
            `numpy.memmap` for a detailed description of the modes).  A
            memory-mapped array is kept on disk. However, it can be accessed
            and sliced like any ndarray.  Memory mapping is especially useful
            for accessing small fragments of large files without reading the
            entire file into memory.
        allow_pickle : bool, optional
            Allow loading pickled object arrays stored in npy files. Reasons for
            disallowing pickles include security, as loading pickled data can
            execute arbitrary code. If pickles are disallowed, loading object
            arrays will fail. Default: False
    
            .. versionchanged:: 1.16.3
                Made default False in response to CVE-2019-6[446](https://github.com/talmolab/sleap/actions/runs/4659970232/jobs/8247475372#step:8:447).
    
        fix_imports : bool, optional
            Only useful when loading Python 2 generated pickled files on Python 3,
            which includes npy/npz files containing object arrays. If `fix_imports`
            is True, pickle will try to map the old Python 2 names to the new names
            used in Python 3.
        encoding : str, optional
            What encoding to use when reading Python 2 strings. Only useful when
            loading Python 2 generated pickled files in Python 3, which includes
            npy/npz files containing object arrays. Values other than 'latin1',
            'ASCII', and 'bytes' are not allowed, as they can corrupt numerical
            data. Default: 'ASCII'
    
        Returns
        -------
        result : array, tuple, dict, etc.
            Data stored in the file. For ``.npz`` files, the returned instance
            of NpzFile class must be closed to avoid leaking file descriptors.
    
        Raises
        ------
        IOError
            If the input file does not exist or cannot be read.
        ValueError
            The file contains an object array, but allow_pickle=False given.
    
        See Also
        --------
        save, savez, savez_compressed, loadtxt
        memmap : Create a memory-map to an array stored in a file on disk.
        lib.format.open_memmap : Create or load a memory-mapped ``.npy`` file.
    
        Notes
        -----
        - If the file contains pickle data, then whatever object is stored
          in the pickle is returned.
        - If the file is a ``.npy`` file, then a single array is returned.
        - If the file is a ``.npz`` file, then a dictionary-like object is
          returned, containing ``{filename: array}`` key-value pairs, one for
          each file in the archive.
        - If the file is a ``.npz`` file, the returned value supports the
          context manager protocol in a similar fashion to the open function::
    
            with load('foo.npz') as data:
                a = data['a']
    
          The underlying file descriptor is closed when exiting the 'with'
          block.
    
        Examples
        --------
        Store data to disk, and load it again:
    
        >>> np.save('/tmp/123', np.array([[1, 2, 3], [4, 5, 6]]))
        >>> np.load('/tmp/123.npy')
        array([[1, 2, 3],
               [4, 5, 6]])
    
        Store compressed data to disk, and load it again:
    
        >>> a=np.array([[1, 2, 3], [4, 5, 6]])
        >>> b=np.array([1, 2])
        >>> np.savez('/tmp/123.npz', a=a, b=b)
        >>> data = np.load('/tmp/123.npz')
        >>> data['a']
        array([[1, 2, 3],
               [4, 5, 6]])
        >>> data['b']
        array([1, 2])
        >>> data.close()
    
        Mem-map the stored array, and then access the second row
        directly from disk:
    
        >>> X = np.load('/tmp/123.npy', mmap_mode='r')
        >>> X[1, :]
        memmap([4, 5, 6])
    
        """
        if encoding not in ('ASCII', 'latin1', 'bytes'):
            # The 'encoding' value for pickle also affects what encoding
            # the serialized binary data of NumPy arrays is loaded
            # in. Pickle does not pass on the encoding information to
            # NumPy. The unpickling code in numpy.core.multiarray is
            # written to assume that unicode data appearing where binary
            # should be is in 'latin1'. 'bytes' is also safe, as is 'ASCII'.
            #
            # Other encoding values can corrupt binary data, and we
            # purposefully disallow them. For the same reason, the errors=
            # argument is not exposed, as values other than 'strict'
            # result can similarly silently corrupt numerical data.
            raise ValueError("encoding must be 'ASCII', 'latin1', or 'bytes'")
    
        pickle_kwargs = dict(encoding=encoding, fix_imports=fix_imports)
    
        with contextlib.ExitStack() as stack:
            if hasattr(file, 'read'):
                fid = file
                own_fid = False
            else:
                fid = stack.enter_context(open(os_fspath(file), "rb"))
                own_fid = True
    
            # Code to distinguish from NumPy binary files and pickles.
            _ZIP_PREFIX = b'PK\x03\x04'
            _ZIP_SUFFIX = b'PK\x05\x06' # empty zip files start with this
            N = len(format.MAGIC_PREFIX)
            magic = fid.read(N)
            # If the file size is less than N, we need to make sure not
            # to seek past the beginning of the file
            fid.seek(-min(N, len(magic)), 1)  # back-up
            if magic.startswith(_ZIP_PREFIX) or magic.startswith(_ZIP_SUFFIX):
                # zip-file (assume .npz)
                # Potentially transfer file ownership to NpzFile
                stack.pop_all()
                ret = NpzFile(fid, own_fid=own_fid, allow_pickle=allow_pickle,
                              pickle_kwargs=pickle_kwargs)
                return ret
            elif magic == format.MAGIC_PREFIX:
                # .npy file
                if mmap_mode:
                    return format.open_memmap(file, mode=mmap_mode)
                else:
                    return format.read_array(fid, allow_pickle=allow_pickle,
                                             pickle_kwargs=pickle_kwargs)
            else:
                # Try a pickle
                if not allow_pickle:
>                   raise ValueError("Cannot load file containing pickled data "
                                     "when allow_pickle=False")
E                   ValueError: Cannot load file containing pickled data when allow_pickle=False

Types of changes

Does this address any currently open issues?

metrics.npz file to be generated after training a SLEAP model #1286

Outside contributors checklist

Review the guidelines for contributing to this repository
Read and sign the CLA and add yourself to the authors list
Make sure you are making a pull request against the develop branch (not main). Also you should start your branch off develop
Add tests that prove your fix is effective or that your feature works
Add necessary documentation (if appropriate)

Thank you for contributing to SLEAP!

❤️

roomrys · 2023-05-25T17:48:43Z

sleap/io/video.py

+            filename_key = "could not find filename key"
+            backend = x["backend"]
+            if "filename" in backend:
+                backend["filename"] = Video.fixup_path(backend["filename"])
+                filename_key = "filename"
+            if "file" in backend:
+                backend["file"] = Video.fixup_path(backend["file"])
+                filename_key = "file"
+
+            filename = backend.pop(filename_key)
+            return Video.from_filename(filename, **backend)


Would we expect any other filename keys?

Reroute Video cattr to Video.from_filename

6a4c65b

roomrys commented May 25, 2023

View reviewed changes

roomrys mentioned this pull request May 27, 2023

Fix SingleImageVideo caching #1330

Merged

11 tasks

roomrys closed this May 27, 2023

roomrys deleted the liezl/reroute-video-cattr-to-from_filename branch June 2, 2023 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reroute `Video` cattr to `Video.from_filename` #1327

Reroute `Video` cattr to `Video.from_filename` #1327

roomrys commented May 25, 2023 •

edited

Loading

roomrys May 25, 2023

Reroute Video cattr to Video.from_filename #1327

Reroute Video cattr to Video.from_filename #1327

Conversation

roomrys commented May 25, 2023 • edited Loading

Description

Update

Types of changes

Does this address any currently open issues?

Outside contributors checklist

Thank you for contributing to SLEAP!

roomrys May 25, 2023

Choose a reason for hiding this comment

Reroute `Video` cattr to `Video.from_filename` #1327

Reroute `Video` cattr to `Video.from_filename` #1327

roomrys commented May 25, 2023 •

edited

Loading