Fix intermittent bug (file-corruption) that has snuck into the tests #1267

roomrys · 2023-04-10T20:47:53Z

Description

This PR fixes an intermittent bug that has snuck into our tests. It seems that the TEST_SLP_SIV_ROBOT = "tests\data\siv_format_v1\robot_siv.slp" has become corrupt per this post. Of course, my fault, due to a strange test set-up I had on my laptop with conflicting paths. I grabbed an older version of the robot_siv.slp file and replaced the current file with this older one.

Test failure

# Test reset does not break deserialization of older slp
        labels: Labels = Labels.load_file(
>           TEST_SLP_SIV_ROBOT, video_search="tests/data/videos/"
        )

tests/io/test_video.py:552: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
sleap/io/dataset.py:1966: in load_file
    filename, for_object="labels", video_search=video_search, *args, **kwargs
sleap/io/format/main.py:113: in read
    return disp.read(filename, *args, **kwargs)
sleap/io/format/dispatch.py:56: in read
    return adaptor.read(file, *args, **kwargs)
sleap/io/format/hdf5.py:139: in read
    labels = cls.read_headers(file, video_search, match_to)
sleap/io/format/hdf5.py:124: in read_headers
    labels = labels_json.LabelsJsonAdaptor.from_json_data(dicts, match_to=match_to)
sleap/io/format/labels_json.py:386: in from_json_data
    videos = Video.cattr().structure(dicts["videos"], List[Video])
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:201: in structure
    return self._structure_func.dispatch(cl)(obj, cl)
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:336: in _structure_list
    for e in obj
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:336: in <listcomp>
    for e in obj
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:[323](https://github.com/talmolab/sleap/actions/runs/4659970232/jobs/8247475372#step:8:324): in structure_attrs_fromdict
    dispatch(type_)(val, type_) if type_ is not None else val
/usr/share/miniconda/envs/sleap_ci/lib/python3.7/site-packages/cattr/converters.py:[407](https://github.com/talmolab/sleap/actions/runs/4659970232/jobs/8247475372#step:8:408): in _structure_union
    return self._structure_func.dispatch(cl)(obj, cl)
sleap/io/video.py:1541: in fixup_video
    return Video.make_specific_backend(cl, x)
sleap/io/video.py:1523: in make_specific_backend
    return backend_class(**attribute_kwargs)
<attrs generated init sleap.io.video.NumpyVideo>:3: in __init__
    self.__attrs_post_init__()
sleap/io/video.py:538: in __attrs_post_init__
    self.__data = np.load(self.filename)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

file = 'tests/data/videos/robot0.jpg', mmap_mode = None, allow_pickle = False
fix_imports = True, encoding = 'ASCII'

    @set_module('numpy')
    def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True,
             encoding='ASCII'):
        """
        Load arrays or pickled objects from ``.npy``, ``.npz`` or pickled files.
    
        .. warning:: Loading files that contain object arrays uses the ``pickle``
                     module, which is not secure against erroneous or maliciously
                     constructed data. Consider passing ``allow_pickle=False`` to
                     load data that is known not to contain object arrays for the
                     safer handling of untrusted sources.
    
        Parameters
        ----------
        file : file-like object, string, or pathlib.Path
            The file to read. File-like objects must support the
            ``seek()`` and ``read()`` methods. Pickled files require that the
            file-like object support the ``readline()`` method as well.
        mmap_mode : {None, 'r+', 'r', 'w+', 'c'}, optional
            If not None, then memory-map the file, using the given mode (see
            `numpy.memmap` for a detailed description of the modes).  A
            memory-mapped array is kept on disk. However, it can be accessed
            and sliced like any ndarray.  Memory mapping is especially useful
            for accessing small fragments of large files without reading the
            entire file into memory.
        allow_pickle : bool, optional
            Allow loading pickled object arrays stored in npy files. Reasons for
            disallowing pickles include security, as loading pickled data can
            execute arbitrary code. If pickles are disallowed, loading object
            arrays will fail. Default: False
    
            .. versionchanged:: 1.16.3
                Made default False in response to CVE-2019-6[446](https://github.com/talmolab/sleap/actions/runs/4659970232/jobs/8247475372#step:8:447).
    
        fix_imports : bool, optional
            Only useful when loading Python 2 generated pickled files on Python 3,
            which includes npy/npz files containing object arrays. If `fix_imports`
            is True, pickle will try to map the old Python 2 names to the new names
            used in Python 3.
        encoding : str, optional
            What encoding to use when reading Python 2 strings. Only useful when
            loading Python 2 generated pickled files in Python 3, which includes
            npy/npz files containing object arrays. Values other than 'latin1',
            'ASCII', and 'bytes' are not allowed, as they can corrupt numerical
            data. Default: 'ASCII'
    
        Returns
        -------
        result : array, tuple, dict, etc.
            Data stored in the file. For ``.npz`` files, the returned instance
            of NpzFile class must be closed to avoid leaking file descriptors.
    
        Raises
        ------
        IOError
            If the input file does not exist or cannot be read.
        ValueError
            The file contains an object array, but allow_pickle=False given.
    
        See Also
        --------
        save, savez, savez_compressed, loadtxt
        memmap : Create a memory-map to an array stored in a file on disk.
        lib.format.open_memmap : Create or load a memory-mapped ``.npy`` file.
    
        Notes
        -----
        - If the file contains pickle data, then whatever object is stored
          in the pickle is returned.
        - If the file is a ``.npy`` file, then a single array is returned.
        - If the file is a ``.npz`` file, then a dictionary-like object is
          returned, containing ``{filename: array}`` key-value pairs, one for
          each file in the archive.
        - If the file is a ``.npz`` file, the returned value supports the
          context manager protocol in a similar fashion to the open function::
    
            with load('foo.npz') as data:
                a = data['a']
    
          The underlying file descriptor is closed when exiting the 'with'
          block.
    
        Examples
        --------
        Store data to disk, and load it again:
    
        >>> np.save('/tmp/123', np.array([[1, 2, 3], [4, 5, 6]]))
        >>> np.load('/tmp/123.npy')
        array([[1, 2, 3],
               [4, 5, 6]])
    
        Store compressed data to disk, and load it again:
    
        >>> a=np.array([[1, 2, 3], [4, 5, 6]])
        >>> b=np.array([1, 2])
        >>> np.savez('/tmp/123.npz', a=a, b=b)
        >>> data = np.load('/tmp/123.npz')
        >>> data['a']
        array([[1, 2, 3],
               [4, 5, 6]])
        >>> data['b']
        array([1, 2])
        >>> data.close()
    
        Mem-map the stored array, and then access the second row
        directly from disk:
    
        >>> X = np.load('/tmp/123.npy', mmap_mode='r')
        >>> X[1, :]
        memmap([4, 5, 6])
    
        """
        if encoding not in ('ASCII', 'latin1', 'bytes'):
            # The 'encoding' value for pickle also affects what encoding
            # the serialized binary data of NumPy arrays is loaded
            # in. Pickle does not pass on the encoding information to
            # NumPy. The unpickling code in numpy.core.multiarray is
            # written to assume that unicode data appearing where binary
            # should be is in 'latin1'. 'bytes' is also safe, as is 'ASCII'.
            #
            # Other encoding values can corrupt binary data, and we
            # purposefully disallow them. For the same reason, the errors=
            # argument is not exposed, as values other than 'strict'
            # result can similarly silently corrupt numerical data.
            raise ValueError("encoding must be 'ASCII', 'latin1', or 'bytes'")
    
        pickle_kwargs = dict(encoding=encoding, fix_imports=fix_imports)
    
        with contextlib.ExitStack() as stack:
            if hasattr(file, 'read'):
                fid = file
                own_fid = False
            else:
                fid = stack.enter_context(open(os_fspath(file), "rb"))
                own_fid = True
    
            # Code to distinguish from NumPy binary files and pickles.
            _ZIP_PREFIX = b'PK\x03\x04'
            _ZIP_SUFFIX = b'PK\x05\x06' # empty zip files start with this
            N = len(format.MAGIC_PREFIX)
            magic = fid.read(N)
            # If the file size is less than N, we need to make sure not
            # to seek past the beginning of the file
            fid.seek(-min(N, len(magic)), 1)  # back-up
            if magic.startswith(_ZIP_PREFIX) or magic.startswith(_ZIP_SUFFIX):
                # zip-file (assume .npz)
                # Potentially transfer file ownership to NpzFile
                stack.pop_all()
                ret = NpzFile(fid, own_fid=own_fid, allow_pickle=allow_pickle,
                              pickle_kwargs=pickle_kwargs)
                return ret
            elif magic == format.MAGIC_PREFIX:
                # .npy file
                if mmap_mode:
                    return format.open_memmap(file, mode=mmap_mode)
                else:
                    return format.read_array(fid, allow_pickle=allow_pickle,
                                             pickle_kwargs=pickle_kwargs)
            else:
                # Try a pickle
                if not allow_pickle:
>                   raise ValueError("Cannot load file containing pickled data "
                                     "when allow_pickle=False")
E                   ValueError: Cannot load file containing pickled data when allow_pickle=False

### Types of changes

Does this address any currently open issues?

[list open issues here]

Outside contributors checklist

Review the guidelines for contributing to this repository
Read and sign the CLA and add yourself to the authors list
Make sure you are making a pull request against the develop branch (not main). Also you should start your branch off develop
Add tests that prove your fix is effective or that your feature works
Add necessary documentation (if appropriate)

Thank you for contributing to SLEAP!

❤️

codecov · 2023-04-10T21:40:00Z

Codecov Report

Merging #1267 (4c46527) into develop (4cd8c0d) will increase coverage by 0.15%.
The diff coverage is 94.38%.

@@             Coverage Diff             @@
##           develop    #1267      +/-   ##
===========================================
+ Coverage    72.28%   72.44%   +0.15%     
===========================================
  Files          132      133       +1     
  Lines        23558    23662     +104     
===========================================
+ Hits         17030    17143     +113     
+ Misses        6528     6519       -9

Impacted Files	Coverage Δ
sleap/gui/commands.py	`60.63% <0.00%> (ø)`
sleap/gui/widgets/docks.py	`94.31% <94.31%> (ø)`
sleap/gui/app.py	`74.80% <100.00%> (-4.15%)`	⬇️

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

* Disable data caching by default for SingleImageVideos (#1243) * Disable data caching by default for SingleImageVideos * Remove a couple things that shouldn't be in this PR :) * Centralize video extensions (#1244) * Add video extension support lists to io.video module * Use centralized extension definitions * Remove some unused code * Add some coverage for indirect coverage reduction * Fix single frame GUI increment (#1254) * Fix frame increment for single frame videos * Add test and lint * Add video search to "robustify" (excessively) fragile test (#1262) Add video search to fix fragile test * Organize docks (#1265) * Add `DockWidget` class and subclasses * Create docks in `MainWindow` using `DockWidget` classes * Remove unused imports * Fix references in existing tests * Fix intermittent bug (file-corruption) that has snuck into the tests (#1267) * Remove corrupted test file * Add non-corrupted copy of test file * Rename the file so tests run * Layout docks in a tab configuration (instead of stacked) (#1289) * Add tab with to docks (and use `self.model` in table) * Use `self.model_type` in `create_models` * Increase GUI crop size range from 512 to 832 (#1295) * Fix conversion to numpy array when last frame(s) do not have labels (#1307) * Ensure frames to predict list is unique (#1293) * Ensure frames to predict list is unique * Ensure frames to predict on are ordered correctly * Better frame sorting * Fix GUI resume training (#1314) * Do not choose `top_k` instances if `max_instances` < num centroids (#1313) * Use max_instances as a max without any minimum requirement * Create test (that fails) * Fix test by re-init inference model each call * Do not compute top k if max is greater * Add `--max_instances` to `sleap-track` and GUI (#1305) * Expose --max_instances to sleap-track and in GUI * Lint * Change shorthand to `-n` for both `sleap-export` and `sleap-track` * Add test for creating predictor from cli * Add max instances support to bottom up model (#1306) * Add max instances support to bottom up model * Remove unnecessary attribute setter in CLI parser * Edge case * Expose max_instances for BU to train/infer GUI --------- Co-authored-by: roomrys <[email protected]> * Update docs for `--max_instances` command * Add test for BU `--max_instances` --------- Co-authored-by: Talmo Pereira <[email protected]> * Remove `--labels` and redundant `data_path` (#1326) * Create copy of config info to modify (gui) (#1325) * Fixes GPU memory polling using environment variable filtering (#1272) * add cuda visible gpus to nvidia smi CLI * tests masked none, single and multiple GPUs * deal with comma separated lists and test 2 gpus * clean up system script and better test that actually works * add a case for cuda_visible_devices = [] * add test for gpu order and length in nvidia and tf * test nvidia smi indices and checking gpu memory * fix spaces * fix linting with Black * put skips in tests for git not having nvidia-smi * add doc strings to get_gpu_memory * set CUDA device order to PCI BUS ID * test with no nvidia smi + invalid visible devices * fixed linting again * remove pci env variable, move to other issue --------- Co-authored-by: Eric Leonardis <[email protected]> * Set `split_by_inds`, `test_labels`, and `validation_labels` to default (GUI) (#1331) * Allow returning PAF graph during low level inference (#1329) * allow returning the paf graph during low level inference * reformatted with black * Fix `SingleImageVideo` caching (#1330) * Set `SingleImageVideo.caching` as a class attribute * Modify tests for `SingleImageVideo.caching` * Add caching as a preference and menu checkbox * Test `SingleImageVideo.toggle_caching` * Remove GUI elements for `SingleImageVideo.CACHING` * Remove remaining prefs for `SingleImageVideo.CACHING` * Add depreciated comment * Clean-up * Update comments * Bump to 1.3.1 (#1335) * Bump to 1.3.1 * Test build to dev label * Update environment creation (#1366) * First 'working' env for 1.3.1, no GPU... * First working 1.3.1 w/gpu * Sorry, this is the first gpu working env * Fix `imgstore` version-dependent bug * Build and entry points work, but pip packages not installed * Add default channels in condarc.yaml * Rename environment.yaml to .yml * Working build no gpu (documented) * Env working w/gpu using tensorflow from pypi * Working build with GPU! And pip dep * Attempt to fix cannot find wheel ~= 0.35 issue * Run constrain everything * Use minimal conda req for install_requires, inc build number to get it working * Ubuntu build and environments working * Env creation working on M2 * Build and env working on windows (again) * Apple silicon (M2) build working * Pip package working (M2) * Exclude tests and docs from pypi wheel * Add comments to requirements * Get ready for manual build * Update os images, trigger dev build * Retry manual build win and mac * Retry mac manual build (to upload) * Require latest pynwb, remove setuptools * Remove old mac environment * Add twine as a build dependency * Add comments to manual build * Update build.yml * Rename "apple_silicon" to "mac" * Update installation docs (#1365) * Update installation docs * Trigger website build * Update quick install and build docs * Add Mambaforge links * Remove comment about experimental apple silicon support Co-authored-by: Talmo Pereira <[email protected]> * Fix minor typo * Change installation docs to reference universal mac env --------- Co-authored-by: Talmo Pereira <[email protected]> * More flexible with python version ~=3.7 * Pin py3.7.12 for host and run (good on win) * Add comments for why python ~=3.7 * Update ci workflow (#1371) * Update CI workflow to include mac * Trigger CI workflow * Remove verbose? python version * Migrate to micromamba * Explicitly state environment name * Remove environment chaching (for now) * Use different environment file for mac * Use correct syntax * Add `env_hash` and `bash -l {0}` * Remove env_file * Try different nested variable call via format * Use correct syntax * Update env file reference AS -> mac * Different path to environment file * Checkout repo * Use default shells * Remove unused comments * Fix caching attempt by changing caching key * Remove environment caching * Increase python in no_cuda to 3.7.17 * Less restrictive with python ~3.7 version * More direct installation docs 🤞 * Increase build numbers for develop branch build --------- Co-authored-by: Talmo Pereira <[email protected]> --------- Co-authored-by: Talmo Pereira <[email protected]> Co-authored-by: Eric Leonardis <[email protected]> Co-authored-by: Eric Leonardis <[email protected]> Co-authored-by: Caleb Weinreb <[email protected]>

roomrys added 3 commits April 10, 2023 13:38

Remove corrupted test file

5a25831

Add non-corrupted copy of test file

5866956

Rename the file so tests run

4c46527

roomrys changed the base branch from develop to liezl/add-camera-group April 10, 2023 22:58

roomrys changed the base branch from liezl/add-camera-group to develop April 11, 2023 19:02

roomrys marked this pull request as ready for review April 11, 2023 19:03

roomrys merged commit b5aab0e into develop Apr 11, 2023

roomrys deleted the liezl/replace-corrupted-test-file branch April 12, 2023 19:50

This was referenced May 25, 2023

Reroute Video cattr to Video.from_filename #1327

Closed

Fix SingleImageVideo caching #1330

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix intermittent bug (file-corruption) that has snuck into the tests #1267

Fix intermittent bug (file-corruption) that has snuck into the tests #1267

roomrys commented Apr 10, 2023

codecov bot commented Apr 10, 2023

Fix intermittent bug (file-corruption) that has snuck into the tests #1267

Fix intermittent bug (file-corruption) that has snuck into the tests #1267

Conversation

roomrys commented Apr 10, 2023

Description

Does this address any currently open issues?

Outside contributors checklist

Thank you for contributing to SLEAP!

codecov bot commented Apr 10, 2023

Codecov Report