Add pcodec #501

rabernat · 2024-01-24T15:17:20Z

This PR adds an exciting new codec to numcodecs: pcodec.

From the point of view of the V3 codec spec pcodec is an array -> bytes. It takes a numpy array of dtype i4, i8, u4, u8, f4, f8 and turns it into encoded bytes (and reverses this for decoding). It seems like numcodecs hasn't quite formalized this way of categorizing codecs (e.g. there is no numcodecs.abc.ArrayToBytesCodec). I've done my best to apply appropriate testing.

TODO:

Unit tests and/or doctests in docstrings
Tests pass locally
Docstrings and API docs for any new/modified user-facing classes and functions
Changes documented in docs/release.rst
Docs build locally
GitHub Actions CI passes
Test coverage to 100% (Codecov passes)

pep8speaks · 2024-01-24T15:17:28Z

Hello @rabernat! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file numcodecs/__init__.py:

Line 119:1: E402 module level import not at top of file
Line 120:23: W292 no newline at end of file

Comment last updated at 2024-02-23 21:25:17 UTC

codecov · 2024-01-24T15:23:07Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.91%. Comparing base (0878717) to head (d637773).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #501   +/-   ##
=======================================
  Coverage   99.91%   99.91%           
=======================================
  Files          57       59    +2     
  Lines        2265     2323   +58     
=======================================
+ Hits         2263     2321   +58     
  Misses          2        2

Files	Coverage Δ
numcodecs/__init__.py	`100.00% <100.00%> (ø)`
numcodecs/pcodec.py	`100.00% <100.00%> (ø)`
numcodecs/tests/common.py	`100.00% <100.00%> (ø)`
numcodecs/tests/test_pcodec.py	`100.00% <100.00%> (ø)`

martindurant

That's a lot of test files!

numcodecs/pcodec.py

rabernat · 2024-01-24T15:34:15Z

That's a lot of test files!

The test files were auto generated. Do you think this is a problem?

martindurant · 2024-01-24T15:36:19Z

If you can generate them reliably at CI/test time even better; but they are not big, so I am not worried.

rabernat · 2024-01-24T15:38:18Z

If you can generate them reliably at CI/test time even better

My understanding is that the reason we store files in the repo is to ensure backwards compatibility of the codecs, i.e. to guard against changes to the implementation that would render existing data unreadable. That wouldn't work if you regenerate them each time.

martindurant · 2024-01-24T15:39:41Z

That wouldn't work if you regenerate them each time.

Right, it would need to be a totally independent and repeatable way to make them. If there is no such way, you store them.

rabernat · 2024-01-29T14:26:59Z

Would anyone care to review this PR?

martindurant · 2024-01-29T15:21:24Z

numcodecs/__init__.py

@@ -115,3 +115,7 @@

 from numcodecs.fletcher32 import Fletcher32
 register_codec(Fletcher32)
+
+with suppress(ImportError):


Isn't it better to make sure this import succeeds (import pcodec inside encode/decode, below), but presenting a helpful message whenever someone tries to use Pcodec without the necessary deps? Otherwise, the user would just see some "codec not found" message without any idea of what to do about it.

I agree with this comment. The approach used here was copied from other implementations, but I'm going to change it to what you suggested.

martindurant · 2024-01-29T15:22:49Z

numcodecs/pcodec.py

+        self.float_mult_spec = float_mult_spec
+        self.max_page_n = max_page_n
+
+    def encode(self, buf):


Suggested change

def encode(self, buf):

def encode(self, buf) -> bytes:

?
(This may be a good way to label array-bytes codecs; maybe also type for buf should be ndarray)

I'm not convinced that there is much value in adding these sorts of type hints if we are not actually running type checking on the library.

Is it not for other users of the library, and their IDEs?

But incorrect type hints are worse than none at all! For example, is ndarray really the correct type for buf? Maybe, but who knows? I could add it, but without running mypy, we'll never know for sure.

True that correctness is important, of course; but this is like the "light" version of array->array V array->bytes V bytes->bytes. Still useful. You can always get around mypy too, if you want.

rabernat · 2024-02-21T22:06:57Z

I am sad that the CI is now failing. I don't understand why. (Everything is good locally).

Going to try reverting the change to how we import.

* added PCodec * fix line length and print statements * docs * mock pcodec on rtd * fix typo * add dtype details * changed import style for pcodec * fix flake8 * revert import changes * fix errors due to changes in pcodec API * change import style * skip coverage of failed import path * skip pcodec tests if not installed

added PCodec

2bbfdaa

martindurant reviewed Jan 24, 2024

View reviewed changes

numcodecs/pcodec.py Outdated Show resolved Hide resolved

numcodecs/pcodec.py Outdated Show resolved Hide resolved

fix line length and print statements

6d2b662

rabernat added 4 commits January 24, 2024 20:42

docs

3eb20d1

mock pcodec on rtd

efb1227

fix typo

a1c8d5c

add dtype details

c9bfa6c

martindurant reviewed Jan 29, 2024

View reviewed changes

rabernat added 3 commits February 21, 2024 12:03

changed import style for pcodec

f999831

Merge remote-tracking branch 'upstream/main' into pcodec

1c44cf2

fix flake8

2650be8

rabernat added 5 commits February 21, 2024 17:09

revert import changes

e81004d

fix errors due to changes in pcodec API

eaab355

change import style

78a665e

skip coverage of failed import path

6bfd88f

skip pcodec tests if not installed

d637773

rabernat merged commit 4abe4be into zarr-developers:main Feb 24, 2024
27 checks passed

sanketverma1704 mentioned this pull request Jun 25, 2024

Add redirects for numcodecs and its codecs zarr-developers/zarr-developers.github.io#114

Merged

dstansby mentioned this pull request Aug 11, 2024

Update c-blosc to v1.26.1 #560

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pcodec #501

Add pcodec #501

rabernat commented Jan 24, 2024 •

edited

Loading

pep8speaks commented Jan 24, 2024 •

edited

Loading

codecov bot commented Jan 24, 2024 •

edited

Loading

martindurant left a comment

rabernat commented Jan 24, 2024

martindurant commented Jan 24, 2024

rabernat commented Jan 24, 2024

martindurant commented Jan 24, 2024

rabernat commented Jan 29, 2024

martindurant Jan 29, 2024

rabernat Feb 21, 2024

martindurant Jan 29, 2024

rabernat Feb 21, 2024

martindurant Feb 21, 2024

rabernat Feb 21, 2024

martindurant Feb 21, 2024

rabernat commented Feb 21, 2024

Add pcodec #501

Add pcodec #501

Conversation

rabernat commented Jan 24, 2024 • edited Loading

pep8speaks commented Jan 24, 2024 • edited Loading

Comment last updated at 2024-02-23 21:25:17 UTC

codecov bot commented Jan 24, 2024 • edited Loading

Codecov Report

martindurant left a comment

Choose a reason for hiding this comment

rabernat commented Jan 24, 2024

martindurant commented Jan 24, 2024

rabernat commented Jan 24, 2024

martindurant commented Jan 24, 2024

rabernat commented Jan 29, 2024

martindurant Jan 29, 2024

Choose a reason for hiding this comment

rabernat Feb 21, 2024

Choose a reason for hiding this comment

martindurant Jan 29, 2024

Choose a reason for hiding this comment

rabernat Feb 21, 2024

Choose a reason for hiding this comment

martindurant Feb 21, 2024

Choose a reason for hiding this comment

rabernat Feb 21, 2024

Choose a reason for hiding this comment

martindurant Feb 21, 2024

Choose a reason for hiding this comment

rabernat commented Feb 21, 2024

rabernat commented Jan 24, 2024 •

edited

Loading

pep8speaks commented Jan 24, 2024 •

edited

Loading

codecov bot commented Jan 24, 2024 •

edited

Loading