-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pcodec #501
Add pcodec #501
Conversation
Hello @rabernat! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2024-02-23 21:25:17 UTC |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #501 +/- ##
=======================================
Coverage 99.91% 99.91%
=======================================
Files 57 59 +2
Lines 2265 2323 +58
=======================================
+ Hits 2263 2321 +58
Misses 2 2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of test files!
The test files were auto generated. Do you think this is a problem? |
If you can generate them reliably at CI/test time even better; but they are not big, so I am not worried. |
My understanding is that the reason we store files in the repo is to ensure backwards compatibility of the codecs, i.e. to guard against changes to the implementation that would render existing data unreadable. That wouldn't work if you regenerate them each time. |
Right, it would need to be a totally independent and repeatable way to make them. If there is no such way, you store them. |
Would anyone care to review this PR? |
numcodecs/__init__.py
Outdated
@@ -115,3 +115,7 @@ | |||
|
|||
from numcodecs.fletcher32 import Fletcher32 | |||
register_codec(Fletcher32) | |||
|
|||
with suppress(ImportError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to make sure this import succeeds (import pcodec
inside encode
/decode
, below), but presenting a helpful message whenever someone tries to use Pcodec without the necessary deps? Otherwise, the user would just see some "codec not found" message without any idea of what to do about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this comment. The approach used here was copied from other implementations, but I'm going to change it to what you suggested.
self.float_mult_spec = float_mult_spec | ||
self.max_page_n = max_page_n | ||
|
||
def encode(self, buf): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def encode(self, buf): | |
def encode(self, buf) -> bytes: |
?
(This may be a good way to label array-bytes codecs; maybe also type for buf
should be ndarray
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that there is much value in adding these sorts of type hints if we are not actually running type checking on the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not for other users of the library, and their IDEs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But incorrect type hints are worse than none at all! For example, is ndarray
really the correct type for buf? Maybe, but who knows? I could add it, but without running mypy, we'll never know for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True that correctness is important, of course; but this is like the "light" version of array->array V array->bytes V bytes->bytes. Still useful. You can always get around mypy too, if you want.
I am sad that the CI is now failing. I don't understand why. (Everything is good locally). Going to try reverting the change to how we import. |
* added PCodec * fix line length and print statements * docs * mock pcodec on rtd * fix typo * add dtype details * changed import style for pcodec * fix flake8 * revert import changes * fix errors due to changes in pcodec API * change import style * skip coverage of failed import path * skip pcodec tests if not installed
This PR adds an exciting new codec to numcodecs: pcodec.
From the point of view of the V3 codec spec pcodec is an
array -> bytes
. It takes a numpy array of dtype i4, i8, u4, u8, f4, f8 and turns it into encoded bytes (and reverses this for decoding). It seems like numcodecs hasn't quite formalized this way of categorizing codecs (e.g. there is nonumcodecs.abc.ArrayToBytesCodec
). I've done my best to apply appropriate testing.TODO: