-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for byte shuffle as a codec so that it can be used as a Zarr filter #273
Add support for byte shuffle as a codec so that it can be used as a Zarr filter #273
Conversation
Hello @pbranson! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-03-18 13:41:16 UTC |
@pbranson : The wheel GH actions don't make use of requirements.txt. I think you'll need to register |
Thanks for the suggestion @joshmoore. The wheels now build for all except Python 3.9 |
Interesting:
Looks like NumPy is missing somewhere, but I haven't tracked down where yet. |
At least for Windows/3.9, it looks like it's failing on trying to build the numba wheel. If there's no wheel available, perhaps installing numba via conda would help? But this raises the issue of whether end users will face the same issue. |
Looks like the build problems with 3.9 as numba does not yet have a release with 3.9 support (see numba/numba#6345). This is coming soon in numba 0.53.0 However, it would probably be good to get some consensus on whether including numba as a dependency is something acceptable to for numcodecs: @jakirkham, @alimanfoo do you have any guidance on that front? |
I would suggest, to push this forward, that numba not be a dependency, but only needed when instantiating this filter, i.e., using any other filters works fine without numba installed. However, numba should be in the tests, and installed from conda. There is a good reason for the existence of conda: you can include complex binary dependencies like LLVM outside of the python tree in a way that is really hard to achieve with pip. How hard is it to write shuffle cython, like the other filters? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for looking into this Paul! 😄
Unfortunately have a few concerns with this implementation noted below. There are probably a few different ways to address this. I think the easiest will likely be adding a new optional dependency for this functionality. Though maybe other solution like using NumPy directly may be viable (though I haven't dug deeply into this)
I made a cython version in pbranson#1 that eliminated the dep problem. It's not quite done, but the tests pass. Almost identical code. For licensing, presumably we need to just find a publicly free definition of shuffle - I'm sure HDF didn't invent it. Actually I'm not sure what their license is anyway. |
Thanks @martindurant and @jakirkham for taking a look at this! The license for HSDS is apache 2.0 https://github.com/HDFGroup/hsds/blob/master/LICENSE which from my naive reading seems to suggest that inclusion of the copyright notice is all that is needed... @martindurant Thanks a lot for the cython code! I have had a look at your PR - will have a go at merging it into this branch to resolve the numba dep |
* shuffle with cython * type j * recert c-files
Yay, all ticks! |
Heh, I went into the logs for the Windows CI and for all of the codecs that have that ImportError handling the tests are skipped... So on Windows only 108 out of a possible 593 tests are actually executed.... So I dont know if it works on those platforms! Doesnt seem satisfactory |
cc ^ @zarr-developers/core-devs |
I assume related to #270 |
cc @JacksonMaxfield - it seems we get |
I know some of this is already said above but condensing into a single comment as I get caught up: See here any of the Windows CI builds from my PR.
Did 400 tests get added to this repo in that past month? Did I miss something in my PR? No that's not the case because even on that same PR the ubuntu and mac OS builds have the expected number of tests detected and ran.
Well then whats the difference between the windows build from the others? Taking the step definition of the ci-macos and the ci-windows workflows to a text compare shows they are exactly the same.... So no differences in the CI workflow definition, and crucially it uses the same testing command as the others, so still confused as to what is going on. There is this comment from my PR: #270 (comment) So, if my memory is correct and understanding the context of that comment is correct, there was no Windows testing prior to my PR. A bit of digging on "pytest fails to pick up Cython tests on Windows" got me to this stackoverflow Which has a final update of:
Looks like this has been an issue of pytest + cython + windows for a while? |
One more update. Looked at that stackover again and saw there was a single upvoted answer down below that linked to this answer: https://stackoverflow.com/a/49068163 Which says to build the cython extension manually instead of using |
The question is: can we be certain that these cythonized modules will work on windows? We know from the linux tests that the algorithms themselves are fine. |
I'm not sure why I was summoned by Github, but I took a look at the workfow YAMLs, and this is my guess for the different amount of tests on Win + MacOS on one side, and Linux on the other: Linux: project is installed properly via Second, Linux uses Hope this helps! |
Hey @hoefling! Sorry for pinging you. You and the person who helped answer the question on StackOverflow have the same name (or you are the same person!) and when I copy-pasted the answer in to GitHub I forgot to remove the Made a PR to fix this: #276 |
Thats awesome, thanks everyone! For this PR should I change the codec registration to not handle the import error and we accept that the windows CI tests will fail until #276 is merged? |
Ah, I get it now, funny thing :-) Absolutely no worries, happy to help! |
I have merged #276 , so you can merge in, and then we can merge this too (note the conflict in release.rst). |
OK I think this is ready! |
Is it right that all the fixture .dat and .npy files show up as empty? |
Yeah that is weird about the fixtures - they arent empty on my system - if you click to view file it shows as ~7kb raw. Actually now they are all showing correctly?! |
Yeah I think one has to manually add them. They shouldn't be empty, but we do have them in |
@jakirkham - the files really are there, they just didn't show in the diff for some reason |
Oh interesting did not realize that. Thanks for correcting me Martin 😄 |
Yay, thanks everyone for your help and time reviewing. I'm pretty new to this so has been a good learning experience! |
PR for a byte shuffle codec as discussed in #260
Became too difficult (for me atleast) to see a way forward making use of the optimised blosc-c shuffle.
This add a dependency on numba which may not be acceptable, in which case perhaps a cython implementation of the shuffle functions will be required. Would appreciate advice if that is the case
TODO:
tox -e py39
passes locallytox -e docs
passes locally