Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy in BitRound #608

Merged
merged 3 commits into from
Oct 17, 2024
Merged

Copy in BitRound #608

merged 3 commits into from
Oct 17, 2024

Conversation

slevang
Copy link
Contributor

@slevang slevang commented Oct 16, 2024

I was benchmarking different compression options and realized BitRound was overwriting my array in place.

import shutil
import dask.array as da
from numcodecs import BitRound

def write_to_store(array, **kwargs):
    shutil.rmtree("test.zarr", ignore_errors=True)
    array.to_zarr("test.zarr", **kwargs)
    print(array.mean().compute())

array = da.random.default_rng(42).random((1,)).persist()

write_to_store(array)
write_to_store(array, filters=[BitRound(keepbits=5)])
write_to_store(array)
0.9167441575549085
0.921875
0.921875

Looks like there was discussion on this in the original PR but nothing came of it. Making an explicit copy seems to work fine as well, not sure which is preferred?

cc @rabernat

TODO:

  • Unit tests and/or doctests in docstrings
  • Tests pass locally
  • Docstrings and API docs for any new/modified user-facing classes and functions
  • Changes documented in docs/release.rst
  • Docs build locally
  • GitHub Actions CI passes
  • Test coverage to 100% (Codecov passes)

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sam! 🙏

This is a good point to follow up

Would actually recommend just copying to start and then doing everything in-place on the copy. Otherwise we are generating a lot of temporary arrays

Mention this as we are doing compression memory usage can be an issue. So we want to reduce the amount of temporary arrays we create if possible

@jakirkham jakirkham mentioned this pull request Oct 16, 2024
7 tasks
Copy link

codecov bot commented Oct 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.91%. Comparing base (85eb7aa) to head (0ea5164).
Report is 56 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #608   +/-   ##
=======================================
  Coverage   99.91%   99.91%           
=======================================
  Files          59       59           
  Lines        2327     2328    +1     
=======================================
+ Hits         2325     2326    +1     
  Misses          2        2           
Files with missing lines Coverage Δ
numcodecs/bitround.py 100.00% <100.00%> (ø)

@slevang slevang changed the title Remove in place ops from BitRound Copy in BitRound Oct 17, 2024
@jakirkham
Copy link
Member

Thanks Sam! 🙏

@jakirkham jakirkham requested a review from rabernat October 17, 2024 02:05
@jakirkham jakirkham merged commit cabecae into zarr-developers:main Oct 17, 2024
30 checks passed
@jakirkham
Copy link
Member

Release note is in PR: #609

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants