Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect roundtrip for [blosc2.Filter.BITSHUFFLE, blosc2.Filter.BYTEDELTA] #528

Closed
froody opened this issue Jul 1, 2023 · 2 comments
Closed

Comments

@froody
Copy link

froody commented Jul 1, 2023

Describe the bug
When invoked from python with a random uint16 array with max value of 118, the roundtrip for bitshuffle + bytedelta is incorrect, even though they are correct on their own.

To Reproduce

import numpy as np
import blosc2
np.random.seed(0)

a = np.random.randint(118, size=13058819, dtype=np.uint16)

def test_filters(a, filters):
    c = blosc2.pack_array2(a, cparams={"filters":filters})
    u = blosc2.unpack_array2(c)
    if not np.array_equal(a, u):
        print("Error: arrays are not equal with", filters)

test_filters(a, [blosc2.Filter.BITSHUFFLE, blosc2.Filter.BYTEDELTA])
test_filters(a, [blosc2.Filter.BITSHUFFLE])
test_filters(a, [blosc2.Filter.BYTEDELTA])
test_filters(a, [blosc2.Filter.SHUFFLE, blosc2.Filter.BYTEDELTA])

Expected behavior
(nothing printed)

Logs
Error: arrays are not equal with [<Filter.BITSHUFFLE: 2>, <Filter.BYTEDELTA: 34>]

System information:

  • OS: [Ubuntu 22.04
  • Compiler: n/a
  • Version: 2.2.4

Additional context
Add any other context about the problem here.

@froody
Copy link
Author

froody commented Jul 1, 2023

Repro in C:

  • Run with no args to exhibit failure
  • Run with any arg to show successfull roundtrip with SHUFFLE + BYTEDELTA
#include <stdio.h>
#include <blosc2.h>
#include "blosc2/filters-registry.h"

#define NCHUNKS 1
#define NTHREADS 2
#define TYPESIZE 2
#define LEN 13058819
#define CHUNKSIZE (TYPESIZE * LEN)

int main(int argc, char **argv) {
  blosc2_init();
  uint16_t *ref_data = (uint16_t *)malloc(CHUNKSIZE);
  std::srand(0);
  for (int i = 0; i < LEN; i++) {
    ref_data[i] = std::rand() / (RAND_MAX / 118);
  }
  uint16_t *data_dest = (uint16_t *)malloc(CHUNKSIZE);
  int32_t isize = CHUNKSIZE;
  blosc2_cparams cparams = BLOSC2_CPARAMS_DEFAULTS;
  cparams.compcode = BLOSC_ZSTD;
  if (argc > 1) {
    cparams.filters[BLOSC2_MAX_FILTERS - 2] = BLOSC_SHUFFLE;
  } else {
    cparams.filters[BLOSC2_MAX_FILTERS - 2] = BLOSC_BITSHUFFLE;
  }
  cparams.filters[BLOSC2_MAX_FILTERS - 1] = BLOSC_FILTER_BYTEDELTA;
  cparams.filters_meta[BLOSC2_MAX_FILTERS - 1] = 7;

  blosc2_dparams dparams = BLOSC2_DPARAMS_DEFAULTS;
  blosc2_schunk* schunk;

  /* Create a super-chunk container */
  cparams.typesize = TYPESIZE;
  cparams.nthreads = NTHREADS;
  dparams.nthreads = NTHREADS;
  char *urlpath = (char *)"/tmp/dir1.b2frame";
  blosc2_storage storage = {.contiguous=false, .urlpath=urlpath, .cparams=&cparams, .dparams=&dparams};

  if (true) {
    /* Remove the directory and write new data */
    blosc2_remove_dir(storage.urlpath);
    schunk = blosc2_schunk_new(&storage);
    blosc2_schunk_append_buffer(schunk, ref_data, isize);
  }

  /* Read the chunks that were written before */
  schunk = blosc2_schunk_open(urlpath);
  blosc2_schunk_decompress_chunk(schunk, 0, data_dest, isize);
  for (int i = 0; i < LEN; i++) {
    if (data_dest[i] != ref_data[i]) {
      printf("Decompressed data differs from original %d, %d, %d!\n",
             LEN, ref_data[i], data_dest[i]);
      return -1;
    }
  }

  printf("Successful roundtrip data <-> schunk !\n");

  /* Remove directory */
  blosc2_remove_dir(storage.urlpath);
  /* Free resources */
  /* Destroy the super-chunk */
  blosc2_schunk_free(schunk);
  blosc2_destroy();
  return 0;
}

@FrancescAlted
Copy link
Member

Fixed in #530. Thanks @froody !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants