Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LZ4 compression for Apache Arrow #2339

Merged
merged 3 commits into from
Aug 14, 2023
Merged

LZ4 compression for Apache Arrow #2339

merged 3 commits into from
Aug 14, 2023

Conversation

texodus
Copy link
Member

@texodus texodus commented Aug 14, 2023

Adds LZ4 compression to Perspective's Apache Arrow read/write support, which also makes Perspective compatible with e.g. pyarrow.feather.write_feather() IPC methods which default to this compression codec. This PR addresses issues from #2337, adding:

  • Upgrades Apache Arrow to v12.0.0
  • A new option/kwarg for to_arrow(), compression, which can be "lz4" or null/None (for JavaScript/Python, respectively). This option defaults to "lz4".
  • Tests.
  • Benchmarks. There is a slight impact to LZ4 even with the relatively small datasets we benchmark with.

This PR is a breaking change, as the compression option causes default .to_arrow() output to be incompatible with older Perspective versions. To backport, set this option to null/None:

JavaScript:

const arrow = view.to_arrow({compression: null});

Python:

arrow = view.to_arrow(compression=None)

full build

@@ -3,7 +3,7 @@ project(psp)
include(CheckCCompilerFlag)

set(CMAKE_BUILD_TYPE "Release")
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD 17)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should update/remove gnu flag to match:

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -std=c++1y")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this tested?

@@ -509,7 +509,7 @@ set(PYTHON_SOURCE_FILES ${SOURCE_FILES}
)

set(WASM_SOURCE_FILES ${SOURCE_FILES}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should remove whole block if not used anymore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@texodus texodus merged commit 54a6d05 into master Aug 14, 2023
@texodus texodus deleted the feather branch August 14, 2023 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking enhancement Feature requests or improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants