Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement LZ4 compression of blobs #254

Closed
vmarkovtsev opened this issue Jun 26, 2017 · 6 comments · Fixed by #258
Closed

Implement LZ4 compression of blobs #254

vmarkovtsev opened this issue Jun 26, 2017 · 6 comments · Fixed by #258

Comments

@vmarkovtsev
Copy link
Contributor

Hi! Thanks for this project, it is really cool and we use it to serialize our machine learning models (e.g., https://github.com/src-d/ast2vec). Brilliant idea to combine human-readable YAML and space-efficient blobs.

As I understood from the code and the format specification, ASDF supports zlib and bz2 binary compression. I've recently learned about an exciting compression codec, lz4. As you see from their benchmarks, LZ4 HC is damn slow at compression, yields a similar to zlib compression ratio, but is an order of magnitude faster than zlib during decompression. Our use case is indifferent to the time it takes to generate an ASDF, but is sensitive to the loading lag. I believe, we are not the only ones.

I propose to implement lz4 as the third compression option and include it into some next ASDF specification. I am ready to add it myself, provided by you bless me.

vmarkovtsev added a commit to vmarkovtsev/asdf that referenced this issue Jun 29, 2017
@drdavella
Copy link
Contributor

Hi @vmarkovtsev, it's great to hear that this is a useful format for you, and thanks for the contributions. In principle I don't think there's any issue with supporting additional compression algorithms. However, I wonder if it would make more sense to provide a plug-and-play infrastructure for compression implementations rather than hard-coding additional support into the ASDF library itself. Maybe in the short term we can integrate your changes but in the long term we can provide a more flexible infrastructure.

Also, in the short term, rather than introducing a hard dependency on LZ4, maybe LZ4 support in ASDF should only be available on systems where it is already installed.

@vmarkovtsev
Copy link
Contributor Author

Ah, I see what you mean, passing a custom class which implements compression/decompression to all_array_compression. I will be more than happy to add this feature in the future!

Regarding lz4, it is already plug and play: import happens lazily, just like with zlib/bzip2. Do you mean smth special?

@drdavella
Copy link
Contributor

Ah I'm sorry, I didn't look at the diffs closely enough. There are a few other tests that are failing as a result of some recent changes to astropy. I would like to fix these before integrating your changes, which may take a little while longer. Do you mind if I hold off on the merge until this is done?

@vmarkovtsev
Copy link
Contributor Author

Sure, no problem! I will ping here every week 😉

@vmarkovtsev
Copy link
Contributor Author

As promised, the first ping 😄

@drdavella
Copy link
Contributor

Hey, things are going a little slower than I hoped but I think I should be able to pull this in soon. Maybe today, maybe tomorrow.

vmarkovtsev added a commit to vmarkovtsev/asdf that referenced this issue Jul 8, 2017
vmarkovtsev added a commit to vmarkovtsev/asdf that referenced this issue Jul 8, 2017
vmarkovtsev added a commit to vmarkovtsev/asdf that referenced this issue Jul 8, 2017
vmarkovtsev added a commit to vmarkovtsev/asdf that referenced this issue Jul 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants