Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest): implement compression for CheckpointState #6007

Merged

Conversation

alexey-kravtsov
Copy link
Contributor

Stateful ingestion - add and enable by default bz2 compression and base85 encoding for CheckpointState

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Sep 21, 2022
@github-actions
Copy link

Unit Test Results (build & test)

562 tests  ±0   562 ✔️ ±0   13m 29s ⏱️ +30s
139 suites ±0       0 💤 ±0 
139 files   ±0       0 ±0 

Results for commit 34288d6. ± Comparison against base commit b638bcf.

@maggiehays maggiehays added the community-contribution PR or Issue raised by member(s) of DataHub Community label Sep 22, 2022
@shirshanka shirshanka requested a review from rslanka September 23, 2022 05:16
@shirshanka
Copy link
Contributor

How will this impact existing pipelines which are already storing state in the old format? When they upgrade to this code, will they still be able to deserialize their old state?

@alexey-kravtsov
Copy link
Contributor Author

alexey-kravtsov commented Sep 23, 2022

@shirshanka Yes, it is backward compatible. Field serde of checkpoint aspect is not compressed, and decoder choice is based on it. Before this change, aspect was written with utf-8 serde, and this case is addressed https://github.com/datahub-project/datahub/pull/6007/files#diff-1ed61874f5f1ff01308783dc5cd66bf8bc58b08a4ef2ecc8ee89442a10d378caR120 - this is exactly the same decoder as before

Copy link
Contributor

@rslanka rslanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shirshanka shirshanka merged commit 3c3ab64 into datahub-project:master Sep 26, 2022
@alexey-kravtsov alexey-kravtsov deleted the checkpoint-state-compression branch October 17, 2022 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants