-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: klip-31: Metastore Backups #5741
Conversation
|
||
This is a new feature and does not affect any other feature of KSQL or CP components. | ||
|
||
## Security Implications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apurvam @derekjn There are some security implications of having a Metastore backup (see below). I was thinking we should not have these backups enabled by default to avoid users leaking data accidentally. But, if the default will be disabled, then users can still at risk of deleting the command_topic and not able to recover it. And if they need a backup option, won't they configure a better backup service in Kafka instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can toggle these backups with config. We can default to off too. But I think the basic premise is that the access to the state store disks also need to be secured: they contain some representation of data in the underlying kafka topics which created those state stores (in a stream table join, the materialized table is basically a copy of the source topic). And those stores don't respect the same ACL rules as the underlying topics anyway: if you have access to them, you can essentially read underlying topic contents.
So storing the metastore backups along with the state stores is not making the model any worse IMO. Still giving users the option to disable the backup, and also choose the location of the backup makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In simple text, the file grows to 1.7M when using 500 commands. It's not big. I will add gzip compression in the next release among other improvements on formatting and a restore command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Well thought out :) I had some questions but then you answered them later in the KLIP. About whether or not to enable it by default, let's wait for product guidance as you called out in your comment.
### Performance | ||
|
||
The KSQL metastore is not expect to grow into a big file. Appending each new command to a file is faster. KSQL command messages are not large to cause slowness during backup file writing. | ||
It will certainly add extra time during KSQL restarts, but considering the average I/O latency of 10ms per write, then 500 commands would take around 5s to create the initial backup. Also, using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be even faster given the whole file is probably just one bulk I/O load
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider storing a compressed version of the file on disk, to save space. Given that it is pure text, even simple gzip compression should result in a good compression ratio.
|
||
If a backup file with the same KSQL service.id exists, then KSQL will append only the new command_topic messages to that file. | ||
|
||
KSQL will compare each consumed command_topic message against the next line from the backup file. If both messages are different, then it considers the command_topic as a new restored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the comparison function? String equality? Compare the deserialized json objects on both sides? Something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will compare the deserialized objects. If we use Strigs, then we might get false equality if the command properties are in a different order.
8439d6f
to
d4341fe
Compare
Description
Request for comment of KLIP-31.
Testing done
Describe the testing strategy. Unit and integration tests are expected for any behavior changes.
Reviewer checklist