fix: circumvent KAFKA-10179 by forcing changelog topics for tables #5781
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Partial fix for #5673. This PR introduces an extra map phase for any new kafka streams application that reads from a table, thereby preventing the streams optimization that uses the source topic as a changelog topic and (on recovery) loads data directly from the topic into rocksDB. For more discussion and context, see #5673 and https://issues.apache.org/jira/browse/KAFKA-10179
This is only a partial fix because it means any new queries running this code will not hit the problem, it doesn't fix existing queries. To do that, we will follow up this commit with one that preemptively registers the schema of the original topic into the subject of the changelog topic.
Review Guide
NOTE: This PR is split into two commits because it requires rewriting almost all historical plans that used tables. Also, if someone can take a look at
joins.json
and let me know if this fixes a bug or if it introduces a bug - I feel like the behavior after this PR is the right one.Testing done
Unit testing, QTT and manual test; see
tables.json
Reviewer checklist