-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: reserve WINDOWSTART
and WINDOWEND
as system column names
#4388
Merged
big-andy-coates
merged 2 commits into
confluentinc:master
from
big-andy-coates:reserve_window_bounds
Jan 27, 2020
Merged
fix: reserve WINDOWSTART
and WINDOWEND
as system column names
#4388
big-andy-coates
merged 2 commits into
confluentinc:master
from
big-andy-coates:reserve_window_bounds
Jan 27, 2020
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
BREAKING CHANGE: `WINDOWSTART` and `WINDOWEND` are now reserved system column names. Any query that previously used those names will need to be changed: for example, alias the columns to a different name. These column names are being reserved for use as system columns when dealing with streams and tables that have a windowed key.
Have we decided against using functions for these kind of pseudo-fields? (cf. #3734) |
These aren't pseudo fields, they are actual fields in the key, right? |
agavra
approved these changes
Jan 27, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
big-andy-coates
added a commit
to big-andy-coates/ksql
that referenced
this pull request
Jan 29, 2020
fixes: confluentinc#3871 Is needed to fix: - confluentinc#3633 - confluentinc#4015 Before this change the version of `ROWKEY` copied to the value schema during processing of data in the Streams topology was always of type `STRING` regardless of the actual key type. This is because windowed keys had a `ROWKEY` in the format `<actual key> : Window{start=<windowStart>, end=<windowEnd>}`. While `ROWKEY` in the value schema was a `STRING`, `ROWKEY` in the key schema was the actual type, e.g. `INT`. This is confusing and will lead to bugs. Also, the formated string isn't very friendly for users. This change looks to introduce the `WINDOWSTART` and `WINDOWEND` columns that were reserved in confluentinc#4388. The obvious approach would be to add `WINDOWSTART` and `WINDOWEND` as columns in the key schema. Unfortunately, this would be a much bigger change as many parts of the code currently rely on there being only a single key column. The planned structured key work will resolve this. For now, we only add the windows bounds columns when we `LogicalSchema.withMetaAndKeyColsInValue(true)`. This is a bit of a temporary hack, but gets us where we need to be. This will be cleaned up as part of the structured key work. With this change `ROWKEY` for windowed sources no longer has the format `<actual key> : Window{start=<windowStart>, end=<windowEnd>}`: `ROWKEY` is now only the _actual_ key and the window bounds can be accessed by `WINDOWSTART` and `WINDOWEND`. These two window bounds columns are included in a pull `SELECT *` query. Likewise a join will include the window bounds columns from both sides in the join result if the join is `SELECT *`. ## Examples: ### Push queries * A select * on a windowed source will not include `WINDOWSTART` and `WINDOWEND`. `ROWKEY` will be the actual key, not a formatted string. ``` ksql> SELECT * FROM windowedSource emit changes -- old output +---------------+------------------------------------------------------+--------+---------+------+ | ROWTIME | ROWKEY | USERID | PAGEID | TOTAL| +---------------+------------------------------------------------------+--------+---------+------+ | 1557183929488 | User_9|+|Page_39 : Window{start=1557183900000 end=-} | User_9 | Page_39 | 1 | | 1557183930211 | User_1|+|Page_79 : Window{start=1557183900000 end=-} | User_1 | Page_79 | 1 | -- new output +---------------+---------------+---------------+------------------+--------+---------+------+ | ROWTIME | WINDOWSTART | WINDOWEND | ROWKEY | USERID | PAGEID | TOTAL| +---------------+---------------+---------------+------------------+--------+---------+------+ | 1557183919786 | 1557183900000 | 1557183960000 | User_5|+|Page_12 | User_5 | Page_12 | 1 | | 1557183929488 | 1557183900000 | 1557183960000 | User_9|+|Page_39 | User_9 | Page_39 | 1 | ``` * `WINDOWSTART` and `WINDOWEND` are available in the SELECT, GROUPBY, WHERE, HAVING clauses etc. For example: ```sql SELECT TIMESTAMPTOSTRING(WINDOWSTART,'yyyy-MM-dd HH:mm:ss Z') FROM windowedSource emit changes; ``` However, don't get too excited just yet as there is a known limitation that drastically reduces the availability of this syntax: **KNOWN LIMITATION** Where a query builds a windowed source from a non-windowed source the window bounds columns are not available. For example: ``` -- won't yet work: SELECT WINDOWSTART FROM FROM someSource WINDOW TUMBLING (SIZE 1 SECOND) group by ROWKEY; ``` This issue is tracked by: confluentinc#4397 * Joins of windowed sources include the `WINDOWSTART` and `WINDOWEND` columns from both sides. ### Pull queries **KNOWN LIMITATION** Pull queries have not been updated yet. This will be done in a follow up PR confluentinc#3633. This is mainly to keep this PR manageable. ### Persistent queries Persistent C*AS queries work similar to push queries and have the same known limitation. BREAKING CHANGE: Any query of a windowed source that uses `ROWKEY` in the SELECT projection will see the contents of `ROWKEY` change from a formatted `STRING` containing the underlying key and the window bounds, to just the underlying key. Queries can access the window bounds using `WINDOWSTART` and `WINDOWEND`. BREAKING CHANGE: Joins on windowed sources now include `WINDOWSTART` and `WINDOWEND` columns from both sides on a `SELECT *`.
2 tasks
big-andy-coates
added a commit
to big-andy-coates/ksql
that referenced
this pull request
Jan 29, 2020
fixes: confluentinc#3633 This change sees pull queries share more functionality and code around the window bounds columns `WINDOWSTART` and `WINDOWEND` introduced in confluentinc#4388 and confluentinc#4401. * pull queries on time windowed sources, i.e. `TUMBLING` and `HOPPING`, now have a `WINDOWEND` in their schema, just like `SESSION` and the new push query functionality. * window bound columns are now accessible within the projection of a pull query, e.g. `SELECT WINDOWSTART, WINDOWEND FROM FOO WHERE ROWKEY=1;`
2 tasks
big-andy-coates
added a commit
that referenced
this pull request
Jan 29, 2020
#4401) * chore: support window bounds columns in persistent and pull queries fixes: #3871 Is needed to fix: - #3633 - #4015 Before this change the version of `ROWKEY` copied to the value schema during processing of data in the Streams topology was always of type `STRING` regardless of the actual key type. This is because windowed keys had a `ROWKEY` in the format `<actual key> : Window{start=<windowStart>, end=<windowEnd>}`. While `ROWKEY` in the value schema was a `STRING`, `ROWKEY` in the key schema was the actual type, e.g. `INT`. This is confusing and will lead to bugs. Also, the formated string isn't very friendly for users. This change looks to introduce the `WINDOWSTART` and `WINDOWEND` columns that were reserved in #4388. The obvious approach would be to add `WINDOWSTART` and `WINDOWEND` as columns in the key schema. Unfortunately, this would be a much bigger change as many parts of the code currently rely on there being only a single key column. The planned structured key work will resolve this. For now, we only add the windows bounds columns when we `LogicalSchema.withMetaAndKeyColsInValue(true)`. This is a bit of a temporary hack, but gets us where we need to be. This will be cleaned up as part of the structured key work. With this change `ROWKEY` for windowed sources no longer has the format `<actual key> : Window{start=<windowStart>, end=<windowEnd>}`: `ROWKEY` is now only the _actual_ key and the window bounds can be accessed by `WINDOWSTART` and `WINDOWEND`. These two window bounds columns are included in a pull `SELECT *` query. Likewise a join will include the window bounds columns from both sides in the join result if the join is `SELECT *`. ## Examples: ### Push queries * A select * on a windowed source will not include `WINDOWSTART` and `WINDOWEND`. `ROWKEY` will be the actual key, not a formatted string. ``` ksql> SELECT * FROM windowedSource emit changes -- old output +---------------+------------------------------------------------------+--------+---------+------+ | ROWTIME | ROWKEY | USERID | PAGEID | TOTAL| +---------------+------------------------------------------------------+--------+---------+------+ | 1557183929488 | User_9|+|Page_39 : Window{start=1557183900000 end=-} | User_9 | Page_39 | 1 | | 1557183930211 | User_1|+|Page_79 : Window{start=1557183900000 end=-} | User_1 | Page_79 | 1 | -- new output +---------------+---------------+---------------+------------------+--------+---------+------+ | ROWTIME | WINDOWSTART | WINDOWEND | ROWKEY | USERID | PAGEID | TOTAL| +---------------+---------------+---------------+------------------+--------+---------+------+ | 1557183919786 | 1557183900000 | 1557183960000 | User_5|+|Page_12 | User_5 | Page_12 | 1 | | 1557183929488 | 1557183900000 | 1557183960000 | User_9|+|Page_39 | User_9 | Page_39 | 1 | ``` * `WINDOWSTART` and `WINDOWEND` are available in the SELECT, GROUPBY, WHERE, HAVING clauses etc. For example: ```sql SELECT TIMESTAMPTOSTRING(WINDOWSTART,'yyyy-MM-dd HH:mm:ss Z') FROM windowedSource emit changes; ``` However, don't get too excited just yet as there is a known limitation that drastically reduces the availability of this syntax: **KNOWN LIMITATION** Where a query builds a windowed source from a non-windowed source the window bounds columns are not available. For example: ``` -- won't yet work: SELECT WINDOWSTART FROM FROM someSource WINDOW TUMBLING (SIZE 1 SECOND) group by ROWKEY; ``` This issue is tracked by: #4397 * Joins of windowed sources include the `WINDOWSTART` and `WINDOWEND` columns from both sides. ### Pull queries **KNOWN LIMITATION** Pull queries have not been updated yet. This will be done in a follow up PR #3633. This is mainly to keep this PR manageable. ### Persistent queries Persistent C*AS queries work similar to push queries and have the same known limitation. BREAKING CHANGE: Any query of a windowed source that uses `ROWKEY` in the SELECT projection will see the contents of `ROWKEY` change from a formatted `STRING` containing the underlying key and the window bounds, to just the underlying key. Queries can access the window bounds using `WINDOWSTART` and `WINDOWEND`. BREAKING CHANGE: Joins on windowed sources now include `WINDOWSTART` and `WINDOWEND` columns from both sides on a `SELECT *`.
big-andy-coates
added a commit
that referenced
this pull request
Jan 30, 2020
* chore: add full window bounds support to pull queries fixes: #3633 This change sees pull queries share more functionality and code around the window bounds columns `WINDOWSTART` and `WINDOWEND` introduced in #4388 and #4401. * pull queries on time windowed sources, i.e. `TUMBLING` and `HOPPING`, now have a `WINDOWEND` in their schema, just like `SESSION` and the new push query functionality. * window bound columns are now accessible within the projection of a pull query, e.g. `SELECT WINDOWSTART, WINDOWEND FROM FOO WHERE ROWKEY=1;`
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
BREAKING CHANGE:
WINDOWSTART
andWINDOWEND
are now reserved system column names. Any query that previously used those names will need to be changed: for example, alias the columns to a different name.These column names are being reserved for use as system columns when dealing with streams and tables that have a windowed key.
In the fullness of time we may allow custom names for these columns. However, in the short term we're going to be using hardcoded names.
Testing done
Suitable test unit / QTT tests added.
Reviewer checklist