-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistent queries on tables should require key columns #5303
Milestone
Comments
This was referenced May 7, 2020
This will require a (breaking) change to the data sent back from the rest API. Likely something like: {
"name": "table tombstones",
"statements": [
"CREATE STREAM INPUT (K BIGINT KEY, V0 INT) WITH (kafka_topic='test_topic', value_format='JSON');",
"CREATE TABLE T AS SELECT K, SUM(V0) AS SUM FROM INPUT GROUP BY K HAVING SUM(V0) > 0;",
"SELECT * FROM INPUT EMIT CHANGES LIMIT 3;"
],
"inputs": [
{"topic": "test_topic", "key": 11, "value": {"v0": 1}},
{"topic": "test_topic", "key": 11, "value": {"v0": -2}},
{"topic": "test_topic", "key": 11, "value": {"v0": 10}}
],
"responses": [
{"admin": {"@type": "currentStatus"}},
{"query": [
{"header":{"schema":"``K` STRING KEY, `SUM` BIGINT"}},
{"row":{"keys":[11], "values":[1]}},
{"row":{"keys":[11], "tombstone":true}},
{"row":{"keys":[11], "values":[9]}},
{"finalMessage":"Limit Reached"}
]}
]
} Note the difference in the [
{"row":{"keys":[11], "values":[1]}},
{"row":{"keys":[11], "tombstone":true}}
] vs current: [
{"row":{"values":[11, 1]}}
] i.e. we'll need to split the key column value(s) out of the value column value(s). Then either explicitly set a |
big-andy-coates
added a commit
to big-andy-coates/ksql
that referenced
this issue
Jun 3, 2020
implements: [KLIP-29](confluentinc#5530) fixes: confluentinc#5303 fixes: confluentinc#4678 This change sees ksqlDB no longer adding an implicit `ROWKEY STRING` key column to created streams or primary key column to created tables when no key column is explicitly provided in the `CREATE` statement. BREAKING CHANGE `CREATE TABLE` statements will now fail if not `PRIMARY KEY` column is provided. For example, a statement such as: ```sql CREATE TABLE FOO (name STRING) WITH (kafka_topic='foo', value_format='json'); ``` Will need to be updated to include the definition of the PRIMARY KEY, e.g. ```sql CREATE TABLE FOO (ID STRING PRIMARY KEY, name STRING) WITH (kafka_topic='foo', value_format='json'); ``` If using schema inference, i.e. loading the value columns of the topic from the Schema Registry, the primary key can be provided as a partial schema, e.g. ```sql -- FOO will have value columns loaded from the Schema Registry CREATE TABLE FOO (ID INT PRIMARY KEY) WITH (kafka_topic='foo', value_format='avro'); ``` `CREATE STREAM` statements that do not define a `KEY` column will no longer have an implicit `ROWKEY` key column. For example: ```sql CREATE STREAM BAR (NAME STRING) WITH (...); ``` The above statement would previously have resulted in a stream with two columns: `ROWKEY STRING KEY` and `NAME STRING`. With this change the above statement will result in a stream with only the `NAME STRING` column. Streams will no KEY column will be serialized to Kafka topics with a `null` key.
big-andy-coates
added a commit
that referenced
this issue
Jun 3, 2020
* feat: explicit keys implements: [KLIP-29](#5530) fixes: #5303 fixes: #4678 This change sees ksqlDB no longer adding an implicit `ROWKEY STRING` key column to created streams or primary key column to created tables when no key column is explicitly provided in the `CREATE` statement. BREAKING CHANGE `CREATE TABLE` statements will now fail if not `PRIMARY KEY` column is provided. For example, a statement such as: ```sql CREATE TABLE FOO (name STRING) WITH (kafka_topic='foo', value_format='json'); ``` Will need to be updated to include the definition of the PRIMARY KEY, e.g. ```sql CREATE TABLE FOO (ID STRING PRIMARY KEY, name STRING) WITH (kafka_topic='foo', value_format='json'); ``` If using schema inference, i.e. loading the value columns of the topic from the Schema Registry, the primary key can be provided as a partial schema, e.g. ```sql -- FOO will have value columns loaded from the Schema Registry CREATE TABLE FOO (ID INT PRIMARY KEY) WITH (kafka_topic='foo', value_format='avro'); ``` `CREATE STREAM` statements that do not define a `KEY` column will no longer have an implicit `ROWKEY` key column. For example: ```sql CREATE STREAM BAR (NAME STRING) WITH (...); ``` The above statement would previously have resulted in a stream with two columns: `ROWKEY STRING KEY` and `NAME STRING`. With this change the above statement will result in a stream with only the `NAME STRING` column. Streams will no KEY column will be serialized to Kafka topics with a `null` key. Co-authored-by: Andy Coates <[email protected]>
stevenpyzhang
pushed a commit
that referenced
this issue
Jun 5, 2020
* feat: explicit keys implements: [KLIP-29](#5530) fixes: #5303 fixes: #4678 This change sees ksqlDB no longer adding an implicit `ROWKEY STRING` key column to created streams or primary key column to created tables when no key column is explicitly provided in the `CREATE` statement. BREAKING CHANGE `CREATE TABLE` statements will now fail if not `PRIMARY KEY` column is provided. For example, a statement such as: ```sql CREATE TABLE FOO (name STRING) WITH (kafka_topic='foo', value_format='json'); ``` Will need to be updated to include the definition of the PRIMARY KEY, e.g. ```sql CREATE TABLE FOO (ID STRING PRIMARY KEY, name STRING) WITH (kafka_topic='foo', value_format='json'); ``` If using schema inference, i.e. loading the value columns of the topic from the Schema Registry, the primary key can be provided as a partial schema, e.g. ```sql -- FOO will have value columns loaded from the Schema Registry CREATE TABLE FOO (ID INT PRIMARY KEY) WITH (kafka_topic='foo', value_format='avro'); ``` `CREATE STREAM` statements that do not define a `KEY` column will no longer have an implicit `ROWKEY` key column. For example: ```sql CREATE STREAM BAR (NAME STRING) WITH (...); ``` The above statement would previously have resulted in a stream with two columns: `ROWKEY STRING KEY` and `NAME STRING`. With this change the above statement will result in a stream with only the `NAME STRING` column. Streams will no KEY column will be serialized to Kafka topics with a `null` key. Co-authored-by: Andy Coates <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As per discussion here:
#5115 (comment)
We should require transient queries on tables to include the primary key(s), so that any transient push query can be converted to a peristent query.
The text was updated successfully, but these errors were encountered: