Persistent queries on tables should require key columns #5303

big-andy-coates · 2020-05-07T15:39:04Z

As per discussion here:

We should require transient queries on tables to include the primary key(s), so that any transient push query can be converted to a peristent query.

big-andy-coates · 2020-05-14T11:28:49Z

This will require a (breaking) change to the data sent back from the rest API. Likely something like:

{
      "name": "table tombstones",
      "statements": [
        "CREATE STREAM INPUT (K BIGINT KEY, V0 INT) WITH (kafka_topic='test_topic', value_format='JSON');",
        "CREATE TABLE T AS SELECT K, SUM(V0) AS SUM FROM INPUT GROUP BY K HAVING SUM(V0) > 0;",
        "SELECT * FROM INPUT EMIT CHANGES LIMIT 3;"
      ],
      "inputs": [
        {"topic": "test_topic", "key": 11, "value": {"v0": 1}},
        {"topic": "test_topic", "key": 11, "value": {"v0": -2}},
        {"topic": "test_topic", "key": 11, "value": {"v0": 10}}
      ],
      "responses": [
        {"admin": {"@type": "currentStatus"}},
        {"query": [
          {"header":{"schema":"``K` STRING KEY, `SUM` BIGINT"}},
          {"row":{"keys":[11], "values":[1]}},
          {"row":{"keys":[11], "tombstone":true}},
          {"row":{"keys":[11], "values":[9]}},
          {"finalMessage":"Limit Reached"}
        ]}
      ]
    }

Note the difference in the row output rows.

[
          {"row":{"keys":[11], "values":[1]}},
          {"row":{"keys":[11], "tombstone":true}}
]

vs current:

[
          {"row":{"values":[11, 1]}}
]

i.e. we'll need to split the key column value(s) out of the value column value(s). Then either explicitly set a tombstone flag, or just have this implicit from the lack of a values field.

implements: [KLIP-29](confluentinc#5530) fixes: confluentinc#5303 fixes: confluentinc#4678 This change sees ksqlDB no longer adding an implicit `ROWKEY STRING` key column to created streams or primary key column to created tables when no key column is explicitly provided in the `CREATE` statement. BREAKING CHANGE `CREATE TABLE` statements will now fail if not `PRIMARY KEY` column is provided. For example, a statement such as: ```sql CREATE TABLE FOO (name STRING) WITH (kafka_topic='foo', value_format='json'); ``` Will need to be updated to include the definition of the PRIMARY KEY, e.g. ```sql CREATE TABLE FOO (ID STRING PRIMARY KEY, name STRING) WITH (kafka_topic='foo', value_format='json'); ``` If using schema inference, i.e. loading the value columns of the topic from the Schema Registry, the primary key can be provided as a partial schema, e.g. ```sql -- FOO will have value columns loaded from the Schema Registry CREATE TABLE FOO (ID INT PRIMARY KEY) WITH (kafka_topic='foo', value_format='avro'); ``` `CREATE STREAM` statements that do not define a `KEY` column will no longer have an implicit `ROWKEY` key column. For example: ```sql CREATE STREAM BAR (NAME STRING) WITH (...); ``` The above statement would previously have resulted in a stream with two columns: `ROWKEY STRING KEY` and `NAME STRING`. With this change the above statement will result in a stream with only the `NAME STRING` column. Streams will no KEY column will be serialized to Kafka topics with a `null` key.

* feat: explicit keys implements: [KLIP-29](#5530) fixes: #5303 fixes: #4678 This change sees ksqlDB no longer adding an implicit `ROWKEY STRING` key column to created streams or primary key column to created tables when no key column is explicitly provided in the `CREATE` statement. BREAKING CHANGE `CREATE TABLE` statements will now fail if not `PRIMARY KEY` column is provided. For example, a statement such as: ```sql CREATE TABLE FOO (name STRING) WITH (kafka_topic='foo', value_format='json'); ``` Will need to be updated to include the definition of the PRIMARY KEY, e.g. ```sql CREATE TABLE FOO (ID STRING PRIMARY KEY, name STRING) WITH (kafka_topic='foo', value_format='json'); ``` If using schema inference, i.e. loading the value columns of the topic from the Schema Registry, the primary key can be provided as a partial schema, e.g. ```sql -- FOO will have value columns loaded from the Schema Registry CREATE TABLE FOO (ID INT PRIMARY KEY) WITH (kafka_topic='foo', value_format='avro'); ``` `CREATE STREAM` statements that do not define a `KEY` column will no longer have an implicit `ROWKEY` key column. For example: ```sql CREATE STREAM BAR (NAME STRING) WITH (...); ``` The above statement would previously have resulted in a stream with two columns: `ROWKEY STRING KEY` and `NAME STRING`. With this change the above statement will result in a stream with only the `NAME STRING` column. Streams will no KEY column will be serialized to Kafka topics with a `null` key. Co-authored-by: Andy Coates <[email protected]>

This was referenced May 7, 2020

docs: add klip-24: key column semantics in queries. #5115

Merged

Tombstones not visible in SELECT and PRINT statements #3616

Closed

big-andy-coates added this to the 0.11.0 milestone May 14, 2020

big-andy-coates mentioned this issue Jun 3, 2020

Explicit keys #5533

Merged

2 tasks

big-andy-coates closed this as completed in #5533 Jun 3, 2020

big-andy-coates modified the milestones: 0.11.0, 0.10.0 Jul 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent queries on tables should require key columns #5303

Persistent queries on tables should require key columns #5303

big-andy-coates commented May 7, 2020

big-andy-coates commented May 14, 2020 •

edited

Loading

Persistent queries on tables should require key columns #5303

Persistent queries on tables should require key columns #5303

Comments

big-andy-coates commented May 7, 2020

big-andy-coates commented May 14, 2020 • edited Loading

big-andy-coates commented May 14, 2020 •

edited

Loading