Primary keys #4986

big-andy-coates · 2020-04-03T11:36:47Z

Description

KLIP for this (which needs approving first): #5008

The commit introduces PRIMARY KEY columns into the ksqlDB syntax for tables. Streams will continue to have KEY columns. For example,

CREATE TABLE ORDERS (ID BIGINT PRIMARY KEY, USER_ID BIGINT, ...
--vs
CREATE STREAM ORDER_UPDATES (ID BIGINT KEY, USER_ID BIGINT, ...

Note: this change only introduces the PRIMARY KEY syntax. It does not change how data is processed by ksqlDB.

This change in syntax differentiates the key handling semantics for tables vs streams:

A ksqlDB TABLE works much like tables in other SQL systems: Each row is identified by its PRIMARY KEY. PRIMARY KEY column(s) can not be NULL.
A message in the underlying Kafka topic with the same key as an existing row will replaces the earlier row in the table,
or deletes the row if the message's value is NULL, as long as the earlier row does not have a later timestamp / ROWTIME.

A ksqlDB STREAM is a stream of facts. Each fact is immutable and is unique. A stream can store its data in either KEY or VALUE columns. Columns stored in the key of the Kafka message are not PRIMARY KEY columns. They are just non-primary-key columns that happen to be stored in the key of the Kafka message.
Both KEY and VALUE columns can be NULL. No special processing is done if two rows have the same key.

The table below contrasts key handling for streams and tables:

	STREAM	TABLE
Key column type	`KEY`	`PRIMARY KEY`
NON NULL key constraint	No	Yes Messages in the Kafka topic with a NULL `PRIMARY KEY` are ignored
Unique key constraint	No	Yes Messages with the same key as another have no special meaning : Later messages with the same key replace earlier
Tombstones	No Messages with NULL values are ignored	Yes NULL message values are treated as a tombstone Any existing row with a matching key is deleted

Testing done

usual

Reviewing notes:

Commits broken down into:

First commit: Prod code changes and associated tests.
Second commit: doc updates
Third commit: Updates to other tests, i.e. adding the PRIMARY to other tests that use tables.
Fourth commit: New historical query plans (The change is just the addition of the PRIMARY key word in the SQL).
Fifth commit: doc update to fix table layout. Turns out github does support multi-line tables :(

Reviewer checklist

Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
Ensure relevant issues are linked (description should include text like "Fixes #")

fixes: confluentinc#3681 The commit introduces `PRIMARY KEY` columns into the ksqlDB syntax for tables. Streams will continue to have `KEY` columns. For example, ```sql CREATE TABLE ORDERS (ID BIGINT PRIMARY KEY, USER_ID BIGINT, ... --vs CREATE STREAM ORDER_UPDATES (ID BIGINT KEY, USER_ID BIGINT, ... ``` This change in syntax differentiates the key handling semantics for tables vs streams: A ksqlDB TABLE works much like tables in other SQL systems. Each row is identified by its `PRIMARY KEY`. `PRIMARY KEY` values can not be NULL. A message in the underlying Kafka topic with the same key as an existing row will _replace_ the earlier row in the table, or _delete_ the row if the message's value is NULL, as long as the earlier row does not have a later timestamp / `ROWTIME`. A ksqlDB STREAM is a stream of _facts_. Each _fact_ is immutable and is unique. A stream can store its data in either `KEY` or `VALUE` columns. Both `KEY` and `VALUE` columns can be NULL. No special processing is done if two rows have the same key. The table below contrasts key handling for streams and tables: | | STREAM | TABLE | | ------------------------ | --------------------------------------------------------------| ----------------------------------------------------------------- | | Key column type | `KEY` | `PRIMARY KEY` | | NON NULL key constraint | No | Yes | : : : Messages in the Kafka topic with a NULL `PRIMARY KEY` are ignored : | Unique key constraint | No | Yes | : : Messages with the same key as another have no special meaning : Later messages with the same key _replace_ earlier : | Tombstones | No | Yes | : : Messages with NULL values are ignored : NULL message values are treated as a _tombstone_ : : : : Any existing row with a matching key is deleted : | ------------------------ | --------------------------------------------------------------| ----------------------------------------------------------------- |

docs-md/developer-guide/ksqldb-reference/create-stream.md

docs-md/developer-guide/ksqldb-reference/create-table.md

JimGalasyn

LGTM, with some suggestions. Thanks for the great doc updates!

rodesai

The code change LGTM. Is this discussed in a KLIP somewhere?

rodesai · 2020-04-06T07:02:24Z

docs-md/developer-guide/ksqldb-reference/create-stream.md

+|                          |  STREAM                                                       | TABLE                                                             |
+| ------------------------ | --------------------------------------------------------------| ----------------------------------------------------------------- |
+| Key column type          | `KEY`                                                         | `PRIMARY KEY`                                                     |
+| NON NULL key constraint  | No                                                            | Yes <br> Messages in the Kafka topic with a NULL `PRIMARY KEY` are ignored |


This is true for now (since we only support unwrapped primitive keys), but it won't be true in the future if we support single-element wrapped keys right?

In a strict SQL sense this should remain true for wrapped and multi-column primary keys. Primary key columns in SQL must be NON NULL.

The main reason for this is that an SQL NULL is not comparable to another SQL NULL, i.e. NULL = NULL returns false. So a NULL value in a primary key could never be matched.

big-andy-coates · 2020-04-06T13:22:42Z

@rodesai

Is this discussed in a KLIP somewhere?

I'm not aware it's been covered explicitly in a KLIP. However, I have discussed with Product about this change.

cc @MichaelDrogalis @derekjn

UPDATE: KLIP available here: #5008

agavra

LGTM, pending #5008

ksqldb-parser/src/main/java/io/confluent/ksql/parser/tree/CreateStream.java

ksqldb-parser/src/main/java/io/confluent/ksql/parser/tree/CreateTable.java

big-andy-coates added 4 commits April 3, 2020 12:29

docs: doc updates for PRIMITIVE KEYS

81d7b49

test: test updates for primitive keys

a9f9abb

chore: updated historical query plans

c740468

big-andy-coates requested review from JimGalasyn and a team as code owners April 3, 2020 11:36

docs: fix table layout

ce5e8f0

big-andy-coates changed the title ~~Primitive keys~~ Primary keys Apr 3, 2020