-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Primary keys #4986
Primary keys #4986
Conversation
fixes: confluentinc#3681 The commit introduces `PRIMARY KEY` columns into the ksqlDB syntax for tables. Streams will continue to have `KEY` columns. For example, ```sql CREATE TABLE ORDERS (ID BIGINT PRIMARY KEY, USER_ID BIGINT, ... --vs CREATE STREAM ORDER_UPDATES (ID BIGINT KEY, USER_ID BIGINT, ... ``` This change in syntax differentiates the key handling semantics for tables vs streams: A ksqlDB TABLE works much like tables in other SQL systems. Each row is identified by its `PRIMARY KEY`. `PRIMARY KEY` values can not be NULL. A message in the underlying Kafka topic with the same key as an existing row will _replace_ the earlier row in the table, or _delete_ the row if the message's value is NULL, as long as the earlier row does not have a later timestamp / `ROWTIME`. A ksqlDB STREAM is a stream of _facts_. Each _fact_ is immutable and is unique. A stream can store its data in either `KEY` or `VALUE` columns. Both `KEY` and `VALUE` columns can be NULL. No special processing is done if two rows have the same key. The table below contrasts key handling for streams and tables: | | STREAM | TABLE | | ------------------------ | --------------------------------------------------------------| ----------------------------------------------------------------- | | Key column type | `KEY` | `PRIMARY KEY` | | NON NULL key constraint | No | Yes | : : : Messages in the Kafka topic with a NULL `PRIMARY KEY` are ignored : | Unique key constraint | No | Yes | : : Messages with the same key as another have no special meaning : Later messages with the same key _replace_ earlier : | Tombstones | No | Yes | : : Messages with NULL values are ignored : NULL message values are treated as a _tombstone_ : : : : Any existing row with a matching key is deleted : | ------------------------ | --------------------------------------------------------------| ----------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with some suggestions. Thanks for the great doc updates!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code change LGTM. Is this discussed in a KLIP somewhere?
| | STREAM | TABLE | | ||
| ------------------------ | --------------------------------------------------------------| ----------------------------------------------------------------- | | ||
| Key column type | `KEY` | `PRIMARY KEY` | | ||
| NON NULL key constraint | No | Yes <br> Messages in the Kafka topic with a NULL `PRIMARY KEY` are ignored | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true for now (since we only support unwrapped primitive keys), but it won't be true in the future if we support single-element wrapped keys right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a strict SQL sense this should remain true for wrapped and multi-column primary keys. Primary key columns in SQL must be NON NULL.
The main reason for this is that an SQL NULL is not comparable to another SQL NULL, i.e. NULL = NULL
returns false
. So a NULL
value in a primary key could never be matched.
I'm not aware it's been covered explicitly in a KLIP. However, I have discussed with Product about this change. UPDATE: KLIP available here: #5008 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending #5008
Description
KLIP for this (which needs approving first): #5008
fixes: #3681
The commit introduces
PRIMARY KEY
columns into the ksqlDB syntax for tables. Streams will continue to haveKEY
columns. For example,Note: this change only introduces the
PRIMARY KEY
syntax. It does not change how data is processed by ksqlDB.This change in syntax differentiates the key handling semantics for tables vs streams:
A ksqlDB TABLE works much like tables in other SQL systems: Each row is identified by its
PRIMARY KEY
.PRIMARY KEY
column(s) can not be NULL.A message in the underlying Kafka topic with the same key as an existing row will replaces the earlier row in the table,
or deletes the row if the message's value is NULL, as long as the earlier row does not have a later timestamp /
ROWTIME
.A ksqlDB STREAM is a stream of facts. Each fact is immutable and is unique. A stream can store its data in either
KEY
orVALUE
columns. Columns stored in the key of the Kafka message are notPRIMARY KEY
columns. They are just non-primary-key columns that happen to be stored in the key of the Kafka message.Both
KEY
andVALUE
columns can be NULL. No special processing is done if two rows have the same key.The table below contrasts key handling for streams and tables:
KEY
PRIMARY KEY
Messages in the Kafka topic with a NULL
PRIMARY KEY
are ignoredMessages with the same key as another have no special meaning : Later messages with the same key replace earlier
Messages with NULL values are ignored
NULL message values are treated as a tombstone
Any existing row with a matching key is deleted
Testing done
usual
Reviewing notes:
Commits broken down into:
PRIMARY
to other tests that use tables.PRIMARY
key word in the SQL).Reviewer checklist