Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit keys #5533

Merged
merged 7 commits into from
Jun 3, 2020
Merged

Conversation

big-andy-coates
Copy link
Contributor

Description

implements: KLIP-29

fixes: #5303
fixes: #4678

This change sees ksqlDB no longer adding an implicit ROWKEY STRING key column to created streams or primary key column to created tables when no key column is explicitly provided in the CREATE statement.

BREAKING CHANGE

CREATE TABLE statements will now fail if not PRIMARY KEY column is provided.

For example, a statement such as:

CREATE TABLE FOO (name STRING) WITH (kafka_topic='foo', value_format='json');

Will need to be updated to include the definition of the PRIMARY KEY, e.g.

CREATE TABLE FOO (ID STRING PRIMARY KEY, name STRING) WITH (kafka_topic='foo', value_format='json');

If using schema inference, i.e. loading the value columns of the topic from the Schema Registry, the primary key can be provided as a partial schema, e.g.

-- FOO will have value columns loaded from the Schema Registry
CREATE TABLE FOO (ID INT PRIMARY KEY) WITH (kafka_topic='foo', value_format='avro');

CREATE STREAM statements that do not define a KEY column will no longer have an implicit ROWKEY key column.

For example:

CREATE STREAM BAR (NAME STRING) WITH (...);

The above statement would previously have resulted in a stream with two columns: ROWKEY STRING KEY and NAME STRING.
With this change the above statement will result in a stream with only the NAME STRING column.

Streams will no KEY column will be serialized to Kafka topics with a null key.

Testing done

usual

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

implements: [KLIP-29](confluentinc#5530)

fixes: confluentinc#5303
fixes: confluentinc#4678

This change sees ksqlDB no longer adding an implicit `ROWKEY STRING` key column to created streams or primary key column to created tables when no key column is explicitly provided in the `CREATE` statement.

BREAKING CHANGE

`CREATE TABLE` statements will now fail if not `PRIMARY KEY` column is provided.

For example, a statement such as:

```sql
CREATE TABLE FOO (name STRING) WITH (kafka_topic='foo', value_format='json');
```

Will need to be updated to include the definition of the PRIMARY KEY, e.g.

```sql
CREATE TABLE FOO (ID STRING PRIMARY KEY, name STRING) WITH (kafka_topic='foo', value_format='json');
```

If using schema inference, i.e. loading the value columns of the topic from the Schema Registry, the primary key can be provided as a partial schema, e.g.

```sql
-- FOO will have value columns loaded from the Schema Registry
CREATE TABLE FOO (ID INT PRIMARY KEY) WITH (kafka_topic='foo', value_format='avro');
```

`CREATE STREAM` statements that do not define a `KEY` column will no longer have an implicit `ROWKEY` key column.

For example:

```sql
CREATE STREAM BAR (NAME STRING) WITH (...);
```

The above statement would previously have resulted in a stream with two columns: `ROWKEY STRING KEY` and `NAME STRING`.
With this change the above statement will result in a stream with only the `NAME STRING` column.

Streams will no KEY column will be serialized to Kafka topics with a `null` key.
@big-andy-coates big-andy-coates requested review from JimGalasyn and a team as code owners June 3, 2020 00:42
Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, I have some minor testing comments - though I'm a little nervous putting this in last minute to a release without extensive testing, I have confidence in our QTTs 😅

@@ -1966,6 +1966,26 @@
"outputs": [
{"topic": "OUTPUT", "key": "user_0", "value": {"IMPRESSION_ID": 24, "URL": "urlA"}, "timestamp": 12}
]
},
{
"name": "streams with no key columns",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont want to be too picky, but let's also add a stream no-key --> table join

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@big-andy-coates big-andy-coates merged commit d0db0cf into confluentinc:master Jun 3, 2020
@big-andy-coates big-andy-coates deleted the explicit_keys branch June 3, 2020 09:43
stevenpyzhang pushed a commit that referenced this pull request Jun 5, 2020
* feat: explicit keys

implements: [KLIP-29](#5530)

fixes: #5303
fixes: #4678

This change sees ksqlDB no longer adding an implicit `ROWKEY STRING` key column to created streams or primary key column to created tables when no key column is explicitly provided in the `CREATE` statement.

BREAKING CHANGE

`CREATE TABLE` statements will now fail if not `PRIMARY KEY` column is provided.

For example, a statement such as:

```sql
CREATE TABLE FOO (name STRING) WITH (kafka_topic='foo', value_format='json');
```

Will need to be updated to include the definition of the PRIMARY KEY, e.g.

```sql
CREATE TABLE FOO (ID STRING PRIMARY KEY, name STRING) WITH (kafka_topic='foo', value_format='json');
```

If using schema inference, i.e. loading the value columns of the topic from the Schema Registry, the primary key can be provided as a partial schema, e.g.

```sql
-- FOO will have value columns loaded from the Schema Registry
CREATE TABLE FOO (ID INT PRIMARY KEY) WITH (kafka_topic='foo', value_format='avro');
```

`CREATE STREAM` statements that do not define a `KEY` column will no longer have an implicit `ROWKEY` key column.

For example:

```sql
CREATE STREAM BAR (NAME STRING) WITH (...);
```

The above statement would previously have resulted in a stream with two columns: `ROWKEY STRING KEY` and `NAME STRING`.
With this change the above statement will result in a stream with only the `NAME STRING` column.

Streams will no KEY column will be serialized to Kafka topics with a `null` key.

Co-authored-by: Andy Coates <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Persistent queries on tables should require key columns Support STREAMs without key column defined
2 participants