Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: drop legacy key field functionality #3764

Merged

Conversation

big-andy-coates
Copy link
Contributor

@big-andy-coates big-andy-coates commented Nov 5, 2019

Description

BREAKING CHANGE

Fixes ##3350

This drops functionality that maintained a legacy version of the key field.
This legacy version was often wrong and resulted in unnecessary repartition steps being performed.
The functionality was previously required to ensure existing legacy queries continued to have the unnecessary repartitions to ensure no data loss.

With a major version bump this can now be dropped.

Reviewing notes:

  1. Start with the changes to KeyField as this is the crux of the change. The legacy field is no more! Most of the rest of the code changes is removing the legacy field.
  2. JoinNode has received a little bit of love to tidy things up. The correct keyField is now constructed in the constructor and used for all joins later on.
  3. KsqlStructuredDataOutputNode has been simplified with the removal of SchemaKStream.withKeyField.

Testing done

Usual.

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

BREAKING CHANGE

This drops functionality that maintained a legacy version of the key field.
This legacy version was often wrong and resulted in unnecessary repartition steps being performed.
The functionality was previously required to ensure existing legacy queries continued to have the unnecessary repartitions to ensure no data loss.

With a major version bump this can now be dropped.
@big-andy-coates big-andy-coates requested a review from a team as a code owner November 5, 2019 23:05
@@ -281,12 +283,6 @@ Joiner getJoiner(final DataSourceType leftType,
final ColumnRef joinFieldName,
final Stacker contextStacker
) {
final LogicalSchema schema = stream.getSchema();

schema.findValueColumn(joinFieldName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this test as its duplicates one inside selectKey.

* @param leftKeyField the key field of the left source.
* @return the key field that should be used by the resultant joined stream.
*/
static KeyField getJoinedKeyField(final SourceName leftAlias, final KeyField leftKeyField) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer needed as the correct keyfield is built in the constructor.

* @param leftKeyField the key field of the left source.
* @return the key field that should be used by the resultant joined stream.
*/
static KeyField getOuterJoinedKeyField(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

getKeyField().legacy()
);

final SchemaKStream result = schemaKStream.withKeyField(resultKeyField);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withKeyField has been removed as its no longer needed. The only reason it was needed was to set the legacy (incorrect) keyField. With this PR this all goes away.

return doFindKeyColumn(selectExpressions, keyColumn).map(Column::ref);
}

private Optional<LegacyField> findLegacyKeyField(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been removed and everything else inlined into findKeyField

@@ -119,7 +119,6 @@
private static final int MAP_INDEX1 = 11;
private static final int MAP_INDEX2 = 12;
private static final int STRUCT_INDEX = 15;
private static final int DECIMAL_INDEX = 16;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed as unused

@@ -433,150 +433,6 @@ public void shouldHandleRightTableJoiningOnRowKey() {
// Then: did not throw.
}

@Test
public void shouldRepartitionLeftStreamIfNotCorrectKey_Legacy() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all testing the legacy code path - so removed.

@@ -142,46 +127,29 @@
Collections.emptyList()
);

private final QueryId queryId = new QueryId("source-test");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were all unused, so removed them

@@ -412,25 +378,6 @@ public void shouldBuildTableWithCorrectContext() {
equalTo(ImmutableList.of("0", "reduce")));
}


@SuppressWarnings("unchecked")
private DataSourceNode nodeWithMockTableSource() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

@@ -132,4 +136,32 @@ private static ConnectSchema makeTopLevelStructNoneOptional(final Schema schema)
schema.fields().forEach(field -> builder.field(field.name(), field.schema()));
return (ConnectSchema) builder.build();
}

public static class Deserializer extends JsonDeserializer<SourceNode> {
Copy link
Contributor Author

@big-andy-coates big-andy-coates Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the job of this deserializer is basically to distinguish between a source with no key field criteria specified, meaning the key field can be anything vs one where the key field is explicitly set to null, meaning the source should have no key field set.

I'm not aware of a better way to do this with Jackson...

* It is deliberately excluded from the test suite
*/
@Ignore
public final class ManualTopologyFileGeneratorTest {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rolled this into TopologyFileGenerator ...

@@ -51,20 +50,11 @@ Topologies:
--> KSTREAM-MAPVALUES-0000000002
<-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-MAPVALUES-0000000002 (stores: [])
--> KSTREAM-FILTER-0000000003
--> KSTREAM-MAPVALUES-0000000003
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this topology has changed because the test was explicitly setting ksql.query.fields.key.legacy to true. With this setting now gone the unnecessary repartition is avoided.

"valueSchema": "STRUCT<FOO INT, KSQL_COL_1 BIGINT>"
}
]
}
},
{
"name": "stream | initially null | group by (-) | key in value | no aliasing | legacy downstream query",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed these legacy downstream queries as the property they use no longer exists

big-andy-coates added a commit to big-andy-coates/ksql that referenced this pull request Nov 5, 2019
Copy link
Contributor

@rodesai rodesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left one item inline.

@big-andy-coates big-andy-coates merged commit 5369dc2 into confluentinc:master Nov 6, 2019
@big-andy-coates big-andy-coates deleted the legacy_key_field branch November 6, 2019 08:44
big-andy-coates added a commit to big-andy-coates/ksql that referenced this pull request Nov 6, 2019
big-andy-coates added a commit to big-andy-coates/ksql that referenced this pull request Nov 6, 2019
With legacy key fields now gone, (confluentinc#3764), we can now drop the final use of `Column.matches`.
big-andy-coates added a commit that referenced this pull request Nov 6, 2019
With legacy key fields now gone, (#3764), we can now drop the final use of `Column.matches`.
big-andy-coates added a commit that referenced this pull request Nov 11, 2019
big-andy-coates added a commit that referenced this pull request Nov 11, 2019
* feat: drop legacy key field functionality

BREAKING CHANGE

This drops functionality that maintained a legacy version of the key field.
This legacy version was often wrong and resulted in unnecessary repartition steps being performed.
The functionality was previously required to ensure existing legacy queries continued to have the unnecessary repartitions to ensure no data loss.

With a major version bump this can now be dropped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants