Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up query plan schema #4002

Merged
merged 3 commits into from
Dec 5, 2019

Conversation

rodesai
Copy link
Contributor

@rodesai rodesai commented Nov 30, 2019

Description

The goal here is to clean up the query plan schema as part of the work to
get to the plan schema proposed in #3969

Firstly, we flatten PhysicalPlan into QueryPlan in the plan schema. QueryPlan now
stores the properties it needs from PhysicalPlan, rather than holding a reference to
PhysicalPlan. This gives us a flatter, simpler, plan schema

This patch also cleans up the summary form the serialized plan by building it from
the execution plan, rather than by SchemaKStream. This is better because it gives
us a more accurate summary (previously we would skip some steps), simplifies
SchemaKStream, and simplifies the schema of the serialized plan.

The summary is built by a new class called PlanSummary, which walks the execution
plan and builds up the summary string. It uses a new interface called StepSchemaResolver
to determine the schema computed by a given step given its input schemas.

@rodesai rodesai requested a review from a team as a code owner November 30, 2019 22:16
@rodesai rodesai force-pushed the clean-up-query-plan-pojo branch from f2ffc90 to 9d23d52 Compare November 30, 2019 22:23
@agavra agavra self-assigned this Dec 3, 2019
Copy link
Contributor

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM. The first part (about flattening) makes total sense to me.

This patch also cleans up the summary form the serialized plan by building it from
the execution plan, rather than by SchemaKStream. This is better because it gives
us a more accurate summary (previously we would skip some steps), simplifies
SchemaKStream, and simplifies the schema of the serialized plan.

This part introduces some coupling that seems a little fragile - I think it's an improvement (so giving the +1) but it would be nice to have some way of enforcing tighter coupling. See my inline #4002 (comment).

import java.util.Objects;
import java.util.stream.Collectors;

public class PlanSummary {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add javadocs to these new classes? specifically, it would also be nice to document any compatibility requirements (is it okay to change a summary? is it used by code other than tests?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's returned in the API as a string. We should change this to a structured object (I guess now is a good time), but as such we don't make any guarantees about its contents.

Comment on lines 88 to 89
Objects.requireNonNull(config, "config"),
Objects.requireNonNull(metaStore, "metaStore")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary null checks (the StepSchemaResolver does this in the constructor)

import java.util.Optional;

/**
* An interface for computing the schema produced by an execution step, given the schema(s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of date comment - no longer an interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant an api not a literal interface. I'll rephrase.

*/
public final class StepSchemaResolver {
private static final HandlerMaps.ClassHandlerMapR2
<ExecutionStep, StepSchemaResolver, LogicalSchema, LogicalSchema> HANDLERS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the original code, this mapping was tightly coupled with the SchemaKStream and now it's defined by this. The approach in this PR means that if we change how any of these steps generate the schema, we need to remember to change this class to (i.e. for structured keys, maybe we want a StreamGroupBy to change the key schema of the output - we'd need to update this class too).

Is there any way to make the coupling tighter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agavra I think this is fair. This class is just glue between the step types and code that computes the schema. We can make it tighter by having the actual schema computation code in the step builders. The supposed advantage of having it looser is that the schema computation is decoupled from the underlying implementation of the steps (e.g. kstreams). Maybe not worth it though. I'll probably change it in a follow up. To be clear, I think it still makes sense to have this class - just that it would work by proxying calls to the step builders.

@agavra agavra requested a review from a team December 3, 2019 22:39
@rodesai rodesai force-pushed the clean-up-query-plan-pojo branch from 9d23d52 to 48a6322 Compare December 4, 2019 07:46
This patch flattens PhysicalPlan into QueryPlan in the plan schema.
QueryPlan now stores the properties it needs from PhysicalPlan, rather
than holding a reference to PhysicalPlan. This gives us a flatter,
simpler, plan schema
This patch changes up how ksql builds the plan summary string that gets
returned by our API endpoint. The summary is now built from the execution
plan, rather than by SchemaKStream. This is better because it gives us a
more accurate plan (previously we would skip some steps), simplified
SchemaKStream, and simplifies the schema of the serialized plan.

This patch adds PlanSummary, which walks the execution plan and builds up
the summary string. It uses a new interface called StepSchemaResolver to
determine the schema computed by a given step given its input schemas.
@rodesai rodesai force-pushed the clean-up-query-plan-pojo branch from 48a6322 to 91cf868 Compare December 5, 2019 00:48
@rodesai rodesai merged commit 3e49bab into confluentinc:master Dec 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants