-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up query plan schema #4002
Conversation
f2ffc90
to
9d23d52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM. The first part (about flattening) makes total sense to me.
This patch also cleans up the summary form the serialized plan by building it from
the execution plan, rather than by SchemaKStream. This is better because it gives
us a more accurate summary (previously we would skip some steps), simplifies
SchemaKStream, and simplifies the schema of the serialized plan.
This part introduces some coupling that seems a little fragile - I think it's an improvement (so giving the +1) but it would be nice to have some way of enforcing tighter coupling. See my inline #4002 (comment).
import java.util.Objects; | ||
import java.util.stream.Collectors; | ||
|
||
public class PlanSummary { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add javadocs to these new classes? specifically, it would also be nice to document any compatibility requirements (is it okay to change a summary? is it used by code other than tests?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's returned in the API as a string. We should change this to a structured object (I guess now is a good time), but as such we don't make any guarantees about its contents.
Objects.requireNonNull(config, "config"), | ||
Objects.requireNonNull(metaStore, "metaStore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessary null checks (the StepSchemaResolver
does this in the constructor)
import java.util.Optional; | ||
|
||
/** | ||
* An interface for computing the schema produced by an execution step, given the schema(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of date comment - no longer an interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant an api not a literal interface. I'll rephrase.
*/ | ||
public final class StepSchemaResolver { | ||
private static final HandlerMaps.ClassHandlerMapR2 | ||
<ExecutionStep, StepSchemaResolver, LogicalSchema, LogicalSchema> HANDLERS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the original code, this mapping was tightly coupled with the SchemaKStream
and now it's defined by this. The approach in this PR means that if we change how any of these steps generate the schema, we need to remember to change this class to (i.e. for structured keys, maybe we want a StreamGroupBy
to change the key schema of the output - we'd need to update this class too).
Is there any way to make the coupling tighter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agavra I think this is fair. This class is just glue between the step types and code that computes the schema. We can make it tighter by having the actual schema computation code in the step builders. The supposed advantage of having it looser is that the schema computation is decoupled from the underlying implementation of the steps (e.g. kstreams). Maybe not worth it though. I'll probably change it in a follow up. To be clear, I think it still makes sense to have this class - just that it would work by proxying calls to the step builders.
9d23d52
to
48a6322
Compare
This patch flattens PhysicalPlan into QueryPlan in the plan schema. QueryPlan now stores the properties it needs from PhysicalPlan, rather than holding a reference to PhysicalPlan. This gives us a flatter, simpler, plan schema
This patch changes up how ksql builds the plan summary string that gets returned by our API endpoint. The summary is now built from the execution plan, rather than by SchemaKStream. This is better because it gives us a more accurate plan (previously we would skip some steps), simplified SchemaKStream, and simplifies the schema of the serialized plan. This patch adds PlanSummary, which walks the execution plan and builds up the summary string. It uses a new interface called StepSchemaResolver to determine the schema computed by a given step given its input schemas.
48a6322
to
91cf868
Compare
Description
The goal here is to clean up the query plan schema as part of the work to
get to the plan schema proposed in #3969
Firstly, we flatten PhysicalPlan into QueryPlan in the plan schema. QueryPlan now
stores the properties it needs from PhysicalPlan, rather than holding a reference to
PhysicalPlan. This gives us a flatter, simpler, plan schema
This patch also cleans up the summary form the serialized plan by building it from
the execution plan, rather than by SchemaKStream. This is better because it gives
us a more accurate summary (previously we would skip some steps), simplifies
SchemaKStream, and simplifies the schema of the serialized plan.
The summary is built by a new class called PlanSummary, which walks the execution
plan and builds up the summary string. It uses a new interface called StepSchemaResolver
to determine the schema computed by a given step given its input schemas.