-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposed execution plan schema #3969
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rodesai
This is looking good IMHO. I've flagged a lot of stuff in the review. With some of the things I've flagged we may choose that fixing them in our schema is not worth the work. However, I think we should at least discussion/consider each.
I also think some of these points may highlight the benefits of splitting our serialized from from our implementation. Splitting is a pain up front, but:
- gives us a more clearly defined delimitation of the model / impl
- makes us more flexible in how we map between the two
- allows the model to be more exact representation of what we support, while allowing the impl to be more flexible. A good example would be the current
JoinType
enum is the super set. The schema for the stream-table joins step should be a subset if we're being accurate.
Of course, this comes at a cost, and that's what we need to think-about/discuss. Maybe its unnecessary. I'd like to hear others thoughts on this.
Have you thought about splitting this schema file up into smaller, per-type, files? I think this will help with merging / management / reviewing down the line. (Ref: https://stackoverflow.com/questions/18376215/jsonschema-split-one-big-schema-file-into-multiple-logical-smaller-files). It might be nice to have a file structure such as:
resources/json/schema
|---- Ksql.json
|---- KsqlPlanV1.json
|---- /query
| |-----QueryPlanV1.json
| |-----/steps
| |----- StreamGroupByV1.json
| |----- ... other step types
|
|---- /cmd
|---- CreateTableV1.json
|---- ... other command types
Also, can you link to some gists with example plans again please?
We've some related types in the schema, e.g. StreamSource
, WindowedStreamSource
, TableSource
, etc. It might be useful to make use of a base Source
type in the schema and have these types extend that one type. (https://json-schema.org/understanding-json-schema/structuring.html#extending)
ksql-rest-app/src/test/resources/ksql-plan-schema/schema-final.json
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/test/resources/ksql-plan-schema/schema-final.json
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/test/resources/ksql-plan-schema/schema-final.json
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/test/resources/ksql-plan-schema/schema-final.json
Outdated
Show resolved
Hide resolved
ksql-rest-app/src/test/resources/ksql-plan-schema/schema-final.json
Outdated
Show resolved
Hide resolved
I think what I am missing from this discussion is a clear and unambiguous explanation of why we want to version this stuff. I think the assumption seems to be that we must do this, and all the discussion is about the implementation of a solution. Now, it may well be that there is no alternative but to do this, but I am not convinced yet, and maybe that's just because I wasn't around when the original discussions about this happened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so familiar with the individual steps that I can give feedback on whether or not every single field should/needs to be there, but this seems sound to me (I think it's also easier to review than looking at Java code to define our spec).
My only concern is with trying to use JSON for this (see #3969 (comment)), I think it isn't really well equipped to handle schema evolution.
"title" : "streamAggregateV1", | ||
"required" : [ "@type", "properties", "source", "formats", "nonFuncColumnCount", "aggregations" ] | ||
}, | ||
"DefaultExecutionStepProperties" : { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we have finalized the schemas, it'd be nice to add docs to all of the objects to understand what they are used for (and setup good practice for people extending the schema). For example, what does this one represent (why is it "default" - that seems like an implementation thing, not a schema thing)? (And what is "queryContext"?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Is there a way to get such comments into the schema - that would be awesome.
Hey @agavra I've read that before and the only example there is the case of changing a key from a concatenated string to a struct. It seems to me that using concatenated string for a key is a design smell and we should change this to a struct anyway (I think that is already planned?). If the key is a generic struct it's unlikely we'd ever want to change it to anything else in the future? |
I agree that it makes sense to keep the serialized model logically decoupled from KSQL's implementation (even if it was originally derived from the implementation =P). What's proposed here is intended to be that. You pointed out a bunch of problems inline, but those things will be cleaned up. I also think having a decoupled model is orthogonal to whether or not we serialize from jackson-annotated types. You can have a decoupled or tightly-coupled model that uses jackson-annotated types or generated avro or protobuf. |
I'm going to say my final words on the approach taken here, as I don't want to hold up the work. I continue to have serious concerns about the general approach taken here, as discussed in other places, we're effectively defining a very broad protocol that is strongly coupled to our internal implementation, and I worry that is going to reduce our ability to evolve the project effectively going ahead. I suspect this will trip up us in the future, but maybe you will prove me wrong. We'll see ;) Even though I don't agree with the approach taken, I appreciate that others want to press ahead with it, so I will respect that, and I won't block anything related to this work. |
Hey @purplefox, what do you propose as an alternative? |
I'm closing this out. We should be able to enable the feature with the current schema. There's a few issues in review that have not been addressed yet, but can be fixed even after plans are turned on. I've filed github issues for those and linked to them inline. |
Description
This patch isn't meant to be merged. It's a place to review/finalize the proposed schema for
the KSQL physical plan.
Join example: https://gist.github.com/rodesai/19ff1591a5c0d17cccd2936da4d7704d
Agg example: https://gist.github.com/rodesai/3a032b7ec8388eac230de4fcbb6ecde2
Project example: https://gist.github.com/rodesai/806613157f115f90ac567d31cc11f162