proposed execution plan schema #3969

rodesai · 2019-11-25T04:42:07Z

Description

This patch isn't meant to be merged. It's a place to review/finalize the proposed schema for
the KSQL physical plan.

Join example: https://gist.github.com/rodesai/19ff1591a5c0d17cccd2936da4d7704d
Agg example: https://gist.github.com/rodesai/3a032b7ec8388eac230de4fcbb6ecde2
Project example: https://gist.github.com/rodesai/806613157f115f90ac567d31cc11f162

ksql-rest-app/src/test/resources/ksql-plan-schema/schema-final.json

big-andy-coates

Thanks @rodesai

This is looking good IMHO. I've flagged a lot of stuff in the review. With some of the things I've flagged we may choose that fixing them in our schema is not worth the work. However, I think we should at least discussion/consider each.

I also think some of these points may highlight the benefits of splitting our serialized from from our implementation. Splitting is a pain up front, but:

gives us a more clearly defined delimitation of the model / impl
makes us more flexible in how we map between the two
allows the model to be more exact representation of what we support, while allowing the impl to be more flexible. A good example would be the current JoinType enum is the super set. The schema for the stream-table joins step should be a subset if we're being accurate.

Of course, this comes at a cost, and that's what we need to think-about/discuss. Maybe its unnecessary. I'd like to hear others thoughts on this.

Have you thought about splitting this schema file up into smaller, per-type, files? I think this will help with merging / management / reviewing down the line. (Ref: https://stackoverflow.com/questions/18376215/jsonschema-split-one-big-schema-file-into-multiple-logical-smaller-files). It might be nice to have a file structure such as:

 resources/json/schema
         |---- Ksql.json
         |---- KsqlPlanV1.json
         |---- /query
         |            |-----QueryPlanV1.json
         |            |-----/steps
         |                       |----- StreamGroupByV1.json
         |                       |----- ... other step types
         |
         |---- /cmd
                      |---- CreateTableV1.json
                      |---- ... other command types

Also, can you link to some gists with example plans again please?

We've some related types in the schema, e.g. StreamSource, WindowedStreamSource, TableSource, etc. It might be useful to make use of a base Source type in the schema and have these types extend that one type. (https://json-schema.org/understanding-json-schema/structuring.html#extending)

ksql-rest-app/src/test/resources/ksql-plan-schema/schema-final.json

purplefox · 2019-11-26T12:01:26Z

I think what I am missing from this discussion is a clear and unambiguous explanation of why we want to version this stuff. I think the assumption seems to be that we must do this, and all the discussion is about the implementation of a solution. Now, it may well be that there is no alternative but to do this, but I am not convinced yet, and maybe that's just because I wasn't around when the original discussions about this happened.
@rodesai would it possible to put together a short doc, maybe just a few paragraphs, not looking for anything formal like a one pager, just explaining the motivation for this, and including some concrete examples? (E.g. involving topic partitioning and whatnot). I think this would also help others coming to this anew.

agavra · 2019-11-26T16:18:06Z

@purplefox https://github.com/confluentinc/ksql/blob/master/design-proposals/klip-6-execution-plans.md may help

agavra

I'm not so familiar with the individual steps that I can give feedback on whether or not every single field should/needs to be there, but this seems sound to me (I think it's also easier to review than looking at Java code to define our spec).

My only concern is with trying to use JSON for this (see #3969 (comment)), I think it isn't really well equipped to handle schema evolution.

ksql-rest-app/src/test/resources/ksql-plan-schema/schema-final.json

agavra · 2019-11-26T16:46:32Z

ksql-rest-app/src/test/resources/ksql-plan-schema/schema-final.json

+      "title" : "streamAggregateV1",
+      "required" : [ "@type", "properties", "source", "formats", "nonFuncColumnCount", "aggregations" ]
+    },
+    "DefaultExecutionStepProperties" : {


when we have finalized the schemas, it'd be nice to add docs to all of the objects to understand what they are used for (and setup good practice for people extending the schema). For example, what does this one represent (why is it "default" - that seems like an implementation thing, not a schema thing)? (And what is "queryContext"?)

+1

Is there a way to get such comments into the schema - that would be awesome.

purplefox · 2019-11-26T19:44:46Z

@purplefox https://github.com/confluentinc/ksql/blob/master/design-proposals/klip-6-execution-plans.md may help

Hey @agavra I've read that before and the only example there is the case of changing a key from a concatenated string to a struct. It seems to me that using concatenated string for a key is a design smell and we should change this to a struct anyway (I think that is already planned?). If the key is a generic struct it's unlikely we'd ever want to change it to anything else in the future?

rodesai · 2019-11-27T04:37:54Z

I also think some of these points may highlight the benefits of splitting our serialized from from our implementation. Splitting is a pain up front, but:

I agree that it makes sense to keep the serialized model logically decoupled from KSQL's implementation (even if it was originally derived from the implementation =P). What's proposed here is intended to be that. You pointed out a bunch of problems inline, but those things will be cleaned up.

I also think having a decoupled model is orthogonal to whether or not we serialize from jackson-annotated types. You can have a decoupled or tightly-coupled model that uses jackson-annotated types or generated avro or protobuf.

purplefox · 2019-12-06T14:31:56Z

I'm going to say my final words on the approach taken here, as I don't want to hold up the work.

I continue to have serious concerns about the general approach taken here, as discussed in other places, we're effectively defining a very broad protocol that is strongly coupled to our internal implementation, and I worry that is going to reduce our ability to evolve the project effectively going ahead. I suspect this will trip up us in the future, but maybe you will prove me wrong. We'll see ;)

Even though I don't agree with the approach taken, I appreciate that others want to press ahead with it, so I will respect that, and I won't block anything related to this work.

big-andy-coates · 2019-12-16T14:31:09Z

I'm going to say my final words on the approach taken here, as I don't want to hold up the work.

I continue to have serious concerns about the general approach taken here, as discussed in other places, we're effectively defining a very broad protocol that is strongly coupled to our internal implementation, and I worry that is going to reduce our ability to evolve the project effectively going ahead. I suspect this will trip up us in the future, but maybe you will prove me wrong. We'll see ;)

Even though I don't agree with the approach taken, I appreciate that others want to press ahead with it, so I will respect that, and I won't block anything related to this work.

Hey @purplefox, what do you propose as an alternative?

rodesai · 2020-01-06T00:22:56Z

I'm closing this out. We should be able to enable the feature with the current schema. There's a few issues in review that have not been addressed yet, but can be fixed even after plans are turned on. I've filed github issues for those and linked to them inline.

feat: proposed final schema

04ddec5

rodesai requested a review from a team as a code owner November 25, 2019 04:42

rodesai commented Nov 25, 2019

View reviewed changes

ksql-rest-app/src/test/resources/ksql-plan-schema/schema-final.json Show resolved Hide resolved

This was referenced Nov 25, 2019

refactor: clean up format pojos in serialized model #3970

Merged

refactor: clean up timestamp extraction policy from physical plan #3936

Merged

big-andy-coates reviewed Nov 26, 2019

View reviewed changes

big-andy-coates requested review from purplefox and agavra November 26, 2019 11:30

agavra reviewed Nov 26, 2019

View reviewed changes

latest version

50d9942

rodesai closed this Jan 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposed execution plan schema #3969

proposed execution plan schema #3969

rodesai commented Nov 25, 2019 •

edited

Loading

big-andy-coates left a comment

purplefox commented Nov 26, 2019

agavra commented Nov 26, 2019

agavra left a comment •

edited

Loading

agavra Nov 26, 2019

big-andy-coates Nov 28, 2019

purplefox commented Nov 26, 2019

rodesai commented Nov 27, 2019 •

edited

Loading

purplefox commented Dec 6, 2019

big-andy-coates commented Dec 16, 2019

rodesai commented Jan 6, 2020

proposed execution plan schema #3969

proposed execution plan schema #3969

Conversation

rodesai commented Nov 25, 2019 • edited Loading

Description

big-andy-coates left a comment

Choose a reason for hiding this comment

purplefox commented Nov 26, 2019

agavra commented Nov 26, 2019

agavra left a comment • edited Loading

Choose a reason for hiding this comment

agavra Nov 26, 2019

Choose a reason for hiding this comment

big-andy-coates Nov 28, 2019

Choose a reason for hiding this comment

purplefox commented Nov 26, 2019

rodesai commented Nov 27, 2019 • edited Loading

purplefox commented Dec 6, 2019

big-andy-coates commented Dec 16, 2019

rodesai commented Jan 6, 2020

rodesai commented Nov 25, 2019 •

edited

Loading

agavra left a comment •

edited

Loading

rodesai commented Nov 27, 2019 •

edited

Loading