-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
execbuilder: plan directly to distsql specs #47473
Comments
@yuzefovich as we discussed during the meeting, it would be good if as part of this milestone you could also think about how to make it easier for other engineers/external contributors to add spec creation so that we can maybe distribute and parallelize this work. Maybe this needs to wait until the creation of a couple of the more difficult specs is done. |
I reran
This comes out to 1.14% improvement. |
50252: opt: change GeoLookupJoin to InvertedLookupJoin r=rytaft a=rytaft This commit converts the `GeoLookupJoin` operator in the optimizer into a more general operator, `InvertedLookupJoin`. This new operator maps directly to the `invertedJoiner` DistSQL processor. In the future, the `InvertedLookupJoin` operator can also be used for lookup joins on inverted JSON and array indexes in addition to geospatial indexes. This commit also adds a new structure, `geoDatumToInvertedExpr`, which implements the `DatumToInvertedExpr` interface for geospatial data types. This will enable the optimized plan to be easily converted to a DistSQL plan for execution by the `invertedJoiner`. Release note: None 50450: sql: add support for hash and merge joins in the new factory r=yuzefovich a=yuzefovich **sql: minor cleanup of joiner planning** `joinNode.mergeJoinOrdering` is now set to non-zero length by the optimizer only when we can use a merge join (meaning that number of equality columns is non-zero and equals the length of the ordering we have). This allows us to slightly simplify the setup up of the merge joiners. Additionally, this commit switching to using `[]exec.NodeColumnOrdinal` instead of `int` for equality columns in `joinPredicate` which allows us to remove one conversion step when planning hash joiners. Also we introduce a small helper that will be reused by the follow-up work. Release note: None **sql: add support for hash and merge joins in the new factory** This commit adds implementation of `ConstructHashJoin` and `ConstructMergeJoin` in the new factory by mostly refactoring and reusing already existing code in the physical planner. Notably, interleaved joins are not supported yet. Fixes: #50291. Addresses: #47473. Release note: None 50646: build: update instructions for updating dependencies r=jbowens a=otan Release note: None Co-authored-by: Rebecca Taft <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Oliver Tan <[email protected]>
50457: geo: coerce invalid geography coordinates into geometry r=otan a=Arun4rangan This commit resolves: #50219 Previously geometry did not convert out of range lat-lngs into correct range. This commit mimics the behavior found in PostGIS. It converts out of range lat-lngs into correct range. Release note (sql change): Geometry now coerce invalid geography coordinates into correct geometry. 50560: sql: add support for aggregations and values in the new factory r=yuzefovich a=yuzefovich **sql: remove some unnecessary things around aggregations** Release note: None **sql: add support for aggregations in the new factory** This commit implements `ConstructGroupBy` and `ConstructScalarGroupBy` methods in the new factory by populating the specs upfront and reusing DistSQLPlanner to do the actual planning. Fixes: #50290. Release note: None **sql: minor cleanup of planNodeToRowSource** This commit cleans up `planNodeToRowSource` to be properly closed (similar to other processors). Release note: None **sql: implement ConstructValues in the new factory** This commit adds implementation of `ConstructValues` in the new factory. In some cases, a valuesNode is the only way to "construct values" - when the node must be wrapped (see createPhysPlanForValuesNode for more details). In other cases, a valuesNode is used to construct the values processor spec. The latter usage is refactored to avoid valuesNode creation. This decision also prompts us to add a separate "side" list of `planNode`s that are part of the physical plan and need to be closed when the whole plan is closed. This is done by introducing a utility wrapper around `PhysicalPlan`. This approach was chosen after considering an alternative in which `planNodeToRowSource` adapter would be the one closing the `planNode` it is wrapping because `planNode.Close` contract currently prohibits the execution from closing any of the `planNode`s, and changined that would be quite invasive at this point. We might reevaluate this decision later, once we've made more progress on the distsql spec work. Addresses: #47473. Release note: None **logictest: add spec-planning configs to the default ones** We have now implemented noticeable chunk of methods in the new factory, so I think it makes to run all logic tests with the spec-planning configs by default. The only logic test that is currently skipped is `interleaved_join` because the new factory doesn't plan interleaved joins yet. Release note: None **sql: enhance unsupported error message in the new factory** Release note: None **sql: support simple projection on top of planNode in the new factory** The new factory still constructs some of the `planNode`s (for example, explain variants), and we should be able to handle simple projections on top of those. This is handled by reusing the logic from the old factory. The issue was hidden by the fallback to the old factory since EXPLAIN statements don't have "SELECT" statement tag. Release note: None **sql: implement ConstructLimit in the new factory** This commit additionally removes the requirement that `limitExpr` and `offsetExpr` must be distributable in order for the whole plan to be distributable in the old path because those expressions are evaluated during the physical planning, locally. Release note: None Co-authored-by: Arun Ranganathan <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
50909: sql: add sort and virtual scan support in the new factory r=yuzefovich a=yuzefovich **sql: implement ConstructSort in the new factory** Release note: None **sql: add support of virtual scans in the new factory** The support is added by sharing the same code with the old factory that construct `delayedNode` with the rest of necessary methods already being implemented in the new factory. That delayedNode is wrapped into the physical plan via a callback. Note that with the new factory it is possible to have a distributed plan that contains an execinfra.LocalProcessor. This required to make a change that distsql.LocalState, regardless of the plan distribution, has always LocalProcs set when setting up a flow on the gateway. Addresses: #47473. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>
51305: geo,geoindex: use bounding box for geography coverings that are bad r=sumeerbhola a=sumeerbhola This change uses a covererInterface implementation for geography that notices when a covering is using top-level cells of all faces and in that case uses the bounding box to compute the covering. Also changed the bounding box calculation for geography shapes to use only the points and not first construct s2.Regions. The latter causes marginally bad shapes to continue to have bad coverings since the bounding box also covers the whole earth. Release note: None 51882: roachpb: panic when comparing a Lease to a non-lease r=andreimatei a=andreimatei Release note: None 52146: sql: remove local execution of projectSetNode and implement ConstructProjectSet in the new factory r=yuzefovich a=yuzefovich Depends on #52108. **sql: remove local execution of projectSetNode** We have project set processor which is always planned for `projectSetNode`, so this commit removes the dead code of its local execution. Additionally, it removes some unused fields and cleans up cancellation check of the processor. Release note: None **sql: implement ConstructProjectSet in the new factory** Addresses: #47473. Release note: None 52320: kvserver: enable merges in kvnemesis r=aayushshah15 a=aayushshah15 We had merges disabled because of the bugs tracked in #44878, but those have since been fixed by #46085 and #50265. Release note: None Co-authored-by: sumeerbhola <[email protected]> Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Aayush Shah <[email protected]>
48842: sql: fix portals after exhausting rows r=yuzefovich a=yuzefovich Previously, we would erroneously restart the execution from the very beginning of empty, unclosed portals after they have been fully exhausted when we should be returning no rows or an error in such scenarios. This is now fixed by tracking whether a portal is exhausted or not and intercepting the calls to `execStmt` when the conn executor state machine is in an open state. Note that the current solution has known deviations from Postgres: - when attempting to execute portals of statements that don't return row sets, on the second and consequent attempt PG returns an error while we are silently doing nothing (meaning we do not run the statement at all and return 0 rows) - we incorrectly populate "command tag" field of pgwire messages of some rows-returning statements after the portal suspension (for example, a suspended UPDATE RETURNING in PG will return the total count of "rows affected" while we will return the count since the last suspension). These deviations are deemed acceptable since this commit fixes a much worse problem - re-executing an exhausted portal (which could be a mutation meaning, previously, we could have executed a mutation multiple times). The reasons for why this commit does not address these deviations are: - Postgres has a concept of "portal strategy" (see https://github.com/postgres/postgres/blob/2f9661311b83dc481fc19f6e3bda015392010a40/src/include/utils/portal.h#L89). - Postgres has a concept of "command" type (these are things like SELECTs, UPDATEs, INSERTs, etc, see https://github.com/postgres/postgres/blob/1aac32df89eb19949050f6f27c268122833ad036/src/include/nodes/nodes.h#L672). CRDB doesn't have these concepts, and without them any attempt to simulate Postgres results in a very error-prone and brittle code. Fixes: #48448. Release note (bug fix): Previously, CockroachDB would erroneously restart the execution of empty, unclosed portals after they have been fully exhausted, and this has been fixed. 52098: sql: better distribute distinct processors r=yuzefovich a=yuzefovich **sql: better distribute distinct processors** The distinct processors are planned in two stages - first, a "local" stage is planned on the same nodes as the previous stage, then, a "global" stage is added. Previously, the global stage would be planned on the gateway, and this commit changes that to make it distributed - by planning "global" distinct processors on the same nodes as the "local" ones and connecting them via a hash router hashing the distinct columns. Release note: None **sql: implement ConstructDistinct in the new factory** Addresses: #47473. Release note: None 52358: engine: set the MVCC timestamp on reads due to historical intents r=ajwerner a=ajwerner Before this commit, we'd return a zero-value MVCC timestamp when reading an intent from the intent history. This was problematic because elsewhere in the code we assume that we will always get a non-zero MVCC timestamp when a read returns a value. This is especially bizarre given that a read of the latest intent will return its write timestamp. The semantics here are such that we'll return the timestamp of the MVCC metadata for the row. I think this is the most reasonable thing to do as that timestamp ought to reflect the timestamp we return when we return the current intent and furthermore is the only timestamp we really have around. We could return the transactions current read or write timestamp but those both seem like worse choices. It's also worth noting that in the case where we need to read a non-zero value, we don't really care what that value is and the fact that we are reading this intent itself is somewhat suspect. That being said, I still think we should make this change in addition to any change made to prevent the code referenced in #49266 from needing it. Fixes #49266. Informs #50102. Release note: None 52384: sql: properly reset extraTxnState in COPY r=ajwerner a=ajwerner Apparently we support some sort of COPY protocol that I know nothing about. We allow operations in that protocol in the open state and in the noTxn state in the connExecutor. In the noTxn state we let the `copyMachine` handle its transaction lifecycle all on its own. In that case, we also hand have the `connExecutor` in a fresh state when calling `execCopyIn()`. During the execution of `COPY`, it seems like sometime we can pick up table descriptor leases. In the noTxn case we'd like to drop those leases and generally leave the executor in the fresh state in which it was handed to us. To deal with that, we call `resetExtraTxnState` before returning in the happy noTxn case. Fixes #52065. Release note (bug fix): Fixed a bug when using the COPY protocol which could prevent schema changes for up to 5 minutes. Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
52163: sql: implement ConstructOpaque in the new factory r=yuzefovich a=yuzefovich Addresses: #47473. Release note: None 52408: sql/catalog/lease: attempt to fix a flakey test r=lucy-zhang a=ajwerner I think this test was going to fail. Fixes #52385. Release note: None 52486: sqlbase,catalogkv: move more kv interactions to catalogkv r=ajwerner a=ajwerner At this point the only remaining kv interactions in sqlbase are during type and table interactions though there are also usages through the protoGetter. This is a minor change in the work to pick apart sqlbase. Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
Informs cockroachdb#47473 Release note: None
Informs cockroachdb#47473 Release note: None
Informs cockroachdb#47473 Release note: None
Informs cockroachdb#47473 Release note: None
Informs cockroachdb#47473 Release note: None
72653: sql: implement ConstructZigzagJoin in distsql_spec_exec_factory r=cucaroach a=cucaroach Informs #47473 Release note: None 72825: sql: handle no cluster_inflight_traces table on tenants r=adityamaru a=aliher1911 Previously query for traces under tenant will fail with NPE. This was happening because trace collector is not available and virtual table was not aware of the case. This patch adds handling of the table on tenant sql pods. Release note: None Fixes #72564 72871: backupccl: don't include tenants in non-cluster backups r=dt a=dt Release note (bug fix): System tenant backups of individual tables and databases no longer include tenants as well. 72887: roachprod: remove ClusterImpl interface r=RaduBerinde a=RaduBerinde The ClusterImpl interface was useful when roachprod had some (very limited) support for Cassandra, but that has been removed. We're left with an unnecessary indirection that lacks documentation. The code doesn't follow any layering, with `Cockroach` working directly with `SyncedCluster` internals and plenty of cockroach-specific code in `SyncedCluster`. This commit removes this interface and improves the documentation of the methods that are simplified. Release note: None 72894: kv: use version-less key in rditer.KeyRange r=nvanbenschoten a=nvanbenschoten This commit replaces the use of `storage.MVCCKey` bounds with `roachpb.Key` bounds in `rditer.KeyRange`. It addresses an existing TODO and helps clarify things in #72121 slightly, as that PR is going to add a new field to `MVCCKey`. 72895: storage: remove decodeMVCCKey r=nvanbenschoten a=nvanbenschoten The function was identical to `DecodeMVCCKey`, which is defined a few lines up in the file. 72902: storage: delete deprecated PutProto function r=nvanbenschoten a=nvanbenschoten This function has been deprecated since 5e5eaa5. It was only used in one test, so this commit removes it. 72907: build/builder: install awscli v2 r=rail a=erikgrinaker `roachprod` requires awscli v2, but the `builder` image included v1. Release note: None Co-authored-by: Tommy Reilly <[email protected]> Co-authored-by: Oleg Afanasyev <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Erik Grinaker <[email protected]>
Remaining big items/questions
planNode
s that are wrapped into the physical plans?sql: add support for aggregations and values in the new factory #50560 (review) goes into some details with the difficulties, but the TL;DR is that some
planNode
s do not have equivalent processor specs, so we need to have a way to "embed" them into the physical plans (this is done viawrapPlan
method which probably doesn't need any modifications) and clean up after them once the query is done (this is the part needs figuring out). With the old factory, that cleanup is done byplanTop.close
method which closes all trees in the query. However, with the new factory we don't have full trees - we might have a collection of randomplanNode
s that might not be connected between each other.planNode
s in the physical plan? sql: figure out how to handle subqueries by distSQLSpecExecFactory #51095We probably will rely mostly on existing logic tests but also we need to add a differential testing harness before we're comfortable with removing the old factory. This is tracked in sql: implement differential testing of physical plans produced by the old and new factories #50610.
Here is the list of all unimplemented methods that was added in #49348 that introduced the new factory (the ones that are checked either have been already merged or have a working work-in-progress commits). All methods are divided into two categories - one is methods in which we can construct the spec directly because there is a corresponding processor core, and another are
planNode
s that don't have an equivalent processor and might need to be wrapped (it is possible that the second category might be actually reduced to just a handful of items that require figuring out the necessary plumbing for handling the wrappedplanNode
s and all of the methods below will "just work" 🤞 ).Can construct processor specs directly:
Need to have wrapped
planNode
s:Miscellaneous items:
Background info
Currently, the last stage in the optimizer is to use the
execbuilder.Builder
to construct aplanNode
tree from amemo
expression tree. ThisplanNode
tree is subsequently converted to a collection ofProcessorSpecs
in the distsql physical planner:cockroach/pkg/sql/distsql_physical_planner.go
Line 2293 in 1e9f570
We need to get rid of this redundant conversion and create
ProcessorSpecs
directly.For 20.2, our focus should be on adding an off-by-default option to remove this redundant planning phase for the
kv --read-percent=100
workload. I envision this as creating a new implementation of theexec.Factory
implementation that is swapped in when a cluster setting is set.Supporting
kv --read-percent=100
is mostly a question of implementingConstructScan
and nothing else. However, this will probably teach us a lot of what we need to do to move over and how to do it. My hope is that once we get theTableReaderSpec
s created directly, we'll have a concrete gameplan on moving over the rest of the processor spec creation.Epic: CRDB-79
Jira issue: CRDB-4407
The text was updated successfully, but these errors were encountered: