-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: query decorrelation error under recursive CTE in asyncpg #71908
Comments
Hello, I am Blathers. I am here to help you get the issue triaged. Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here. I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
This comment was marked as off-topic.
This comment was marked as off-topic.
the "real" issue is that asyncpg is trying to execute
Where |
I still get it without that line. |
cc @ZhouXing19 i'm surprised this wasn't caught in the tests here: cockroach/pkg/cmd/roachtest/tests/asyncpg.go Line 102 in 041aaff
are we sure we're hitting cockroach with those queries? we should also fix
|
Fixes cockroachdb#80169 Fixes cockroachdb#71908 Previously, some queries which follow the pattern: `SELECT CASE WHEN <cond> THEN <subquery> END FROM <table>` would error out with a `could not decorrelate subquery` error. The reason is that the subquery should only run in cases where `<cond>` is true or when it is leak-proof (when running the subquery when `<cond>` is false couldn't have side-effects, e.g. erroring out due to overflow during a CAST). When the subquery is not leak-proof, it cannot be pulled above the CASE expression into an apply join (which is a necessary first step in executing a subquery expression). To address this, this patch introduces a new normalization rule which attempts to push the WHEN clause condition into the THEN clause subquery and remove the CASE expression entirely (replace the CASE with the subquery). The preconditions for this normalization are: 1. There is a single WHEN clause. 2. There is an ELSE NULL clause (either explicitly specified or implicit). 3. The WHEN clause condition is not volatile (for example, the result is the same no matter how many times it is evaluated). 4. The WHEN clause condition does not cause any side-effects, like writing rows to a table. 5. The relational expressions in the THEN clause are only of the following types: (Select, Project, Limit, Offset, RecursiveCTE, InnerJoin, Scan, WithScan, ScalarGroupBy, Window). 6. There are no aggregate functions which produce a non-null value when the input is empty, such as COUNT. 7. There are no projected expressions above an aggregation in the subquery operation tree. If these conditions are met, a new Select is placed above each Scan or WithScan operation using the WHEN condition as a filter and the CASE expression is replaced by the subquery. Release note (sql change): This patch is adding support for some queries which asyncpg generates internally, which previously would error out with the message, "could not decorrelate subquery".
Any updates on this? @knz Currently a blocker for ARRAY support in Piccolo ORM. Example query where we get SELECT "band"."name",
ARRAY(
SELECT
"inner_genre"."name"
FROM
"genre_to_band"
JOIN "band" "inner_band" ON (
"genre_to_band"."band" = "inner_band"."id"
)
JOIN "genre" "inner_genre" ON (
"genre_to_band"."genre" = "inner_genre"."id"
)
WHERE "genre_to_band"."band" = "band"."id"
) AS "genres"
FROM band |
I think the title of this issue (71908) should be modified to refer to the specific kind of correlation mentioned by the OP. @gnat your specific query is using a different structure, and should be filed as a different issue. Thanks |
We'll be working on subquery decorrelation (including this issue specifically) in the coming months, but it likely won't be in any release until v23.1 or later |
That's great. We'll move forward with a Cockroach release for Piccolo ORM relatively soon- if we cannot figure something out ourselves, we'll just flag the affected features as in-progress and keep an eye out here. |
* Cockroach config for tests. * Added feedback from #607 * New CockroachDB engine. * No more extension=[] ! * Running tests as postgres. * Initial build for CockroachDB * Running tests. * Run workflow. * Naming consistency. Making roughly equivalent to Postgres. * Timestamp bug. * DDL for Cockroach. for_engines decorator. * Progress. * Added database prep for cluster settings. * Tests passing. * Passing migration type checks. * Cockroach ALTER COLUMN TYPE moved outside of transaction until #49351 is solved. * New test helpers. * Fixtures. * Cockroach specific stuff. * Array tests. * Column tests. * JSON tests. * M2M tables. as_list is effected by cockroachdb/cockroach#71908 * Skip test for numeric. * Pool test. * Transaction tests. * Related readable tests. * Save tests. * To dict tests. * JOIN tests. * Output tests. * Raw tests. * Repr tests. * Select tests. * "As string" tests. * Cockroach having trouble with one specific operator. * Skipping until CRDB issue is resolved. Should work though! * Minor updates. * Test dump load. Other CRDB things. * Migration manager tests. * ALTER tests. * SQL Generator tests. * Serialization tests. * Test helpers. * Cockroach Specific stuff. * Numerics all the same for CRDB. * Cockroach specific stuff. * Cleanup. * Skipping timestamp tests. * DDL resolution stuff for Cockroach. * WIP. * Cohesion and special test tables for Cockroach. * WIP * Special table for Cockroach. * Removed hack. * Should pass now. * Removed some debug stuff. * LGTM stuff. * Feedback. * Feedback. * Feedback. * More readable "postgres" or "cockroach. * Added Cockroach overrides for COLUMN_TYPE_MAP, COLUMN_DEFAULT_PARSER * Restore SmallInt functionality for Cockroach. * Restored type() comparison for reflection / generators. * Updated to latest Cockroach release. * Cleanup. * Cleanup. * Build stuff. * Cockroach Serial now uses unique_rowid() as default value always instead of DEFAULT. * Removed first_id() test helper. * Time Travel queries using AS OF SYSTEM TIME * Added documentation. * Small docs fix. * Test update. * Refactored out unnecessary code because we use unique_rowid() as default for Cockroach Serial and BigSerial. * BIGINT instead of BIGSERIAL for Cockroach. * LGTM stuff * LGTM... * Made compatible with older Python versions. * Get Cockroach passing without Python 3.9 stuff. * Removed unused variable. * Fixed test for SQLite. * Re-add Python 3.7 for CRDB + new CRDB version. * Consolidated cockroach_skip() into engines_skip() everywhere. * Re-add SQLite DDL. * Moved back to original engine DDL selector. * Reverted mistake in test suite. * Remove migration that should not be available. * Moving back to postgres_only for specific test. Probably cannot detect engine type at this stage. * postgres_only() to engine_only() * Set sane default for JSONB because we override column_type in JSON for some engines. * Ran isort. * Black being obnoxious, part 1. * Black being obnoxious Part 2. * Flake8 * Black * Flake8 * Flake8 * Flake8 * isort * Flake8 * Flake8 * Added alternate test names for duplicates to remove F811 * Added alternative name. * mypy * mypy * mypy * mypy * mypy * mypy * mypy * mypy * mypy * Cockroach: Now testing latest stable and upcoming release. * Cockroach: Testing only v22.2 * Cockroach version documentation. * Engine detection now based on type because CockroachEngine inherits PostgresEngine. * postgres_only is now engines_only("postgres", "cockroach") because CockroachEngine was always tested as a child class. * Updated to Cockroach 22.2 beta. * isort. * assertAlmostEqual for Decimal test. Docs update. * Linter * Cockroach 22.2 Beta 2. * Added notes about special Column types for Cockroach * version pin xpresso, so integration tests pass * add small section about running cockroach in contributing docs * fix typo Co-authored-by: Daniel Townsend <[email protected]>
Previously, the optimizer would error in rare cases when it was unable to hoist correlated subqueries into apply-joins. Now, scalar, correlated subqueries that aren't hoisted are executed successfully. There is remaining work to apply the same method in this commit to `EXISTS` and `<op> ANY` subqueries. Hoisting correlated subqueries is not possible when a conditional expression, like a `CASE`, wraps a subquery that is not leak-proof. One of the effects of hoisting a subquery is that the subquery will be unconditionally evaluated. For leak-proof subqueries, the worst case is that unnecessary computation is performed. For non-leak-proof subqueries, errors could originate from the subquery when it should have never been evaluated because the corresponding conditional expression was never true. So, in order to support these cases, we must be able to execute a correlated subquery. A correlated subquery can be thought of as a relational expression with parameters that need to be filled in with constant value arguments for each invocation. It is essentially a user-defined function with a single statement in the function body. So, the `tree.RoutineExpr` machinery that powers UDFs is easily repurposed to facilitate evaluation of correlated subqueries. Fixes cockroachdb#71908 Fixes cockroachdb#73573 Fixes cockroachdb#80169 Release note (sql change): Some queries which previously resulted in the error "could not decorrelate subquery" now succeed.
Previously, the optimizer would error in rare cases when it was unable to hoist correlated subqueries into apply-joins. Now, scalar, correlated subqueries that aren't hoisted are executed successfully. There is remaining work to apply the same method in this commit to `EXISTS` and `<op> ANY` subqueries. Hoisting correlated subqueries is not possible when a conditional expression, like a `CASE`, wraps a subquery that is not leak-proof. One of the effects of hoisting a subquery is that the subquery will be unconditionally evaluated. For leak-proof subqueries, the worst case is that unnecessary computation is performed. For non-leak-proof subqueries, errors could originate from the subquery when it should have never been evaluated because the corresponding conditional expression was never true. So, in order to support these cases, we must be able to execute a correlated subquery. A correlated subquery can be thought of as a relational expression with parameters that need to be filled in with constant value arguments for each invocation. It is essentially a user-defined function with a single statement in the function body. So, the `tree.RoutineExpr` machinery that powers UDFs is easily repurposed to facilitate evaluation of correlated subqueries. Fixes cockroachdb#71908 Fixes cockroachdb#73573 Fixes cockroachdb#80169 Release note (sql change): Some queries which previously resulted in the error "could not decorrelate subquery" now succeed.
94670: metrics: add tenant name to _status/vars output r=knz a=dhartunian This commit adds a `tenant` prometheus label to each metrics output via the `_status/vars` HTTP endpoint. The label is populated with either "system" or the name of the tenant generating the metric. When queried from the system tenant's HTTP port or handler, the result will now also contain metrics from all in-process tenants on the same node. When initializing a tenant, an optional "parent" metrics recorder is passed down allowing the tenant's recorder to be registered with the parent. When we query the parent it iterates over all child recorders (much like we already do for stores) and outputs all of their metrics as well. Example: ``` sql_txn_commit_count{tenant="system"} 0 sql_txn_commit_count{tenant="demo-tenant"} 0 ``` Resolves: #94663 Epic: CRDB-18798 Release note (ops change): Metrics output via `_status/vars` now contain `tenant` labels allowing the user to distinguish between metrics emitted by the system tenant vs other app tenants identified by their IDs. Co-authored-by: Alex Barganier <[email protected]> Co-authored-by: Aaditya Sondhi <[email protected]> 95234: sql: evaluate correlated subqueries as routines r=mgartner a=mgartner #### opt: create tree.Routine planning closure in helper function The logic for creating a planning closure for a `tree.Routine` has been moved to a helper function so that it can be reused in future commits. Release note: None #### sql: evaluate correlated subqueries as routines Previously, the optimizer would error in rare cases when it was unable to hoist correlated subqueries into apply-joins. Now, scalar, correlated subqueries that aren't hoisted are executed successfully. There is remaining work to apply the same method in this commit to `EXISTS` and `<op> ANY` subqueries. Hoisting correlated subqueries is not possible when a conditional expression, like a `CASE`, wraps a subquery that is not leak-proof. One of the effects of hoisting a subquery is that the subquery will be unconditionally evaluated. For leak-proof subqueries, the worst case is that unnecessary computation is performed. For non-leak-proof subqueries, errors could originate from the subquery when it should have never been evaluated because the corresponding conditional expression was never true. So, in order to support these cases, we must be able to execute a correlated subquery. A correlated subquery can be thought of as a relational expression with parameters that need to be filled in with constant value arguments for each invocation. It is essentially a user-defined function with a single statement in the function body. So, the `tree.RoutineExpr` machinery that powers UDFs is easily repurposed to facilitate evaluation of correlated subqueries. Fixes #71908 Fixes #73573 Fixes #80169 Release note (sql change): Some queries which previously resulted in the error "could not decorrelate subquery" now succeed. 95432: kvserver,kvstorage: move InitEngine r=pavelkalinnikov a=tbg I'm working on a datadriven test for `kvstorage` and I need to be able to initialize the engine there. There's more that should move, but I'm taking it one step at a time. As we build up more and more sophisticated datadriven tests in `kvstorage` we can move whatever we need when we need it. PS: I looked at the unit test being modified by the move and it can't yet be moved to kvstorage - it also bootstraps the initial ranges, etc, requiring a lot more code movement until we can move it. 100% mechanical movement via Goland. Touches #93310. Epic: CRDB-220 Release note: None Co-authored-by: David Hartunian <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Tobias Grieger <[email protected]>
tl;dr: #71908 (comment)
Describe the problem
I've seen a number of issues regarding "could not decorrelate subquery" (e.g., #68086 , #67951 ) but this one might be a little different in that it seems to be related more specifically to asyncpg.
Reflection for PostgreSQL in SQLAlchemy uses queries that include
ANY()
. For example, to retrieve primary key columns it emits a query similar to this:If I create a table in an empty CockroachDB database
then this code (using psycopg2) works fine:
However, if I connect to CockroachDB using asyncpg I get the "could not decorrelate subquery" error
However, if I use asyncpg to connect to PostgreSQL then the same query does not fail:
It also seems to be directly related to
ANY()
because if I remove theAND a.attnum = ANY(cons.conkey)
then the error does not occur for cockroachdb+asyncpg.Environment:
Additional context
One possible workaround might be to have the sqlalchemy-cockroachdb dialect override SQLAlchemy's code for PostgreSQL reflection with code that does not emit queries that use
ANY()
.Epic CRDB-18886
Jira issue: CRDB-10828
The text was updated successfully, but these errors were encountered: