Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: "could not decorrelate subquery" on insert to enum with asyncpg #80169

Closed
gordthompson opened this issue Apr 19, 2022 · 18 comments · Fixed by #95234
Closed

sql: "could not decorrelate subquery" on insert to enum with asyncpg #80169

gordthompson opened this issue Apr 19, 2022 · 18 comments · Fixed by #95234
Assignees
Labels
A-tools-asyncpg C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner

Comments

@gordthompson
Copy link

gordthompson commented Apr 19, 2022

Describe the problem

As originally reported in

sqlalchemy/sqlalchemy#7945

attempting to insert into a table with an enum column results in "could not decorrelate subquery".

To Reproduce

psycopg2 (2.9.3) - No issue

This code runs fine for both PostgreSQL (pg) and CockroachDB (cr)

import psycopg2

cr = {"host": "localhost", "port": 26257, "user": "root", "dbname": "defaultdb"}
pg = {"host": "192.168.0.199", "port": 5432, "user": "scott", "password": "tiger", "dbname": "test"}
conn = psycopg2.connect(**cr)
cur = conn.cursor()
cur.execute("DROP TABLE IF EXISTS test_enums")
cur.execute("DROP TYPE IF EXISTS myenum")
cur.execute("CREATE TYPE myenum AS ENUM ('foo', 'bar')")
cur.execute(
    """\
CREATE TABLE test_enums (
    id int NOT NULL, 
    test myenum
)
"""
)
cur.execute(
    "INSERT INTO test_enums (id, test) VALUES (%(id)s, %(test)s)",
    {"id": 1, "test": "foo"},
)
conn.commit()

asyncpg with PostgreSQL - No issue

import asyncio

import asyncpg

# uses ENUM and TABLE created by previous (sync) tests


async def run():
    pg = {
        "host": "192.168.0.199",
        "port": 5432,
        "user": "scott",
        "password": "tiger",
        "database": "test",
    }
    conn = await asyncpg.connect(**pg)
    await conn.execute(
        "INSERT INTO test_enums (id, test) VALUES ($1, $2)",
        1,
        "foo",
    )
    await conn.close()


loop = asyncio.get_event_loop()
loop.run_until_complete(run())

asyncpg with CockroachDB - "could not decorrelate subquery"

import asyncio

import asyncpg

# uses ENUM and TABLE created by previous (sync) tests


async def run():
    cr = {"host": "localhost", "port": 26257, "user": "root", "database": "defaultdb"}
    conn = await asyncpg.connect(**cr)
    await conn.execute(
        "INSERT INTO test_enums (id, test) VALUES ($1, $2)",
        1,
        "foo",
    )
    await conn.close()


loop = asyncio.get_event_loop()
loop.run_until_complete(run())

"""
Traceback (most recent call last):
  File "/home/gord/git/sqlalchemy-cockroachdb/gord_test/async_issue_7945_mcve.py", line 28, in <module>
    loop.run_until_complete(run())
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/gord/git/sqlalchemy-cockroachdb/gord_test/async_issue_7945_mcve.py", line 19, in run
    await conn.execute(
  File "/home/gord/git/sqlalchemy-cockroachdb/venv/lib/python3.8/site-packages/asyncpg/connection.py", line 320, in execute
    _, status, _ = await self._execute(
  File "/home/gord/git/sqlalchemy-cockroachdb/venv/lib/python3.8/site-packages/asyncpg/connection.py", line 1659, in _execute
    result, _ = await self.__execute(
  File "/home/gord/git/sqlalchemy-cockroachdb/venv/lib/python3.8/site-packages/asyncpg/connection.py", line 1684, in __execute
    return await self._do_execute(
  File "/home/gord/git/sqlalchemy-cockroachdb/venv/lib/python3.8/site-packages/asyncpg/connection.py", line 1711, in _do_execute
    stmt = await self._get_statement(
  File "/home/gord/git/sqlalchemy-cockroachdb/venv/lib/python3.8/site-packages/asyncpg/connection.py", line 413, in _get_statement
    types, intro_stmt = await self._introspect_types(
  File "/home/gord/git/sqlalchemy-cockroachdb/venv/lib/python3.8/site-packages/asyncpg/connection.py", line 459, in _introspect_types
    return await self.__execute(
  File "/home/gord/git/sqlalchemy-cockroachdb/venv/lib/python3.8/site-packages/asyncpg/connection.py", line 1684, in __execute
    return await self._do_execute(
  File "/home/gord/git/sqlalchemy-cockroachdb/venv/lib/python3.8/site-packages/asyncpg/connection.py", line 1731, in _do_execute
    result = await executor(stmt, None)
  File "asyncpg/protocol/protocol.pyx", line 201, in bind_execute
asyncpg.exceptions._base.UnknownPostgresError: could not decorrelate subquery
"""

Environment:

  • CockroachDB version: v22.1.0-beta.3
  • Server OS: Ubuntu 20.04
  • Client app: asyncpg 0.25.0

Epic CRDB-18886
Jira issue: CRDB-15824

@gordthompson gordthompson added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 19, 2022
@blathers-crl
Copy link

blathers-crl bot commented Apr 19, 2022

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/sql-schema (found keywords: DROP TABLE)
  • @cockroachdb/bulk-io (found keywords: import)
  • @cockroachdb/sql-experience (found keywords: sqlalchemy,psycopg2)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added A-disaster-recovery O-community Originated from the community X-blathers-triaged blathers was able to find an owner T-disaster-recovery labels Apr 19, 2022
@blathers-crl
Copy link

blathers-crl bot commented Apr 19, 2022

cc @cockroachdb/bulk-io

@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Apr 19, 2022
@rafiss
Copy link
Collaborator

rafiss commented Apr 19, 2022

Thanks for filing! I ran this and saw that the INSERT query itself runs fine, but then right after asyncpg sends this query, which is the one that's failing:

WITH RECURSIVE typeinfo_tree(
    oid, ns, name, kind, basetype, elemtype, elemdelim,
    range_subtype, attrtypoids, attrnames, depth)
AS (
    SELECT
        ti.oid, ti.ns, ti.name, ti.kind, ti.basetype,
        ti.elemtype, ti.elemdelim, ti.range_subtype,
        ti.attrtypoids, ti.attrnames, 0
    FROM
            (
        SELECT
            t.oid                           AS oid,
            ns.nspname                      AS ns,
            t.typname                       AS name,
            t.typtype                       AS kind,
            (CASE WHEN t.typtype = 'd' THEN
                (WITH RECURSIVE typebases(oid, depth) AS (
                    SELECT
                        t2.typbasetype      AS oid,
                        0                   AS depth
                    FROM
                        pg_type t2
                    WHERE
                        t2.oid = t.oid

                    UNION ALL

                    SELECT
                        t2.typbasetype      AS oid,
                        tb.depth + 1        AS depth
                    FROM
                        pg_type t2,
                        typebases tb
                    WHERE
                       tb.oid = t2.oid
                       AND t2.typbasetype != 0
               ) SELECT oid FROM typebases ORDER BY depth DESC LIMIT 1)

               ELSE NULL
            END)                            AS basetype,
            t.typelem                       AS elemtype,
            elem_t.typdelim                 AS elemdelim,
            range_t.rngsubtype              AS range_subtype,
            (CASE WHEN t.typtype = 'c' THEN
                (SELECT
                    array_agg(ia.atttypid ORDER BY ia.attnum)
                FROM
                    pg_attribute ia
                    INNER JOIN pg_class c
                        ON (ia.attrelid = c.oid)
                WHERE
                    ia.attnum > 0 AND NOT ia.attisdropped
                    AND c.reltype = t.oid)

                ELSE NULL
            END)                            AS attrtypoids,
            (CASE WHEN t.typtype = 'c' THEN
                (SELECT
                    array_agg(ia.attname::text ORDER BY ia.attnum)
                FROM
                    pg_attribute ia
                    INNER JOIN pg_class c
                        ON (ia.attrelid = c.oid)
                WHERE
                    ia.attnum > 0 AND NOT ia.attisdropped
                    AND c.reltype = t.oid)

                ELSE NULL
            END)                            AS attrnames
        FROM
            pg_catalog.pg_type AS t
            INNER JOIN pg_catalog.pg_namespace ns ON (
                ns.oid = t.typnamespace)
            LEFT JOIN pg_type elem_t ON (
                t.typlen = -1 AND
                t.typelem != 0 AND
                t.typelem = elem_t.oid
            )
            LEFT JOIN pg_range range_t ON (
                t.oid = range_t.rngtypid
            )
    )
 AS ti
    WHERE
        ti.oid = any($1::oid[])

    UNION ALL

    SELECT
        ti.oid, ti.ns, ti.name, ti.kind, ti.basetype,
        ti.elemtype, ti.elemdelim, ti.range_subtype,
        ti.attrtypoids, ti.attrnames, tt.depth + 1
    FROM
            (
        SELECT
            t.oid                           AS oid,
            ns.nspname                      AS ns,
            t.typname                       AS name,
            t.typtype                       AS kind,
            (CASE WHEN t.typtype = 'd' THEN
                (WITH RECURSIVE typebases(oid, depth) AS (
                    SELECT
                        t2.typbasetype      AS oid,
                        0                   AS depth
                    FROM
                        pg_type t2
                    WHERE
                        t2.oid = t.oid

                    UNION ALL

                    SELECT
                        t2.typbasetype      AS oid,
                        tb.depth + 1        AS depth
                    FROM
                        pg_type t2,
                        typebases tb
                    WHERE
                       tb.oid = t2.oid
                       AND t2.typbasetype != 0
               ) SELECT oid FROM typebases ORDER BY depth DESC LIMIT 1)

               ELSE NULL
            END)                            AS basetype,
            t.typelem                       AS elemtype,
            elem_t.typdelim                 AS elemdelim,
            range_t.rngsubtype              AS range_subtype,
            (CASE WHEN t.typtype = 'c' THEN
                (SELECT
                    array_agg(ia.atttypid ORDER BY ia.attnum)
                FROM
                    pg_attribute ia
                    INNER JOIN pg_class c
                        ON (ia.attrelid = c.oid)
                WHERE
                    ia.attnum > 0 AND NOT ia.attisdropped
                    AND c.reltype = t.oid)

                ELSE NULL
            END)                            AS attrtypoids,
            (CASE WHEN t.typtype = 'c' THEN
                (SELECT
                    array_agg(ia.attname::text ORDER BY ia.attnum)
                FROM
                    pg_attribute ia
                    INNER JOIN pg_class c
                        ON (ia.attrelid = c.oid)
                WHERE
                    ia.attnum > 0 AND NOT ia.attisdropped
                    AND c.reltype = t.oid)

                ELSE NULL
            END)                            AS attrnames
        FROM
            pg_catalog.pg_type AS t
            INNER JOIN pg_catalog.pg_namespace ns ON (
                ns.oid = t.typnamespace)
            LEFT JOIN pg_type elem_t ON (
                t.typlen = -1 AND
                t.typelem != 0 AND
                t.typelem = elem_t.oid
            )
            LEFT JOIN pg_range range_t ON (
                t.oid = range_t.rngtypid
            )
    )
 ti,
        typeinfo_tree tt
    WHERE
        (tt.elemtype IS NOT NULL AND ti.oid = tt.elemtype)
        OR (tt.attrtypoids IS NOT NULL AND ti.oid = any(tt.attrtypoids))
        OR (tt.range_subtype IS NOT NULL AND ti.oid = tt.range_subtype)
)

SELECT DISTINCT
    *,
    basetype::regtype::text AS basetype_name,
    elemtype::regtype::text AS elemtype_name,
    range_subtype::regtype::text AS range_subtype_name
FROM
    typeinfo_tree
ORDER BY
    depth DESC

@rafiss rafiss changed the title "could not decorrelate subquery" on insert to enum with asyncpg sql: "could not decorrelate subquery" on insert to enum with asyncpg Apr 19, 2022
@gordthompson
Copy link
Author

@rafiss - Thanks for the additional information. Looks like this might be similar to #71908

@mgartner
Copy link
Collaborator

Could also be similar to #73573.

@michae2
Copy link
Collaborator

michae2 commented Apr 26, 2022

Tagging @vy-ton

@vy-ton
Copy link
Contributor

vy-ton commented Apr 26, 2022

@kernfeld-cockroach fyi there's a group of could not decorrelate subquery issues that Queries will need to investigate whenever DX prioritizes asyncpg support

@andy-kimball
Copy link
Contributor

andy-kimball commented May 13, 2022

Looking at this query, there are at least 4 different things preventing decorrelation:

  1. The ia.attname::text cast is treated as Immutable rather than LeakProof. Since there's no way that converting from a name type to a text type can generate an error, we could easily fix this.

  2. The tb.depth + 1 expressions prevent hoisting subqueries out of CASE statements, since they may generate an error in the extreme case that the addition triggers 64-bit int overflow. This is much tougher to address. We could implement "apply with probe" as described in the "Conditional scalar execution" section of this paper, which would yield the highest performance. Or we could upgrade execution support for evaluating correlated subqueries in every case, but would be quite slow (but would be fine for this case). Finally, we could adopt an approach like Materialize, describe in more detail in a later comment.

  3. We don't yet support decorrelating window functions (e.g. array_agg(ia.attname::text ORDER BY ia.attnum)) or projection operators in concert with left-join-apply. We do have inner-join-apply variants, though. We'd have to prove that those rules could be extended to left-join-apply. I expect there'd be some complications, but also that it's possible.

  4. We don't support decorrelating recursive CTE expressions. I'm not sure how difficult that would be to do, or even if it's possible in the general case.

@msirek msirek self-assigned this May 13, 2022
@andy-kimball
Copy link
Contributor

andy-kimball commented May 14, 2022

Well, the problem is that we'd likely need to use a With operator to cache the input to D, since it's also used to do the post-join. That caching is probably worse than the group-by, especially if there's enough data to trigger spilling to disk. Also remember that computing D requires doing a DISTINCT operation, which may be comparable to the group-by perf-wise. So it'd be caching + distinct vs. group-by.

I suspect that the best solution will be to do group-by decorrelation for most common cases, and then to fall back on generalized decorrelation for other cases that we can't handle well otherwise. But we'd need to do some performance testing to be sure.

Also, note that generalized decorrelation does not solve the other problem with this query, which is that we're not able to create an apply-join to begin with for subqueries in CASE statements (that are not LeakProof). If we decide to devote the time to addressing this limitation, I think we should do something similar to what Materialize does, which is to split the THEN and ELSE clauses into separate branches of a UNION clause, one of which filters on the "if" condition and the other of which filters on a corresponding "if not" condition. Of course, that assumes that we can then decorrelate UNION which is also NYI.

@msirek
Copy link
Contributor

msirek commented May 14, 2022

Well, the problem is that we'd likely need to use a With operator to cache the input to D, since it's also used to do the post-join.

I'm not sure I understand this point. Are you worried that rows will be inserted into the source relation of D in between 2 separate scans? Wouldn't our transaction logic handle this by either blocking the writes or making our transaction recompute results if we scanned one of the written rows?

Edit: Oh, maybe if the input to D is itself a sequence of joins, we don't want to redo that entire set of joins.

@andy-kimball
Copy link
Contributor

Right, the input to D could be very expensive to recompute. It's better to cache it locally, since we know we're going to need it twice. If the execution engine was able to handle graphs like Materialize, this wouldn't be necessary.

@msirek
Copy link
Contributor

msirek commented May 16, 2022

Leakproof, short definition:

LEAKPROOF indicates that the function has no side effects. It reveals no information about its arguments other than by its return value. For example, a function which throws an error message for some argument values but not others, or which includes the argument values in any error message, is not leakproof. 

@jlinder jlinder added sync-me and removed sync-me labels May 20, 2022
msirek pushed a commit to msirek/cockroach that referenced this issue May 24, 2022
Fixes cockroachdb#80169
Fixes cockroachdb#71908

Previously, some queries which follow the pattern:
`SELECT CASE WHEN <cond> THEN <subquery> END FROM <table>` would error
out with a `could not decorrelate subquery` error. The reason is that
the subquery should only run in cases where `<cond>` is true or when it
is leak-proof (when running the subquery when `<cond>` is false couldn't
have side-effects, e.g. erroring out due to overflow during a CAST).
When the subquery is not leak-proof, it cannot be pulled above the
CASE expression into an apply join (which is a necessary first step
in executing a subquery expression).

To address this, this patch introduces a new normalization rule which
attempts to push the WHEN clause condition into the THEN clause subquery
and remove the CASE expression entirely (replace the CASE with the
subquery). The preconditions for this normalization are:
  1. There is a single WHEN clause.
  2. There is an ELSE NULL clause (either explicitly specified or
     implicit).
  3. The WHEN clause condition is not volatile (for example, the result
     is the same no matter how many times it is evaluated).
  4. The WHEN clause condition does not cause any side-effects, like
     writing rows to a table.
  5. The relational expressions in the THEN clause are only of the
     following types: (Select, Project, Limit, Offset, RecursiveCTE,
     InnerJoin, Scan, WithScan, ScalarGroupBy, Window).
  6. There are no aggregate functions which produce a non-null value
     when the input is empty, such as COUNT.
  7. There are no projected expressions above an aggregation in the
     subquery operation tree.

If these conditions are met, a new Select is placed above each Scan or
WithScan operation using the WHEN condition as a filter and the CASE
expression is replaced by the subquery.

Release note (sql change): This patch is adding support for some queries
which asyncpg generates internally, which previously would error out
with the message, "could not decorrelate subquery".
jlrobins pushed a commit to noteable-io/asyncpg-crdb-noteable that referenced this issue Jun 24, 2022
Avoids 'cannot decorrelate query', until at least:
  * cockroachdb/cockroach#81706
  * cockroachdb/cockroach#80169
are solved.

Is good enough to introspect into enums at least.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 13, 2023
Previously, the optimizer would error in rare cases when it was unable
to hoist correlated subqueries into apply-joins. Now, scalar, correlated
subqueries that aren't hoisted are executed successfully. There is
remaining work to apply the same method in this commit to `EXISTS` and
`<op> ANY` subqueries.

Hoisting correlated subqueries is not possible when a conditional
expression, like a `CASE`, wraps a subquery that is not leak-proof. One
of the effects of hoisting a subquery is that the subquery will be
unconditionally evaluated. For leak-proof subqueries, the worst case is
that unnecessary computation is performed. For non-leak-proof
subqueries, errors could originate from the subquery when it should have
never been evaluated because the corresponding conditional expression
was never true. So, in order to support these cases, we must be able to
execute a correlated subquery.

A correlated subquery can be thought of as a relational expression with
parameters that need to be filled in with constant value arguments for
each invocation. It is essentially a user-defined function with a single
statement in the function body. So, the `tree.RoutineExpr` machinery
that powers UDFs is easily repurposed to facilitate evaluation of
correlated subqueries.

Fixes cockroachdb#71908
Fixes cockroachdb#73573
Fixes cockroachdb#80169

Release note (sql change): Some queries which previously resulted in the
error "could not decorrelate subquery" now succeed.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 13, 2023
Previously, the optimizer would error in rare cases when it was unable
to hoist correlated subqueries into apply-joins. Now, scalar, correlated
subqueries that aren't hoisted are executed successfully. There is
remaining work to apply the same method in this commit to `EXISTS` and
`<op> ANY` subqueries.

Hoisting correlated subqueries is not possible when a conditional
expression, like a `CASE`, wraps a subquery that is not leak-proof. One
of the effects of hoisting a subquery is that the subquery will be
unconditionally evaluated. For leak-proof subqueries, the worst case is
that unnecessary computation is performed. For non-leak-proof
subqueries, errors could originate from the subquery when it should have
never been evaluated because the corresponding conditional expression
was never true. So, in order to support these cases, we must be able to
execute a correlated subquery.

A correlated subquery can be thought of as a relational expression with
parameters that need to be filled in with constant value arguments for
each invocation. It is essentially a user-defined function with a single
statement in the function body. So, the `tree.RoutineExpr` machinery
that powers UDFs is easily repurposed to facilitate evaluation of
correlated subqueries.

Fixes cockroachdb#71908
Fixes cockroachdb#73573
Fixes cockroachdb#80169

Release note (sql change): Some queries which previously resulted in the
error "could not decorrelate subquery" now succeed.
craig bot pushed a commit that referenced this issue Jan 18, 2023
94670: metrics: add tenant name to _status/vars output r=knz a=dhartunian

This commit adds a `tenant` prometheus label to each metrics output via the
`_status/vars` HTTP endpoint. The label is populated with either "system" or
the name of the tenant generating the metric. When queried from the system
tenant's HTTP port or handler, the result will now also contain metrics from
all in-process tenants on the same node.

When initializing a tenant, an optional "parent" metrics recorder is passed
down allowing the tenant's recorder to be registered with the parent. When we
query the parent it iterates over all child recorders (much like we already do
for stores) and outputs all of their metrics as well.

Example:
```
sql_txn_commit_count{tenant="system"} 0
sql_txn_commit_count{tenant="demo-tenant"} 0
```

Resolves: #94663
Epic: CRDB-18798

Release note (ops change): Metrics output via `_status/vars` now contain
`tenant` labels allowing the user to distinguish between metrics emitted by
the system tenant vs other app tenants identified by their IDs.

Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Aaditya Sondhi <[email protected]>

95234: sql: evaluate correlated subqueries as routines r=mgartner a=mgartner

#### opt: create tree.Routine planning closure in helper function

The logic for creating a planning closure for a `tree.Routine` has been
moved to a helper function so that it can be reused in future commits.

Release note: None

#### sql: evaluate correlated subqueries as routines

Previously, the optimizer would error in rare cases when it was unable
to hoist correlated subqueries into apply-joins. Now, scalar, correlated
subqueries that aren't hoisted are executed successfully. There is
remaining work to apply the same method in this commit to `EXISTS` and
`<op> ANY` subqueries.

Hoisting correlated subqueries is not possible when a conditional
expression, like a `CASE`, wraps a subquery that is not leak-proof. One
of the effects of hoisting a subquery is that the subquery will be
unconditionally evaluated. For leak-proof subqueries, the worst case is
that unnecessary computation is performed. For non-leak-proof
subqueries, errors could originate from the subquery when it should have
never been evaluated because the corresponding conditional expression
was never true. So, in order to support these cases, we must be able to
execute a correlated subquery.

A correlated subquery can be thought of as a relational expression with
parameters that need to be filled in with constant value arguments for
each invocation. It is essentially a user-defined function with a single
statement in the function body. So, the `tree.RoutineExpr` machinery
that powers UDFs is easily repurposed to facilitate evaluation of
correlated subqueries.

Fixes #71908
Fixes #73573
Fixes #80169

Release note (sql change): Some queries which previously resulted in the
error "could not decorrelate subquery" now succeed.


95432: kvserver,kvstorage: move InitEngine r=pavelkalinnikov a=tbg

I'm working on a datadriven test for `kvstorage` and I need to be able
to initialize the engine there.

There's more that should move, but I'm taking it one step at a time.
As we build up more and more sophisticated datadriven tests in
`kvstorage` we can move whatever we need when we need it.

PS: I looked at the unit test being modified by the move and it can't
yet be moved to kvstorage - it also bootstraps the initial ranges, etc,
requiring a lot more code movement until we can move it.

100% mechanical movement via Goland.

Touches #93310.

Epic: CRDB-220
Release note: None


Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
@craig craig bot closed this as completed in 61233f0 Jan 18, 2023
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
@glstas
Copy link

glstas commented Sep 7, 2024

Hi folks, I just wanted to get a random enum value to insert it into dummy data and this all started...

At first, I had an issue with not random random() in subquery. I solved it with some magic by pushing an unnecessary parameter into a subquery so that it is not optimized and remains random.
Second, I wanted some condition for my case and got [XXUUU] ERROR: could not decorrelate subquery

Prepared a simplified example:

select random()                                                                     r1, -- random
       (select random())                                                            r2, -- NOT random
       a.rnd                                                                        r3, -- random
       (select a.rnd)                                                               r4, -- random
       (select * from (values (1), (2), (3) order by random() limit 1))             r5, -- NOT random
       (select * from (values (1), (2), (3) order by a.rnd limit 1))                r6, -- NOT random
       (select * from (values (1), (2), (3) order by random() + a.rnd * 0 limit 1)) r7  -- random
--        ,   -- uncomment this to see the error
--        CASE
--            WHEN a.rnd is null
--                THEN (select * from (values (1), (2), (3) order by random() + a.rnd * 0 limit 1))
--            END                                                                      r8
FROM (select random() as rnd from generate_series(1, 5)) a;

@mgartner
Copy link
Collaborator

mgartner commented Sep 9, 2024

What version of CockroachDB are you running? The example works for me on 24.1.0:

defaultdb> SELECT version();
                                                    version
----------------------------------------------------------------------------------------------------------------
  CockroachDB CCL v24.1.0 (aarch64-apple-darwin21.2, built 2024/05/15 21:28:46, go1.22.2 X:nocoverageredesign)
(1 row)

defaultdb> select random()                                                                     r1, -- random
                             ->        (select random())                                                            r2, -- NOT
                             -> random
                             ->        a.rnd                                                                        r3, -- random
                             ->        (select a.rnd)                                                               r4, -- random
                             ->        (select * from (values (1), (2), (3) order by random() limit 1))             r5, -- NOT
                             -> random
                             ->        (select * from (values (1), (2), (3) order by a.rnd limit 1))                r6, -- NOT
                             -> random
                             ->        (select * from (values (1), (2), (3) order by random() + a.rnd * 0 limit 1)) r7  -- random
                             ->         ,   -- uncomment this to see the error
                             ->         CASE
                             ->             WHEN a.rnd is null
                             ->                 THEN (select * from (values (1), (2), (3) order by random() + a.rnd * 0 limit 1))
                             ->             END                                                                      r8
                             -> FROM (select random() as rnd from generate_series(1, 5)) a;
                             ->
          r1          |         r2         |         r3          |         r4          | r5 | r6 | r7 |  r8
----------------------+--------------------+---------------------+---------------------+----+----+----+-------
   0.5150354836336497 | 0.6108113127505165 |   0.847858279431529 |  0.7073162442168096 |  2 |  1 |  1 | NULL
    0.524494171843365 | 0.6108113127505165 |  0.9187092713522866 |  0.8292892608737888 |  2 |  1 |  1 | NULL
  0.09254128438983356 | 0.6108113127505165 |  0.8123665890914908 |  0.5199786715575637 |  2 |  1 |  2 | NULL
   0.4632872261624881 | 0.6108113127505165 | 0.43223473246089356 | 0.28477151392088645 |  2 |  1 |  1 | NULL
   0.7341229727042871 | 0.6108113127505165 |  0.8332920639350992 | 0.12258354800515972 |  2 |  1 |  2 | NULL
(5 rows)

@glstas
Copy link

glstas commented Sep 10, 2024

Yes, you are right, I had version v22.2.1, I updated to v24.1 and there is no error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tools-asyncpg C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner
Projects
Archived in project
9 participants