Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: InlineWith should not inline mutations into subqueries and apply-joins #95360

Closed
mgartner opened this issue Jan 17, 2023 · 1 comment · Fixed by #95454
Closed

opt: InlineWith should not inline mutations into subqueries and apply-joins #95360

mgartner opened this issue Jan 17, 2023 · 1 comment · Fixed by #95454
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Jan 17, 2023

The InlineWith normalization rule can inline a mutation into subqueries or apply-joins, causing the mutations to be executed more than once. For example, notice that the INSERT is executed five times:

defaultdb> SELECT version();
                                          version
-------------------------------------------------------------------------------------------
  CockroachDB CCL v22.2.2 (aarch64-apple-darwin21.2, built 2023/01/04 17:38:41, go1.19.1)
(1 row)

defaultdb> CREATE TABLE corr ( k INT PRIMARY KEY, i INT);
CREATE TABLE

defaultdb> CREATE TABLE corr2 (k INT, i INT);
CREATE TABLE

defaultdb> INSERT INTO corr VALUES (1, 10), (2, 22), (3, 30), (4, 40), (5, 50);
INSERT 0 5

defaultdb> WITH insVals AS NOT MATERIALIZED (INSERT INTO corr2 VALUES (1, 10) RETURNING k)
SELECT * FROM corr
WHERE EXISTS (SELECT * FROM insVals WHERE k = corr.k UNION ALL SELECT k FROM corr c3 WHERE k = corr.k);
  k | i
----+-----
  1 | 10
  2 | 22
  3 | 30
  4 | 40
  5 | 50
(5 rows)

defaultdb> SELECT * FROM corr2;
  k | i
----+-----
  1 | 10
  1 | 10
  1 | 10
  1 | 10
  1 | 10
(5 rows)

This is not consistent with Postgres which guarantees that:

Data-modifying statements in WITH are executed exactly once, and always to completion, independently of whether the primary query reads all (or indeed any) of their output.

^ From the Postgres Docs.

We should prevent InlineWith from inlining into correlated subqueries (and maybe uncorrelated, unless they are guaranteed to execute onle once - even lazy ones?), and apply-joins.

Jira issue: CRDB-23481

@mgartner mgartner added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 17, 2023
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jan 17, 2023
@mgartner
Copy link
Collaborator Author

The PG docs don't mention specifically what they behavior is for data-modifying statements in WITHs that have NOT MATERIALIZED, but the answer is in the source code:

https://github.com/postgres/postgres/blob/15eb1228800a35042ef318f941173aa8b89002d1/src/backend/optimizer/plan/subselect.c#L952-L957

NOT MATERIALIZED (or CTEMaterializeNever) is ignored if the statement inside the WITH is a DML statement, like INSERT.

@mgartner mgartner self-assigned this Jan 18, 2023
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 18, 2023
Previously, any `NOT MATERIALIZED` CTE was inlined, even if it had
side-effects (e.g., a volatile expression or mutation). This did not
match Postgres's behavior. The Postgres documentation mentions that
"data-modifying statements in WITH are executed exactly once"[^1], but
omits any specifics about data-modifying statements in
`NOT MATERIALIZED` CTEs. A comment in the Postgres source code makes the
intended behavior clear[^2].

Fixes cockroachdb#95360

Release note (bug fix): A bug has been fixed that could cause
expressions with side-effects (e.g., volatile expressions or
data-modifying statements like `INSERT`s) in `NOT MATERIALIZED` CTEs to
be executed more than once. This bug was present since
`NOT MATERIALIZED` was first supported in version 20.2.0.

[^1]: https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-MODIFYING
[^2]: https://github.com/postgres/postgres/blob/15eb1228800a35042ef318f941173aa8b89002d1/src/backend/optimizer/plan/subselect.c#L921-L951
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 18, 2023
Previously, any `NOT MATERIALIZED` CTE was inlined, even if it had
side-effects (e.g., a volatile expression or mutation). This did not
match Postgres's behavior. The Postgres documentation mentions that
"data-modifying statements in WITH are executed exactly once"[^1], but
omits any specifics about data-modifying statements in
`NOT MATERIALIZED` CTEs. A comment in the Postgres source code makes the
intended behavior clear[^2].

Fixes cockroachdb#95360

Release note (bug fix): A bug has been fixed that could cause
expressions with side-effects (e.g., volatile expressions or
data-modifying statements like `INSERT`s) in `NOT MATERIALIZED` CTEs to
be executed more than once. This bug was present since
`NOT MATERIALIZED` was first supported in version 20.2.0. Our CTE docs
should be updated with the following changes:
  1. Under "Data-modifying statements", we should mention that they are
     guaranteed to be executed exactly once, even if the
     `NOT MATERIALIZED` is specified.
  2. In the description of `MATERIALIZED` / `NOT MATERIALIZED`, we
     should mention: `MATERIALIZED` forces the CTE to be materialized so
     that it is only executed once, and `NOT MATERIALIZED` is a hint to
     the optimizer to inline the CTE as long as it doesn't affect other
     objects in the database, even if it is used in the query multiple
     times.

[^1]: https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-MODIFYING
[^2]: https://github.com/postgres/postgres/blob/15eb1228800a35042ef318f941173aa8b89002d1/src/backend/optimizer/plan/subselect.c#L921-L951
craig bot pushed a commit that referenced this issue Jan 19, 2023
95357: roachtest: use unified mode in c2c roach tests r=knz a=stevendanna

The first two commits modify roachprod and roactest to make it 
a bit easier to use CRDB's new ability to serve multiple tenants from
a single process:

- Adds a --tenant-name flag to the sql and pgurl commands
- Adds the ability to specify a tenant name when using roachprods command templating
- Adds the first 3 secondary tenants (2, 3, 4) to the default certificate setup

Together this allows us to fairly easily rewrite the c2c roachtests to avoid starting a separate 
SQL pod.

Epic: None

Release note: None

95454: opt: do not inline NOT MATERIALIZED CTEs with side-effects r=mgartner a=mgartner

#### sql: use enum for CTE materialize clause

The `tree.MaterializeClause` struct has been replaced with a 3-value
enum to represent `MATERIALIZED`, `NOT MATERIALIZED`, and an empty
materialization clause (the default). These three clauses represent
three distinct modes of behavior for CTEs. Previously, the
`tree.MaterializeClause` struct had both a `Set` boolean and a
`Materialize` boolean which allowed four possible combinations, and
confusingly implied that `NOT MATERIALIZED` (`Set: true, Materialize:
false`) was equivalent to an empty clause (`Set: false, Materialize:
false`), with the only difference being the explicitness of the clause
in the source SQL. In reality, there are behavioral differences between
a CTE with an empty materialize clause and `NOT MATERIALIZED`. Using an
enum to represent these clauses is more natural and less error-prone
than a two-boolean struct.

Release note: None

#### opt: do not inline NOT MATERIALIZED CTEs with side-effects

Previously, any `NOT MATERIALIZED` CTE was inlined, even if it had
side-effects (e.g., a volatile expression or mutation). This did not
match Postgres's behavior. The Postgres documentation mentions that
"data-modifying statements in WITH are executed exactly once"[^1], but
omits any specifics about data-modifying statements in
`NOT MATERIALIZED` CTEs. A comment in the Postgres source code makes the
intended behavior clear[^2].

Fixes #95360

Release note (bug fix): A bug has been fixed that could cause
expressions with side-effects (e.g., volatile expressions or
data-modifying statements like `INSERT`s) in `NOT MATERIALIZED` CTEs to
be executed more than once. This bug was present since
`NOT MATERIALIZED` was first supported in version 20.2.0. Our CTE docs
should be updated with the following changes:
  1. Under "Data-modifying statements", we should mention that they are
     guaranteed to be executed exactly once, even if the
     `NOT MATERIALIZED` is specified.
  2. In the description of `MATERIALIZED` / `NOT MATERIALIZED`, we
     should mention: `MATERIALIZED` forces the CTE to be materialized so
     that it is only executed once, and `NOT MATERIALIZED` is a hint to
     the optimizer to inline the CTE as long as it doesn't affect other
     objects in the database, even if it is used in the query multiple
     times.

[^1]: https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-MODIFYING
[^2]: https://github.com/postgres/postgres/blob/15eb1228800a35042ef318f941173aa8b89002d1/src/backend/optimizer/plan/subselect.c#L921-L951

#### opt: refactor InlineAnyWithScan rule

The `InlineAnyWithScan` rules has be renamed `InlineAnyWithScanOfValues`
and the code and comments have been refactored to make it clear that
this rule is only valid for WithScan expressions that reference Values
expressions.

Release note: None

95485: release: do not use custom field for SRE issues r=celiala a=rail

Previously, we submitted Jira issues for the SRE team. Recently they changed the project settings and there is no more "Has SLA" field.

This PR removes the custom field from the submissions. NExt step will be adding the "Due date" field.

Epic: none
Part of: RE-361
Release note: None

Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Rail Aliiev <[email protected]>
@craig craig bot closed this as completed in 49c9a7b Jan 19, 2023
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant