Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: incorrect query plans for trigram similarity filters when pg_trgm.similarity_threshold=0 #122443

Closed
mgartner opened this issue Apr 16, 2024 · 3 comments · Fixed by #139265
Closed
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Apr 16, 2024

The optimizer produces incorrect query plans for queries with trigram similarity filters, e.g., col % 'foo', when pg_trgm.similarity_threshold is set to 0. These incorrect query plans produce incorrect results. For example:

CREATE TABLE t (
  k INT PRIMARY KEY,
  s STRING,
  INVERTED INDEX i (s gin_trgm_ops)
);

INSERT INTO t VALUES (1, 'foo'), (2, 'bar');

SET pg_trgm.similarity_threshold =  0;

SELECT * FROM t@primary WHERE s % 'foo';
--   k |  s
-- ----+------
--   1 | foo
--   2 | bar
-- (2 rows)

SELECT * FROM t@i WHERE s % 'foo';
--   k |  s
-- ----+------
--   1 | foo
-- (1 row)

When pg_trgm.similarity_threshold is set to 0, all strings are similar:

SET pg_trgm.similarity_threshold =  0;

SELECT 'foo' % 'bar';
--   ?column?
-- ------------
--      t
-- (1 row)

SELECT 'foo' % '';
--   ?column?
-- ------------
--      t
-- (1 row)

SELECT '' % '';
--   ?column?
-- ------------
--      t
-- (1 row)

Therefore, there is no subset of trigrams in the inverted index we can scan in order to find all rows matching the filter. We'd need to scan the entire inverted index. But that would be less efficient than scanning the entire primary index, so we simply shouldn't plan an inverted index scan at all.

Jira issue: CRDB-37893

@mgartner mgartner added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. labels Apr 16, 2024
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Apr 16, 2024
@mgartner
Copy link
Collaborator Author

This was discovered by @michae2 here: #121973 (review)

@mgartner
Copy link
Collaborator Author

This affects versions v22.2.0+. It has existed since trigram inverted indexes were introduced.

@yuzefovich yuzefovich added the T-sql-queries SQL Queries Team label May 2, 2024
@yuzefovich yuzefovich moved this from Triage to Bugs to Fix in SQL Queries May 7, 2024
@normanchenn normanchenn self-assigned this Jan 16, 2025
normanchenn added a commit to normanchenn/cockroach that referenced this issue Jan 16, 2025
Previously, the optimizer would produce incorrect query plans for
queries with trigram similarity filters when
`pg_trgm.similarity_threshold == 0`, producing incorrect results.

To address this, this patch adds a check to return early if
`pg_trgm.similarity_threshold == 0` in trigram similarity queries on
inverted indices.

Fixes: cockroachdb#122443

Release note (bug fix): The optimizer would produce incorrect query
plans for some queries with trigram similarity filters when
`pg_trgm.similarity_threshold` is set to 0. This has now been fixed.
normanchenn added a commit to normanchenn/cockroach that referenced this issue Jan 17, 2025
Previously, the optimizer would produce incorrect query plans for
queries with trigram similarity filters when
`pg_trgm.similarity_threshold == 0`, producing incorrect results.

To address this, this patch adds a check to return early if
`pg_trgm.similarity_threshold == 0` in trigram similarity queries on
inverted indices.

Fixes: cockroachdb#122443

Release note (bug fix): The optimizer could produce incorrect query
plans for queries using trigram similarity filters when
`pg_trgm.similarity_threshold` is set to 0. This bug was introduced in
v22.2.0. It does not affect v24.2.0+ releases, provided the
`optimizer_use_trigram_similarity_optimization` session variable is set
to its default value `true`.
craig bot pushed a commit that referenced this issue Jan 28, 2025
139265: opt: remove incorrect query plans for trigram similarity filters r=normanchenn a=normanchenn

Previously, the optimizer would produce incorrect query plans for queries with trigram similarity filters when `pg_trgm.similarity_threshold == 0`, producing incorrect results.

To address this, this patch adds a check to return early if `pg_trgm.similarity_threshold == 0` in trigram similarity queries on inverted indices.

Fixes: #122443

Release note (bug fix): The optimizer could produce incorrect query plans for queries using trigram similarity filters (e.g. `col % 'val'`) when `pg_trgm.similarity_threshold` was set to 0. This bug was introduced in v22.2.0 and is now fixed. Note that this issue does not affect v24.2.0+ releases when the `optimizer_use_trigram_similarity_optimization` session variable (introduced in v24.2.0) is set to its default value `true`, as it would skip this behaviour.

139914: cdctest: use legacy schema changer when sql smith is enabled r=aerfrei a=wenyihu6

This patch disables the declarative schema changer when TestChangefeedNemeses
uses SQLSmith. Enabling it causes a decoder error, so it is temporarily disabled
as a workaround. This ensures nemesis testing can run in CI with SQLSmith
without being skipped.

Informs: #137125
Release note: None

Co-authored-by: Norman Chen <[email protected]>
Co-authored-by: Wenyi Hu <[email protected]>
craig bot pushed a commit that referenced this issue Jan 28, 2025
139265: opt: remove incorrect query plans for trigram similarity filters r=normanchenn a=normanchenn

Previously, the optimizer would produce incorrect query plans for queries with trigram similarity filters when `pg_trgm.similarity_threshold == 0`, producing incorrect results.

To address this, this patch adds a check to return early if `pg_trgm.similarity_threshold == 0` in trigram similarity queries on inverted indices.

Fixes: #122443

Release note (bug fix): The optimizer could produce incorrect query plans for queries using trigram similarity filters (e.g. `col % 'val'`) when `pg_trgm.similarity_threshold` was set to 0. This bug was introduced in v22.2.0 and is now fixed. Note that this issue does not affect v24.2.0+ releases when the `optimizer_use_trigram_similarity_optimization` session variable (introduced in v24.2.0) is set to its default value `true`, as it would skip this behaviour.

Co-authored-by: Norman Chen <[email protected]>
@craig craig bot closed this as completed in 496f35f Jan 28, 2025
@github-project-automation github-project-automation bot moved this from Bugs to Fix to Done in SQL Queries Jan 28, 2025
Copy link

blathers-crl bot commented Jan 28, 2025

Hi @normanchenn, please add branch-* labels to identify which branch(es) this C-bug affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. T-sql-queries SQL Queries Team
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants