Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support indexed key-contains queries in JSON inverted indexes #23905

Closed
jordanlewis opened this issue Mar 15, 2018 · 4 comments
Closed

sql: support indexed key-contains queries in JSON inverted indexes #23905

jordanlewis opened this issue Mar 15, 2018 · 4 comments
Labels
A-sql-json JSON handling in SQL. A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@jordanlewis
Copy link
Member

jordanlewis commented Mar 15, 2018

The ? operator currently requires a full table scan when used as a filter on an indexed JSON column. Our inverted index representation permits efficient access for this operator, so we should teach index selection to generate efficient spans for ?.

The ? operator is the object key containment operator: it asks whether the second operand is contained within the first as an object key. Our inverted index key representation encodes full paths through every JSON value which means that ? can be implemented as a scan over a subset of those paths.

For example, suppose the JSON value '{"a": {"b": "c"}}was added to an index on thejcolumn of some tablet`. The index key for this looks roughly like:

<json prefix> a <sep> b <term> <string prefix> c

Then, suppose we wanted to execute the query SELECT * FROM t WHERE j ? 'a'. Index selection could generate the span <json prefix> a <term> to the prefix end of <json prefix> a <sep>. A scan for that span will return all top level objects that contain the string key a, as expected.

A slightly more complicated example is SELECT * FROM t WHERE j->'a' ? 'b'. This will require a little more sophistication in index selection to work. I believe j->'a' = 'foo' currently gets normalized to j @> '{"a": "foo"}'. We will need to make a similar normalization for an arrow followed by a ?, so that index selection can properly construct a span for that case. (it should be <json prefix> a <sep> b <term> to the prefix end of <json prefix> a <sep> b <sep>)

Jira issue: CRDB-5800

@jordanlewis jordanlewis added the A-sql-json JSON handling in SQL. label Mar 15, 2018
@jordanlewis jordanlewis added this to the 2.1 milestone Mar 15, 2018
@jordanlewis jordanlewis added A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Apr 27, 2018
@knz knz added A-sql-optimizer SQL logical planning and optimizations. and removed A-sql-optimizer SQL logical planning and optimizations. labels Apr 28, 2018
@justinj
Copy link
Contributor

justinj commented Sep 25, 2018

I originally thought this was pretty easy and would only require generating appropriate spans, but I think it's actually slightly more involved: a given JSON object might appear multiple times in such spans (consider {"a": [1, 2]}), so we also need to insert an (ordered) DISTINCT operation on top of a query like this. Gonna ponder it more but I think it's nontrivial.

@RaduBerinde
Copy link
Member

CC @mgartner

@mgartner
Copy link
Collaborator

@RaduBerinde Ack. I'll take a crack at this.

@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@jordanlewis
Copy link
Member Author

Closing this as a duplicate of #81253.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-json JSON handling in SQL. A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
None yet
Development

No branches or pull requests

7 participants