sql: support indexed key-contains queries in JSON inverted indexes #23905
Labels
A-sql-json
JSON handling in SQL.
A-sql-optimizer
SQL logical planning and optimizations.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
The
?
operator currently requires a full table scan when used as a filter on an indexed JSON column. Our inverted index representation permits efficient access for this operator, so we should teach index selection to generate efficient spans for?
.The
?
operator is the object key containment operator: it asks whether the second operand is contained within the first as an object key. Our inverted index key representation encodes full paths through every JSON value which means that?
can be implemented as a scan over a subset of those paths.For example, suppose the JSON value '{"a": {"b": "c"}}
was added to an index on the
jcolumn of some table
t`. The index key for this looks roughly like:<json prefix> a <sep> b <term> <string prefix> c
Then, suppose we wanted to execute the query
SELECT * FROM t WHERE j ? 'a'
. Index selection could generate the span<json prefix> a <term>
to the prefix end of<json prefix> a <sep>
. A scan for that span will return all top level objects that contain the string keya
, as expected.A slightly more complicated example is
SELECT * FROM t WHERE j->'a' ? 'b'
. This will require a little more sophistication in index selection to work. I believej->'a' = 'foo'
currently gets normalized toj @> '{"a": "foo"}'
. We will need to make a similar normalization for an arrow followed by a?
, so that index selection can properly construct a span for that case. (it should be<json prefix> a <sep> b <term>
to the prefix end of<json prefix> a <sep> b <sep>
)Jira issue: CRDB-5800
The text was updated successfully, but these errors were encountered: