-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql, geo: add support for writing geospatial inverted indexes #47424
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first commit is from #47097
The code in the second commit is mechanical integration code, but I am likely to have overlooked stuff.
I need help with some TODOs that I have pointed out in review comments.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)
pkg/sql/opt/exec/execbuilder/testdata/geospatial, line 34 at r2 (raw file):
# TODO: why is this /Table/54/2, /Table/54/3, /Table/54/4, /Table/54/5? Shouldn't # there be 2 inverted index "table" numbers instead of 4?
this TODO
pkg/sql/sem/builtins/builtins.go, line 3697 at r2 (raw file):
} // TODO: change this call. We need the index configuration to know // how many entries there will be. How do we get that?
this TODO
pkg/sql/sqlbase/index_encoding.go, line 882 at r2 (raw file):
return encodeArrayInvertedIndexTableKeys(val.(*tree.DArray), inKey) // TODO: Temporary hack. Remove the following: // - iiuc, the call from span_builder.go should not be using
this bullet in this TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't look at the guts of the geo stuff.
pkg/sql/sqlbase/index_encoding.go
Outdated
|
||
return EncodeInvertedIndexTableKeys(val, keyPrefix) | ||
if !geoindex.IsEmptyConfig(&index.GeoConfig) { | ||
return encodeGeoInvertedIndexTableKeys(val, keyPrefix, index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer this switch to go inside of EncodeInvertedIndexTableKeys
, because that function already has a switch between JSON and Array - so it feels more natural there. Won't you be able to add a case to switch val.ResolvedType().Family()
rather than checking the index type as well, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, it's that span_builder is a different semantic use. Let's talk more in the meeting, I don't get this still.
pkg/sql/sem/builtins/builtins.go
Outdated
} | ||
// TODO: change this call. We need the index configuration to know | ||
// how many entries there will be. How do we get that? | ||
arr := tree.MustBeDGeography(arg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rather than try to figure that out, you should make a new builtin and change the caller in validateInvertedIndexes
. That thing will have all of the context, whereas plumbing it into the builtins will be extremely awkward.
pkg/sql/sem/tree/datum.go
Outdated
func MustBeDGeography(e Expr) *DGeography { | ||
i, ok := AsDGeography(e) | ||
if !ok { | ||
panic(errors.AssertionFailedf("expected *DArray, found %T", e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/DArray/DGeography/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @otan, @rytaft, and @sumeerbhola)
pkg/sql/opt/exec/execbuilder/testdata/geospatial, line 34 at r2 (raw file):
Previously, sumeerbhola wrote…
this TODO
It's because you made 4 indexes. the INVERTED INDEX declaration in CREATE TABLE is equivalent to making a new index.
❌ The GitHub CI (Cockroach) build has failed on 89465d0c. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, @rytaft, and @sumeerbhola)
pkg/sql/sem/builtins/builtins.go, line 3698 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
I think rather than try to figure that out, you should make a new builtin and change the caller in
validateInvertedIndexes
. That thing will have all of the context, whereas plumbing it into the builtins will be extremely awkward.
I didn't quite understand -- I can make a new builtin, but how do I get access to the IndexDescriptor
in the builtin execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @otan, @rytaft, and @sumeerbhola)
pkg/sql/sem/builtins/builtins.go, line 3698 at r2 (raw file):
Previously, sumeerbhola wrote…
I didn't quite understand -- I can make a new builtin, but how do I get access to the
IndexDescriptor
in the builtin execution?
I figured you should pass whatever context you need from the IndexDescriptor
in to the new builtin. Or is that not feasible for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, @rytaft, and @sumeerbhola)
pkg/sql/sqlbase/index_encoding.go, line 858 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
I see, it's that span_builder is a different semantic use. Let's talk more in the meeting, I don't get this still.
EncodeInvertedIndexTableKeys
has the following callers:
num_inverted_index_entries
: which only cares about the length of the encoded keysspan_builder
: I am not sure what this is for, but I was guessing queries that read indexesEncodeInvertedIndexKeys
(this method): which is called for two reasons
a. for zig-zag join: we don't plan to use zig-zag joins for geospatial queries.
b. index_encoding.go
I think we only care about 3b and 1 for geospatial indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @otan, @rytaft, and @sumeerbhola)
pkg/sql/sqlbase/index_encoding.go, line 858 at r2 (raw file):
Previously, sumeerbhola wrote…
EncodeInvertedIndexTableKeys
has the following callers:
num_inverted_index_entries
: which only cares about the length of the encoded keysspan_builder
: I am not sure what this is for, but I was guessing queries that read indexesEncodeInvertedIndexKeys
(this method): which is called for two reasons
a. for zig-zag join: we don't plan to use zig-zag joins for geospatial queries.
b. index_encoding.goI think we only care about 3b and 1 for geospatial indexes.
span_builder is for statically encoding index spans for queries like SELECT * FROM t WHERE indexed_column = some_constant
for example, or like WHERE json_column @> '{"foo": "bar"}'
for a json inverted index. Is there no way to specify a constant constraint against a geo index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really familiar enough with this code to sign off on it, but everything seems reasonable to me.
Reviewed 11 of 11 files at r1, 16 of 16 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @otan and @sumeerbhola)
89465d0
to
66f0434
Compare
❌ The GitHub CI (Cockroach) build has failed on 66f04343. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)
pkg/sql/opt/exec/execbuilder/testdata/geospatial, line 34 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
It's because you made 4 indexes. the INVERTED INDEX declaration in CREATE TABLE is equivalent to making a new index.
ah, typo -- I thought I had created on both tables. Thanks for spotting it.
pkg/sql/sem/builtins/builtins.go, line 3698 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
I figured you should pass whatever context you need from the
IndexDescriptor
in to the new builtin. Or is that not feasible for some reason?
I must be missing something obvious:
validateInvertedIndexes
uses a SQL statement of the formSELECT coalesce(sum_int(crdb_internal.num_inverted_index_entries
- I can make a new builtin, mechanically following the pattern in
builtins.go
but I don't know how to provide theIndexDescriptor
to this builtin -- maybe it is buried somewhere in thetree.EvalContext
parameter toOverload.Fn
? - I see
crdb_internal.encode_key
has code to get theIndexDescriptor
but it is being explicitly passed atable_id
andindex_id
parameter. Ah, maybe you meant do the same sincevalidateInvertedIndexes
has theTableDescriptor
andIndexDescriptor
so can grab theID
. Is that correct?
pkg/sql/sem/tree/datum.go, line 2667 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
s/DArray/DGeography/
Done.
pkg/sql/sqlbase/index_encoding.go, line 858 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
span_builder is for statically encoding index spans for queries like
SELECT * FROM t WHERE indexed_column = some_constant
for example, or likeWHERE json_column @> '{"foo": "bar"}'
for a json inverted index. Is there no way to specify a constant constraint against a geo index?
There can be a constant in one of the operators, like WHERE ST_Equals(ST_GeomFromText('POINT(0 0)'), geom_column)
but that will also translate into calls into the one of the functions in the {Geography,Geometry}Index interface https://github.com/cockroachdb/cockroach/blob/master/pkg/geo/geoindex/index.go#L61
Currently, ST_Equals
will also use the Intersects
function in the index interface as described in the list here https://docs.google.com/document/d/1OhJdGb3JviCuhyWgsF0VdUgeoW5XfOLQeLECsJqE58E/edit#heading=h.63bn2tg7docd
Eventually that could improve, but it will always be a call to one of the functions there, that will return an expression and the spans will be derived from the expression. So I don't think it will be this path -- somewhere in the caller we will need to do different things for geospatial. Does that sound reasonable?
66f0434
to
7cd4c81
Compare
❌ The GitHub CI (Cockroach) build has failed on 7cd4c815. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
7cd4c81
to
c98eab0
Compare
❌ The GitHub CI (Cockroach) build has failed on c98eab03. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
c98eab0
to
acd5dc6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)
pkg/sql/sem/builtins/builtins.go, line 3698 at r2 (raw file):
Previously, sumeerbhola wrote…
I must be missing something obvious:
validateInvertedIndexes
uses a SQL statement of the formSELECT coalesce(sum_int(crdb_internal.num_inverted_index_entries
- I can make a new builtin, mechanically following the pattern in
builtins.go
but I don't know how to provide theIndexDescriptor
to this builtin -- maybe it is buried somewhere in thetree.EvalContext
parameter toOverload.Fn
?- I see
crdb_internal.encode_key
has code to get theIndexDescriptor
but it is being explicitly passed atable_id
andindex_id
parameter. Ah, maybe you meant do the same sincevalidateInvertedIndexes
has theTableDescriptor
andIndexDescriptor
so can grab theID
. Is that correct?
I've added crdb_internal.num_geo_inverted_index_entries
and changed validateInvertedIndexes
to use it.
pkg/sql/sqlbase/index_encoding.go, line 882 at r2 (raw file):
Previously, sumeerbhola wrote…
this bullet in this TODO
Fixed, by removing this change -- there is an EncodeGeoInvertedIndexTableKeys
function for geospatial indexes, and it takes an additional IndexDescriptor
parameter. Which is why the switch to decide which to use is made in EncodeInvertedIndexKeys
. The EncodeGeoInvertedIndexTableKeys
are also used directly by the implementation of the crdb_internal.num_geo_inverted_index_entries
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)
pkg/sql/create_table.go, line 1437 at r3 (raw file):
return desc, err } if d.Inverted {
MakeIndexDescriptor()
does not get used when the index is created as part of CREATE TABLE
, so similar code is repeated here.
❌ The GitHub CI (Cockroach) build has failed on acd5dc64. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
acd5dc6
to
66866f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting to wonder whether this should be CREATE SPATIAL INDEX
, because we are special casing the special casing for inverted indexes in here. Happy for this go ahead for now though.
Few small comments.
Reviewed 6 of 20 files at r3, 3 of 4 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, @rytaft, and @sumeerbhola)
pkg/geo/geoindex/index.go, line 544 at r4 (raw file):
MaxLevel: 30, LevelMod: 1, MaxCells: 1,
thought we were doing 4?
pkg/geo/geoindex/s2_geometry_index.go, line 59 at r4 (raw file):
// Arbitrary bounding box. // TODO(sumeer): replace with parameters specified by CREATE INDEX. MinX: 0,
hmm, i'd probably do -10000 for mins :)
but that's a big nit hehe
pkg/sql/create_index.go, line 152 at r4 (raw file):
} indexDesc.Type = sqlbase.IndexDescriptor_INVERTED columnDesc, _, err := tableDesc.FindColumnByName(n.Columns[0].Column)
we should probably error if there are more than one columns for geospatial indexes
pkg/sql/create_index.go, line 162 at r4 (raw file):
indexDesc.GeoConfig = *geoindex.DefaultGeographyIndexConfig() } telemetry.Inc(sqltelemetry.InvertedIndexCounter)
mind adding a Geometry/GeographyInvertedIndex counter to the telemetry here?
pkg/sql/sem/tree/datum.go, line 2667 at r2 (raw file):
Previously, sumeerbhola wrote…
Done.
maybe MustAsDGeography
, to match the function name?
pkg/sql/sem/tree/datum.go, line 2772 at r4 (raw file):
// MustBeDGeometry attempts to retrieve a *DGeometry from an Expr, panicking // if the assertion fails. func MustBeDGeometry(e Expr) *DGeometry {
ditto above.
pkg/sql/sqlbase/index_encoding.go, line 921 at r4 (raw file):
case types.GeographyFamily: index := geoindex.NewS2GeographyIndex(*index.GeoConfig.S2Geography) intKeys, err := index.InvertedIndexKeys(context.TODO(), val.(*tree.DGeography).Geography)
are there any plans to use Context? I don't think we need to pass it in considering it doesn't do network calls. Feel free to delete it from everything in this PR (or later)
Postgres had already special syntax to choose the index type (USING). I don't think there's need to invent something new.
Also be mindful to report the index type consistently in pg_catalog.
--
Verstuurd vanaf mijn Android apparaat met K-9 Mail. Excuseer mijn beknoptheid.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also be mindful to report the index type consistently in pg_catalog.
how do I report in pg_catalog?
Also Jordan had said something about adding a version gate for writing such indexes -- @otan can you point me to where I can do that?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)
pkg/geo/geoindex/index.go, line 544 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
thought we were doing 4?
This was initially a temporary hack and forgot to update. Done
pkg/geo/geoindex/s2_geometry_index.go, line 59 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
hmm, i'd probably do -10000 for mins :)
but that's a big nit hehe
min+max=0 has a nice symmetry. Done.
pkg/sql/create_index.go, line 152 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
we should probably error if there are more than one columns for geospatial indexes
there will be more columns, corresponding to the primary key of the table.
pkg/sql/create_index.go, line 162 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
mind adding a Geometry/GeographyInvertedIndex counter to the telemetry here?
Done.
pkg/sql/sem/tree/datum.go, line 2667 at r2 (raw file):
Previously, otan (Oliver Tan) wrote…
maybe
MustAsDGeography
, to match the function name?
I didn't understand what should be MustAsDGeography
-- this follows the pattern of other functions in this file.
pkg/sql/sem/tree/datum.go, line 2772 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
ditto above.
Same question
pkg/sql/sqlbase/index_encoding.go, line 921 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
are there any plans to use Context? I don't think we need to pass it in considering it doesn't do network calls. Feel free to delete it from everything in this PR (or later)
This was in Dan's initial interface so I copied it assuming it was for logging/tracing. Is it ok if we revisit once we have something working end-to-end? I'd like to delay such fine-tuning of interfaces after we have something end-to-end working that uses this code path.
66866f7
to
32d1e55
Compare
❌ The GitHub CI (Cockroach) build has failed on 32d1e550. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
32d1e55
to
9953ffa
Compare
Version gating example: #47800 PG Catalog: cockroach/pkg/sql/pg_catalog.go Line 1545 in 998abbe
make testbaselogic FILES='pg_catalog' TESTFLAGS='-rewrite' )
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, @rytaft, and @sumeerbhola)
pkg/sql/create_index.go, line 162 at r4 (raw file):
Previously, sumeerbhola wrote…
Done.
might be nice to separate into "Geometry" and "Geography".
pkg/sql/sem/tree/datum.go, line 2667 at r2 (raw file):
Previously, sumeerbhola wrote…
I didn't understand what should be
MustAsDGeography
-- this follows the pattern of other functions in this file.
ah, i usally do Must
in front of functions like this, so AsDGeography
-> MustAsDGeography
.
But if it's already the style, /shrug.
pkg/sql/sqlbase/index_encoding.go, line 921 at r4 (raw file):
Previously, sumeerbhola wrote…
This was in Dan's initial interface so I copied it assuming it was for logging/tracing. Is it ok if we revisit once we have something working end-to-end? I'd like to delay such fine-tuning of interfaces after we have something end-to-end working that uses this code path.
ok
❌ The GitHub CI (Cockroach) build has failed on 9953ffa7. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
01e1a78
to
4f6e59b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- I added a test to the pg_catalog
file -- the indexes were already being populated.
And in light of #47800 which gates the geospatial types, there is no additional feature gate here.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)
pkg/sql/create_index.go, line 162 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
might be nice to separate into "Geometry" and "Geography".
Given that there are not many counters defined in sqltelemetry/schema.go I was concerned about polluting that space by adding 2 counters instead of 1. Where are the values in these counters stored and how do we use them? You have a broader understanding of these counters so let me know if you still prefer two counters, and I'll make the change.
pkg/sql/create_index.go, line 162 at r4 (raw file): Previously, sumeerbhola wrote…
i think the more specific we can be the better. they're used by PMs on looker for analytics work, I think this will also be a useful metrics for us later if we can differentiate which indexing is used more often. |
4f6e59b
to
fda6951
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)
pkg/sql/create_index.go, line 162 at r4 (raw file):
Previously, otan (Oliver Tan) wrote…
i think the more specific we can be the better.
they're used by PMs on looker for analytics work, I think this will also be a useful metrics for us later if we can differentiate which indexing is used more often.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)
fda6951
to
6e2f009
Compare
❌ The GitHub CI (Cockroach) build has failed on 6e2f009f. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
6e2f009
to
88854aa
Compare
Release note: None
88854aa
to
63549e5
Compare
bors r+ |
Build succeeded |
Release note: None