Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql, geo: add support for writing geospatial inverted indexes #47424

Merged
merged 1 commit into from
Apr 26, 2020

Conversation

sumeerbhola
Copy link
Collaborator

Release note: None

@sumeerbhola sumeerbhola requested review from jordanlewis, rytaft, otan and a team April 13, 2020 15:14
@sumeerbhola sumeerbhola requested review from a team as code owners April 13, 2020 15:14
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit is from #47097
The code in the second commit is mechanical integration code, but I am likely to have overlooked stuff.

I need help with some TODOs that I have pointed out in review comments.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)


pkg/sql/opt/exec/execbuilder/testdata/geospatial, line 34 at r2 (raw file):


# TODO: why is this /Table/54/2, /Table/54/3, /Table/54/4, /Table/54/5? Shouldn't
# there be 2 inverted index "table" numbers instead of 4?

this TODO


pkg/sql/sem/builtins/builtins.go, line 3697 at r2 (raw file):

				}
				// TODO: change this call. We need the index configuration to know
				// how many entries there will be. How do we get that?

this TODO


pkg/sql/sqlbase/index_encoding.go, line 882 at r2 (raw file):

		return encodeArrayInvertedIndexTableKeys(val.(*tree.DArray), inKey)
		// TODO: Temporary hack. Remove the following:
		// - iiuc, the call from span_builder.go should not be using

this bullet in this TODO

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't look at the guts of the geo stuff.


return EncodeInvertedIndexTableKeys(val, keyPrefix)
if !geoindex.IsEmptyConfig(&index.GeoConfig) {
return encodeGeoInvertedIndexTableKeys(val, keyPrefix, index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this switch to go inside of EncodeInvertedIndexTableKeys, because that function already has a switch between JSON and Array - so it feels more natural there. Won't you be able to add a case to switch val.ResolvedType().Family() rather than checking the index type as well, or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it's that span_builder is a different semantic use. Let's talk more in the meeting, I don't get this still.

}
// TODO: change this call. We need the index configuration to know
// how many entries there will be. How do we get that?
arr := tree.MustBeDGeography(arg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rather than try to figure that out, you should make a new builtin and change the caller in validateInvertedIndexes. That thing will have all of the context, whereas plumbing it into the builtins will be extremely awkward.

func MustBeDGeography(e Expr) *DGeography {
i, ok := AsDGeography(e)
if !ok {
panic(errors.AssertionFailedf("expected *DArray, found %T", e))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/DArray/DGeography/

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan, @rytaft, and @sumeerbhola)


pkg/sql/opt/exec/execbuilder/testdata/geospatial, line 34 at r2 (raw file):

Previously, sumeerbhola wrote…

this TODO

It's because you made 4 indexes. the INVERTED INDEX declaration in CREATE TABLE is equivalent to making a new index.

@blathers-crl
Copy link

blathers-crl bot commented Apr 13, 2020

❌ The GitHub CI (Cockroach) build has failed on 89465d0c.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, @rytaft, and @sumeerbhola)


pkg/sql/sem/builtins/builtins.go, line 3698 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I think rather than try to figure that out, you should make a new builtin and change the caller in validateInvertedIndexes. That thing will have all of the context, whereas plumbing it into the builtins will be extremely awkward.

I didn't quite understand -- I can make a new builtin, but how do I get access to the IndexDescriptor in the builtin execution?

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan, @rytaft, and @sumeerbhola)


pkg/sql/sem/builtins/builtins.go, line 3698 at r2 (raw file):

Previously, sumeerbhola wrote…

I didn't quite understand -- I can make a new builtin, but how do I get access to the IndexDescriptor in the builtin execution?

I figured you should pass whatever context you need from the IndexDescriptor in to the new builtin. Or is that not feasible for some reason?

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, @rytaft, and @sumeerbhola)


pkg/sql/sqlbase/index_encoding.go, line 858 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I see, it's that span_builder is a different semantic use. Let's talk more in the meeting, I don't get this still.

EncodeInvertedIndexTableKeys has the following callers:

  1. num_inverted_index_entries: which only cares about the length of the encoded keys
  2. span_builder: I am not sure what this is for, but I was guessing queries that read indexes
  3. EncodeInvertedIndexKeys (this method): which is called for two reasons
    a. for zig-zag join: we don't plan to use zig-zag joins for geospatial queries.
    b. index_encoding.go

I think we only care about 3b and 1 for geospatial indexes.

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan, @rytaft, and @sumeerbhola)


pkg/sql/sqlbase/index_encoding.go, line 858 at r2 (raw file):

Previously, sumeerbhola wrote…

EncodeInvertedIndexTableKeys has the following callers:

  1. num_inverted_index_entries: which only cares about the length of the encoded keys
  2. span_builder: I am not sure what this is for, but I was guessing queries that read indexes
  3. EncodeInvertedIndexKeys (this method): which is called for two reasons
    a. for zig-zag join: we don't plan to use zig-zag joins for geospatial queries.
    b. index_encoding.go

I think we only care about 3b and 1 for geospatial indexes.

span_builder is for statically encoding index spans for queries like SELECT * FROM t WHERE indexed_column = some_constant for example, or like WHERE json_column @> '{"foo": "bar"}' for a json inverted index. Is there no way to specify a constant constraint against a geo index?

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar enough with this code to sign off on it, but everything seems reasonable to me.

Reviewed 11 of 11 files at r1, 16 of 16 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan and @sumeerbhola)

@blathers-crl
Copy link

blathers-crl bot commented Apr 16, 2020

❌ The GitHub CI (Cockroach) build has failed on 66f04343.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)


pkg/sql/opt/exec/execbuilder/testdata/geospatial, line 34 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

It's because you made 4 indexes. the INVERTED INDEX declaration in CREATE TABLE is equivalent to making a new index.

ah, typo -- I thought I had created on both tables. Thanks for spotting it.


pkg/sql/sem/builtins/builtins.go, line 3698 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I figured you should pass whatever context you need from the IndexDescriptor in to the new builtin. Or is that not feasible for some reason?

I must be missing something obvious:

  • validateInvertedIndexes uses a SQL statement of the form SELECT coalesce(sum_int(crdb_internal.num_inverted_index_entries
  • I can make a new builtin, mechanically following the pattern in builtins.go but I don't know how to provide the IndexDescriptor to this builtin -- maybe it is buried somewhere in the tree.EvalContext parameter to Overload.Fn?
  • I see crdb_internal.encode_key has code to get the IndexDescriptor but it is being explicitly passed a table_id and index_id parameter. Ah, maybe you meant do the same since validateInvertedIndexes has the TableDescriptor and IndexDescriptor so can grab the ID. Is that correct?

pkg/sql/sem/tree/datum.go, line 2667 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

s/DArray/DGeography/

Done.


pkg/sql/sqlbase/index_encoding.go, line 858 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

span_builder is for statically encoding index spans for queries like SELECT * FROM t WHERE indexed_column = some_constant for example, or like WHERE json_column @> '{"foo": "bar"}' for a json inverted index. Is there no way to specify a constant constraint against a geo index?

There can be a constant in one of the operators, like WHERE ST_Equals(ST_GeomFromText('POINT(0 0)'), geom_column) but that will also translate into calls into the one of the functions in the {Geography,Geometry}Index interface https://github.com/cockroachdb/cockroach/blob/master/pkg/geo/geoindex/index.go#L61
Currently, ST_Equals will also use the Intersects function in the index interface as described in the list here https://docs.google.com/document/d/1OhJdGb3JviCuhyWgsF0VdUgeoW5XfOLQeLECsJqE58E/edit#heading=h.63bn2tg7docd
Eventually that could improve, but it will always be a call to one of the functions there, that will return an expression and the spans will be derived from the expression. So I don't think it will be this path -- somewhere in the caller we will need to do different things for geospatial. Does that sound reasonable?

@blathers-crl
Copy link

blathers-crl bot commented Apr 17, 2020

❌ The GitHub CI (Cockroach) build has failed on 7cd4c815.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link

blathers-crl bot commented Apr 17, 2020

❌ The GitHub CI (Cockroach) build has failed on c98eab03.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)


pkg/sql/sem/builtins/builtins.go, line 3698 at r2 (raw file):

Previously, sumeerbhola wrote…

I must be missing something obvious:

  • validateInvertedIndexes uses a SQL statement of the form SELECT coalesce(sum_int(crdb_internal.num_inverted_index_entries
  • I can make a new builtin, mechanically following the pattern in builtins.go but I don't know how to provide the IndexDescriptor to this builtin -- maybe it is buried somewhere in the tree.EvalContext parameter to Overload.Fn?
  • I see crdb_internal.encode_key has code to get the IndexDescriptor but it is being explicitly passed a table_id and index_id parameter. Ah, maybe you meant do the same since validateInvertedIndexes has the TableDescriptor and IndexDescriptor so can grab the ID. Is that correct?

I've added crdb_internal.num_geo_inverted_index_entries and changed validateInvertedIndexes to use it.


pkg/sql/sqlbase/index_encoding.go, line 882 at r2 (raw file):

Previously, sumeerbhola wrote…

this bullet in this TODO

Fixed, by removing this change -- there is an EncodeGeoInvertedIndexTableKeys function for geospatial indexes, and it takes an additional IndexDescriptor parameter. Which is why the switch to decide which to use is made in EncodeInvertedIndexKeys. The EncodeGeoInvertedIndexTableKeys are also used directly by the implementation of the crdb_internal.num_geo_inverted_index_entries.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)


pkg/sql/create_table.go, line 1437 at r3 (raw file):

				return desc, err
			}
			if d.Inverted {

MakeIndexDescriptor() does not get used when the index is created as part of CREATE TABLE, so similar code is repeated here.

@blathers-crl
Copy link

blathers-crl bot commented Apr 18, 2020

❌ The GitHub CI (Cockroach) build has failed on acd5dc64.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting to wonder whether this should be CREATE SPATIAL INDEX, because we are special casing the special casing for inverted indexes in here. Happy for this go ahead for now though.

Few small comments.

Reviewed 6 of 20 files at r3, 3 of 4 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, @rytaft, and @sumeerbhola)


pkg/geo/geoindex/index.go, line 544 at r4 (raw file):

		MaxLevel: 30,
		LevelMod: 1,
		MaxCells: 1,

thought we were doing 4?


pkg/geo/geoindex/s2_geometry_index.go, line 59 at r4 (raw file):

			// Arbitrary bounding box.
			// TODO(sumeer): replace with parameters specified by CREATE INDEX.
			MinX:     0,

hmm, i'd probably do -10000 for mins :)
but that's a big nit hehe


pkg/sql/create_index.go, line 152 at r4 (raw file):

		}
		indexDesc.Type = sqlbase.IndexDescriptor_INVERTED
		columnDesc, _, err := tableDesc.FindColumnByName(n.Columns[0].Column)

we should probably error if there are more than one columns for geospatial indexes


pkg/sql/create_index.go, line 162 at r4 (raw file):

			indexDesc.GeoConfig = *geoindex.DefaultGeographyIndexConfig()
		}
		telemetry.Inc(sqltelemetry.InvertedIndexCounter)

mind adding a Geometry/GeographyInvertedIndex counter to the telemetry here?


pkg/sql/sem/tree/datum.go, line 2667 at r2 (raw file):

Previously, sumeerbhola wrote…

Done.

maybe MustAsDGeography, to match the function name?


pkg/sql/sem/tree/datum.go, line 2772 at r4 (raw file):

// MustBeDGeometry attempts to retrieve a *DGeometry from an Expr, panicking
// if the assertion fails.
func MustBeDGeometry(e Expr) *DGeometry {

ditto above.


pkg/sql/sqlbase/index_encoding.go, line 921 at r4 (raw file):

	case types.GeographyFamily:
		index := geoindex.NewS2GeographyIndex(*index.GeoConfig.S2Geography)
		intKeys, err := index.InvertedIndexKeys(context.TODO(), val.(*tree.DGeography).Geography)

are there any plans to use Context? I don't think we need to pass it in considering it doesn't do network calls. Feel free to delete it from everything in this PR (or later)

@knz
Copy link
Contributor

knz commented Apr 21, 2020 via email

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also be mindful to report the index type consistently in pg_catalog.

how do I report in pg_catalog?

Also Jordan had said something about adding a version gate for writing such indexes -- @otan can you point me to where I can do that?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)


pkg/geo/geoindex/index.go, line 544 at r4 (raw file):

Previously, otan (Oliver Tan) wrote…

thought we were doing 4?

This was initially a temporary hack and forgot to update. Done


pkg/geo/geoindex/s2_geometry_index.go, line 59 at r4 (raw file):

Previously, otan (Oliver Tan) wrote…

hmm, i'd probably do -10000 for mins :)
but that's a big nit hehe

min+max=0 has a nice symmetry. Done.


pkg/sql/create_index.go, line 152 at r4 (raw file):

Previously, otan (Oliver Tan) wrote…

we should probably error if there are more than one columns for geospatial indexes

there will be more columns, corresponding to the primary key of the table.


pkg/sql/create_index.go, line 162 at r4 (raw file):

Previously, otan (Oliver Tan) wrote…

mind adding a Geometry/GeographyInvertedIndex counter to the telemetry here?

Done.


pkg/sql/sem/tree/datum.go, line 2667 at r2 (raw file):

Previously, otan (Oliver Tan) wrote…

maybe MustAsDGeography, to match the function name?

I didn't understand what should be MustAsDGeography -- this follows the pattern of other functions in this file.


pkg/sql/sem/tree/datum.go, line 2772 at r4 (raw file):

Previously, otan (Oliver Tan) wrote…

ditto above.

Same question


pkg/sql/sqlbase/index_encoding.go, line 921 at r4 (raw file):

Previously, otan (Oliver Tan) wrote…

are there any plans to use Context? I don't think we need to pass it in considering it doesn't do network calls. Feel free to delete it from everything in this PR (or later)

This was in Dan's initial interface so I copied it assuming it was for logging/tracing. Is it ok if we revisit once we have something working end-to-end? I'd like to delay such fine-tuning of interfaces after we have something end-to-end working that uses this code path.

@blathers-crl
Copy link

blathers-crl bot commented Apr 21, 2020

❌ The GitHub CI (Cockroach) build has failed on 32d1e550.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@otan
Copy link
Contributor

otan commented Apr 21, 2020

Version gating example: #47800

PG Catalog:

CREATE TABLE pg_catalog.pg_indexes (
(tests for it: , tests make testbaselogic FILES='pg_catalog' TESTFLAGS='-rewrite')

Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, @rytaft, and @sumeerbhola)


pkg/sql/create_index.go, line 162 at r4 (raw file):

Previously, sumeerbhola wrote…

Done.

might be nice to separate into "Geometry" and "Geography".


pkg/sql/sem/tree/datum.go, line 2667 at r2 (raw file):

Previously, sumeerbhola wrote…

I didn't understand what should be MustAsDGeography -- this follows the pattern of other functions in this file.

ah, i usally do Must in front of functions like this, so AsDGeography -> MustAsDGeography.

But if it's already the style, /shrug.


pkg/sql/sqlbase/index_encoding.go, line 921 at r4 (raw file):

Previously, sumeerbhola wrote…

This was in Dan's initial interface so I copied it assuming it was for logging/tracing. Is it ok if we revisit once we have something working end-to-end? I'd like to delay such fine-tuning of interfaces after we have something end-to-end working that uses this code path.

ok

@blathers-crl
Copy link

blathers-crl bot commented Apr 22, 2020

❌ The GitHub CI (Cockroach) build has failed on 9953ffa7.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@sumeerbhola sumeerbhola force-pushed the index_write branch 2 times, most recently from 01e1a78 to 4f6e59b Compare April 22, 2020 16:24
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- I added a test to the pg_catalog file -- the indexes were already being populated.
And in light of #47800 which gates the geospatial types, there is no additional feature gate here.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)


pkg/sql/create_index.go, line 162 at r4 (raw file):

Previously, otan (Oliver Tan) wrote…

might be nice to separate into "Geometry" and "Geography".

Given that there are not many counters defined in sqltelemetry/schema.go I was concerned about polluting that space by adding 2 counters instead of 1. Where are the values in these counters stored and how do we use them? You have a broader understanding of these counters so let me know if you still prefer two counters, and I'll make the change.

@otan
Copy link
Contributor

otan commented Apr 22, 2020


pkg/sql/create_index.go, line 162 at r4 (raw file):

Previously, sumeerbhola wrote…

Given that there are not many counters defined in sqltelemetry/schema.go I was concerned about polluting that space by adding 2 counters instead of 1. Where are the values in these counters stored and how do we use them? You have a broader understanding of these counters so let me know if you still prefer two counters, and I'll make the change.

i think the more specific we can be the better.

they're used by PMs on looker for analytics work, I think this will also be a useful metrics for us later if we can differentiate which indexing is used more often.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)


pkg/sql/create_index.go, line 162 at r4 (raw file):

Previously, otan (Oliver Tan) wrote…

i think the more specific we can be the better.

they're used by PMs on looker for analytics work, I think this will also be a useful metrics for us later if we can differentiate which indexing is used more often.

Done

Copy link
Contributor

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @otan, and @rytaft)

@blathers-crl
Copy link

blathers-crl bot commented Apr 26, 2020

❌ The GitHub CI (Cockroach) build has failed on 6e2f009f.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@sumeerbhola
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 26, 2020

Build succeeded

@craig craig bot merged commit a20a881 into cockroachdb:master Apr 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants