execinfrapb: add InvertedJoinerSpec for joins between #48453

sumeerbhola · 2020-05-05T19:27:22Z

two tables using an inverted index on one table

This is extracted from #48019 which has more context, including
the implementation.

Release note: None

two tables using an inverted index on one table This is extracted from cockroachdb#48019 which has more context, including the implementation. Release note: None

cockroach-teamcity · 2020-05-05T19:27:29Z

This change is

asubiotto · 2020-05-06T09:34:57Z

Can we reuse the JoinReaderSpec instead of creating a new spec? I think the only difference is the inverted expression, which would signal to execution that an inverted joiner should be used.

sumeerbhola

Maybe I am missing something about the CockroachDB context, so I am very open to being convinced otherwise, but currently I'm not seeing the benefit:

there is a long comment before InvertedJoinSpec that is important for explaining what this is about. That will need to be mashed together with the JoinReaderSpec comment which I already find a bit hard to understand.
Both have Table, IndexIdx, but note that IndexIdx needs a different comment.
InvertedJoinerSpec has a single LookupColumn -- that can be stuffed into the LookupColumns slice, but again a different comment
Remaining fields are InvertedExpr, OnExpr and Type: First is unique to invertedJoiner. All 3 need comments very specific to inverted joins.

A mashup of these two will need to have a long comment explaining the different cases and for each field also state the different cases. That would hinder readability. There is a history in proto land of mashing together different things into a single proto and then being left with a bad headache later (there are infamous examples at Google preceding extensions support in proto2), but those were all for transport purposes (and eventual demultiplexing, which requires the same proto type). That would not be an issue here. Also, these similarities between specs are shared among some of the other specs like HashJoinerSpec and MergeJoinerSpec, so I am not sure why we are trying to reuse the same proto here.

Are you concerned that this will ensure that we will never merge invertedJoiner and joinReader? If yes, my current thinking is

any merging will need to process the specs and set up the data-structures somewhat differently, so a different spec is not a blocker
currently I don't see any benefit from merging the join execution code -- I think it will obscure the code logic, as I mentioned in a detailed comment in [DNM] rowexec, rowcontainer: add an invertedJoiner, for #48019 (which I've copy-pasted below [1] since I think it got lost in the large variety of comments in that PR) and in a comment in rowexec: make lookup joins with no required ordering more efficient #48439 . And if we get serious about inverted indexes, especially the on-disk representation, the logic will further diverge. Perhaps the issue is that the "inverted join" was initially incorrectly characterized as a "lookup join" -- it isn't really a lookup join.

Thoughts?

[1]
Stylistically it follows a similar pattern. Next() is very similar though that seems hardly to justify refactoring since it is a trivial for loop. The differences are unfortunately buried in each method:

readInput() needs to do a different thing with each input row. And then when the batch is read it needs to do a different thing for preparing the spans to read.
performLookup() does a different thing with the read row: both the transformation to the PK row and when the scan is done the call to batchedInvertedExprEvaluator.
emitRow starts looking similar but unlike joinReader we can't short-circuit the processing for semi-join and anti-join in the PerformLookup() stage -- it needs to happen in emitRow. And emitRow is iterating over a different data-structure (joinedRowIdx) and doing a different transformation on what it retrieves from the row container.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @jordanlewis, and @rytaft)

asubiotto · 2020-05-06T16:09:04Z

Are you concerned that this will ensure that we will never merge invertedJoiner and joinReader?

No, I think this is different.

The issues with the field comments can be resolved by being more general (e.g. index_idx is the id of the index we're looking up, regardless of the type of join). I agree that the general comment is more problematic so maybe it's a good argument that we're overloading a spec definition.

My biggest concern is that whenever we add fields that apply to all joins, we'll have to make sure to add these to each spec. For example, should this spec also add Visibility, LockingStrength or LockingWaitPolicy since it uses a row fetcher? It looks like we already add the same field to multiple specs though so maybe that's fine 🤷 This could also be an orthogonal problem, since you could argue that these fields should probably be encapsulated in a separate protobuf referenced by all specs that need them.

I think I agree with you that it's bad to overload spec definitions. I wish there was a better way to share common functionality though.

sumeerbhola

For example, should this spec also add Visibility, LockingStrength or LockingWaitPolicy since it uses a row fetcher? It looks like we already add the same field to multiple specs though so maybe that's fine

Perhaps, but I can't come up with a real-world example where we would use the inverted index to narrow the rows to mutate, so we should not add the ability to lock the inverted index rows or add ScanVisibility. We could revisit this if there is a compelling example from a real user, but I suspect that will be far in the future or never.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @jordanlewis, and @rytaft)

asubiotto

Reviewed 2 of 2 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @rytaft)

rytaft

Reviewed 2 of 2 files at r1.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @jordanlewis)

sumeerbhola · 2020-05-08T18:40:36Z

TFTR!

sumeerbhola · 2020-05-08T18:41:15Z

bors r+

craig · 2020-05-08T20:26:28Z

Build succeeded

GitHub CI (Cockroach)

execinfrapb: add InvertedJoinerSpec for joins between

a3cb06d

two tables using an inverted index on one table This is extracted from cockroachdb#48019 which has more context, including the implementation. Release note: None

sumeerbhola requested review from jordanlewis, rytaft, asubiotto and a team May 5, 2020 19:27

sumeerbhola commented May 6, 2020

View reviewed changes

asubiotto approved these changes May 7, 2020

View reviewed changes

rytaft approved these changes May 7, 2020

View reviewed changes

craig bot merged commit e35af53 into cockroachdb:master May 8, 2020

sumeerbhola deleted the inverted_joiner_spec branch May 8, 2020 21:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

execinfrapb: add InvertedJoinerSpec for joins between #48453

execinfrapb: add InvertedJoinerSpec for joins between #48453

sumeerbhola commented May 5, 2020

cockroach-teamcity commented May 5, 2020

asubiotto commented May 6, 2020

sumeerbhola left a comment

asubiotto commented May 6, 2020

sumeerbhola left a comment

asubiotto left a comment

rytaft left a comment

sumeerbhola commented May 8, 2020

sumeerbhola commented May 8, 2020

craig bot commented May 8, 2020

execinfrapb: add InvertedJoinerSpec for joins between #48453

execinfrapb: add InvertedJoinerSpec for joins between #48453

Conversation

sumeerbhola commented May 5, 2020

cockroach-teamcity commented May 5, 2020

asubiotto commented May 6, 2020

sumeerbhola left a comment

Choose a reason for hiding this comment

asubiotto commented May 6, 2020

sumeerbhola left a comment

Choose a reason for hiding this comment

asubiotto left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

sumeerbhola commented May 8, 2020

sumeerbhola commented May 8, 2020

craig bot commented May 8, 2020

Build succeeded