-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexec: fix premature calls to IdempotentClose in hash router outputs #49333
Conversation
ddcdd92
to
5423555
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, outboxes only close the closers after a zero-batch is received. If a hashrouter is the input, this will only happen once all streams have received all batches, so I don't think this patch fixes the linked issue.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)
I added some print statements, and it appears that outbox
I'll continue on looking. (Note this is the behavior on master, without this patch with some extra printf statements.) |
Ok, I think I understand what's going on, and I believe my initial patch correctly fixes the issue (a side note: making We have this plan, and
And this is where the problem is - the output of |
Previously, we were passing in `toClose` slice of idempotent closers to every outbox that is the output of a hash router. However, it is possible that one stream will be closed before others, and this would prompt the corresponding outbox to close all of the closers prematurely. Other streams might still be active and ready to consume more data, but that single outbox would close everything. This is now fixed by making hash router responsible for closing the whole `toClose` slice, and it will do so right before exiting `Run` method. Release note (bug fix): Previously, CockroachDB could return an internal error or incorrect results on queries when they were run via the vectorized execution engine and had a hash router in the DistSQL plan. This could only occur with `vectorize=on`.
5423555
to
294a8a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging deeper into this! It makes sense.
Reviewed 1 of 4 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/sql/colflow/vectorized_flow.go, line 671 at r1 (raw file):
// router is responsible for closing all of the idempotent closers. if _, err := s.setupRemoteOutputStream( ctx, flowCtx, op, outputTyps, stream, metadataSourcesQueue, nil /* toClose */, factory,
Hmm, were we previously passing in toClose
to multiple outboxes? I guess it doesn't matter since they are idempotent closers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/sql/colflow/vectorized_flow.go, line 671 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Hmm, were we previously passing in
toClose
to multiple outboxes? I guess it doesn't matter since they are idempotent closers.
Exactly, we were passing full toClose
to all outbox
es, so the first one to shutdown would initiate the closure of toClose
things and others would initiate a noop.
Build failed (retrying...) |
Build succeeded |
Previously, we were passing in
toClose
slice of idempotent closers toevery outbox that is the output of a hash router. However, it is
possible that one stream will be closed before others, and this would
prompt the corresponding outbox to close all of the closers prematurely.
Other streams might still be active and ready to consume more data, but
that single outbox would close everything. This is now fixed by making
hash router responsible for closing the whole
toClose
slice, and itwill do so right before exiting
Run
method.Fixes: #49315.
Release note (bug fix): Previously, CockroachDB could return an internal
error or incorrect results on queries when they were run via the
vectorized execution engine and had a hash router in the DistSQL plan.
This could only occur with
vectorize=on
.