-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coldata: expose a modifier for batchSize variable #42845
Conversation
Not requesting a review for now since I want to see what the fallout is first. We probably will want to limit the config to two batch sizes - the default (1024) and a random (I think it must be greater than 1). |
72b2eff
to
423f5fa
Compare
04d2f15
to
f4d9c58
Compare
This didn't discover any new bugs, but #42816 would have been caught by this change when Also, I'm not sure whether we want to randomize the second batch size or just pick 3 (smallest I've tried). |
f4d9c58
to
8820247
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/col/coldata/batch.go, line 63 at r1 (raw file):
const ( // MinBatchSize is the minimum acceptable size of batches. MinBatchSize = 3
What happens if we make this 1?
pkg/col/coldata/batch.go, line 65 at r1 (raw file):
MinBatchSize = 3 // MaxBatchSize is the maximum acceptable size of batches. MaxBatchSize = 1024
Do we want to try having a max batch size > 1024?
pkg/col/coldata/batch.go, line 78 at r1 (raw file):
// SetBatchSizeForTests modifies batchSize variable. It should only be used in // tests. func SetBatchSizeForTests(newBatchSize uint16) {
I wonder if there's a way to only make this method available during tests. Does putting it in a _test.go
file work?
pkg/sql/colexec/main_test.go, line 57 at r1 (raw file):
fmt.Printf("coldata.BatchSize() is set to %d\n", batchSize) coldata.SetBatchSizeForTests(batchSize) if code := m.Run(); code != 0 {
Do we want to run these tests twice? How does the runtime increase? I think we should just run them with a random batch size that we print out in the same way that the random seed is printed out using randutil
.
pkg/sql/colexec/routers_test.go, line 477 at r1 (raw file):
if _, ok := expectedNumVals[batchSize]; !ok { // We do not have a precomputed expectedNumVals for this batch size, so we
But then we have a very low probability of actually running this test. We might want to override the batch size here by getting the batch size, calling SetBatchSize
ourselves with a constant value, then resetting it with a defer.
8820247
to
fb6aa65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)
pkg/col/coldata/batch.go, line 63 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
What happens if we make this 1?
There are multiple unit tests that have baked-in assumption of a "big" batch size, those would need to be changed. Also, I feel like some operators might have that assumption, and they might get stuck in an infinite loop or something (or maybe not). I'm not sure how much of a change that would be, but I think we will benefit from getting this PR in with 3 as the minimum and then during stabilization period possibly pushing it all the way down to 1.
pkg/col/coldata/batch.go, line 65 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Do we want to try having a max batch size > 1024?
I tried with 2048, 4096, and 8192, and the tests passed (I needed to do a minor tweak to one aggregator random test because there was uint16 overflow). I do feel that unless we decide to use bigger batch sizes in production, there is not much benefit from testing with larger sizes - most of our unit tests have very little data, so if we increase the batch size, the likelihood of uncovering a bug doesn't seem to increase (nor to decrease). Running randomized unit tests might get slightly better coverage at the expense of longer runtime. I think that pushing the minimum as low as possible is beneficial, not sure about the maximum though.
pkg/col/coldata/batch.go, line 78 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I wonder if there's a way to only make this method available during tests. Does putting it in a
_test.go
file work?
Unfortunately, that doesn't work. If we move the setter method in _test
file, it is only visible within coldata
package (not even coldata_test
); if we move it to coldata_test
package, then it is visible only that package.
pkg/sql/colexec/main_test.go, line 57 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Do we want to run these tests twice? How does the runtime increase? I think we should just run them with a random batch size that we print out in the same way that the random seed is printed out using
randutil
.
The runtime increases more or less linearly (there are randomized unit tests which depend on the batch size, so with the smaller sizes, those run faster). The whole test suite of colexec
package takes 10 seconds or so on my laptop, I think we could run it with the default batch size and a random one, but I'm not sure whether that would give us any benefits. Switched to only random batch size.
pkg/sql/colexec/routers_test.go, line 477 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
But then we have a very low probability of actually running this test. We might want to override the batch size here by getting the batch size, calling
SetBatchSize
ourselves with a constant value, then resetting it with a defer.
True, I was imagining this being useful if we always run the tests with the default config. Changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained
fb6aa65
to
6bf26cb
Compare
This commit exposes a setter for batchSize private variable to be modified in tests and runs all tests in sql/colexec with a random batch size in [3, 4096] range. The tests in several places needed to be adjusted because their assumptions might no longer hold when batch size is modified. Release note: None
6bf26cb
to
3d83574
Compare
TFTR! bors r+ |
Build failed |
if you know anything about the failure, say on #43949 |
An unrelated flake. bors r+ |
42845: coldata: expose a modifier for batchSize variable r=yuzefovich a=yuzefovich This commit exposes a setter for batchSize private variable to be modified in tests and runs all tests in sql/colexec with a random batch size in [3, 4096] range. The tests in several places needed to be adjusted because their assumptions might no longer hold when batch size is modified. Fixes: #40791. Release note: None 43409: execinfrapb: print input types one per line in EXPLAIN (DISTSQL, TYPES) r=yuzefovich a=yuzefovich Previously, input types were printed in a single line within the input synchronizer box on the flow diagram. However, when we have a processor with two inputs and when each input has several columns, the boxes would collide and make the types unreadable. This commit changes it to print each type one per line, so there is no collision between different boxes of input synchronizer. Separately, cockroachdb.github.io has been adjusted so that the boxes for the input synchronizers could be displayed well in such scenario. Release note: None 43887: pkg/sql/opt: validate correct use of FOR UPDATE locking clauses r=nvanbenschoten a=nvanbenschoten The second commit updates our SQL grammar to be identical to Postgres regarding select statement locking clauses. Concretely, this results in the following changes in behavior: - We can now parse multiple locking clause items - We can now parse the `FOR READ ONLY` syntax, which is a no-op - We can now parse the `SELECT ... FOR UPDATE ... LIMIT` syntax Ref: https://github.com/postgres/postgres/blob/1a4a0329650b0545a54afb3c317aa289fd817f8a/src/backend/parser/gram.y#L11834 The third commit introduces a number of validation checks into the optbuilder to ensure that FOR UPDATE locking clauses are only being used in places where they're allowed. Most of this was based off of Postgres, which disallows the same set of operations to be used with `FOR [KEY] UPDATE/SHARE` locking. Ref: https://github.com/postgres/postgres/blob/1a4a0329650b0545a54afb3c317aa289fd817f8a/src/backend/parser/analyze.c#L2691 Release note (sql change): Invalid usages of FOR UPDATE locking clauses are now rejected by the SQL optimizer. 43980: storage: rename SSTSnapshotStorage{,Scratch,File} vars r=ajwerner a=irfansharif Around snapshot receiving code, we have variables for SSTSnapshotStorage, SSTSnapshotStorageScratch, and SSTSnapshotStorageFile types all abbreviated to some variation of sss{,s,f}. We also have "ssts" to talk about SSTs. This is all terribly confusing, we could do with some much needed clarity here. Release note: None. Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: irfan sharif <[email protected]>
Build succeeded |
This commit exposes a setter for batchSize private variable to be
modified in tests and runs all tests in sql/colexec with a random
batch size in [3, 4096] range.
The tests in several places needed to be adjusted because their
assumptions might no longer hold when batch size is modified.
Fixes: #40791.
Release note: None