New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

coldata: expose a modifier for batchSize variable #42845

Merged

craig merged 1 commit into cockroachdb:master from yuzefovich:vary-batchsize

Jan 14, 2020

Member

yuzefovich commented Nov 27, 2019 •

edited

Loading

This commit exposes a setter for batchSize private variable to be
modified in tests and runs all tests in sql/colexec with a random
batch size in [3, 4096] range.

The tests in several places needed to be adjusted because their
assumptions might no longer hold when batch size is modified.

Fixes: #40791.

Release note: None

Member

cockroach-teamcity commented Nov 27, 2019

This change is

Member Author

yuzefovich commented Nov 27, 2019

Not requesting a review for now since I want to see what the fallout is first. We probably will want to limit the config to two batch sizes - the default (1024) and a random (I think it must be greater than 1).

yuzefovich force-pushed the vary-batchsize branch 2 times, most recently from 72b2eff to 423f5fa Compare

December 6, 2019 00:20

yuzefovich force-pushed the vary-batchsize branch 5 times, most recently from 04d2f15 to f4d9c58 Compare

December 20, 2019 02:09

yuzefovich requested review from asubiotto and a team

December 20, 2019 02:11

Member Author

yuzefovich commented Dec 20, 2019

This didn't discover any new bugs, but #42816 would have been caught by this change when batchSize is set to 3.

Also, I'm not sure whether we want to randomize the second batch size or just pick 3 (smallest I've tried).

yuzefovich force-pushed the vary-batchsize branch from f4d9c58 to 8820247 Compare

January 6, 2020 22:30

asubiotto suggested changes

View reviewed changes

Contributor

asubiotto left a comment

Reviewed 9 of 9 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/col/coldata/batch.go, line 63 at r1 (raw file):

const (
	// MinBatchSize is the minimum acceptable size of batches.
	MinBatchSize = 3

What happens if we make this 1?

pkg/col/coldata/batch.go, line 65 at r1 (raw file):

	MinBatchSize = 3
	// MaxBatchSize is the maximum acceptable size of batches.
	MaxBatchSize = 1024

Do we want to try having a max batch size > 1024?

pkg/col/coldata/batch.go, line 78 at r1 (raw file):

// SetBatchSizeForTests modifies batchSize variable. It should only be used in
// tests.
func SetBatchSizeForTests(newBatchSize uint16) {

I wonder if there's a way to only make this method available during tests. Does putting it in a _test.go file work?

pkg/sql/colexec/main_test.go, line 57 at r1 (raw file):

			fmt.Printf("coldata.BatchSize() is set to %d\n", batchSize)
			coldata.SetBatchSizeForTests(batchSize)
			if code := m.Run(); code != 0 {

Do we want to run these tests twice? How does the runtime increase? I think we should just run them with a random batch size that we print out in the same way that the random seed is printed out using randutil.

pkg/sql/colexec/routers_test.go, line 477 at r1 (raw file):

	if _, ok := expectedNumVals[batchSize]; !ok {
		// We do not have a precomputed expectedNumVals for this batch size, so we

But then we have a very low probability of actually running this test. We might want to override the batch size here by getting the batch size, calling SetBatchSize ourselves with a constant value, then resetting it with a defer.

yuzefovich force-pushed the vary-batchsize branch from 8820247 to fb6aa65 Compare

January 13, 2020 23:10

yuzefovich commented

View reviewed changes

Member Author

yuzefovich left a comment

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)

pkg/col/coldata/batch.go, line 63 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

What happens if we make this 1?

There are multiple unit tests that have baked-in assumption of a "big" batch size, those would need to be changed. Also, I feel like some operators might have that assumption, and they might get stuck in an infinite loop or something (or maybe not). I'm not sure how much of a change that would be, but I think we will benefit from getting this PR in with 3 as the minimum and then during stabilization period possibly pushing it all the way down to 1.

pkg/col/coldata/batch.go, line 65 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Do we want to try having a max batch size > 1024?

I tried with 2048, 4096, and 8192, and the tests passed (I needed to do a minor tweak to one aggregator random test because there was uint16 overflow). I do feel that unless we decide to use bigger batch sizes in production, there is not much benefit from testing with larger sizes - most of our unit tests have very little data, so if we increase the batch size, the likelihood of uncovering a bug doesn't seem to increase (nor to decrease). Running randomized unit tests might get slightly better coverage at the expense of longer runtime. I think that pushing the minimum as low as possible is beneficial, not sure about the maximum though.

pkg/col/coldata/batch.go, line 78 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I wonder if there's a way to only make this method available during tests. Does putting it in a _test.go file work?

Unfortunately, that doesn't work. If we move the setter method in _test file, it is only visible within coldata package (not even coldata_test); if we move it to coldata_test package, then it is visible only that package.

pkg/sql/colexec/main_test.go, line 57 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Do we want to run these tests twice? How does the runtime increase? I think we should just run them with a random batch size that we print out in the same way that the random seed is printed out using randutil.

The runtime increases more or less linearly (there are randomized unit tests which depend on the batch size, so with the smaller sizes, those run faster). The whole test suite of colexec package takes 10 seconds or so on my laptop, I think we could run it with the default batch size and a random one, but I'm not sure whether that would give us any benefits. Switched to only random batch size.

pkg/sql/colexec/routers_test.go, line 477 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

But then we have a very low probability of actually running this test. We might want to override the batch size here by getting the batch size, calling SetBatchSize ourselves with a constant value, then resetting it with a defer.

True, I was imagining this being useful if we always run the tests with the default config. Changed.

asubiotto approved these changes

View reviewed changes

Contributor

asubiotto left a comment

Reviewed 5 of 5 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained

yuzefovich force-pushed the vary-batchsize branch from fb6aa65 to 6bf26cb Compare

January 14, 2020 17:46


          coldata: vary coldata.BatchSize in tests

3d83574

This commit exposes a setter for batchSize private variable to be
modified in tests and runs all tests in sql/colexec with a random
batch size in [3, 4096] range.

The tests in several places needed to be adjusted because their
assumptions might no longer hold when batch size is modified.

Release note: None

yuzefovich force-pushed the vary-batchsize branch from 6bf26cb to 3d83574 Compare

January 14, 2020 17:55

Member Author

yuzefovich commented Jan 14, 2020

TFTR!

bors r+

Contributor

craig bot commented Jan 14, 2020

Build failed

GitHub CI (Cockroach)

Contributor

andreimatei commented Jan 14, 2020

if you know anything about the failure, say on #43949

yuzefovich mentioned this pull request

server: TestListenerFileCreation failed #43949

Closed

Member Author

yuzefovich commented Jan 14, 2020

An unrelated flake.

bors r+

craig bot pushed a commit that referenced this pull request


          Merge #42845 #43409 #43887 #43980

31871ae

42845: coldata: expose a modifier for batchSize variable r=yuzefovich a=yuzefovich

This commit exposes a setter for batchSize private variable to be
modified in tests and runs all tests in sql/colexec with a random
batch size in [3, 4096] range.

The tests in several places needed to be adjusted because their
assumptions might no longer hold when batch size is modified.

Fixes: #40791.

Release note: None

43409: execinfrapb: print input types one per line in EXPLAIN (DISTSQL, TYPES) r=yuzefovich a=yuzefovich

Previously, input types were printed in a single line within the input
synchronizer box on the flow diagram. However, when we have a processor
with two inputs and when each input has several columns, the boxes would
collide and make the types unreadable. This commit changes it to print
each type one per line, so there is no collision between different boxes
of input synchronizer. Separately, cockroachdb.github.io has been
adjusted so that the boxes for the input synchronizers could be
displayed well in such scenario.

Release note: None

43887: pkg/sql/opt: validate correct use of FOR UPDATE locking clauses r=nvanbenschoten a=nvanbenschoten

The second commit updates our SQL grammar to be identical to Postgres regarding select statement locking clauses. Concretely, this results in the following changes in behavior:
- We can now parse multiple locking clause items
- We can now parse the `FOR READ ONLY` syntax, which is a no-op
- We can now parse the `SELECT ... FOR UPDATE ... LIMIT` syntax

Ref: https://github.com/postgres/postgres/blob/1a4a0329650b0545a54afb3c317aa289fd817f8a/src/backend/parser/gram.y#L11834

The third commit introduces a number of validation checks into the optbuilder to ensure that FOR UPDATE locking clauses are only being used in places where they're allowed. Most of this was based off of Postgres, which disallows the same set of operations to be used with `FOR [KEY] UPDATE/SHARE` locking.

Ref: https://github.com/postgres/postgres/blob/1a4a0329650b0545a54afb3c317aa289fd817f8a/src/backend/parser/analyze.c#L2691

Release note (sql change): Invalid usages of FOR UPDATE locking clauses are now rejected by the SQL optimizer.

43980: storage: rename SSTSnapshotStorage{,Scratch,File} vars r=ajwerner a=irfansharif

Around snapshot receiving code, we have variables for SSTSnapshotStorage, SSTSnapshotStorageScratch, and SSTSnapshotStorageFile types
all abbreviated to some variation of sss{,s,f}. We also have "ssts" to
talk about SSTs. This is all terribly confusing, we could do with some
much needed clarity here.

Release note: None.

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: irfan sharif <[email protected]>

Contributor

craig bot commented Jan 14, 2020

Build succeeded

GitHub CI (Cockroach)

craig bot merged commit 3d83574 into cockroachdb:master

yuzefovich deleted the vary-batchsize branch

January 14, 2020 22:36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet