Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coldata: expose a modifier for batchSize variable #42845

Merged
merged 1 commit into from
Jan 14, 2020

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Nov 27, 2019

This commit exposes a setter for batchSize private variable to be
modified in tests and runs all tests in sql/colexec with a random
batch size in [3, 4096] range.

The tests in several places needed to be adjusted because their
assumptions might no longer hold when batch size is modified.

Fixes: #40791.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member Author

Not requesting a review for now since I want to see what the fallout is first. We probably will want to limit the config to two batch sizes - the default (1024) and a random (I think it must be greater than 1).

@yuzefovich yuzefovich force-pushed the vary-batchsize branch 2 times, most recently from 72b2eff to 423f5fa Compare December 6, 2019 00:20
@yuzefovich yuzefovich force-pushed the vary-batchsize branch 5 times, most recently from 04d2f15 to f4d9c58 Compare December 20, 2019 02:09
@yuzefovich yuzefovich requested review from asubiotto and a team December 20, 2019 02:11
@yuzefovich
Copy link
Member Author

This didn't discover any new bugs, but #42816 would have been caught by this change when batchSize is set to 3.

Also, I'm not sure whether we want to randomize the second batch size or just pick 3 (smallest I've tried).

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/col/coldata/batch.go, line 63 at r1 (raw file):

const (
	// MinBatchSize is the minimum acceptable size of batches.
	MinBatchSize = 3

What happens if we make this 1?


pkg/col/coldata/batch.go, line 65 at r1 (raw file):

	MinBatchSize = 3
	// MaxBatchSize is the maximum acceptable size of batches.
	MaxBatchSize = 1024

Do we want to try having a max batch size > 1024?


pkg/col/coldata/batch.go, line 78 at r1 (raw file):

// SetBatchSizeForTests modifies batchSize variable. It should only be used in
// tests.
func SetBatchSizeForTests(newBatchSize uint16) {

I wonder if there's a way to only make this method available during tests. Does putting it in a _test.go file work?


pkg/sql/colexec/main_test.go, line 57 at r1 (raw file):

			fmt.Printf("coldata.BatchSize() is set to %d\n", batchSize)
			coldata.SetBatchSizeForTests(batchSize)
			if code := m.Run(); code != 0 {

Do we want to run these tests twice? How does the runtime increase? I think we should just run them with a random batch size that we print out in the same way that the random seed is printed out using randutil.


pkg/sql/colexec/routers_test.go, line 477 at r1 (raw file):

	if _, ok := expectedNumVals[batchSize]; !ok {
		// We do not have a precomputed expectedNumVals for this batch size, so we

But then we have a very low probability of actually running this test. We might want to override the batch size here by getting the batch size, calling SetBatchSize ourselves with a constant value, then resetting it with a defer.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/col/coldata/batch.go, line 63 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

What happens if we make this 1?

There are multiple unit tests that have baked-in assumption of a "big" batch size, those would need to be changed. Also, I feel like some operators might have that assumption, and they might get stuck in an infinite loop or something (or maybe not). I'm not sure how much of a change that would be, but I think we will benefit from getting this PR in with 3 as the minimum and then during stabilization period possibly pushing it all the way down to 1.


pkg/col/coldata/batch.go, line 65 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Do we want to try having a max batch size > 1024?

I tried with 2048, 4096, and 8192, and the tests passed (I needed to do a minor tweak to one aggregator random test because there was uint16 overflow). I do feel that unless we decide to use bigger batch sizes in production, there is not much benefit from testing with larger sizes - most of our unit tests have very little data, so if we increase the batch size, the likelihood of uncovering a bug doesn't seem to increase (nor to decrease). Running randomized unit tests might get slightly better coverage at the expense of longer runtime. I think that pushing the minimum as low as possible is beneficial, not sure about the maximum though.


pkg/col/coldata/batch.go, line 78 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I wonder if there's a way to only make this method available during tests. Does putting it in a _test.go file work?

Unfortunately, that doesn't work. If we move the setter method in _test file, it is only visible within coldata package (not even coldata_test); if we move it to coldata_test package, then it is visible only that package.


pkg/sql/colexec/main_test.go, line 57 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Do we want to run these tests twice? How does the runtime increase? I think we should just run them with a random batch size that we print out in the same way that the random seed is printed out using randutil.

The runtime increases more or less linearly (there are randomized unit tests which depend on the batch size, so with the smaller sizes, those run faster). The whole test suite of colexec package takes 10 seconds or so on my laptop, I think we could run it with the default batch size and a random one, but I'm not sure whether that would give us any benefits. Switched to only random batch size.


pkg/sql/colexec/routers_test.go, line 477 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

But then we have a very low probability of actually running this test. We might want to override the batch size here by getting the batch size, calling SetBatchSize ourselves with a constant value, then resetting it with a defer.

True, I was imagining this being useful if we always run the tests with the default config. Changed.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

This commit exposes a setter for batchSize private variable to be
modified in tests and runs all tests in sql/colexec with a random
batch size in [3, 4096] range.

The tests in several places needed to be adjusted because their
assumptions might no longer hold when batch size is modified.

Release note: None
@yuzefovich
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 14, 2020

Build failed

@andreimatei
Copy link
Contributor

if you know anything about the failure, say on #43949

@yuzefovich
Copy link
Member Author

An unrelated flake.

bors r+

craig bot pushed a commit that referenced this pull request Jan 14, 2020
42845: coldata: expose a modifier for batchSize variable r=yuzefovich a=yuzefovich

This commit exposes a setter for batchSize private variable to be
modified in tests and runs all tests in sql/colexec with a random
batch size in [3, 4096] range.

The tests in several places needed to be adjusted because their
assumptions might no longer hold when batch size is modified.

Fixes: #40791.

Release note: None

43409: execinfrapb: print input types one per line in EXPLAIN (DISTSQL, TYPES) r=yuzefovich a=yuzefovich

Previously, input types were printed in a single line within the input
synchronizer box on the flow diagram. However, when we have a processor
with two inputs and when each input has several columns, the boxes would
collide and make the types unreadable. This commit changes it to print
each type one per line, so there is no collision between different boxes
of input synchronizer. Separately, cockroachdb.github.io has been
adjusted so that the boxes for the input synchronizers could be
displayed well in such scenario.

Release note: None

43887: pkg/sql/opt: validate correct use of FOR UPDATE locking clauses r=nvanbenschoten a=nvanbenschoten

The second commit updates our SQL grammar to be identical to Postgres regarding select statement locking clauses. Concretely, this results in the following changes in behavior:
- We can now parse multiple locking clause items
- We can now parse the `FOR READ ONLY` syntax, which is a no-op
- We can now parse the `SELECT ... FOR UPDATE ... LIMIT` syntax

Ref: https://github.com/postgres/postgres/blob/1a4a0329650b0545a54afb3c317aa289fd817f8a/src/backend/parser/gram.y#L11834

The third commit introduces a number of validation checks into the optbuilder to ensure that FOR UPDATE locking clauses are only being used in places where they're allowed. Most of this was based off of Postgres, which disallows the same set of operations to be used with `FOR [KEY] UPDATE/SHARE` locking.

Ref: https://github.com/postgres/postgres/blob/1a4a0329650b0545a54afb3c317aa289fd817f8a/src/backend/parser/analyze.c#L2691

Release note (sql change): Invalid usages of FOR UPDATE locking clauses are now rejected by the SQL optimizer.

43980: storage: rename SSTSnapshotStorage{,Scratch,File} vars r=ajwerner a=irfansharif

Around snapshot receiving code, we have variables for SSTSnapshotStorage, SSTSnapshotStorageScratch, and SSTSnapshotStorageFile types
all abbreviated to some variation of sss{,s,f}. We also have "ssts" to
talk about SSTs. This is all terribly confusing, we could do with some
much needed clarity here.

Release note: None.

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jan 14, 2020

Build succeeded

@craig craig bot merged commit 3d83574 into cockroachdb:master Jan 14, 2020
@yuzefovich yuzefovich deleted the vary-batchsize branch January 14, 2020 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

coldata: vary coldata.BatchSize for tests
4 participants