New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

workload: add zipfian distribution option to kv #31902

Merged

craig merged 1 commit into cockroachdb:master from ridwanmsharif:zipfian-kv

Oct 28, 2018

ridwanmsharif commented Oct 25, 2018

Adds a workload distribution that spreads load according
to a Zipfian distribution. This is part of the experimentation
with load based splitting.

Release note: None

ridwanmsharif requested review from nvanbenschoten, petermattis and m-schneider

October 25, 2018 22:13

Member

cockroach-teamcity commented Oct 25, 2018

This change is

ridwanmsharif force-pushed the zipfian-kv branch from c3529db to 2859220 Compare

October 25, 2018 23:47

ridwanmsharif mentioned this pull request

storage: Load Based Splitting Experiments #31819

Closed

nvanbenschoten approved these changes

View reviewed changes

Member

nvanbenschoten left a comment

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/workload/kv/kv.go, line 135 at r1 (raw file):

			if w.sequential && w.splits > 0 {
				return errors.New("'sequential' and 'splits' cannot both be enabled")
			}

Validate against --sequential and --zipfian.

pkg/workload/kv/zipfian.go, line 17 at r1 (raw file):

)

// A Zipf generates Zipf distributed variates.

Add a comment explaining where this came from and why it was forked.

pkg/workload/kv/zipfian.go, line 18 at r1 (raw file):

// A Zipf generates Zipf distributed variates.
type Zipf struct {

No need to export this type.

pkg/workload/kv/zipfian.go, line 41 at r1 (raw file):

// such that P(k) is proportional to (v + k) ** (-s).
// Requirements: s > 1 and v >= 1.
func NewZipf(s float64, v float64, imax uint64) *Zipf {

No need to export this either.

pkg/workload/kv/zipfian.go, line 66 at r1 (raw file):

	for {
		r := random.Float64() // r on [0,1]

s/r on [0,1]/r in [0.0,1.0)/

pkg/workload/kv/zipfian.go, line 66 at r1 (raw file):

	for {
		r := random.Float64() // r on [0,1]

Like we talked about yesterday, this input doesn't need to be random, just deterministic based on the seed and uniformly distributed. We could just as easily hash the sequence number, map the value to a range of [0.0,1.0), and make that the input to this function.

The effect of this would be mapping the zipf function over the output of hashGenerator, so we could/should use that directly in zipfGenerator. In fact, hashGenerator could use sequentialGenerator directly as well.

That would save the cost of a new rand.Source and rand.Rand on every key. This might not sounds particularly expensive, but when we're pushing 200k qps, I suspect it will add up.


          workload: add zipfian distribution option to kv

77618f3

Adds a workload distribution that spreads load according
to a Zipfian distribution. This is part of the experimentation
with load based splitting.

Release note: None

ridwanmsharif force-pushed the zipfian-kv branch from 2859220 to 77618f3 Compare

October 28, 2018 01:42

ridwanmsharif commented

View reviewed changes

Author

ridwanmsharif left a comment

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/workload/kv/kv.go, line 135 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Validate against --sequential and --zipfian.

Done.

pkg/workload/kv/zipfian.go, line 17 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Add a comment explaining where this came from and why it was forked.

Done.

pkg/workload/kv/zipfian.go, line 18 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

No need to export this type.

Done.

pkg/workload/kv/zipfian.go, line 41 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

No need to export this either.

Done.

pkg/workload/kv/zipfian.go, line 66 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/r on [0,1]/r in [0.0,1.0)/

Done.

pkg/workload/kv/zipfian.go, line 66 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Like we talked about yesterday, this input doesn't need to be random, just deterministic based on the seed and uniformly distributed. We could just as easily hash the sequence number, map the value to a range of [0.0,1.0), and make that the input to this function.

The effect of this would be mapping the zipf function over the output of hashGenerator, so we could/should use that directly in zipfGenerator. In fact, hashGenerator could use sequentialGenerator directly as well.

That would save the cost of a new rand.Source and rand.Rand on every key. This might not sounds particularly expensive, but when we're pushing 200k qps, I suspect it will add up.

As discussed offline, we're going to keep using rand as is for some of the reasons below instead of prematurely optimizing as there's no issue with this as is.

Looking a little closer into the zipfian calculation logic, I notice that the rand object calls Float64() in a huge loop. It relies on the ability to create random numbers indefinitely after initiating it with a seed. To keep this property (at least logically) I think we should keep the random number generator instead of a hash function. This is because its trivial to generate the first “random” number for r using the hash but every subsequent one would need to re-hash using the old value and continue that over and over again. Now that may very well be worth it, I’m just afraid of some skew that will cause w.r.t uniformity. But more than that; although a perf benefit may be seen in this specific workload, does it warrant replacing rand with our own seeding randomness generator that tries to do the same thing (but uses hash instead of explicitly using rand)? It makes more sense to me to leave this as is because the zipfian distribution works well with this calculation, instead of the extra hashing/rehashing logic that would need to be added here.

Also personal bias: this usage to me just screams like rand is the more semantically accurate and simpler answer.

Author

ridwanmsharif commented Oct 28, 2018

bors r+

craig bot pushed a commit that referenced this pull request


          Merge #31902

e9e3ff2

31902: workload: add zipfian distribution option to kv r=ridwanmsharif a=ridwanmsharif

Adds a workload distribution that spreads load according
to a Zipfian distribution. This is part of the experimentation
with load based splitting.

Release note: None

Co-authored-by: Ridwan Sharif <[email protected]>

Contributor

craig bot commented Oct 28, 2018

Build succeeded

GitHub CI (Cockroach)

craig bot merged commit 77618f3 into cockroachdb:master

ridwanmsharif deleted the zipfian-kv branch

October 28, 2018 03:34

Contributor

m-schneider commented Oct 29, 2018

pkg/workload/kv/kv.go, line 101 at r2 (raw file):

			`Percent (0-100) of operations that are reads of existing keys.`)
		g.flags.Int64Var(&g.seed, `seed`, 1, `Key hash seed.`)
		g.flags.BoolVar(&g.zipfian, `zipfian`, false,

Maybe add a general distribution flag in case we want to add different ones later?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet