Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload: add zipfian distribution option to kv #31902

Merged
merged 1 commit into from
Oct 28, 2018

Conversation

ridwanmsharif
Copy link

Adds a workload distribution that spreads load according
to a Zipfian distribution. This is part of the experimentation
with load based splitting.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/workload/kv/kv.go, line 135 at r1 (raw file):

			if w.sequential && w.splits > 0 {
				return errors.New("'sequential' and 'splits' cannot both be enabled")
			}

Validate against --sequential and --zipfian.


pkg/workload/kv/zipfian.go, line 17 at r1 (raw file):

)

// A Zipf generates Zipf distributed variates.

Add a comment explaining where this came from and why it was forked.


pkg/workload/kv/zipfian.go, line 18 at r1 (raw file):

// A Zipf generates Zipf distributed variates.
type Zipf struct {

No need to export this type.


pkg/workload/kv/zipfian.go, line 41 at r1 (raw file):

// such that P(k) is proportional to (v + k) ** (-s).
// Requirements: s > 1 and v >= 1.
func NewZipf(s float64, v float64, imax uint64) *Zipf {

No need to export this either.


pkg/workload/kv/zipfian.go, line 66 at r1 (raw file):

	for {
		r := random.Float64() // r on [0,1]

s/r on [0,1]/r in [0.0,1.0)/


pkg/workload/kv/zipfian.go, line 66 at r1 (raw file):

	for {
		r := random.Float64() // r on [0,1]

Like we talked about yesterday, this input doesn't need to be random, just deterministic based on the seed and uniformly distributed. We could just as easily hash the sequence number, map the value to a range of [0.0,1.0), and make that the input to this function.

The effect of this would be mapping the zipf function over the output of hashGenerator, so we could/should use that directly in zipfGenerator. In fact, hashGenerator could use sequentialGenerator directly as well.

That would save the cost of a new rand.Source and rand.Rand on every key. This might not sounds particularly expensive, but when we're pushing 200k qps, I suspect it will add up.

Adds a workload distribution that spreads load according
to a Zipfian distribution. This is part of the experimentation
with load based splitting.

Release note: None
Copy link
Author

@ridwanmsharif ridwanmsharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/workload/kv/kv.go, line 135 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Validate against --sequential and --zipfian.

Done.


pkg/workload/kv/zipfian.go, line 17 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Add a comment explaining where this came from and why it was forked.

Done.


pkg/workload/kv/zipfian.go, line 18 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

No need to export this type.

Done.


pkg/workload/kv/zipfian.go, line 41 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

No need to export this either.

Done.


pkg/workload/kv/zipfian.go, line 66 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/r on [0,1]/r in [0.0,1.0)/

Done.


pkg/workload/kv/zipfian.go, line 66 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Like we talked about yesterday, this input doesn't need to be random, just deterministic based on the seed and uniformly distributed. We could just as easily hash the sequence number, map the value to a range of [0.0,1.0), and make that the input to this function.

The effect of this would be mapping the zipf function over the output of hashGenerator, so we could/should use that directly in zipfGenerator. In fact, hashGenerator could use sequentialGenerator directly as well.

That would save the cost of a new rand.Source and rand.Rand on every key. This might not sounds particularly expensive, but when we're pushing 200k qps, I suspect it will add up.

As discussed offline, we're going to keep using rand as is for some of the reasons below instead of prematurely optimizing as there's no issue with this as is.

Looking a little closer into the zipfian calculation logic, I notice that the rand object calls Float64() in a huge loop. It relies on the ability to create random numbers indefinitely after initiating it with a seed. To keep this property (at least logically) I think we should keep the random number generator instead of a hash function. This is because its trivial to generate the first “random” number for r using the hash but every subsequent one would need to re-hash using the old value and continue that over and over again. Now that may very well be worth it, I’m just afraid of some skew that will cause w.r.t uniformity. But more than that; although a perf benefit may be seen in this specific workload, does it warrant replacing rand with our own seeding randomness generator that tries to do the same thing (but uses hash instead of explicitly using rand)? It makes more sense to me to leave this as is because the zipfian distribution works well with this calculation, instead of the extra hashing/rehashing logic that would need to be added here.

Also personal bias: this usage to me just screams like rand is the more semantically accurate and simpler answer.

@ridwanmsharif
Copy link
Author

bors r+

craig bot pushed a commit that referenced this pull request Oct 28, 2018
31902: workload: add zipfian distribution option to kv r=ridwanmsharif a=ridwanmsharif

Adds a workload distribution that spreads load according
to a Zipfian distribution. This is part of the experimentation
with load based splitting.

Release note: None

Co-authored-by: Ridwan Sharif <[email protected]>
@craig
Copy link
Contributor

craig bot commented Oct 28, 2018

Build succeeded

@craig craig bot merged commit 77618f3 into cockroachdb:master Oct 28, 2018
@ridwanmsharif ridwanmsharif deleted the zipfian-kv branch October 28, 2018 03:34
@m-schneider
Copy link
Contributor


pkg/workload/kv/kv.go, line 101 at r2 (raw file):

			`Percent (0-100) of operations that are reads of existing keys.`)
		g.flags.Int64Var(&g.seed, `seed`, 1, `Key hash seed.`)
		g.flags.BoolVar(&g.zipfian, `zipfian`, false,

Maybe add a general distribution flag in case we want to add different ones later?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants