-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload: add zipfian distribution option to kv #31902
Conversation
c3529db
to
2859220
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/workload/kv/kv.go, line 135 at r1 (raw file):
if w.sequential && w.splits > 0 { return errors.New("'sequential' and 'splits' cannot both be enabled") }
Validate against --sequential
and --zipfian
.
pkg/workload/kv/zipfian.go, line 17 at r1 (raw file):
) // A Zipf generates Zipf distributed variates.
Add a comment explaining where this came from and why it was forked.
pkg/workload/kv/zipfian.go, line 18 at r1 (raw file):
// A Zipf generates Zipf distributed variates. type Zipf struct {
No need to export this type.
pkg/workload/kv/zipfian.go, line 41 at r1 (raw file):
// such that P(k) is proportional to (v + k) ** (-s). // Requirements: s > 1 and v >= 1. func NewZipf(s float64, v float64, imax uint64) *Zipf {
No need to export this either.
pkg/workload/kv/zipfian.go, line 66 at r1 (raw file):
for { r := random.Float64() // r on [0,1]
s/r on [0,1]/r in [0.0,1.0)/
pkg/workload/kv/zipfian.go, line 66 at r1 (raw file):
for { r := random.Float64() // r on [0,1]
Like we talked about yesterday, this input doesn't need to be random, just deterministic based on the seed and uniformly distributed. We could just as easily hash the sequence number, map the value to a range of [0.0,1.0)
, and make that the input to this function.
The effect of this would be mapping the zipf function over the output of hashGenerator
, so we could/should use that directly in zipfGenerator
. In fact, hashGenerator
could use sequentialGenerator
directly as well.
That would save the cost of a new rand.Source
and rand.Rand
on every key. This might not sounds particularly expensive, but when we're pushing 200k qps, I suspect it will add up.
Adds a workload distribution that spreads load according to a Zipfian distribution. This is part of the experimentation with load based splitting. Release note: None
2859220
to
77618f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/workload/kv/kv.go, line 135 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Validate against
--sequential
and--zipfian
.
Done.
pkg/workload/kv/zipfian.go, line 17 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Add a comment explaining where this came from and why it was forked.
Done.
pkg/workload/kv/zipfian.go, line 18 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
No need to export this type.
Done.
pkg/workload/kv/zipfian.go, line 41 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
No need to export this either.
Done.
pkg/workload/kv/zipfian.go, line 66 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/r on [0,1]/r in [0.0,1.0)/
Done.
pkg/workload/kv/zipfian.go, line 66 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Like we talked about yesterday, this input doesn't need to be random, just deterministic based on the seed and uniformly distributed. We could just as easily hash the sequence number, map the value to a range of
[0.0,1.0)
, and make that the input to this function.The effect of this would be mapping the zipf function over the output of
hashGenerator
, so we could/should use that directly inzipfGenerator
. In fact,hashGenerator
could usesequentialGenerator
directly as well.That would save the cost of a new
rand.Source
andrand.Rand
on every key. This might not sounds particularly expensive, but when we're pushing 200k qps, I suspect it will add up.
As discussed offline, we're going to keep using rand
as is for some of the reasons below instead of prematurely optimizing as there's no issue with this as is.
Looking a little closer into the zipfian calculation logic, I notice that the rand
object calls Float64()
in a huge loop. It relies on the ability to create random numbers indefinitely after initiating it with a seed. To keep this property (at least logically) I think we should keep the random number generator instead of a hash function. This is because its trivial to generate the first “random” number for r using the hash but every subsequent one would need to re-hash using the old value and continue that over and over again. Now that may very well be worth it, I’m just afraid of some skew that will cause w.r.t uniformity. But more than that; although a perf benefit may be seen in this specific workload, does it warrant replacing rand
with our own seeding randomness generator that tries to do the same thing (but uses hash instead of explicitly using rand)? It makes more sense to me to leave this as is because the zipfian distribution works well with this calculation, instead of the extra hashing/rehashing logic that would need to be added here.
Also personal bias: this usage to me just screams like rand
is the more semantically accurate and simpler answer.
bors r+ |
31902: workload: add zipfian distribution option to kv r=ridwanmsharif a=ridwanmsharif Adds a workload distribution that spreads load according to a Zipfian distribution. This is part of the experimentation with load based splitting. Release note: None Co-authored-by: Ridwan Sharif <[email protected]>
Build succeeded |
pkg/workload/kv/kv.go, line 101 at r2 (raw file):
Maybe add a general distribution flag in case we want to add different ones later? |
Adds a workload distribution that spreads load according
to a Zipfian distribution. This is part of the experimentation
with load based splitting.
Release note: None