Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up HLL presentation by 100x #491

Merged
merged 3 commits into from
Sep 10, 2015
Merged

Conversation

avibryant
Copy link
Contributor

This changes the use of math.pow(2.0,-mj) when computing the estimates in SparseHLL and DenseHLL to 1.0 / (1 << mj).

To benchmark the difference this makes, I first had to change the HLLPresentBenchmark to clone the HLL instances before asking for their approximateSize, otherwise the estimate gets cached as a lazy val and the benchmarks are meaningless.

Having done that, you can see a dramatic difference. Because I'm impatient I have so far only run this with a single set of parameters, where it led to a 40x speedup.

Before:

[info] Benchmark                            (bits)  (max)  (numHLL)   Mode  Cnt    Score   Error  Units
[info] HLLPresentBenchmark.timeBatchCreate      12  10000        10  thrpt  200  141.569 ± 5.010  ops/s

After:

[info] Benchmark                            (bits)  (max)  (numHLL)   Mode  Cnt     Score    Error  Units
[info] HLLPresentBenchmark.timeBatchCreate      12  10000        10  thrpt  200  5984.351 ± 60.770  ops/s

@avibryant
Copy link
Contributor Author

This is failing on the HLL downsize test (which passes on my machine). Is this a known-flaky test, or is this maybe pointing to some actual issue?

@ianoc
Copy link
Collaborator

ianoc commented Sep 10, 2015

The two builds actually failed on two different tests, one of them was that, the min hasher one too. Both are alas flaky tests. Restarted both

@ianoc
Copy link
Collaborator

ianoc commented Sep 10, 2015

LGTM though, merge when green

@jvns
Copy link

jvns commented Sep 10, 2015

I just tested with @kamalmarhubi using a lookup table as well as (1/ (1 << mj)), and the lookup table is twice as fast, which seems worthwhile to me. (val lut: Array[Double] = (0 until 256).map(x => 1.0 / (1 << x)).toArray). Shouldn't block this though!

@jnievelt
Copy link
Contributor

This breaks when mj > 30. If we're not concerned about that, can we at least add an assertion so we fail instead of giving the wrong result?

scala> 1.0 / (1 << 31)
res4: Double = -4.6566128730773926E-10

scala> 1.0 / (1 << 32)
res5: Double = 1.0

@avibryant
Copy link
Contributor Author

@jnievelt in principle it can be up to 124, though that's exceedingly unlikely. Still, @jvns's suggestion of a lookup table would take care of this and is faster anyway, so I'll revise to take that approach.

@avibryant
Copy link
Contributor Author

Ok, yeah, with a lookup table (pushed) it's over 100x as fast:

[info] Benchmark                            (bits)  (max)  (numHLL)   Mode  Cnt      Score     Error  Units
[info] HLLPresentBenchmark.timeBatchCreate      12  10000        10  thrpt  200  14693.743 ± 206.868  ops/s

@avibryant avibryant changed the title Speed up HLL presentation by 40x Speed up HLL presentation by 100x Sep 10, 2015
@@ -43,6 +43,8 @@ object HyperLogLog {
/* Size of the hash in bits */
val hashSize = 128

val powersOfNegativeTwo: Array[Double] = 0.until(hashSize).map{ i => math.pow(2.0, -i) }.toArray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to instead of until?

I think it's the rhoW that's taken here? On inspection the code seems a little not right (I could be missing something), but it seems like this is the count of the number of leading 0s, so it should include hashSize for the case that the hash is all 0s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're always setting aside at least 4 bits for the bucket, so I think the max is actually 124 - this should be slightly more than needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry looks like we raced on actions there. Can you have hashSize itself rather than hashSize -1? the tests are green, so it seems unlikely this is wrong per say right? or do we have a test missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. For the record the parts that look not right are in rhoW:

  • terminal condition is that contains returns false, but contains doesn't do any bounds checking; is it possible to overrun the array, however unlikely?
  • loop starts at 1 instead of 0. with even a small aggregation I imagine this doesn't matter, but isn't it technically incorrect?

Again I could be missing something, but if you think they might be worth addressing maybe lets file issues.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be negativePowersOfTwo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I'm confused as to what's being asked here.

@ianoc I'm quite certain we don't need more than hashSize-1
@jnievelt IIRC the definition of rhoW is "index of first non-zero", not "number of zeros" so that might explain starting at 1?
@kamalmarhubi argh, good point

ianoc added a commit that referenced this pull request Sep 10, 2015
Speed up HLL presentation by 100x
@ianoc ianoc merged commit 1387e6c into twitter:develop Sep 10, 2015
@johnynek
Copy link
Collaborator

If/when we clean this up to fix the name, can will also make the table private[this]?

@@ -447,7 +449,7 @@ case class DenseHLL(bits: Int, v: Bytes) extends HLL {
count += 1
res += 1.0
} else {
res += java.lang.Math.pow(2.0, -mj)
res += HyperLogLog.powersOfNegativeTwo(mj)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be negativePowersOfTwo, not powersOfNegativeTwo? the current name implies we are doing (-2)^(mj) which is not the same as (2)^(-mj)

@zg
Copy link

zg commented Sep 11, 2015

Given the performance boost could you not replace all instances of Math.pow(2, X) with X << 1 across the entire code base?

@avibryant
Copy link
Contributor Author

@johnynek it's used in the HLL subclasses but defined in the HyperLogLog object, so private[this] won't work. We could do private[algebird]?

@ianoc
Copy link
Collaborator

ianoc commented Sep 11, 2015

yeah, private[algebird] is good, once its classed as an implementation detail outside the project so we can change it later

@jnievelt
Copy link
Contributor

@zg not, for two reasons:

  1. math.pow allows non-integral exponents (which we sometimes use), which we can't use with 1 << X
  2. per my comment above it would only work if we're sure that X <= 30. And per the results here, we'd probably be better off using a lookup in that case anyway

@eigenvariable
Copy link
Contributor

Do we have an ETA on when a version of algebird with this optimization will be released?

edit: I'm told this was just released yesterday.

@ianoc
Copy link
Collaborator

ianoc commented Feb 4, 2016

@eigenvariable Latest algebird was released only 2 days go. Check out the releases section for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants