-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18200][GRAPHX] Support zero as an initial capacity in OpenHashSet #15741
Conversation
@@ -42,10 +42,12 @@ import org.apache.spark.annotation.Private | |||
*/ | |||
@Private | |||
class OpenHashSet[@specialized(Long, Int) T: ClassTag]( | |||
initialCapacity: Int, | |||
var initialCapacity: Int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really hacky and makes it a public variable. Why don't we just change the require to >= 0, and then change nextPowerOf2 to pick a value that's greater than 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I'll change like that.
Test build #68026 has finished for PR 15741 at commit
|
@@ -271,7 +271,7 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag]( | |||
private def hashcode(h: Int): Int = Hashing.murmur3_32().hashInt(h).asInt() | |||
|
|||
private def nextPowerOf2(n: Int): Int = { | |||
val highBit = Integer.highestOneBit(n) | |||
val highBit = Integer.highestOneBit(if (n == 0) 1 else n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just do ...
if (n == 0) {
2
} else {
// original code
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
BTW, for the test failure, I'll look at that soon.
Test build #68030 has finished for PR 15741 at commit
|
Test build #68040 has finished for PR 15741 at commit
|
The only failure seems to be irrelevant.
|
Retest this please. |
LGTM pending Jenkins. |
Test build #68046 has finished for PR 15741 at commit
|
## What changes were proposed in this pull request? [SPARK-18200](https://issues.apache.org/jira/browse/SPARK-18200) reports Apache Spark 2.x raises `java.lang.IllegalArgumentException: requirement failed: Invalid initial capacity` while running `triangleCount`. The root cause is that `VertexSet`, a type alias of `OpenHashSet`, does not allow zero as a initial size. This PR loosens the restriction to allow zero. ## How was this patch tested? Pass the Jenkins test with a new test case in `OpenHashSetSuite`. Author: Dongjoon Hyun <[email protected]> Closes #15741 from dongjoon-hyun/SPARK-18200. (cherry picked from commit d24e736) Signed-off-by: Reynold Xin <[email protected]>
## What changes were proposed in this pull request? [SPARK-18200](https://issues.apache.org/jira/browse/SPARK-18200) reports Apache Spark 2.x raises `java.lang.IllegalArgumentException: requirement failed: Invalid initial capacity` while running `triangleCount`. The root cause is that `VertexSet`, a type alias of `OpenHashSet`, does not allow zero as a initial size. This PR loosens the restriction to allow zero. ## How was this patch tested? Pass the Jenkins test with a new test case in `OpenHashSetSuite`. Author: Dongjoon Hyun <[email protected]> Closes #15741 from dongjoon-hyun/SPARK-18200. (cherry picked from commit d24e736) Signed-off-by: Reynold Xin <[email protected]>
Thank you for review and merging, @rxin . |
val highBit = Integer.highestOneBit(n) | ||
if (highBit == n) n else highBit << 1 | ||
if (n == 0) { | ||
2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been 1 perhaps ?
nextPowerOf2(1) < nextPowerOf2(0) now ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True or just make the condition <= 1. It won't really matter much but is a little inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 was designed to pretend to be 1.
nextPowerOf2(0) = nextPowerOf2(1) = 2
But, 1 could be better. Should I change this as a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I think the issue was that right now nextPowerOf2(1) is 1 and that seems inconsistent if 0 gives 2. Both should result in 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry. You're right. nextPowerOf2(1)
is indeed 1. Then, both should return 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, should I fix this like the following?
- 2
+ 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…in OpenHashSet ## What changes were proposed in this pull request? This is a follow-up PR of apache#15741 in order to keep `nextPowerOf2` consistent. **Before** ``` nextPowerOf2(0) => 2 nextPowerOf2(1) => 1 nextPowerOf2(2) => 2 nextPowerOf2(3) => 4 nextPowerOf2(4) => 4 nextPowerOf2(5) => 8 ``` **After** ``` nextPowerOf2(0) => 1 nextPowerOf2(1) => 1 nextPowerOf2(2) => 2 nextPowerOf2(3) => 4 nextPowerOf2(4) => 4 nextPowerOf2(5) => 8 ``` ## How was this patch tested? N/A Author: Dongjoon Hyun <[email protected]> Closes apache#15754 from dongjoon-hyun/SPARK-18200-2.
…in OpenHashSet ## What changes were proposed in this pull request? This is a follow-up PR of #15741 in order to keep `nextPowerOf2` consistent. **Before** ``` nextPowerOf2(0) => 2 nextPowerOf2(1) => 1 nextPowerOf2(2) => 2 nextPowerOf2(3) => 4 nextPowerOf2(4) => 4 nextPowerOf2(5) => 8 ``` **After** ``` nextPowerOf2(0) => 1 nextPowerOf2(1) => 1 nextPowerOf2(2) => 2 nextPowerOf2(3) => 4 nextPowerOf2(4) => 4 nextPowerOf2(5) => 8 ``` ## How was this patch tested? N/A Author: Dongjoon Hyun <[email protected]> Closes #15754 from dongjoon-hyun/SPARK-18200-2. (cherry picked from commit 27602c3) Signed-off-by: Reynold Xin <[email protected]>
…in OpenHashSet ## What changes were proposed in this pull request? This is a follow-up PR of #15741 in order to keep `nextPowerOf2` consistent. **Before** ``` nextPowerOf2(0) => 2 nextPowerOf2(1) => 1 nextPowerOf2(2) => 2 nextPowerOf2(3) => 4 nextPowerOf2(4) => 4 nextPowerOf2(5) => 8 ``` **After** ``` nextPowerOf2(0) => 1 nextPowerOf2(1) => 1 nextPowerOf2(2) => 2 nextPowerOf2(3) => 4 nextPowerOf2(4) => 4 nextPowerOf2(5) => 8 ``` ## How was this patch tested? N/A Author: Dongjoon Hyun <[email protected]> Closes #15754 from dongjoon-hyun/SPARK-18200-2. (cherry picked from commit 27602c3) Signed-off-by: Reynold Xin <[email protected]>
## What changes were proposed in this pull request? [SPARK-18200](https://issues.apache.org/jira/browse/SPARK-18200) reports Apache Spark 2.x raises `java.lang.IllegalArgumentException: requirement failed: Invalid initial capacity` while running `triangleCount`. The root cause is that `VertexSet`, a type alias of `OpenHashSet`, does not allow zero as a initial size. This PR loosens the restriction to allow zero. ## How was this patch tested? Pass the Jenkins test with a new test case in `OpenHashSetSuite`. Author: Dongjoon Hyun <[email protected]> Closes apache#15741 from dongjoon-hyun/SPARK-18200.
…in OpenHashSet ## What changes were proposed in this pull request? This is a follow-up PR of apache#15741 in order to keep `nextPowerOf2` consistent. **Before** ``` nextPowerOf2(0) => 2 nextPowerOf2(1) => 1 nextPowerOf2(2) => 2 nextPowerOf2(3) => 4 nextPowerOf2(4) => 4 nextPowerOf2(5) => 8 ``` **After** ``` nextPowerOf2(0) => 1 nextPowerOf2(1) => 1 nextPowerOf2(2) => 2 nextPowerOf2(3) => 4 nextPowerOf2(4) => 4 nextPowerOf2(5) => 8 ``` ## How was this patch tested? N/A Author: Dongjoon Hyun <[email protected]> Closes apache#15754 from dongjoon-hyun/SPARK-18200-2.
What changes were proposed in this pull request?
SPARK-18200 reports Apache Spark 2.x raises
java.lang.IllegalArgumentException: requirement failed: Invalid initial capacity
while runningtriangleCount
. The root cause is thatVertexSet
, a type alias ofOpenHashSet
, does not allow zero as a initial size. This PR loosens the restriction to allow zero.How was this patch tested?
Pass the Jenkins test with a new test case in
OpenHashSetSuite
.