Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18200][GRAPHX] Support zero as an initial capacity in OpenHashSet #15741

Closed
wants to merge 4 commits into from
Closed

[SPARK-18200][GRAPHX] Support zero as an initial capacity in OpenHashSet #15741

wants to merge 4 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 2, 2016

What changes were proposed in this pull request?

SPARK-18200 reports Apache Spark 2.x raises java.lang.IllegalArgumentException: requirement failed: Invalid initial capacity while running triangleCount. The root cause is that VertexSet, a type alias of OpenHashSet, does not allow zero as a initial size. This PR loosens the restriction to allow zero.

How was this patch tested?

Pass the Jenkins test with a new test case in OpenHashSetSuite.

@@ -42,10 +42,12 @@ import org.apache.spark.annotation.Private
*/
@Private
class OpenHashSet[@specialized(Long, Int) T: ClassTag](
initialCapacity: Int,
var initialCapacity: Int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really hacky and makes it a public variable. Why don't we just change the require to >= 0, and then change nextPowerOf2 to pick a value that's greater than 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I'll change like that.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-18200][GRAPHX] Support zero-size initial capacity in OpenHashSet [SPARK-18200][GRAPHX] Support zero as an initial capacity in OpenHashSet Nov 2, 2016
@SparkQA
Copy link

SparkQA commented Nov 3, 2016

Test build #68026 has finished for PR 15741 at commit d157ea1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -271,7 +271,7 @@ class OpenHashSet[@specialized(Long, Int) T: ClassTag](
private def hashcode(h: Int): Int = Hashing.murmur3_32().hashInt(h).asInt()

private def nextPowerOf2(n: Int): Int = {
val highBit = Integer.highestOneBit(n)
val highBit = Integer.highestOneBit(if (n == 0) 1 else n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just do ...

if (n == 0) {
  2
} else {
  // original code
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.
BTW, for the test failure, I'll look at that soon.

@SparkQA
Copy link

SparkQA commented Nov 3, 2016

Test build #68030 has finished for PR 15741 at commit d7bd9a9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 3, 2016

Test build #68040 has finished for PR 15741 at commit b4dd73d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

The only failure seems to be irrelevant.

[info] BasicSchedulerIntegrationSuite:
[info] - job with fetch failure *** FAILED *** (1 second, 889 milliseconds)
[info]   java.util.concurrent.TimeoutException: Futures timed out after [1 second]

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@rxin
Copy link
Contributor

rxin commented Nov 3, 2016

LGTM pending Jenkins.

@SparkQA
Copy link

SparkQA commented Nov 3, 2016

Test build #68046 has finished for PR 15741 at commit b4dd73d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Nov 3, 2016
## What changes were proposed in this pull request?

[SPARK-18200](https://issues.apache.org/jira/browse/SPARK-18200) reports Apache Spark 2.x raises `java.lang.IllegalArgumentException: requirement failed: Invalid initial capacity` while running `triangleCount`. The root cause is that `VertexSet`, a type alias of `OpenHashSet`, does not allow zero as a initial size. This PR loosens the restriction to allow zero.

## How was this patch tested?

Pass the Jenkins test with a new test case in `OpenHashSetSuite`.

Author: Dongjoon Hyun <[email protected]>

Closes #15741 from dongjoon-hyun/SPARK-18200.

(cherry picked from commit d24e736)
Signed-off-by: Reynold Xin <[email protected]>
asfgit pushed a commit that referenced this pull request Nov 3, 2016
## What changes were proposed in this pull request?

[SPARK-18200](https://issues.apache.org/jira/browse/SPARK-18200) reports Apache Spark 2.x raises `java.lang.IllegalArgumentException: requirement failed: Invalid initial capacity` while running `triangleCount`. The root cause is that `VertexSet`, a type alias of `OpenHashSet`, does not allow zero as a initial size. This PR loosens the restriction to allow zero.

## How was this patch tested?

Pass the Jenkins test with a new test case in `OpenHashSetSuite`.

Author: Dongjoon Hyun <[email protected]>

Closes #15741 from dongjoon-hyun/SPARK-18200.

(cherry picked from commit d24e736)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in d24e736 Nov 3, 2016
@dongjoon-hyun
Copy link
Member Author

Thank you for review and merging, @rxin .

val highBit = Integer.highestOneBit(n)
if (highBit == n) n else highBit << 1
if (n == 0) {
2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been 1 perhaps ?
nextPowerOf2(1) < nextPowerOf2(0) now ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True or just make the condition <= 1. It won't really matter much but is a little inconsistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 was designed to pretend to be 1.

nextPowerOf2(0) = nextPowerOf2(1) = 2

But, 1 could be better. Should I change this as a follow-up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think the issue was that right now nextPowerOf2(1) is 1 and that seems inconsistent if 0 gives 2. Both should result in 2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. You're right. nextPowerOf2(1) is indeed 1. Then, both should return 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, should I fix this like the following?

-      2
+      1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @mridulm and @srowen .
I'll create a PR as a follow-up.

ghost pushed a commit to dbtsai/spark that referenced this pull request Nov 4, 2016
…in OpenHashSet

## What changes were proposed in this pull request?

This is a follow-up PR of apache#15741 in order to keep `nextPowerOf2` consistent.

**Before**
```
nextPowerOf2(0) => 2
nextPowerOf2(1) => 1
nextPowerOf2(2) => 2
nextPowerOf2(3) => 4
nextPowerOf2(4) => 4
nextPowerOf2(5) => 8
```

**After**
```
nextPowerOf2(0) => 1
nextPowerOf2(1) => 1
nextPowerOf2(2) => 2
nextPowerOf2(3) => 4
nextPowerOf2(4) => 4
nextPowerOf2(5) => 8
```

## How was this patch tested?

N/A

Author: Dongjoon Hyun <[email protected]>

Closes apache#15754 from dongjoon-hyun/SPARK-18200-2.
asfgit pushed a commit that referenced this pull request Nov 4, 2016
…in OpenHashSet

## What changes were proposed in this pull request?

This is a follow-up PR of #15741 in order to keep `nextPowerOf2` consistent.

**Before**
```
nextPowerOf2(0) => 2
nextPowerOf2(1) => 1
nextPowerOf2(2) => 2
nextPowerOf2(3) => 4
nextPowerOf2(4) => 4
nextPowerOf2(5) => 8
```

**After**
```
nextPowerOf2(0) => 1
nextPowerOf2(1) => 1
nextPowerOf2(2) => 2
nextPowerOf2(3) => 4
nextPowerOf2(4) => 4
nextPowerOf2(5) => 8
```

## How was this patch tested?

N/A

Author: Dongjoon Hyun <[email protected]>

Closes #15754 from dongjoon-hyun/SPARK-18200-2.

(cherry picked from commit 27602c3)
Signed-off-by: Reynold Xin <[email protected]>
asfgit pushed a commit that referenced this pull request Nov 4, 2016
…in OpenHashSet

## What changes were proposed in this pull request?

This is a follow-up PR of #15741 in order to keep `nextPowerOf2` consistent.

**Before**
```
nextPowerOf2(0) => 2
nextPowerOf2(1) => 1
nextPowerOf2(2) => 2
nextPowerOf2(3) => 4
nextPowerOf2(4) => 4
nextPowerOf2(5) => 8
```

**After**
```
nextPowerOf2(0) => 1
nextPowerOf2(1) => 1
nextPowerOf2(2) => 2
nextPowerOf2(3) => 4
nextPowerOf2(4) => 4
nextPowerOf2(5) => 8
```

## How was this patch tested?

N/A

Author: Dongjoon Hyun <[email protected]>

Closes #15754 from dongjoon-hyun/SPARK-18200-2.

(cherry picked from commit 27602c3)
Signed-off-by: Reynold Xin <[email protected]>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-18200 branch November 7, 2016 00:48
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

[SPARK-18200](https://issues.apache.org/jira/browse/SPARK-18200) reports Apache Spark 2.x raises `java.lang.IllegalArgumentException: requirement failed: Invalid initial capacity` while running `triangleCount`. The root cause is that `VertexSet`, a type alias of `OpenHashSet`, does not allow zero as a initial size. This PR loosens the restriction to allow zero.

## How was this patch tested?

Pass the Jenkins test with a new test case in `OpenHashSetSuite`.

Author: Dongjoon Hyun <[email protected]>

Closes apache#15741 from dongjoon-hyun/SPARK-18200.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…in OpenHashSet

## What changes were proposed in this pull request?

This is a follow-up PR of apache#15741 in order to keep `nextPowerOf2` consistent.

**Before**
```
nextPowerOf2(0) => 2
nextPowerOf2(1) => 1
nextPowerOf2(2) => 2
nextPowerOf2(3) => 4
nextPowerOf2(4) => 4
nextPowerOf2(5) => 8
```

**After**
```
nextPowerOf2(0) => 1
nextPowerOf2(1) => 1
nextPowerOf2(2) => 2
nextPowerOf2(3) => 4
nextPowerOf2(4) => 4
nextPowerOf2(5) => 8
```

## How was this patch tested?

N/A

Author: Dongjoon Hyun <[email protected]>

Closes apache#15754 from dongjoon-hyun/SPARK-18200-2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants