Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35357][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities #32485

Closed

Conversation

ebonnal
Copy link
Contributor

@ebonnal ebonnal commented May 9, 2021

What changes were proposed in this pull request?

Overload methods PageRank.runWithOptions and PageRank.runWithOptionsWithPreviousPageRank (not to break any user-facing signature) with a normalized parameter that describes "whether or not to normalize the rank sum".

Why are the changes needed?

https://issues.apache.org/jira/browse/SPARK-35357

When dealing with a non negligible proportion of sinks in a graph, algorithm based on incremental update of ranks can get a precision gain for free if they are allowed to manipulate non normalized ranks.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By adding a unit test that verifies that (even when dealing with a graph containing a sink) we end up with the same result for both these scenarios:
a)

  • Run 6 iterations of pagerank in a row using PageRank.runWithOptions with normalization enabled

b)

  • Run 2 iterations using PageRank.runWithOptions with normalization disabled
  • Resume from the preRankGraph1 and run 2 more iterations using PageRank.runWithOptionsWithPreviousPageRank with normalization disabled
  • Finally resume from the preRankGraph2 and run 2 more iterations using PageRank.runWithOptionsWithPreviousPageRank with normalization enabled

…nk with a 'normalized' parameter to trigger or not the normalization
@ebonnal ebonnal changed the title [WIP][GRAPHX] Allow to turn off the normalization applied in the end of static PageRank utilities [WIP][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities May 9, 2021
@ebonnal ebonnal marked this pull request as ready for review May 9, 2021 17:48
@ebonnal ebonnal changed the title [WIP][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities [SPARK-35357][GRAPHX] Allow to turn off the normalization applied by static PageRank utilities May 9, 2021
@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented May 10, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42856/

@SparkQA
Copy link

SparkQA commented May 10, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42856/

@SparkQA
Copy link

SparkQA commented May 10, 2021

Test build #138334 has finished for PR 32485 at commit 60482b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

I think it's fine. cc @srowen FYI

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK, only one tiny comment about 'since'

@SparkQA
Copy link

SparkQA commented May 11, 2021

Test build #138375 has finished for PR 32485 at commit 5a52408.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ebonnal
Copy link
Contributor Author

ebonnal commented May 11, 2021

Thank you @Ayushsunny @HyukjinKwon @srowen for the review 🙏 .
I have applied the requested changes.

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42898/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42898/

@ebonnal ebonnal requested a review from srowen May 11, 2021 14:18
@srowen srowen closed this in 402375b May 12, 2021
@srowen
Copy link
Member

srowen commented May 12, 2021

Merged to master

@ebonnal ebonnal deleted the make-pagerank-normalization-optional branch May 12, 2021 14:41
@ebonnal ebonnal restored the make-pagerank-normalization-optional branch May 12, 2021 14:41
@ebonnal ebonnal deleted the make-pagerank-normalization-optional branch May 12, 2021 14:41
@ebonnal ebonnal restored the make-pagerank-normalization-optional branch May 12, 2021 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants