-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12363][Mllib] Remove setRun and fix PowerIterationClustering failed test #10539
Conversation
Test build #48543 has finished for PR 10539 at commit
|
@yanboliang Because that test doesn't fail so this commit doesn't set |
@viirya I'm not very familiar with GraphX and I just found they are do the same thing for different interface. Maybe we can hear others' opinions. |
@ankurdave Would you mind checking out this issue? It's really weird to me. Here's what's happening:
Is it possible that GraphX is not shipping the correct data between partitions? Thanks a lot in advance for your help! |
@ankurdave Maybe I should ask you: What does |
@jkbradley I'm not completely sure of the cause, but one thing I noticed was that PowerIterationClustering calls Each VertexRDD contains routing tables corresponding to a particular EdgeRDD, and I fixed this in PowerIterationClustering by replacing calls to |
@ankurdave Thanks a lot for taking a look! @viirya Would you mind updating this PR according to @ankurdave 's suggestion? After that, we can look into what is causing the failure and create a fix for the test. Thanks! |
@jkbradley Sure. I will go to update this. |
@ankurdave @jkbradley I tried to replace calls to I did some experiments and it is interesting to me that the following has different results:
If I return So I am interested in why |
@viirya About your above comments:
I don't think that's true because I'd recommend using Graph.apply, and then exploring what PIC is doing in those 2 tests (e.g., by printing out the vertex and edge attributes on each iteration, and seeing how values are propagated around the graph). |
use Graph instead of GraphImpl and update tests/example based on PIC paper
@mengxr The update looks good, I've merged it. Thanks. As you modified the test cases in |
Test build #51186 has finished for PR 10539 at commit
|
Test build #51189 has finished for PR 10539 at commit
|
The implementation was wrong: 1) As @ankurdave mentioned, Sorry for Scala style issues and failed Python tests! Could you help fix them? Thanks! |
Test build #51237 has finished for PR 10539 at commit
|
retest this please. |
Test build #51240 has finished for PR 10539 at commit
|
…ailed test JIRA: https://issues.apache.org/jira/browse/SPARK-12363 This issue is pointed by yanboliang. When `setRuns` is removed from PowerIterationClustering, one of the tests will be failed. I found that some `dstAttr`s of the normalized graph are not correct values but 0.0. By setting `TripletFields.All` in `mapTriplets` it can work. Author: Liang-Chi Hsieh <[email protected]> Author: Xiangrui Meng <[email protected]> Closes #10539 from viirya/fix-poweriter. (cherry picked from commit e3441e3) Signed-off-by: Xiangrui Meng <[email protected]>
LGTM. Merged into master, branch-1.6, and branch-1.5. Thanks! |
…ailed test JIRA: https://issues.apache.org/jira/browse/SPARK-12363 This issue is pointed by yanboliang. When `setRuns` is removed from PowerIterationClustering, one of the tests will be failed. I found that some `dstAttr`s of the normalized graph are not correct values but 0.0. By setting `TripletFields.All` in `mapTriplets` it can work. Author: Liang-Chi Hsieh <[email protected]> Author: Xiangrui Meng <[email protected]> Closes #10539 from viirya/fix-poweriter. (cherry picked from commit e3441e3) Signed-off-by: Xiangrui Meng <[email protected]>
@viirya If it doesn't take you much time, could you prepare backports for branch-1.4 and branch-1.3? I guess we only need to remove the Python changes. This is not high-priority since we don't have release plan for 1.4.x and 1.3.x. |
@mengxr ok. I will do the backports. |
JIRA: https://issues.apache.org/jira/browse/SPARK-12363
This issue is pointed by @yanboliang. When
setRuns
is removed from PowerIterationClustering, one of the tests will be failed. I found that somedstAttr
s of the normalized graph are not correct values but 0.0. By settingTripletFields.All
inmapTriplets
it can work.