-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12685] [MLlib] [Backport to 1.4]word2vec trainWordsCount gets overflow #10721
Conversation
Test build #49223 has finished for PR 10721 at commit
|
LGTM |
Hi @srowen , Thanks for taking a look. I'm not sure if I should send separate PRs for release 1.5 and 1.6, can you please advice? Thanks. |
@hhbyyh do you know if it cherry-picks cleanly into other branches? @jkbradley indicated it didn't. Back-porting to 1.6 makes sense; 1.5 maybe; 1.4 seems pretty old as it's very unlikely to see another release. |
@hhbyyh I think this PR will cherry-pick cleanly to 1.4, 1.5, and 1.6. I think it's a change in the last line in master (after 1.6) which messed up the original PR. The main benefit of a separate PR would be getting Jenkins to run tests, but I think this PR is pretty safe. LGTM I'll try merging now. @srowen I agree about 1.4, but I might as well if it cherry-picks cleanly. |
…verflow jira: https://issues.apache.org/jira/browse/SPARK-12685 master PR: #10627 the log of word2vec reports trainWordsCount = -785727483 during computation over a large dataset. Update the priority as it will affect the computation process. alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1)) Author: Yuhao Yang <[email protected]> Closes #10721 from hhbyyh/branch-1.4.
…verflow jira: https://issues.apache.org/jira/browse/SPARK-12685 master PR: #10627 the log of word2vec reports trainWordsCount = -785727483 during computation over a large dataset. Update the priority as it will affect the computation process. alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1)) Author: Yuhao Yang <[email protected]> Closes #10721 from hhbyyh/branch-1.4. (cherry picked from commit 7bd2564) Signed-off-by: Joseph K. Bradley <[email protected]>
The cherry-pick worked, so this is in 1.4, 1.5, 1.6. |
…verflow jira: https://issues.apache.org/jira/browse/SPARK-12685 master PR: #10627 the log of word2vec reports trainWordsCount = -785727483 during computation over a large dataset. Update the priority as it will affect the computation process. alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1)) Author: Yuhao Yang <[email protected]> Closes #10721 from hhbyyh/branch-1.4. (cherry picked from commit 7bd2564) Signed-off-by: Joseph K. Bradley <[email protected]>
@hhbyyh, now that this has been merged would you mind closing this pull request? GitHub can't auto-close PRs which were opened against maintenance branches. |
Thanks. Close the PR now. |
jira: https://issues.apache.org/jira/browse/SPARK-12685
master PR: #10627
the log of word2vec reports
trainWordsCount = -785727483
during computation over a large dataset.
Update the priority as it will affect the computation process.
alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))