[SPARK-12685] [MLlib] [Backport to 1.4]word2vec trainWordsCount gets overflow #10721

hhbyyh · 2016-01-12T07:11:16Z

jira: https://issues.apache.org/jira/browse/SPARK-12685

master PR: #10627

the log of word2vec reports
trainWordsCount = -785727483
during computation over a large dataset.

Update the priority as it will affect the computation process.
alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))

SparkQA · 2016-01-12T09:39:44Z

Test build #49223 has finished for PR 10721 at commit 27ba586.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-01-12T11:16:32Z

LGTM

hhbyyh · 2016-01-12T11:19:54Z

Hi @srowen , Thanks for taking a look. I'm not sure if I should send separate PRs for release 1.5 and 1.6, can you please advice? Thanks.

srowen · 2016-01-13T10:14:09Z

@hhbyyh do you know if it cherry-picks cleanly into other branches? @jkbradley indicated it didn't. Back-porting to 1.6 makes sense; 1.5 maybe; 1.4 seems pretty old as it's very unlikely to see another release.

jkbradley · 2016-01-13T19:52:32Z

@hhbyyh I think this PR will cherry-pick cleanly to 1.4, 1.5, and 1.6. I think it's a change in the last line in master (after 1.6) which messed up the original PR. The main benefit of a separate PR would be getting Jenkins to run tests, but I think this PR is pretty safe.

LGTM

I'll try merging now. @srowen I agree about 1.4, but I might as well if it cherry-picks cleanly.

…verflow jira: https://issues.apache.org/jira/browse/SPARK-12685 master PR: #10627 the log of word2vec reports trainWordsCount = -785727483 during computation over a large dataset. Update the priority as it will affect the computation process. alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1)) Author: Yuhao Yang <[email protected]> Closes #10721 from hhbyyh/branch-1.4.

…verflow jira: https://issues.apache.org/jira/browse/SPARK-12685 master PR: #10627 the log of word2vec reports trainWordsCount = -785727483 during computation over a large dataset. Update the priority as it will affect the computation process. alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1)) Author: Yuhao Yang <[email protected]> Closes #10721 from hhbyyh/branch-1.4. (cherry picked from commit 7bd2564) Signed-off-by: Joseph K. Bradley <[email protected]>

jkbradley · 2016-01-13T19:55:25Z

The cherry-pick worked, so this is in 1.4, 1.5, 1.6.

…verflow jira: https://issues.apache.org/jira/browse/SPARK-12685 master PR: #10627 the log of word2vec reports trainWordsCount = -785727483 during computation over a large dataset. Update the priority as it will affect the computation process. alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1)) Author: Yuhao Yang <[email protected]> Closes #10721 from hhbyyh/branch-1.4. (cherry picked from commit 7bd2564) Signed-off-by: Joseph K. Bradley <[email protected]>

JoshRosen · 2016-01-13T21:34:53Z

@hhbyyh, now that this has been merged would you mind closing this pull request? GitHub can't auto-close PRs which were opened against maintenance branches.

hhbyyh · 2016-01-14T01:37:32Z

Thanks. Close the PR now.

fix overflow in w2v

27ba586

hhbyyh closed this Jan 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12685] [MLlib] [Backport to 1.4]word2vec trainWordsCount gets overflow #10721

[SPARK-12685] [MLlib] [Backport to 1.4]word2vec trainWordsCount gets overflow #10721

hhbyyh commented Jan 12, 2016

SparkQA commented Jan 12, 2016

srowen commented Jan 12, 2016

hhbyyh commented Jan 12, 2016

srowen commented Jan 13, 2016

jkbradley commented Jan 13, 2016

jkbradley commented Jan 13, 2016

JoshRosen commented Jan 13, 2016

hhbyyh commented Jan 14, 2016

[SPARK-12685] [MLlib] [Backport to 1.4]word2vec trainWordsCount gets overflow #10721

[SPARK-12685] [MLlib] [Backport to 1.4]word2vec trainWordsCount gets overflow #10721

Conversation

hhbyyh commented Jan 12, 2016

SparkQA commented Jan 12, 2016

srowen commented Jan 12, 2016

hhbyyh commented Jan 12, 2016

srowen commented Jan 13, 2016

jkbradley commented Jan 13, 2016

jkbradley commented Jan 13, 2016

JoshRosen commented Jan 13, 2016

hhbyyh commented Jan 14, 2016