Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12685] [MLlib] [Backport to 1.4]word2vec trainWordsCount gets overflow #10721

Closed
wants to merge 1 commit into from

Conversation

hhbyyh
Copy link
Contributor

@hhbyyh hhbyyh commented Jan 12, 2016

jira: https://issues.apache.org/jira/browse/SPARK-12685

master PR: #10627

the log of word2vec reports
trainWordsCount = -785727483
during computation over a large dataset.

Update the priority as it will affect the computation process.
alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))

@SparkQA
Copy link

SparkQA commented Jan 12, 2016

Test build #49223 has finished for PR 10721 at commit 27ba586.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jan 12, 2016

LGTM

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Jan 12, 2016

Hi @srowen , Thanks for taking a look. I'm not sure if I should send separate PRs for release 1.5 and 1.6, can you please advice? Thanks.

@srowen
Copy link
Member

srowen commented Jan 13, 2016

@hhbyyh do you know if it cherry-picks cleanly into other branches? @jkbradley indicated it didn't. Back-porting to 1.6 makes sense; 1.5 maybe; 1.4 seems pretty old as it's very unlikely to see another release.

@jkbradley
Copy link
Member

@hhbyyh I think this PR will cherry-pick cleanly to 1.4, 1.5, and 1.6. I think it's a change in the last line in master (after 1.6) which messed up the original PR. The main benefit of a separate PR would be getting Jenkins to run tests, but I think this PR is pretty safe.

LGTM

I'll try merging now. @srowen I agree about 1.4, but I might as well if it cherry-picks cleanly.

asfgit pushed a commit that referenced this pull request Jan 13, 2016
…verflow

jira: https://issues.apache.org/jira/browse/SPARK-12685

master PR: #10627

the log of word2vec reports
trainWordsCount = -785727483
during computation over a large dataset.

Update the priority as it will affect the computation process.
alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))

Author: Yuhao Yang <[email protected]>

Closes #10721 from hhbyyh/branch-1.4.
asfgit pushed a commit that referenced this pull request Jan 13, 2016
…verflow

jira: https://issues.apache.org/jira/browse/SPARK-12685

master PR: #10627

the log of word2vec reports
trainWordsCount = -785727483
during computation over a large dataset.

Update the priority as it will affect the computation process.
alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))

Author: Yuhao Yang <[email protected]>

Closes #10721 from hhbyyh/branch-1.4.

(cherry picked from commit 7bd2564)
Signed-off-by: Joseph K. Bradley <[email protected]>
@jkbradley
Copy link
Member

The cherry-pick worked, so this is in 1.4, 1.5, 1.6.

asfgit pushed a commit that referenced this pull request Jan 13, 2016
…verflow

jira: https://issues.apache.org/jira/browse/SPARK-12685

master PR: #10627

the log of word2vec reports
trainWordsCount = -785727483
during computation over a large dataset.

Update the priority as it will affect the computation process.
alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))

Author: Yuhao Yang <[email protected]>

Closes #10721 from hhbyyh/branch-1.4.

(cherry picked from commit 7bd2564)
Signed-off-by: Joseph K. Bradley <[email protected]>
@JoshRosen
Copy link
Contributor

@hhbyyh, now that this has been merged would you mind closing this pull request? GitHub can't auto-close PRs which were opened against maintenance branches.

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Jan 14, 2016

Thanks. Close the PR now.

@hhbyyh hhbyyh closed this Jan 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants