-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3130][MLLIB] detect negative values in naive Bayes #2038
Conversation
QA tests have started for PR 2038 at commit
|
QA tests have finished for PR 2038 at commit
|
case dv: DenseVector => | ||
dv.values | ||
} | ||
if (!values.forall(x => x >= 0.0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could shorten:
if (!values.forall(_ >= 0.0)) {
LGTM. The function requireNonnegativeValues() actually also identifies NaN values, so you could add a test in the Suite for that. (Same as the test("detect negative values") but with -1.0 replaced with 0.0/0.0) |
@@ -17,7 +17,8 @@ Bayes](http://en.wikipedia.org/wiki/Naive_Bayes_classifier#Multinomial_naive_Bay | |||
which is typically used for [document | |||
classification](http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html). | |||
Within that context, each observation is a document and each | |||
feature represents a term whose value is the frequency of the term. | |||
feature represents a term whose value is the frequency of the term. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is really an unnormalized frequency? Should that be clarified by saying "unnormalized frequency" or "frequency or count" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be normalized, then user needs to adjust alpha
.
QA tests have started for PR 2038 at commit
|
QA tests have finished for PR 2038 at commit
|
Jenkins, retest this please. |
1 similar comment
Jenkins, retest this please. |
LGTM |
Jenkins, retest this please. |
QA tests have started for PR 2038 at commit
|
QA tests have finished for PR 2038 at commit
|
Jenkins, retest this please. |
QA tests have started for PR 2038 at commit
|
QA tests have finished for PR 2038 at commit
|
because NB treats feature values as term frequencies. jkbradley Author: Xiangrui Meng <[email protected]> Closes #2038 from mengxr/nb-neg and squashes the following commits: 52c37c3 [Xiangrui Meng] address comments 65f892d [Xiangrui Meng] detect negative values in nb (cherry picked from commit 068b6fe) Signed-off-by: Xiangrui Meng <[email protected]>
Merged into master and branch-1.1. |
because NB treats feature values as term frequencies. jkbradley Author: Xiangrui Meng <[email protected]> Closes apache#2038 from mengxr/nb-neg and squashes the following commits: 52c37c3 [Xiangrui Meng] address comments 65f892d [Xiangrui Meng] detect negative values in nb
because NB treats feature values as term frequencies. @jkbradley