[MLLIB] SPARK-2329 Add multi-label evaluation metrics #1270

avulanov · 2014-06-30T15:45:59Z

Implementation of various multi-label classification measures, including: Hamming-loss, strict and default Accuracy, macro-averaged Precision, Recall and F1-measure based on documents and labels, micro-averaged measures: https://issues.apache.org/jira/browse/SPARK-2329

Multi-class measures are currently in the following pull request: #1155

…eraged by docs, micro and per-class precision and recall averaged by class

… macro measures, bunch of tests

AmplabJenkins · 2014-06-30T15:50:38Z

Can one of the admins verify this patch?

coderxiang · 2014-06-30T23:19:05Z

mllib/src/main/scala/org/apache/spark/mllib/evaluation/MultilabelMetrics.scala

+   * @return Accuracy.
+   */
+  lazy val accuracy = predictionAndLabels.map{ case(predictions, labels) =>
+    labels.intersect(predictions).size.toDouble / labels.union(predictions).size}.


As the intersect is called multiple times in different metrics, how about take it out so it is only calculated once.

Do you suggest to extract "labels.intersect(predictions).size" as a lazy val? Will it then be calculated only once? The operation is made with Scala Set (not with RDD). Another option might be to store in RDD all intermediate calculations (including intersection) that are used in six different measures. In this case, I will need to make fold on the six-element tuple, which will look kind of scary, but it will be the most effective way to compute everything at once.

BaiGang · 2014-07-02T03:22:11Z

@avulanov Cool Alexander. Are you working on a multi-label classifier? We are expecting a multi-class multi-label classifier. I'm planning to implement the MultiBoost.MH on Spark, not sure if you've already started working on it.

avulanov · 2014-07-02T09:57:09Z

@BaiGang Thanks! I'm implementing the decomposition of multiclass and multilabel problems to binary classification problems that can be solved with built-in MLLib classifiers. I use one-vs-one and one-vs-all approaches. As far as I understand, MultiBoost.MH is a C++ implementation of Adaboost.MH and the latter uses another kind of problem decomposition in addition to boosting. So, our efforts are complimentary. Lets stay in touch. Btw, I would be glad to benchmark your classifier with the classification tasks that I'm solving.

BaiGang · 2014-07-09T07:48:31Z

@avulanov Thanks Alexander! I just started to implement the base learner. The algorithms described in the MultiBoost document and the paper are straightforward but it will take some efforts to implement optimally on Apache Spark. I will keep you notified when I get the skeletons up.

mengxr · 2014-07-16T04:45:21Z

Jenkins, test this please.

SparkQA · 2014-07-16T04:48:01Z

QA tests have started for PR 1270. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16709/consoleFull

SparkQA · 2014-07-16T04:48:43Z

QA results for PR 1270:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class MultilabelMetrics(predictionAndLabels:RDD[(Set[Double], Set[Double])]) extends Logging{

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16709/consoleFull

avulanov · 2014-07-17T04:47:44Z

@mengxr I've fixed Scala style

BaiGang · 2014-07-24T03:36:25Z

mllib/src/main/scala/org/apache/spark/mllib/evaluation/MultilabelMetrics.scala

+ * Evaluator for multilabel classification.
+ * @param predictionAndLabels an RDD of (predictions, labels) pairs, both are non-null sets.
+ */
+class MultilabelMetrics(predictionAndLabels: RDD[(Set[Double], Set[Double])]) {


Another feasible representation of predictions/labels is mllib.linalg.Vector. It's basically a Vector of +1s and -1s, either dense or sparse. So it will be great if we add another function to do the transformation.

It's up to you. Transforming the data outside this evaluation module is also OK. : )

RDD[(Set[Double], Set[Double])] may be hard for Java users. We can ask users to input RDD[(Array[Double], Array[Double])], requiring that the labels are ordered. It is easier for Java users and faster to compute intersection and other set operations.

@mengxr We need to ensure that they don't contain repeating elements as well. It should be an optional constructor, I think.

Both Set and Double are Scala types. It is very hard for Java users to construct such RDDs. Also, the input labels and output predictions are usually stored as Array[Double]. Shall we change the input to RDD[(Array[Double], Array[Double])] and internally convert it to RDD[(Set[Double], Set[Double])] and cache? We can put a contract that both labels and predictions are unique and ordered within a single instance. We don't need it if we use Set internally. But later we can switch to Array[Double] based solution for speed, because those are very small arrays.

@mengxr Can we have RDD[(java.util.HashSet[Double], java.util.HashSet[Double])] as an optional constructor? Internally, we will use scala.collection.JavaConversions.asScalaSet.

@avulanov Let's think what is more natural for the input data to a multi-label classifier and the output from the model it produces. They should match the input type here, so we can chain them easily. If we use either Java or Scala Set, we are going to have compatibility issues on the other. Also, set stores small objects, which increase GC pressure. These are the reasons I recommend using Array[Double].

mengxr · 2014-09-08T17:02:54Z

this is ok to test

mengxr · 2014-09-08T17:02:58Z

test this please

SparkQA · 2014-09-08T17:48:13Z

QA tests have started for PR 1270 at commit 1843f73.

This patch merges cleanly.

mengxr · 2014-09-08T18:27:45Z

@avulanov Sorry for getting back late! For the implementation, shall we define an aggregator and then compute all necessary information in a single pass, instead of trigger a job for each?

For the metric names, I think our reference is Mining Multi-label Data and we should follow the naming there:

strictAccuracy -> subsetAccuracy
microPrecisionDoc -> microPrecision (and update other metric names)
add precision, recall, fMeasure and accuracy (example-based)
For per-class metrics, I suggest removingClass and overload the metric method, as in MulticlassMetrics.

SparkQA · 2014-09-08T18:40:17Z

QA tests have finished for PR 1270 at commit 1843f73.

This patch fails unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class MultilabelMetrics(predictionAndLabels: RDD[(Set[Double], Set[Double])])
- logDebug("isMulticlass = " + metadata.isMulticlass)
- logDebug("isMulticlass = " + metadata.isMulticlass)

avulanov · 2014-09-09T14:39:54Z

@mengxr Thank you for comments!

I'll do the renaming stuff you suggested. It's worth implementing fMeasure with parameter as in MulticlassMetrics.
For a single pass computation, I can aggregate left and right intersections and differences per class and per doc (total 6). Did you mean the same?
3)Should we discuss [MLLIB] SPARK-5491 (ex SPARK-1473): Chi-square feature selection #1484 as well?

…rns the list of labels

SparkQA · 2014-09-10T11:51:40Z

QA tests have started for PR 1270 at commit cf4222b.

This patch merges cleanly.

SparkQA · 2014-09-10T12:52:55Z

QA tests have finished for PR 1270 at commit cf4222b.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class MultilabelMetrics(predictionAndLabels: RDD[(Set[Double], Set[Double])])

mengxr · 2014-09-10T23:11:06Z

mllib/src/main/scala/org/apache/spark/mllib/evaluation/MultilabelMetrics.scala

+   * Returns Hamming-loss
+   */
+  lazy val hammingLoss: Double = (predictionAndLabels.map { case (predictions, labels) =>
+    labels.diff(predictions).size + predictions.diff(labels).size}.


This may be faster: labels.size + predictions.size - 2 * labels.intersect(labels).size

mengxr · 2014-09-10T23:11:14Z

mllib/src/main/scala/org/apache/spark/mllib/evaluation/MultilabelMetrics.scala

+  private lazy val numDocs: Long = predictionAndLabels.count
+
+  private lazy val numLabels: Long = predictionAndLabels.flatMap { case (_, labels) =>
+    labels}.distinct.count


predictionAndLabels.values.flatMap(l => l).distinct().count()

@mengxr Could you elaborate on this?

SparkQA · 2014-09-15T17:09:16Z

QA tests have started for PR 1270 at commit 517a594.

This patch merges cleanly.

SparkQA · 2014-09-15T18:15:09Z

QA tests have finished for PR 1270 at commit 517a594.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class MultilabelMetrics(predictionAndLabels: RDD[(Set[Double], Set[Double])])

mengxr · 2014-09-26T18:06:35Z

@avulanov Just want to check with you on the input type. I still feel Array[Double] is more suitable than Set[Double]. BitSet may be better for storage, but we want to have simple types for Python/Java APIs.

mengxr · 2014-10-30T19:06:24Z

@avulanov We are close to the feature freeze deadline. Do you plan to update the PR? If you are busy, do you mind me taking it over? Thanks!

avulanov · 2014-10-30T19:50:40Z

@mengxr It was a busy month for me (moved to Bay area), but I am able update it this week. I currently also work on #1290 together with @bgreeven. When is the deadline?

mengxr · 2014-10-30T21:28:26Z

Welcome to the bay area! The deadline (soft for MLlib) is this Saturday. But since the only thing that needs to change is the input type, it should be trivial to update. Using Array is more consistent across other APIs and it also works nice with the new dataset APIs.

avulanov · 2014-10-31T18:57:39Z

@mengxr Thanks! I've replaced Set with Array, fixed two functions that didn't pass test (due to union working differently on Arrays) and added a note that Arrays must have unique elements.

SparkQA · 2014-10-31T18:59:59Z

Test build #22624 has started for PR 1270 at commit fc8175e.

This patch merges cleanly.

SparkQA · 2014-10-31T19:59:17Z

Test build #22624 has finished for PR 1270 at commit fc8175e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class MultilabelMetrics(predictionAndLabels: RDD[(Array[Double], Array[Double])])

AmplabJenkins · 2014-10-31T19:59:20Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22624/
Test FAILed.

mengxr · 2014-10-31T22:55:40Z

test this please

SparkQA · 2014-10-31T22:57:34Z

Test build #22659 has started for PR 1270 at commit fc8175e.

This patch merges cleanly.

SparkQA · 2014-11-01T00:12:21Z

Test build #22659 has finished for PR 1270 at commit fc8175e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-01T00:12:25Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22659/
Test PASSed.

mengxr · 2014-11-01T01:31:27Z

LGTM. Merged into master. Thanks!

avulanov · 2014-11-01T01:48:11Z

@mengxr Thank you!

### What changes were proposed in this pull request? This PR bumps the ADT version to 1.1.0. ### Why are the changes needed? These changes are needed to avoid dependencies on a preview release. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Locally.

* Only test range join and outer join * fix

avulanov added 4 commits June 30, 2014 14:33

Multilabel evaluation metics and tests: macro precision and recall av…

154164b

…eraged by docs, micro and per-class precision and recall averaged by class

Comments and scala style check

ad62df0

Multi-label metrics: Hamming-loss, strict and normal accuracy, fix to…

40593f5

… macro measures, bunch of tests

Cosmetic changes: Apache header and parameter explanation

ca46765

coderxiang reviewed Jun 30, 2014
View reviewed changes

Replacing fold(_ + _) with sum as suggested by srowen

79e8476

avulanov mentioned this pull request Jul 4, 2014

[MLLIB] [SPARK-2222] Add multiclass evaluation metrics #1155

Closed

Scala style fix

1843f73

BaiGang reviewed Jul 24, 2014
View reviewed changes

Addressing reviewers comments: renaming. Added label method that retu…

cf4222b

…rns the list of labels

mengxr reviewed Sep 10, 2014
View reviewed changes

Addressing reviewers comments: Scala style

517a594

avulanov added 2 commits October 31, 2014 11:43

Addressing reviewers comments: change Set to Array

43a613e

Merge with previous updates

fc8175e

asfgit closed this in 62d01d2 Nov 1, 2014

wangyum added a commit that referenced this pull request May 26, 2023

[CARMEL-6609] Casts types according to bucket info support view (#1270)

e77e54f

* Only test range join and outer join * fix

[MLLIB] SPARK-2329 Add multi-label evaluation metrics #1270

[MLLIB] SPARK-2329 Add multi-label evaluation metrics #1270

Conversation

avulanov commented Jun 30, 2014

AmplabJenkins commented Jun 30, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BaiGang commented Jul 2, 2014

avulanov commented Jul 2, 2014

BaiGang commented Jul 9, 2014

mengxr commented Jul 16, 2014

SparkQA commented Jul 16, 2014

SparkQA commented Jul 16, 2014

avulanov commented Jul 17, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mengxr commented Sep 8, 2014

mengxr commented Sep 8, 2014

SparkQA commented Sep 8, 2014

mengxr commented Sep 8, 2014

SparkQA commented Sep 8, 2014

avulanov commented Sep 9, 2014

SparkQA commented Sep 10, 2014

SparkQA commented Sep 10, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Sep 15, 2014

SparkQA commented Sep 15, 2014

mengxr commented Sep 26, 2014

mengxr commented Oct 30, 2014

avulanov commented Oct 30, 2014

mengxr commented Oct 30, 2014

avulanov commented Oct 31, 2014

SparkQA commented Oct 31, 2014

SparkQA commented Oct 31, 2014

AmplabJenkins commented Oct 31, 2014

mengxr commented Oct 31, 2014

SparkQA commented Oct 31, 2014

SparkQA commented Nov 1, 2014

AmplabJenkins commented Nov 1, 2014

mengxr commented Nov 1, 2014

avulanov commented Nov 1, 2014