Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark-7879][MLlib] KMeans API for spark.ml Pipelines #6756

Closed
wants to merge 59 commits into from

Conversation

yu-iskw
Copy link
Contributor

@yu-iskw yu-iskw commented Jun 11, 2015

I Implemented the KMeans API for spark.ml Pipelines. But it doesn't include clustering abstractions for spark.ml (SPARK-7610). It would fit for another issues. And I'll try it later, since we are trying to add the hierarchical clustering algorithms in another issue. Thanks.

[SPARK-7879] KMeans API for spark.ml Pipelines - ASF JIRA https://issues.apache.org/jira/browse/SPARK-7879

@yu-iskw yu-iskw changed the title [Spark 7879][MLlib] KMeans API for spark.ml Pipelines [Spark-7879][MLlib] KMeans API for spark.ml Pipelines Jun 11, 2015
@SparkQA
Copy link

SparkQA commented Jun 11, 2015

Test build #34662 has finished for PR 6756 at commit a6325fc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeans(override val uid: String) extends Estimator[KMeansModel] with KMeansParams
    • class KMeansModel(JavaModel):
    • class KMeans(JavaEstimator, HasFeaturesCol, HasMaxIter, HasSeed):
    • case class Log2(child: Expression)
    • case class StringLength(child: Expression) extends UnaryExpression with ExpectsInputTypes

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Jun 11, 2015

This failure was caused by org.apache.spark.streaming.StreamingListenerSuite.It seems that there is no failure about this issue. How do I deal with this failure?

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Jun 11, 2015

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Jun 11, 2015

Test build #34696 has finished for PR 6756 at commit a6325fc.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeans(override val uid: String) extends Estimator[KMeansModel] with KMeansParams
    • class KMeansModel(JavaModel):
    • class KMeans(JavaEstimator, HasFeaturesCol, HasMaxIter, HasSeed):

@SparkQA
Copy link

SparkQA commented Jun 12, 2015

Test build #34780 has finished for PR 6756 at commit fea6b07.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeans(override val uid: String) extends Estimator[KMeansModel] with KMeansParams
    • class KMeansModel(JavaModel):
    • class KMeans(JavaEstimator, HasFeaturesCol, HasMaxIter, HasSeed):

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Jun 12, 2015

@jkbradley Could you review this code at your earliest convenience? Thanks!

@SparkQA
Copy link

SparkQA commented Jun 23, 2015

Test build #35563 has finished for PR 6756 at commit 12bdb04.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeans(override val uid: String) extends Estimator[KMeansModel] with KMeansParams
    • class KMeansModel(JavaModel):
    • class KMeans(JavaEstimator, HasFeaturesCol, HasMaxIter, HasSeed):

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Jun 23, 2015

Sorry, I did rebase my PR. Beause I must also support the new copy methods for KMeans.

@jkbradley
Copy link
Member

reviewing now...

import org.apache.spark.ml.param.{Param, ParamMap, Params}
import org.apache.spark.ml.util.{Identifiable, SchemaUtils}
import org.apache.spark.ml.{Estimator, Model}
import org.apache.spark.mllib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37475 has finished for PR 6756 at commit 19a9d63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeans(override val uid: String) extends Estimator[KMeansModel] with KMeansParams
    • class KMeansModel(JavaModel):
    • class KMeans(JavaEstimator, HasFeaturesCol, HasMaxIter, HasSeed):

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37479 has finished for PR 6756 at commit c8dc6e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeans(override val uid: String) extends Estimator[KMeansModel] with KMeansParams
    • class KMeansModel(JavaModel):
    • class KMeans(JavaEstimator, HasFeaturesCol, HasMaxIter, HasSeed):

/**
* Param the number of runs of the algorithm to execute in parallel. We initialize the algorithm
* this many times with random starting conditions (configured by the initialization mode), then
* return the best clustering found over any run. Default: 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed: Here and elsewhere, can you please state in the Param Scala doc the constraints (in this case "Must be >= 1")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

@jkbradley
Copy link
Member

Thanks for the updates! A few more comments, but only small items

@SparkQA
Copy link

SparkQA commented Jul 17, 2015

Test build #37587 has finished for PR 6756 at commit 4c61693.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeans(override val uid: String) extends Estimator[KMeansModel] with KMeansParams
    • class KMeansModel(JavaModel):
    • class KMeans(JavaEstimator, HasFeaturesCol, HasMaxIter, HasSeed):

@SparkQA
Copy link

SparkQA commented Jul 17, 2015

Test build #37589 has finished for PR 6756 at commit a14939b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeans(override val uid: String) extends Estimator[KMeansModel] with KMeansParams
    • class KMeansModel(JavaModel):
    • class KMeans(JavaEstimator, HasFeaturesCol, HasMaxIter, HasSeed):

val transformed = model.transform(dataset)
val expectedColumns = Array("features", predictionColName)
expectedColumns.foreach { column =>
transformed.columns.contains(column)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to assert

@jkbradley
Copy link
Member

The changes look good, save for those 2 tiny items. That should be all!

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Jul 17, 2015

Oh, I'm sorry for the easy mistakes...

@SparkQA
Copy link

SparkQA commented Jul 18, 2015

Test build #37674 has finished for PR 6756 at commit be752de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeans(override val uid: String) extends Estimator[KMeansModel] with KMeansParams
    • class KMeansModel(JavaModel):
    • class KMeans(JavaEstimator, HasFeaturesCol, HasMaxIter, HasSeed):

@jkbradley
Copy link
Member

LGTM thanks for contributing this big feature!
Merging with master

@asfgit asfgit closed this in 34a889d Jul 18, 2015
@yu-iskw
Copy link
Contributor Author

yu-iskw commented Jul 18, 2015

Thank you for merging it and your continuous support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants