Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29656][ML][PYSPARK] ML algs expose aggregationDepth #26322

Closed
wants to merge 8 commits into from

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Oct 30, 2019

What changes were proposed in this pull request?

expose expert param aggregationDepth in algs: GMM/GLR

Why are the changes needed?

SVC/LoR/LiR/AFT had exposed expert param aggregationDepth to end users. It should be nice to expose it in similar algs.

Does this PR introduce any user-facing change?

yes, expose new param

How was this patch tested?

added pytext tests

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112920 has finished for PR 26322 at commit c43dc30.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _RobustScalerParams(HasInputCol, HasOutputCol, HasAggregationDepth):
  • class _VectorIndexerParams(HasInputCol, HasOutputCol, HasHandleInvalid, HasAggregationDepth):
  • class _ALSParams(_ALSModelParams, HasMaxIter, HasRegParam, HasCheckpointInterval, HasSeed,

@@ -34,7 +34,8 @@ import org.apache.spark.sql.types.{StructField, StructType}
/**
* Params for [[RobustScaler]] and [[RobustScalerModel]].
*/
private[feature] trait RobustScalerParams extends Params with HasInputCol with HasOutputCol {
private[feature] trait RobustScalerParams extends Params with HasInputCol with HasOutputCol
with HasAggregationDepth {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you forgot to put setAggregationDepth in class RobustScaler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! thanks!

@SparkQA
Copy link

SparkQA commented Oct 31, 2019

Test build #112988 has finished for PR 26322 at commit 484effd.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 31, 2019

Test build #112991 has finished for PR 26322 at commit 8fc6517.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 31, 2019

Test build #112996 has finished for PR 26322 at commit 8480a1e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 31, 2019

Test build #113017 has finished for PR 26322 at commit caeb609.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 31, 2019

Test build #113025 has finished for PR 26322 at commit caeb609.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 31, 2019

Hm, I am really not sure users can meaningfully tune this, so I'm not sure about adding so much complexity for it. In some cases I think we use reduce vs treeReduce on purpose as it can be faster; it really depends on the usage.

@zhengruifeng
Copy link
Contributor Author

@srowen Thanks for reviewing!
This PR is for consistency, I will review it again and try to avoding adding to much complexity.

@SparkQA
Copy link

SparkQA commented Nov 1, 2019

Test build #113072 has finished for PR 26322 at commit 11dbab4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _RobustScalerParams(HasInputCol, HasOutputCol, HasRelativeError):
  • class _VectorIndexerParams(HasInputCol, HasOutputCol, HasHandleInvalid):
  • class _ALSParams(_ALSModelParams, HasMaxIter, HasRegParam, HasCheckpointInterval, HasSeed):

@zhengruifeng
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 5, 2019

Test build #113264 has finished for PR 26322 at commit 11dbab4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _RobustScalerParams(HasInputCol, HasOutputCol, HasRelativeError):
  • class _VectorIndexerParams(HasInputCol, HasOutputCol, HasHandleInvalid):
  • class _ALSParams(_ALSModelParams, HasMaxIter, HasRegParam, HasCheckpointInterval, HasSeed):

@zhengruifeng
Copy link
Contributor Author

merged to master, thanks all

@zhengruifeng zhengruifeng deleted the agg_opt branch November 6, 2019 02:36
@srowen
Copy link
Member

srowen commented Nov 6, 2019

Retroactively: OK I understand the argument was consistency. I might have pinged for a confirmation first though as I wasn't in favor of this at first look.

@zhengruifeng
Copy link
Contributor Author

@srowen Sorry for my careless merge. Should I revert it first?
I had revert changes in ALS/Features, now there are 3 main changes:
1, GMM add aggregationDepth;
2, GLR add aggregationDepth, to do this, I need to add this new param in IterativelyReweightedLeastSquares && WeightedLeastSquares;
3, For the change reduce => treeReduce in VectorIndexer, I left it alone by mistake, I guess I need to revert it in some way.

@srowen
Copy link
Member

srowen commented Nov 6, 2019

I think it's OK if it's for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants