-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29645][ML][PYSPARK] ML add param RelativeError #26305
Conversation
Test build #112882 has finished for PR 26305 at commit
|
Test build #112890 has finished for PR 26305 at commit
|
retest this please |
Test build #112911 has finished for PR 26305 at commit
|
* Relative error (see documentation for | ||
* `org.apache.spark.sql.DataFrameStatFunctions.approxQuantile` for description) | ||
* Must be in the range [0, 1]. | ||
* Note that in multiple columns case, relative error is applied to all columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Seems the above line got removed in the new documentation. I guess maybe put it somewhere else in the doc? Maybe put it in the end of line 97?
Since 2.3.0, `QuantileDiscretizer` can map multiple columns at once by setting the `inputCols` parameter. If both of the `inputCol` and `inputCols` parameters are set, an Exception will be thrown. To specify the number of buckets for each column, the `numBucketsArray` parameter can be set, or if the number of buckets should be the same across columns, `numBuckets` can be set as a convenience. Note that in multiple columns case, relative error is applied to all columns.
LGTM |
Test build #112987 has finished for PR 26305 at commit
|
Merged to master, thanks @huaxingao for reviewing! |
What changes were proposed in this pull request?
1, add shared param
relativeError
2,
Imputer
/RobusterScaler
/QuantileDiscretizer
extendHasRelativeError
Why are the changes needed?
It makes sense to expose RelativeError to end users, since it controls both the precision and memory overhead.
QuantileDiscretizer
had already added this param, while other algs not yet.Does this PR introduce any user-facing change?
yes, new param is added in
Imputer
/RobusterScaler
How was this patch tested?
existing testsutes