-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14163][CORE] SumEvaluator and countApprox cannot reliably handle RDDs of size 1 #11982
Conversation
Work with me in New York? https://www.handy.com/careers/73115?gh_jid=73115&gh_src=o5qcxn
Can one of the admins verify this patch? |
val degreesOfFreedom = (counter.count - 1).toInt | ||
new TDistribution(degreesOfFreedom).inverseCumulativeProbability(1 - (1 - confidence) / 2) | ||
} else { | ||
// this may not be statistically meaningful | ||
confidence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't valid; confidence is really a probability and confFactor is a number of standard deviations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, Double.PositiveInfinity it is then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Do you guys use this feature? I'm actually thinking we should deprecate this feature over time. |
@rxin these are used to implement |
Yea fixing them is fine. I'm asking because this is the 1st time I've seen people using this api in public and want to get some feedback here. |
I hit this with |
…e NaN. Add test + equality methods for BoundedDouble Work with me in NYC: https://www.handy.com/careers/73115?gh_jid=73115&gh_src=o5qcxn
c9318ae
to
66e8e37
Compare
@mtustin-handy you can just push new commits to the branch. |
…le RDDs of size 1 ## What changes were proposed in this pull request? This special cases 0 and 1 counts to avoid passing 0 degrees of freedom. ## How was this patch tested? Tests run successfully. New test added. ## Note: This recreates apache#11982 which was closed to due to non-updated diff. rxin srowen Commented there. This also adds tests, reworks the code to perform the special casing (based on srowen's comments), and adds equality machinery for BoundedDouble, as well as changing how it is transformed to string. Author: Marcin Tustin <[email protected]> Author: Marcin Tustin <[email protected]> Closes apache#12016 from mtustin-handy/SPARK-14163.
What changes were proposed in this pull request?
This special cases 0 and 1 counts to avoid passing 0 degrees of freedom.
How was this patch tested?
Tests run successfully.