Add Student's t-test aggregation support #54469

imotov · 2020-03-30T22:23:40Z

Adds t_test metric aggregation that can perform paired and unpaired two-sample
t-tests. In this PR support for filters in unpaired is still missing. It will
be added in a follow-up PR.

Relates to #53692

Adds t_test metric aggregation that can perform paired and unpaired two-sample t-tests. In this PR support for filters in unpaired is still missing. It will be added in a follow-up PR. Relates to elastic#53692

elasticmachine · 2020-03-30T22:24:11Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

polyfractal

Left a few comments, this looks great! <3

polyfractal · 2020-04-01T20:19:56Z

docs/reference/aggregations/metrics/t-test-aggregation.asciidoc

+
+A `t_test` metrics aggregation that performs a statistical hypothesis test in which the test statistic follows a Student's t-distribution
+under the null hypothesis on numeric values extracted from the aggregated documents or generated by provided scripts.
+


Should perhaps add a layman explanation afterwards? Maybe something like "In practice, this will tell you if the difference between two population means are statistically significant", or something like that?

polyfractal · 2020-04-01T20:20:55Z

docs/reference/aggregations/metrics/t-test-aggregation.asciidoc

+
+   "aggregations": {
+      "startup_time_ttest": {
+         "value":  0.1914368843365979


Should we state what "value" is? E.g. it's the p-value?

polyfractal · 2020-04-01T20:28:34Z

.../analytics/src/main/java/org/elasticsearch/xpack/analytics/ttest/TTestAggregatorFactory.java

+                return new UnpairedTTestAggregator(name, numericMultiVS, tails, false, format, searchContext, parent, pipelineAggregators,
+                    metadata);
+            default:
+                throw new UnsupportedOperationException("Unsupported t-test type " + testType);


Hmm, do we know if UnsupportedOpException is a 4xx or 5xx? If it's a 5xx, perhaps we should change this to an IllegalArgumentException?

polyfractal · 2020-04-01T20:36:37Z

...plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/ttest/PairedTTestState.java

+
+public class PairedTTestState implements TTestState {
+
+    public static final String NAME = "P";


polyfractal · 2020-04-01T20:38:36Z

...plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/ttest/PairedTTestState.java

+        states.forEach(tTestState -> {
+            PairedTTestState state = (PairedTTestState) tTestState;
+            reducer.accept(state.stats);
+            assert state.tails == tails;


Is it possible for this to ever not match in practice (I mean, I know it shouldn't hence the assertion, but...)? I wonder if we should actually throw an exception rather than return a really incorrect result if we ever get this messed up?

polyfractal · 2020-04-01T20:50:47Z

...n/analytics/src/main/java/org/elasticsearch/xpack/analytics/ttest/PairedTTestAggregator.java

+        return new LeafBucketCollectorBase(sub, docAValues) {
+            @Override
+            public void collect(int doc, long bucket) throws IOException {
+                statsBuilder.grow(bigArrays, bucket + 1);


I think we can move this inside the conditionals below? Right now we'll end up growing the statsBuilder bigarray even if none of the documents end up satisfying the conditions (e.g. if we are unlucky and they all have only one of the two fields).

The size provided to grow() is the min size required, so it's ok to not call grow until we actually need a particular bucket ordinal (and then it will back-fill all the empty bucket ords essentially)

polyfractal · 2020-04-01T20:56:13Z

...ck/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/ttest/TStatsBuilder.java

+    }
+
+    public void grow(BigArrays bigArrays, long buckets) {
+        counts = bigArrays.grow(counts, buckets);


TBH, I'm not sure how expensive grow() is... I don't think it's very expensive so it might not matter. But some aggs that have a lot of big arrays to manage will call BigArrays#overSize() method directly and then resize each of their arrays, instead of grow'ing each.

StatsAggregator is a good example: https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/search/aggregations/metrics/StatsAggregator.java#L90-L95

imotov · 2020-04-02T14:23:44Z

@elasticmachine update branch

imotov · 2020-04-02T15:32:19Z

@elasticmachine run elasticsearch-ci/default-distro
@elasticmachine run elasticsearch-ci/docs

imotov · 2020-04-02T16:41:36Z

@elasticmachine run elasticsearch-ci/default-distro

imotov · 2020-04-02T17:15:32Z

@elasticmachine update branch

polyfractal

LGTM! <3

imotov · 2020-04-03T14:06:53Z

@elasticmachine update branch

Adds t_test metric aggregation that can perform paired and unpaired two-sample t-tests. In this PR support for filters in unpaired is still missing. It will be added in a follow-up PR. Relates to elastic#53692

Adds t_test metric aggregation that can perform paired and unpaired two-sample t-tests. In this PR support for filters in unpaired is still missing. It will be added in a follow-up PR. Relates to #53692

Add Student's t-test aggregation support

70c36a2

Adds t_test metric aggregation that can perform paired and unpaired two-sample t-tests. In this PR support for filters in unpaired is still missing. It will be added in a follow-up PR. Relates to elastic#53692

imotov requested a review from polyfractal March 30, 2020 22:23

imotov added :Analytics/Aggregations Aggregations >feature v7.8.0 v8.0.0 labels Mar 30, 2020

imotov added 2 commits March 30, 2020 18:36

Fix docs

e68c1ad

More doc fixes

5c67b61

imotov mentioned this pull request Mar 31, 2020

Transform support for t_test #54503

Open

imotov and others added 2 commits March 31, 2020 10:10

Fix testAggregationsVsTransforms

c9b5d1a

Merge remote-tracking branch 'elastic/master' into issue-53692-t-test

e294c56

$polyfractal$

polyfractal reviewed Apr 1, 2020

View reviewed changes

Address review comments and rename TStatsBuilder to TTestStatsBuilder

cf4e239

elasticmachine and others added 2 commits April 2, 2020 10:23

Merge branch 'master' into issue-53692-t-test

ca94074

Address more review comments

d6f3820

Merge branch 'master' into issue-53692-t-test

9d48358

$polyfractal$

polyfractal approved these changes Apr 2, 2020

View reviewed changes

Remove pipelines parameters after master merge

33cfe91

Merge branch 'master' into issue-53692-t-test

5d52bfc

imotov merged commit 5fc9fc5 into elastic:master Apr 3, 2020

imotov added the backport pending label Apr 3, 2020

imotov removed the backport pending label Apr 6, 2020

imotov deleted the issue-53692-t-test branch May 1, 2020 22:22

russcam mentioned this pull request May 29, 2020

7.8.0 Meta ticket elastic/elasticsearch-net#4718

Closed

17 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Student's t-test aggregation support #54469

Add Student's t-test aggregation support #54469

imotov commented Mar 30, 2020

elasticmachine commented Mar 30, 2020

$@polyfractal$ polyfractal left a comment

$@polyfractal$ polyfractal Apr 1, 2020

$@polyfractal$ polyfractal Apr 1, 2020

$@polyfractal$ polyfractal Apr 1, 2020

$@polyfractal$ polyfractal Apr 1, 2020

$@polyfractal$ polyfractal Apr 1, 2020

$@polyfractal$ polyfractal Apr 1, 2020

$@polyfractal$ polyfractal Apr 1, 2020

imotov commented Apr 2, 2020

imotov commented Apr 2, 2020

imotov commented Apr 2, 2020

imotov commented Apr 2, 2020

$@polyfractal$ polyfractal left a comment

imotov commented Apr 3, 2020


		A `t_test` metrics aggregation that performs a statistical hypothesis test in which the test statistic follows a Student's t-distribution
		under the null hypothesis on numeric values extracted from the aggregated documents or generated by provided scripts.


		public class PairedTTestState implements TTestState {

		public static final String NAME = "P";

Add Student's t-test aggregation support #54469

Add Student's t-test aggregation support #54469

Conversation

imotov commented Mar 30, 2020

elasticmachine commented Mar 30, 2020

polyfractal left a comment

Choose a reason for hiding this comment

polyfractal Apr 1, 2020

Choose a reason for hiding this comment

polyfractal Apr 1, 2020

Choose a reason for hiding this comment

polyfractal Apr 1, 2020

Choose a reason for hiding this comment

polyfractal Apr 1, 2020

Choose a reason for hiding this comment

polyfractal Apr 1, 2020

Choose a reason for hiding this comment

polyfractal Apr 1, 2020

Choose a reason for hiding this comment

polyfractal Apr 1, 2020

Choose a reason for hiding this comment

imotov commented Apr 2, 2020

imotov commented Apr 2, 2020

imotov commented Apr 2, 2020

imotov commented Apr 2, 2020

polyfractal left a comment

Choose a reason for hiding this comment

imotov commented Apr 3, 2020

$@polyfractal$ polyfractal left a comment

$@polyfractal$ polyfractal Apr 1, 2020

$@polyfractal$ polyfractal Apr 1, 2020

$@polyfractal$ polyfractal Apr 1, 2020

$@polyfractal$ polyfractal Apr 1, 2020

$@polyfractal$ polyfractal Apr 1, 2020

$@polyfractal$ polyfractal Apr 1, 2020

$@polyfractal$ polyfractal Apr 1, 2020

$@polyfractal$ polyfractal left a comment