Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Student's t-test aggregation support #54469

Merged
merged 11 commits into from
Apr 3, 2020
35 changes: 35 additions & 0 deletions docs/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -539,6 +539,41 @@ for (int i = 0; i < 100; i++) {
{"load_time": "$value"}"""
}

// Used by t_test aggregations
buildRestTests.setups['node_upgrade'] = '''
- do:
indices.create:
index: node_upgrade
body:
settings:
number_of_shards: 1
number_of_replicas: 1
mappings:
properties:
name:
type: keyword
startup_time_before:
type: long
startup_time_after:
type: long
- do:
bulk:
index: node_upgrade
refresh: true
body: |
{"index":{}}
{"name": "A", "startup_time_before": 102, "startup_time_after": 89}
{"index":{}}
{"name": "B", "startup_time_before": 99, "startup_time_after": 93}
{"index":{}}
{"name": "C", "startup_time_before": 111, "startup_time_after": 72}
{"index":{}}
{"name": "D", "startup_time_before": 97, "startup_time_after": 98}
{"index":{}}
{"name": "E", "startup_time_before": 101, "startup_time_after": 102}
{"index":{}}
{"name": "F", "startup_time_before": 99, "startup_time_after": 98}'''

// Used by iprange agg
buildRestTests.setups['iprange'] = '''
- do:
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/aggregations/metrics.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ include::metrics/median-absolute-deviation-aggregation.asciidoc[]

include::metrics/boxplot-aggregation.asciidoc[]


include::metrics/t-test-aggregation.asciidoc[]



Expand Down
114 changes: 114 additions & 0 deletions docs/reference/aggregations/metrics/t-test-aggregation.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
[role="xpack"]
[testenv="basic"]
[[search-aggregations-metrics-ttest-aggregation]]
=== TTest Aggregation

A `t_test` metrics aggregation that performs a statistical hypothesis test in which the test statistic follows a Student's t-distribution
under the null hypothesis on numeric values extracted from the aggregated documents or generated by provided scripts. In practice, this
will tell you if the difference between two population means are statistically significant and did not occur by chance alone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should perhaps add a layman explanation afterwards? Maybe something like "In practice, this will tell you if the difference between two population means are statistically significant", or something like that?

==== Syntax

A `t_test` aggregation looks like this in isolation:

[source,js]
--------------------------------------------------
{
"t_test": {
"a": "value_before",
"b": "value_after",
"type": "paired"
}
}
--------------------------------------------------
// NOTCONSOLE

Assuming that we have a record of node start up times before and after upgrade, let's look at a t-test to see if upgrade affected
the node start up time in a meaningful way.

[source,console]
--------------------------------------------------
GET node_upgrade/_search
{
"size": 0,
"aggs" : {
"startup_time_ttest" : {
"t_test" : {
"a" : {"field": "startup_time_before"}, <1>
"b" : {"field": "startup_time_after"}, <2>
"type": "paired" <3>
}
}
}
}
--------------------------------------------------
// TEST[setup:node_upgrade]
<1> The field `startup_time_before` must be a numeric field
<2> The field `startup_time_after` must be a numeric field
<3> Since we have data from the same nodes, we are using paired t-test.

The response will return the p-value or probability value for the test. It is the probability of obtaining results at least as extreme as
the result processed by the aggregation, assuming that the null hypothesis is correct (which means there is no difference between
population means). Smaller p-value means the null hypothesis is more likely to be incorrect and population means are indeed different.

[source,console-result]
--------------------------------------------------
{
...

"aggregations": {
"startup_time_ttest": {
"value": 0.1914368843365979 <1>
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
<1> The p-value.

==== T-Test Types

The `t_test` aggregation supports unpaired and paired two-sample t-tests. The type of the test can be specified using the `type` parameter:

`"type": "paired"`:: performs paired t-test
`"type": "homoscedastic"`:: performs two-sample equal variance test
`"type": "heteroscedastic"`:: performs two-sample unequal variance test (this is default)

==== Script

The `t_test` metric supports scripting. For example, if we need to adjust out load times for the before values, we could use
a script to recalculate them on-the-fly:

[source,console]
--------------------------------------------------
GET node_upgrade/_search
{
"size": 0,
"aggs" : {
"startup_time_ttest" : {
"t_test" : {
"a": {
"script" : {
"lang": "painless",
"source": "doc['startup_time_before'].value - params.adjustment", <1>
"params" : {
"adjustment" : 10 <2>
}
}
},
"b": {
"field": "startup_time_after" <3>
},
"type": "paired"
}
}
}
}
--------------------------------------------------
// TEST[setup:node_upgrade]

<1> The `field` parameter is replaced with a `script` parameter, which uses the
script to generate values which percentiles are calculated on
<2> Scripting supports parameterized input just like any other script
<3> We can mix scripts and fields

2 changes: 2 additions & 0 deletions x-pack/plugin/analytics/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ dependencies {

compileOnly project(path: xpackModule('core'), configuration: 'default')
testCompile project(path: xpackModule('core'), configuration: 'testArtifacts')

compile 'org.apache.commons:commons-math3:3.2'
}

integTest.enabled = false
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ec2544ab27e110d2d431bdad7d538ed509b21e62
Loading