-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Student's t-test aggregation support #54469
Merged
Merged
Changes from 5 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
70c36a2
Add Student's t-test aggregation support
imotov e68c1ad
Fix docs
imotov 5c67b61
More doc fixes
imotov c9b5d1a
Fix testAggregationsVsTransforms
imotov e294c56
Merge remote-tracking branch 'elastic/master' into issue-53692-t-test
jasontedor cf4e239
Address review comments and rename TStatsBuilder to TTestStatsBuilder
imotov ca94074
Merge branch 'master' into issue-53692-t-test
elasticmachine d6f3820
Address more review comments
imotov 9d48358
Merge branch 'master' into issue-53692-t-test
elasticmachine 33cfe91
Remove pipelines parameters after master merge
imotov 5d52bfc
Merge branch 'master' into issue-53692-t-test
elasticmachine File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
111 changes: 111 additions & 0 deletions
111
docs/reference/aggregations/metrics/t-test-aggregation.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
[role="xpack"] | ||
[testenv="basic"] | ||
[[search-aggregations-metrics-ttest-aggregation]] | ||
=== TTest Aggregation | ||
|
||
A `t_test` metrics aggregation that performs a statistical hypothesis test in which the test statistic follows a Student's t-distribution | ||
under the null hypothesis on numeric values extracted from the aggregated documents or generated by provided scripts. | ||
|
||
==== Syntax | ||
|
||
A `t_test` aggregation looks like this in isolation: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"t_test": { | ||
"a": "value_before", | ||
"b": "value_after", | ||
"type": "paired" | ||
} | ||
} | ||
-------------------------------------------------- | ||
// NOTCONSOLE | ||
|
||
Assuming that we have a record of node start up times before | ||
and after upgrade, let's look at a ttest to see if upgrade affected | ||
the node start up time in a meaningful way. | ||
|
||
[source,console] | ||
-------------------------------------------------- | ||
GET node_upgrade/_search | ||
{ | ||
"size": 0, | ||
"aggs" : { | ||
"startup_time_ttest" : { | ||
"t_test" : { | ||
"a" : {"field": "startup_time_before"}, <1> | ||
"b" : {"field": "startup_time_after"}, <2> | ||
"type": "paired" <3> | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TEST[setup:node_upgrade] | ||
<1> The field `startup_time_before` must be a numeric field | ||
<2> The field `startup_time_after` must be a numeric field | ||
<3> Since we have data from the same nodes, we are using paired t-test. | ||
|
||
The response will look like this: | ||
|
||
[source,console-result] | ||
-------------------------------------------------- | ||
{ | ||
... | ||
|
||
"aggregations": { | ||
"startup_time_ttest": { | ||
"value": 0.1914368843365979 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we state what "value" is? E.g. it's the p-value? |
||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/] | ||
|
||
==== T-Test Types | ||
|
||
The `t_test` aggregation supports unpaired and paired two-sample t-tests. The type of the test can be specified using the `type` parameter: | ||
|
||
`"type": "paired"`:: performs paired t-test | ||
`"type": "homoscedastic"`:: performs two-sample equal variance test | ||
`"type": "heteroscedastic"`:: performs two-sample unequal variance test (this is default) | ||
|
||
==== Script | ||
|
||
The `t_test` metric supports scripting. For example, if we need to adjust out load times for the before values, we could use | ||
a script to recalculate them on-the-fly: | ||
|
||
[source,console] | ||
-------------------------------------------------- | ||
GET node_upgrade/_search | ||
{ | ||
"size": 0, | ||
"aggs" : { | ||
"startup_time_ttest" : { | ||
"t_test" : { | ||
"a": { | ||
"script" : { | ||
"lang": "painless", | ||
"source": "doc['startup_time_before'].value - params.adjustment", <1> | ||
"params" : { | ||
"adjustment" : 10 <2> | ||
} | ||
} | ||
}, | ||
"b": { | ||
"field": "startup_time_after" <3> | ||
}, | ||
"type": "paired" | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TEST[setup:node_upgrade] | ||
|
||
<1> The `field` parameter is replaced with a `script` parameter, which uses the | ||
script to generate values which percentiles are calculated on | ||
<2> Scripting supports parameterized input just like any other script | ||
<3> We can mix scripts and fields | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
ec2544ab27e110d2d431bdad7d538ed509b21e62 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should perhaps add a layman explanation afterwards? Maybe something like "In practice, this will tell you if the difference between two population means are statistically significant", or something like that?