Skip to content

Commit

Permalink
[DOCS] Add total feature importance
Browse files Browse the repository at this point in the history
  • Loading branch information
lcawl committed Sep 28, 2020
1 parent a5ffb01 commit b4efe00
Show file tree
Hide file tree
Showing 3 changed files with 47 additions and 22 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
69 changes: 47 additions & 22 deletions docs/en/stack/ml/df-analytics/ml-feature-importance.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,32 +5,57 @@
experimental::[]

{feat-imp-cap} values indicate which fields had the biggest impact on each
prediction that is generated by <<dfa-classification,{classification}>> or
<<dfa-regression,{regression}>> analysis. The features of the data points are
responsible for a particular prediction to varying degrees. {feat-imp-cap} shows
to what degree a given feature of a data point contributes to the prediction.
The {feat-imp} value can be either positive or negative depending on its effect
on the prediction. If the feature reduces the prediction value, the {feat-imp}
is negative, if it increases the prediction, then the {feat-imp} is positive.
The magnitude of {feat-imp} shows how significantly the feature affects the
prediction for a given data point.
prediction that is generated by {classification} or {regression} analysis. Each
{feat-imp} value has both a magnitude and a direction (positive or negative),
which indicate how each field (or _feature_ of a data point) affects a
particular prediction.

{feat-imp-cap} in the {stack} is calculated using the SHAP (SHapley Additive
exPlanations) method as described in
https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf[Lundberg, S. M., & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In NeurIPS 2017].
The purpose of {feat-imp} is to help you determine whether the predictions are
sensible. Is the relationship between the dependent variable and the important
features supported by your domain knowledge? The lessons you learn about the
importance of specific features might also affect your decision to include them
in future iterations of your trained model.

You can see the average magnitude of the {feat-imp} values for each field across
all the training data in {kib} or by using the
{ref}/get-inference.html[get trained model API]. For example:

[role="screenshot"]
image::images/flights-regression-total-importance.png["Total {feat-imp} values for a {regression} {dfanalytics-job} in {kib}"]

You can also examine the feature importance values for each individual
prediction. In {kib}, you can see these values in JSON objects or decision plots:

[role="screenshot"]
image::images/flights-regression-decision-plot.png["Feature importance values for a {regression} {dfanalytics-job} in {kib}"]

By default, {feat-imp} values are not calculated when you configure the job via
the API. To generate this information, when you create a {dfanalytics-job} you
must specify the `num_top_feature_importance_values` property. When you
configure the job in {kib}, {feat-imp} values are calculated automatically. The
{feat-imp} values are stored in the {ml} results field for each document in the
destination index.
For {reganalysis}, each decision plot starts at a shared baseline, which is
the average of the prediction values for all the data points in the training
data set. When you add all of the feature importance values for a particular
data point to that baseline, you arrive at the numeric prediction value. If a
{feat-imp} value is negative, it reduces the prediction value. If a {feat-imp}
value is positive, it increases the prediction value.

NOTE: The number of {feat-imp} values for each document might be less than the
`num_top_feature_importance_values` property value. For example, it returns only
features that had a positive or negative effect on the prediction.
//TBD: Add section about classification analysis.

By default, {feat-imp} values are not calculated. To generate this information,
when you create a {dfanalytics-job} you must specify the
`num_top_feature_importance_values` property. For example, see
<<flightdata-regression>>.
//and <<flightdata-classification>>.

The {feat-imp} values are stored in the {ml} results field for each document in
the destination index. The number of {feat-imp} values for each document might
be less than the `num_top_feature_importance_values` property value. For example,
it returns only features that had a positive or negative effect on the
prediction.

[[ml-feature-importance-readings]]
== Further reading

https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning[{feat-imp-cap} for {dfanalytics} with Elastic {ml}]
{feat-imp-cap} in the {stack} is calculated using the SHAP (SHapley Additive
exPlanations) method as described in
https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf[Lundberg, S. M., & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In NeurIPS 2017].

See also
https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning[{feat-imp-cap} for {dfanalytics} with Elastic {ml}].

0 comments on commit b4efe00

Please sign in to comment.