[DOCS] Add total feature importance

elastic · Sep 28, 2020 · b4efe00 · b4efe00
1 parent a5ffb01
commit b4efe00
Show file tree

Hide file tree

Showing 3 changed files with 47 additions and 22 deletions.
diff --git a/docs/en/stack/ml/df-analytics/images/flights-regression-decision-plot.png b/docs/en/stack/ml/df-analytics/images/flights-regression-decision-plot.png
diff --git a/docs/en/stack/ml/df-analytics/images/flights-regression-total-importance.png b/docs/en/stack/ml/df-analytics/images/flights-regression-total-importance.png
diff --git a/docs/en/stack/ml/df-analytics/ml-feature-importance.asciidoc b/docs/en/stack/ml/df-analytics/ml-feature-importance.asciidoc
@@ -5,32 +5,57 @@
 experimental::[]
 
 {feat-imp-cap} values indicate which fields had the biggest impact on each 
-prediction that is generated by <<dfa-classification,{classification}>> or 
-<<dfa-regression,{regression}>> analysis. The features of the data points are 
-responsible for a particular prediction to varying degrees. {feat-imp-cap} shows 
-to what degree a given feature of a data point contributes to the prediction. 
-The {feat-imp} value can be either positive or negative depending on its effect 
-on the prediction. If the feature reduces the prediction value, the {feat-imp} 
-is negative, if it increases the prediction, then the {feat-imp} is positive. 
-The magnitude of {feat-imp} shows how significantly the feature affects the 
-prediction for a given data point.
+prediction that is generated by {classification} or {regression} analysis. Each
+{feat-imp} value has both a magnitude and a direction (positive or negative),
+which indicate how each field (or _feature_ of a data point) affects a
+particular prediction.
 
-{feat-imp-cap} in the {stack} is calculated using the SHAP (SHapley Additive 
-exPlanations) method as described in
-https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf[Lundberg, S. M., & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In NeurIPS 2017].
+The purpose of {feat-imp} is to help you determine whether the predictions are
+sensible. Is the relationship between the dependent variable and the important
+features supported by your domain knowledge? The lessons you learn about the
+importance of specific features might also affect your decision to include them
+in future iterations of your trained model.
+
+You can see the average magnitude of the {feat-imp} values for each field across
+all the training data in {kib} or by using the
+{ref}/get-inference.html[get trained model API]. For example:
+
+[role="screenshot"]
+image::images/flights-regression-total-importance.png["Total {feat-imp} values for a {regression} {dfanalytics-job} in {kib}"]
+
+You can also examine the feature importance values for each individual
+prediction. In {kib}, you can see these values in JSON objects or decision plots:
+
+[role="screenshot"]
+image::images/flights-regression-decision-plot.png["Feature importance values for a {regression} {dfanalytics-job} in {kib}"]
 
-By default, {feat-imp} values are not calculated when you configure the job via 
-the API. To generate this information, when you create a {dfanalytics-job} you 
-must specify the `num_top_feature_importance_values` property. When you 
-configure the job in {kib}, {feat-imp} values are calculated automatically. The 
-{feat-imp} values are stored in the {ml} results field for each document in the 
-destination index.
+For {reganalysis}, each decision plot starts at a shared baseline, which is
+the average of the prediction values for all the data points in the training
+data set. When you add all of the feature importance values for a particular
+data point to that baseline, you arrive at the numeric prediction value. If a 
+{feat-imp} value is negative, it reduces the prediction value. If a {feat-imp}
+value is positive, it increases the prediction value.
 
-NOTE: The number of {feat-imp} values for each document might be less than the 
-`num_top_feature_importance_values` property value. For example, it returns only 
-features that had a positive or negative effect on the prediction.
+//TBD: Add section about classification analysis.
+
+By default, {feat-imp} values are not calculated. To generate this information,
+when you create a {dfanalytics-job} you must specify the
+`num_top_feature_importance_values` property. For example, see
+<<flightdata-regression>>.
+//and <<flightdata-classification>>.
+
+The {feat-imp} values are stored in the {ml} results field for each document in
+the destination index. The number of {feat-imp} values for each document might
+be less than the `num_top_feature_importance_values` property value. For example,
+it returns only features that had a positive or negative effect on the
+prediction.
 
 [[ml-feature-importance-readings]]
 == Further reading
 
-https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning[{feat-imp-cap} for {dfanalytics} with Elastic {ml}]
+{feat-imp-cap} in the {stack} is calculated using the SHAP (SHapley Additive 
+exPlanations) method as described in
+https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf[Lundberg, S. M., & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In NeurIPS 2017].
+
+See also
+https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning[{feat-imp-cap} for {dfanalytics} with Elastic {ml}].