Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Add feature importance summary charts #78238

Merged
merged 25 commits into from
Oct 1, 2020

Conversation

qn895
Copy link
Member

@qn895 qn895 commented Sep 23, 2020

Summary

Part of #76039. This PR adds the feature importance summary charts to the

Binary classification

Screen Shot 2020-09-22 at 6 49 33 PM

Multi-class classification

Screen Shot 2020-09-23 at 9 43 49 AM

Regression

Screen Shot 2020-09-22 at 7 00 37 PM

Checklist

Delete any items that are not applicable to this PR.

@qn895 qn895 added v8.0.0 Feature:Data Frame Analytics ML data frame analytics features v7.10.0 labels Sep 23, 2020
@qn895 qn895 self-assigned this Sep 23, 2020
@qn895 qn895 changed the title [ML] WIP - Add feature importance summary charts [ML] Add feature importance summary charts Sep 23, 2020
@qn895 qn895 marked this pull request as ready for review September 23, 2020 15:09
@qn895 qn895 requested a review from a team as a code owner September 23, 2020 15:09
@qn895 qn895 added the :ml label Sep 23, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

…error handling with getToastNotificationService.
@qn895
Copy link
Member Author

qn895 commented Sep 23, 2020

@elasticmachine merge upstream

(model) => model.metadata?.analytics_config?.id === jobId
);
if (
Array.isArray(inferenceModel?.metadata?.total_feature_importance) === true &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - I don't think you really need the === true after the isArray check as the isArray method is pretty clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reasons TS needs the explicit check here to make the second part of the if statement valid. Not sure why but I will leave it for now if that's okay.

@alvarezmelissa87
Copy link
Contributor

Great addition! Code LGTM overall - just want to give it a good local test. Will do first thing but feel free to merge if you get the other green checks before then.

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just left a couple of extra comments.

<span>
<FormattedMessage
id="xpack.ml.dataframe.analytics.exploration.featureImportanceSummaryTitle"
defaultMessage="Feature importance summary"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this heading to Overall feature importance

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here 6a3c0d8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's "total_feature_importance" in the API should it match as "Total feature importance" in the title here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lcawl Thanks I updated it here to "Total feature importance" ab8a750

…ance-summary

# Conflicts:
#	x-pack/plugins/ml/public/application/data_frame_analytics/pages/analytics_exploration/components/classification_exploration/classification_exploration.tsx
#	x-pack/plugins/ml/public/application/data_frame_analytics/pages/analytics_exploration/components/exploration_page_wrapper/exploration_page_wrapper.tsx
#	x-pack/plugins/ml/public/application/data_frame_analytics/pages/analytics_exploration/components/regression_exploration/regression_exploration.tsx
@walterra
Copy link
Contributor

Great work overall! I noticed one thing with the multi-class chart: I seems that one isn't sorted properly on the overall value of all classes, you can also see it in the screenshot of this PRs description.

summary: ClassificationTotalFeatureImportance | RegressionTotalFeatureImportance
): summary is RegressionTotalFeatureImportance {
return (summary as RegressionTotalFeatureImportance).importance !== undefined;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have feature_importance.ts in the same directory, can we move all additions to this file to feature_importances.ts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here e18bc41

Comment on lines 70 to 71
analysisType === ANALYSIS_CONFIG_TYPE.CLASSIFICATION ||
analysisType === ANALYSIS_CONFIG_TYPE.REGRESSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In plugins/ml/common/util/analytics_utils.ts we have isClassificationAnalysis and isRegressionAnalysis which we could use here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated it to using the util functions here e18bc41

</EuiFlexGroup>
</div>
<Chart
className="story-chart"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this className was copied from the examples and is not needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that. Updated here e18bc41

@qn895
Copy link
Member Author

qn895 commented Sep 29, 2020

Great work overall! I noticed one thing with the multi-class chart: I seems that one isn't sorted properly on the overall value of all classes, you can also see it in the screenshot of this PRs description.

You're right @walterra. I made some modifications here da48b1a to sort the class by the overall sum mean_magnitude. It should look like this now:

Screen Shot 2020-09-29 at 9 31 44 AM

I also noticed the bars aren't always stacked from largest to smallest from left to right. This seems to be a chart issue so I'm going to raise an issue with the team to look into that. Thanks again for catching that.

@walterra
Copy link
Contributor

Great update! I think the chart is fine now as is, the order of colors should always be the same for each stacked bar.

}

// sort from largest importance at top to smallest importance at bottom
sortedData = sortedData.sort((a, b) => b.meanImportance - a.meanImportance);
// sortedData = sortedData.sort((a, b) => b.meanImportance - a.meanImportance);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this can be cleaned up, remove the commented code and move the text comment above where the sorting is now happening.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I removed it here b73325b.

Copy link
Contributor

@walterra walterra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes LGTM

Copy link
Contributor

@alvarezmelissa87 alvarezmelissa87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes LGTM ⚡

}

// only show legend if it's a multiclass
const _showLegend = classificationType === 'multiclass_classification';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For classification jobs where the numbers of classes is higher than the number of features, I am seeing some truncation of the legend labels. Can we fix this with styling on our side, or does it need a fix on the chart?

image

I also see this with a job on the mushroom data set with this config:

{
  "id": "mushroom_cap_color_classification2",
  "description": "10 top feature importance values",
  "source": {
    "index": [
      "mushroom"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "mushroom_cap_color_classification2",
    "results_field": "ml"
  },
  "analysis": {
    "classification": {
      "dependent_variable": "cap-color",
      "num_top_feature_importance_values": 10,
      "class_assignment_objective": "maximize_minimum_recall",
      "num_top_classes": 2,
      "prediction_field_name": "cap-color_prediction",
      "training_percent": 8,
      "randomize_seed": 2195820625972847400
    }
  },
  "analyzed_fields": {
    "includes": [
      "bruises",
      "cap-color",
      "cap-shape",
      "cap-surface",
      "edibility",
      "gill-attachment",
      "gill-color",
      "gill-size",
      "gill-spacing",
      "habitat",
      "odor",
      "population",
      "ring-number",
      "ring-type",
      "spore-print-color",
      "stalk-color-above-ring",
      "stalk-color-below-ring",
      "stalk-root",
      "stalk-shape",
      "stalk-surface-above-ring",
      "stalk-surface-below-ring",
      "veil-color",
      "veil-type"
    ],
    "excludes": []
  },
  "model_memory_limit": "41mb",
  "create_time": 1601285932889,
  "version": "8.0.0",
  "allow_lazy_start": false,
  "max_num_threads": 1
}

Copy link
Member Author

@qn895 qn895 Sep 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it to give some buffer spacing here 17cfd1d

Unfortunately it's not dynamic so if the label texts are short, there will be some space on the right and vice versa it will still be truncated if the texts are long. Let me know if you're good with the change! Or if there's anything we can improve as far the legend goes, we can do it in a follow up PR.

Copy link
Contributor

@peteharverson peteharverson Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good solution. Shame it isn't dynamic, but it fixes the issue I was seeing where it was truncating the labels after the first character. For long labels the majority of the text should be visible so the user can easily identify which color goes with each class.

image

…icer

[ML] Update legend for multi class total feature importance to look nicer
@qn895
Copy link
Member Author

qn895 commented Oct 1, 2020

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

@kbn/optimizer bundle module count

id value diff baseline
ml 1188 +2 1186

@kbn/ui-shared-deps asset size

id value diff baseline
css 643.8KB ⚠️ +643.8KB -
[email protected] 2.9MB ⚠️ +2.9MB -
kbn-ui-shared-deps.js 5.2MB ⚠️ +5.2MB -
total ⚠️ +8.7MB

async chunks size

id value diff baseline
ml 10.5MB +35.6KB 10.5MB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested latest changes and LGTM

@qn895 qn895 merged commit 31efa1a into elastic:master Oct 1, 2020
@qn895 qn895 deleted the ml-feature-importance-summary branch October 1, 2020 14:39
qn895 added a commit to qn895/kibana that referenced this pull request Oct 1, 2020
qn895 added a commit that referenced this pull request Oct 1, 2020
Co-authored-by: Elastic Machine <[email protected]>

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants