Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataGrid] Filtering performance: use Set() when applicable #10068

Closed
wants to merge 10 commits into from

Conversation

romgrk
Copy link
Contributor

@romgrk romgrk commented Aug 17, 2023

Last part of #9120

edit: Read the summary below, these results are outdated.

This PR improves performance by using Set where applicable.

Elapsed (ms)
Without Set 56.1 (+-2.5)
With Set 35.6 (+-1.8)

Test: 100,000 rows of the commodity dataset, filter one string column with the string "am". We observe gains of about 35% for this test.

Note that I was thinking we'd need to update the selectors to keep returning objects instead of Sets, but I haven't found any public selector that exposes them, they're all @ignore Do not document. If you see anything that would break semver guarantees, please point it out.

@romgrk romgrk added performance component: data grid This is the name of the generic UI component, not the React module! labels Aug 17, 2023
@mui-bot
Copy link

mui-bot commented Aug 17, 2023

Netlify deploy preview

Netlify deploy preview: https://deploy-preview-10068--material-ui-x.netlify.app/

Updated pages

No updates.

These are the results for the performance tests:

Test case Unit Min Max Median Mean σ
Filter 100k rows ms -163 127.5 -6.7 -7.86 119.49
Sort 100k rows ms 943.4 1,701.5 1,606.3 1,436.96 264.51
Select 100k rows ms 656.3 951.8 885.4 821.44 118.234
Deselect 100k rows ms 188.9 360.2 218.7 242.92 60.181

Generated by 🚫 dangerJS against 5d1442c

@romgrk
Copy link
Contributor Author

romgrk commented Oct 2, 2023

Summary of this PR:

  • I initally did the switch from {} lookup object to Set, and observed the improvements described above.
  • I noticed that our {} lookups work as such: false means hidden, true and undefined means not hidden. In other words, the default behavior was to show the row if it hadn't been added in the lookup object. This is the case for autogenerated rows, that not added to the lookup but we always want to show. To comply with those rules, I changed the meaning of sets so that e.g. visibleRows became hiddenRows, and we can use Set to encode things as before.
  • However, making that change lost nearly all of the performance improvements observed in the first step.

At this point, I'm not sure if I'm still interested in merging this due to the limited performance benefits that it brings. One change that I do like however is that it becomes clear what the rules are. It took me some time to figure out that tests were failing because both true and undefined meant the same thing.

cc @mui/xgrid

@@ -574,7 +575,6 @@
{ "name": "GridVisibilityOffIcon", "kind": "Variable" },
{ "name": "gridVisibleColumnDefinitionsSelector", "kind": "Variable" },
{ "name": "gridVisibleColumnFieldsSelector", "kind": "Variable" },
{ "name": "gridVisibleRowsLookupSelector", "kind": "Variable" },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels risky TBH. While gridVisibleRowsLookupSelector is not documented, it's exported and not marked as unstable/experimental.
Should we wait with this PR until v7 alpha?

@@ -17,7 +17,7 @@ import {
createRowTree,
updateRowTree,
RowTreeBuilderGroupingCriterion,
getVisibleRowsLookup,
getHiddenRowsLookup,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, making that change lost nearly all of the performance improvements observed in the first step.

I don't understand what caused the loss of the performance improvements.
First, you tried using Set to keep all the visible rows.
Then, you reversed it to only keep the hidden rows. Did this increase the number of records in the Set drastically?
Or is it something else that impacted performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also didn't make sense to me, but I haven't had enough time to investigate further.

@michelengelen michelengelen changed the base branch from master to next November 6, 2023 14:24
@github-actions github-actions bot added the PR: out-of-date The pull request has merge conflicts and can't be merged label Jan 10, 2024
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@romgrk
Copy link
Contributor Author

romgrk commented Mar 6, 2024

Although the performance gains are interesting I don't have the bandwidth to continue exploring this one so I'll close it for now. If we look into it again, I'm pretty sure we can expose a backwards compatible interface even if we change the internal model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: data grid This is the name of the generic UI component, not the React module! performance PR: out-of-date The pull request has merge conflicts and can't be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants