Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sketched percentile #1420

Merged
merged 30 commits into from
Jul 1, 2024
Merged

Conversation

mrfh92
Copy link
Collaborator

@mrfh92 mrfh92 commented Apr 3, 2024

This solves #1411 by implementing a sketched version of percentile / median and a corresponding option for the RobustScaler in the preprocessing module.

All new features integrate into the current functionality without changes, due to choice of the current non-sketched version as default.

Performance comparison
on 14 MPI-processes (Workstation, CPU) on behalf of the "asteroids data set" (1317275 data points, 9 features), times and deviation measured over10 runs

avg time inaccuracy*
median main ~12.3s
median sketched (10% of data) 0.247s 0.30%
median sketched (1% of data) 0.01212s 1.3%
median sketched (0.1% of data) 0.00312s 2.7%

*relative deviation of the sketched median vs true median per feature, averaged over 10 runs; only relative deviation for the feature with maximal deviation is reported

Copy link
Contributor

github-actions bot commented Apr 3, 2024

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Apr 3, 2024

Thank you for the PR!

Copy link

codecov bot commented Apr 3, 2024

Codecov Report

Attention: Patch coverage is 91.66667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 91.86%. Comparing base (e927907) to head (896f573).
Report is 270 commits behind head on main.

Files Patch % Lines
heat/core/statistics.py 90.00% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1420   +/-   ##
=======================================
  Coverage   91.86%   91.86%           
=======================================
  Files          80       80           
  Lines       11860    11878   +18     
=======================================
+ Hits        10895    10912   +17     
- Misses        965      966    +1     
Flag Coverage Δ
unit 91.86% <91.66%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented Apr 4, 2024

Thank you for the PR!

1 similar comment
Copy link
Contributor

github-actions bot commented Apr 4, 2024

Thank you for the PR!

@mrfh92 mrfh92 requested review from ClaudiaComito and mtar April 4, 2024 08:45
@mrfh92 mrfh92 added the PR talk label Apr 4, 2024
@mrfh92 mrfh92 mentioned this pull request Apr 4, 2024
Copy link
Collaborator

@mtar mtar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments on documentation

heat/core/statistics.py Outdated Show resolved Hide resolved
heat/core/statistics.py Outdated Show resolved Hide resolved
heat/core/statistics.py Outdated Show resolved Hide resolved
heat/core/tests/test_statistics.py Outdated Show resolved Hide resolved
mrfh92 and others added 3 commits April 5, 2024 09:40
followed mtar's suggestion

Co-authored-by: Michael Tarnawa <[email protected]>
updated docstring according to review
Copy link
Contributor

github-actions bot commented Apr 5, 2024

Thank you for the PR!

Copy link
Contributor

github-actions bot commented Apr 5, 2024

Thank you for the PR!

1 similar comment
Copy link
Contributor

github-actions bot commented Apr 5, 2024

Thank you for the PR!

mrfh92 and others added 4 commits April 5, 2024 09:54
updated docstring according to review
updated docstring according to review
Copy link
Contributor

github-actions bot commented Apr 5, 2024

Thank you for the PR!

Copy link
Contributor

Thank you for the PR!

@mrfh92 mrfh92 requested a review from ClaudiaComito June 19, 2024 08:38
Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple tiny changes, otherwise I think we can merge this week. Thanks a lot @mrfh92 !

heat/core/statistics.py Outdated Show resolved Hide resolved
heat/core/statistics.py Outdated Show resolved Hide resolved
mrfh92 and others added 2 commits June 24, 2024 17:47
Co-authored-by: Claudia Comito <[email protected]>
Co-authored-by: Claudia Comito <[email protected]>
@mrfh92
Copy link
Collaborator Author

mrfh92 commented Jun 24, 2024

I think showing the example accuracies does not make so much sense since these accuracies very much depend on the data set (even if the median is quite stable w.r.t. outliers) and usually, we only show examples for small data that would allow calculation by hand, too.

@ClaudiaComito except for that I have merged your two change requests.

Copy link
Contributor

Thank you for the PR!

1 similar comment
Copy link
Contributor

Thank you for the PR!

@mrfh92 mrfh92 requested a review from ClaudiaComito June 25, 2024 05:56
ClaudiaComito
ClaudiaComito previously approved these changes Jun 28, 2024
Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks @mrfh92 !

@ClaudiaComito ClaudiaComito changed the title Features/1411 implement sketched percentile Implement sketched percentile Jun 28, 2024
Copy link
Contributor

github-actions bot commented Jul 1, 2024

Thank you for the PR!

Hoppe added 2 commits July 1, 2024 16:29
Copy link
Contributor

github-actions bot commented Jul 1, 2024

Thank you for the PR!

@mrfh92
Copy link
Collaborator Author

mrfh92 commented Jul 1, 2024

@ClaudiaComito I have tried to fix the bug on torch 1.12 and 1.13

@mrfh92 mrfh92 merged commit 064f495 into main Jul 1, 2024
53 checks passed
@mrfh92 mrfh92 deleted the features/1411-Implement_sketched_percentile branch July 1, 2024 15:50
@ClaudiaComito ClaudiaComito added the enhancement New feature or request label Aug 22, 2024
@ClaudiaComito ClaudiaComito added this to the 1.5.0 milestone Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants