-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix incorrect parquet statistics written for UInt64 values > Int64::MAX #16766
Conversation
Test failure due to pip |
CodSpeed Performance ReportMerging #16766 will improve performances by 17.4%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #16766 +/- ##
==========================================
- Coverage 81.32% 81.31% -0.01%
==========================================
Files 1423 1424 +1
Lines 187177 187209 +32
Branches 2721 2726 +5
==========================================
+ Hits 152214 152228 +14
- Misses 34468 34484 +16
- Partials 495 497 +2 ☔ View full report in Codecov by Sentry. |
Not trying to sound like I'm criticizing but this doesn't really fix the issue even as it does avoid them in the future. The fix, I think, would be to add a check in the reader that if min_value is greater than max_value then ignore the stats and possibly issue a warning that the stats are invalid. |
This is what I had in mind from my last comment. #16776 |
That's a good point, I imagine there are parquet files in the wild that have incorrect statistics (perhaps by us 😝), that some basic validation could help with. |
Fixes #16683, #15323