Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Decimal overflow #2974

Closed
gruuya opened this issue Nov 4, 2024 · 0 comments · Fixed by #2975
Closed

Parse Decimal overflow #2974

gruuya opened this issue Nov 4, 2024 · 0 comments · Fixed by #2975
Labels
bug Something isn't working

Comments

@gruuya
Copy link
Contributor

gruuya commented Nov 4, 2024

Environment

Delta-rs version: 0.20.1

Binding: Rust

Environment:

  • Cloud provider:
  • OS:
  • Other:

Bug

What happened: When converting Parquet Statistics::FixedLenByteArray value for a Decimal(scale, precision) to an internal representation based on f64, a rounding error can sometimes lead to the output value whose integer part exceeds the allotted space (i.e. the number of digits is larger than precision - scale).

In turn this will result in an error such as Parser error: parse decimal overflow (1e32) when trying to parse the stats from the logs.

What you expected to happen: The conversion should respect the Decimal's precision/scale (even it means it's slightly less precise than with the overflow).

How to reproduce it: The following cases (in test_stats_scalar_serialization) should pass

            (
                simple_parquet_stat!(
                    Statistics::FixedLenByteArray,
                    FixedLenByteArray::from(vec![
                        75, 59, 76, 168, 90, 134, 196, 122, 9, 138, 34, 63, 255, 255, 255, 255
                    ])
                ),
                Some(LogicalType::Decimal {
                    scale: 6,
                    precision: 38,
                }),
                Value::from(9.999999999999999e31),
            ),
            (
                simple_parquet_stat!(
                    Statistics::FixedLenByteArray,
                    FixedLenByteArray::from(vec![
                        180, 196, 179, 87, 165, 121, 59, 133, 246, 117, 221, 192, 0, 0, 0, 1
                    ])
                ),
                Some(LogicalType::Decimal {
                    scale: 6,
                    precision: 38,
                }),
                Value::from(-9.999999999999999e31),
            ),

as otherwise arrow would raise a parse decimal overflow error for 1e32/-1e32.

More details: Coincidentally, this also revealed a related issue whereby the commit effectively succeeds, meaning the new table version is successfully promoted, but the error is thrown somewhere around running post-commit hooks since the faulty stat gets parsed then.

@gruuya gruuya added the bug Something isn't working label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant