Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] bug of parquet read #1025

Closed
FelixYBW opened this issue Feb 25, 2023 · 5 comments
Closed

[VL] bug of parquet read #1025

FelixYBW opened this issue Feb 25, 2023 · 5 comments
Labels
bug Something isn't working stale stale velox backend works for Velox backend

Comments

@FelixYBW
Copy link
Contributor

Describe the bug
We saw this bug from another customer again. let's track it here.
image

To Reproduce
only appear in customer env.

@FelixYBW FelixYBW added the bug Something isn't working label Feb 25, 2023
@FelixYBW FelixYBW changed the title bug of parquet read [VL] bug of parquet read Feb 25, 2023
@weiting-chen weiting-chen moved this to 🆕 New in Gluten 0.5.0 Mar 1, 2023
@weiting-chen weiting-chen added the velox backend works for Velox backend label Mar 1, 2023
@PHILO-HE
Copy link
Contributor

PHILO-HE commented Mar 2, 2023

This issue can be reproduced by reading a parquet file whose string column has long min/max string value. A reproducible extreme case is the column has a string containing 1000 repeated 'z' characters as the max value.

Velox community has opened a similar issue: facebookincubator/velox#3882
And there is a PR already in the community: facebookincubator/velox#4108
Our test shows the PR works for fixing this issue.

@FelixYBW
Copy link
Contributor Author

FelixYBW commented Mar 2, 2023

This issue can be reproduced by reading a parquet file whose string column has large min/max string value. A reproducible extreme case is the column has a string containing 1000 repeated 'z' characters as the max value.

Velox community has opened a similar issue: facebookincubator/velox#3882 And there is a PR already in the community: facebookincubator/velox#4108 Our test shows the PR works for fixing this issue.

Do we have a PR in Gluten?

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Mar 3, 2023

This issue can be reproduced by reading a parquet file whose string column has large min/max string value. A reproducible extreme case is the column has a string containing 1000 repeated 'z' characters as the max value.
Velox community has opened a similar issue: facebookincubator/velox#3882 And there is a PR already in the community: facebookincubator/velox#4108 Our test shows the PR works for fixing this issue.

Do we have a PR in Gluten?

The fix is only required on velox code side. Even though the velox community patch works for solving this issue, it is still under development to avoid causing any other possible issues I guess. We will create a PR on our side once community PR is in good shape.

@FelixYBW
Copy link
Contributor Author

FelixYBW commented Mar 6, 2023

@PHILO-HE can we have a temp PR to pick up Velox's PR, so customer can try?

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Mar 7, 2023

@PHILO-HE can we have a temp PR to pick up Velox's PR, so customer can try?

Yes, I just filed a PR to port upstream patches: oap-project/velox#145. On my side, a simple test shows the PR can work well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale stale velox backend works for Velox backend
Projects
None yet
Development

No branches or pull requests

3 participants