-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partition values that have been url encoded cannot be read when using deltalake #1446
Labels
bug
Something isn't working
Comments
Was this table created in PySpark? Or with deltalake? |
PySpark |
Looks like we have this test case for this we need to fix:
|
May be related to #1079 |
wjones127
added a commit
that referenced
this issue
Sep 11, 2023
# Description In the delta log, paths are percent encoded. We decode them here: https://github.com/delta-io/delta-rs/blob/787c13a63efa9ada96d303c10c093424215aaa80/rust/src/action/mod.rs#L435-L437 Which is good. But then we've been re-encoding them with `Path::from`. This PR changes to use `Path::parse` when possible instead. Instead of propagating errors, we just fallback to `Path::from` for now. Read more here: https://docs.rs/object_store/0.7.0/object_store/path/struct.Path.html#encode # Related Issue(s) * closes #1533 * closes #1446 * closes #1079 * closes #1393 # Documentation <!--- Share links to useful documentation --->
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Environment
pyarrow 11.0.0
deltalake 0.9.0
Binding: Python
Environment:
Bug
What happened:
Receiving an error when attempting to read partition values that have a colon
What you expected to happen:
I would expect to be able to read these partitions like I can using pyspark
How to reproduce it:
Create a delta table partitioned by timestamp using PySpark. Example partition value: 2023-06-07 13:00:00. When data is loaded to the partition it creates a folder such as this:
'load_ts=2023-06-07 13%3A00%3A00'
Then when running the below code:
I encounter the following error. Notice that in the error message it has url encoded the % resulting in %25253A instead of %3A:
More details:
This is happening when using local storage or aws s3. I also do not encounter this issue if reading these partitions from within pyspark.
The text was updated successfully, but these errors were encountered: