-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust engine doesn't correctly seralize path for partions on timestamp on Windows #2382
Comments
In python we use the FileSystemHandler from src/filesystem.rs, this always normalizes the path: fn normalize_path(&self, path: String) -> PyResult<String> {
let suffix = if path.ends_with('/') { "/" } else { "" };
let path = Path::parse(path).unwrap();
Ok(format!("{path}{suffix}"))
}
|
|
@ion-elgreco I just tried out some different examples. It looks like use object_store::path::{Path};
fn main() {
let path: String = r"C:\table\time=2021-01-02 03:04:06.000003\file.parquet".to_string();
// let path: Result<Path, object_store::path::Error> = Path::parse(path);
let path_from = Path::from(path);
println!( "{:?}", path_from);
let path: String = r"C:\table\time=2021-01-02 03:04:06.000003\file.parquet".to_string();
let path_parse: Result<Path, object_store::path::Error> = Path::parse(path);
println!( "{:?}", path_parse);
let path: String = r"C:\table\time=2021-01-02 03:04:06.000003\<file|.parquet".to_string();
// let path: Result<Path, object_store::path::Error> = Path::parse(path);
let path_from = Path::from(path);
println!( "{:?}", path_from);
let path: String = r"C:\table\time=2021-01-02 03:04:06.000003\<file|.parquet".to_string();
let path_parse: Result<Path, object_store::path::Error> = Path::parse(path);
println!( "{:?}", path_parse);
}
|
Closed upstream in apache/arrow-rs#5830 so when a new object store is released an used it should be fixed ✌🏻 |
Just ran this on Windows using import pandas as pd
from deltalake import write_deltalake
from datetime import datetime
dates = pd.date_range(datetime(2021,1,1,3,4,6,3),datetime(2021,1,3,3,4,6))
df = pd.DataFrame({"time":dates, "a":[i for i in range(len(dates))]})
#Write with diffrent engines
write_deltalake("mytable",df, partition_by="time", mode="overwrite",engine="pyarrow")
write_deltalake("mytable",df, partition_by="time", mode="overwrite",engine="rust") and the the paths serialize correctly: |
Environment
Delta-rs version: main
Binding: python
Environment:
Bug
What happened:
When using the
rust
engine timestamps are serialized with colon (:
) it the file path. This does not work on Windows.OSError: Generic LocalFileSystem error: Unable to open file C:\projects\delta-rs\mytable\time=2021-01-02 03:04:06.000003\part-00001-2be14fa0-e4f4-4fc0-bf61-6779b08cf550-c000.snappy.parquet#1: The filename, directory name, or volume label syntax is incorrect. (os error 123)
What you expected to happen:
That
time
was seralized like in pyarrow:time=2021-01-01%2003%3A04%3A06.000003
How to reproduce it:
Run the following on windows
More details:
The text was updated successfully, but these errors were encountered: