-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rust): serialize MetricDetails from compaction runs to a string #2317
fix(rust): serialize MetricDetails from compaction runs to a string #2317
Conversation
@liamphmurphy you could add a python test to test that DESCRIBE HISTORY now works in pyspark on a delta-rs compacted delta table |
Good idea 👍 , just need to figure out why the spark tests aren't working locally. |
@liamphmurphy are you adding the markers in pytest? |
Was a numpy error, had to reinstall it manually through pip |
ff0e988
to
20c24f5
Compare
@ion-elgreco Pushed up some changes for the lint / cargo fmt errors. The pyspark integration tests pass locally after importing the DeltaTable so hopefully all good there. The last big one seems to be the benchmark showing a regression, will be slow to respond until the week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @liamphmurphy!
In ideal world delta-spark would just be able to parse any commit info value to string, but this also works :D
d13437b
to
557cf87
Compare
I am by no means a Rust developer and haven't touched it in years; so please let me know if there's a better way to go about this. The Rust z_order and optimize.compact already serializes the metrics before it is passed back to Python, which then deserializes it back, so the Python behavior in terms of expecting this as a Dict has not changed which I think is what we want.
Description
Adds a custom serialzer and Display implementation for the
MetricDetails
fields, namelyfilesAdded
andfilesRemoved
so that those fields are written as strings instead of a struct to the commit log. Query engines expect these fields to be strings on reads.I had trouble getting the pyspark tests running locally, but here is an example optimize commit log that gets written with these changes:
Related Issue(s)
Documentation
N/A