-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load_cdf errors out on starting-timestamp only request #3097
Comments
@hntd187 can you look into this? This line should not throw an error on vacuumed tables because the logs it's looking for don't exist anymore: |
@ion-elgreco yeah lemme see if I can sort this out |
I remember what this was now, starting version is always required in delta-spark. https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/commands/cdc/CDCReader.scala#L285 so I think you shouldn't be able to just provide a timestamp and nothing else. Spoke too soon, it's starting version OR starting timestamp, lemme look further. |
We had been waiting for Thanks for looking into this issue. |
Yes and no @ww917352, out of range should in theory fix it, but I mistakenly would always start processing whether versions were valid from 0, which a vacuumed or checkpointed table may not have version 0 anymore. So this is just a bug in my original implementation. I have a fix ready to go I've just gotten side tracked with other things for a few days, I should be getting a PR up later today. |
@hntd187 Thank you so much for the confirmation. May I ask when the bug fix will be available? Thank you! |
Environment
Delta-rs version:
pyrhon-0.23.1
Binding:
Environment:
Bug
DeltaRS load_cdf starting_timestamp still requires a starting version values passed into the method.
What happened:
I'm running following code:
This code errors out with following message:
What you expected to happen:
The table should return the rows from given timestamp.
How to reproduce it:
More details:
My guess is that starting version is still being evaluated even if it was not passed into the method. It defaults to 0, so if the table being used is vaccumed (like in this example) and the 0 version is missing, the starting_timestamp only cdf errors out
The text was updated successfully, but these errors were encountered: