Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_cdf errors out on starting-timestamp only request #3097

Closed
sor-droneup opened this issue Jan 3, 2025 · 6 comments · Fixed by #3110
Closed

load_cdf errors out on starting-timestamp only request #3097

sor-droneup opened this issue Jan 3, 2025 · 6 comments · Fixed by #3110
Assignees
Labels
binding/rust Issues for the Rust crate bug Something isn't working

Comments

@sor-droneup
Copy link

Environment

Delta-rs version:
pyrhon-0.23.1

Binding:

Environment:

  • Cloud provider:
  • OS:
  • Other:

Bug

DeltaRS load_cdf starting_timestamp still requires a starting version values passed into the method.

What happened:

I'm running following code:

    timestamp = "2025-01-02T00:00:00Z"
    table = DeltaTable(path)
    cdf = table.load_cdf(starting_timestamp=timestamp).read_all()

This code errors out with following message:

self = DeltaTable(), starting_version = 0, ending_version = None
starting_timestamp = '2025-01-02T00:00:00Z', ending_timestamp = None
columns = None, allow_out_of_range = True

    def load_cdf(
        self,
        starting_version: int = 0,
        ending_version: Optional[int] = None,
        starting_timestamp: Optional[str] = None,
        ending_timestamp: Optional[str] = None,
        columns: Optional[List[str]] = None,
        allow_out_of_range: bool = False,
    ) -> pyarrow.RecordBatchReader:
>       return self._table.load_cdf(
            columns=columns,
            starting_version=starting_version,
            ending_version=ending_version,
            starting_timestamp=starting_timestamp,
            ending_timestamp=ending_timestamp,
            allow_out_of_range=allow_out_of_range,
        )
E       _internal.DeltaError: Invalid table version: 0

What you expected to happen:

The table should return the rows from given timestamp.

How to reproduce it:

More details:

My guess is that starting version is still being evaluated even if it was not passed into the method. It defaults to 0, so if the table being used is vaccumed (like in this example) and the 0 version is missing, the starting_timestamp only cdf errors out

@sor-droneup sor-droneup added the bug Something isn't working label Jan 3, 2025
@ion-elgreco
Copy link
Collaborator

@hntd187 can you look into this?

This line should not throw an error on vacuumed tables because the logs it's looking for don't exist anymore:

https://github.com/delta-io/delta-rs/blob/main/crates%2Fcore%2Fsrc%2Foperations%2Fload_cdf.rs#L188-L193

@hntd187
Copy link
Collaborator

hntd187 commented Jan 3, 2025

@ion-elgreco yeah lemme see if I can sort this out

@rtyler rtyler added the binding/rust Issues for the Rust crate label Jan 5, 2025
@hntd187 hntd187 self-assigned this Jan 5, 2025
@hntd187
Copy link
Collaborator

hntd187 commented Jan 5, 2025

I remember what this was now, starting version is always required in delta-spark. https://github.com/delta-io/delta/blob/master/spark/src/main/scala/org/apache/spark/sql/delta/commands/cdc/CDCReader.scala#L285 so I think you shouldn't be able to just provide a timestamp and nothing else.

Spoke too soon, it's starting version OR starting timestamp, lemme look further.

@ww917352
Copy link

ww917352 commented Jan 8, 2025

We had been waiting for allow_out_of_range in the new version. Now it is here, it still throws invalid table version error. Isn't this option added just to solve this issue of fetching change data feed from vacuumed data?

Thanks for looking into this issue.

@hntd187
Copy link
Collaborator

hntd187 commented Jan 8, 2025

Yes and no @ww917352, out of range should in theory fix it, but I mistakenly would always start processing whether versions were valid from 0, which a vacuumed or checkpointed table may not have version 0 anymore. So this is just a bug in my original implementation. I have a fix ready to go I've just gotten side tracked with other things for a few days, I should be getting a PR up later today.

@ww917352
Copy link

ww917352 commented Jan 9, 2025

@hntd187 Thank you so much for the confirmation. May I ask when the bug fix will be available? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants