-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update the commit function to work with transactions #1193
feat: update the commit function to work with transactions #1193
Conversation
Still a work in progress, only overwrite is exposed at the moment and I need to add more tests but this passes the existing tests and should give a sense of the overall idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far :)
0f1ecf3
to
2836ba4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Have a few comments.
python/python/lance/dataset.py
Outdated
mode: str = "append", | ||
operation: LanceOperation.Overwrite, | ||
read_version: Optional[int] = None, | ||
options: Optional[Dict[str, Any]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update the doc string below. We should describe which options this field takes in particular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was copying what write_dataset did in the rs file but then didn't look to see what it did in the equivalent python file. I see the options dictionary isn't actually given to the user. In fact, it seems like the only piece of object store params that the user has control over is the commit lock. I went ahead and changed the API to just ask for the commit lock (similar to write_dataset) instead of asking for a dictionary.
I went ahead and updated the doc string for all parameters.
old_fragments: Iterable[FragmentMetadata] | ||
new_fragments: Iterable[FragmentMetadata] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this is changing in #1095. If we merge this first, let's just remove this operation from public API for now.
python/src/dataset.rs
Outdated
#[pymethods] | ||
impl Operation { | ||
fn __repr__(&self) -> String { | ||
"".to_string() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps:
"".to_string() | |
format!("{}", self.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to use {:?}
since the operations don't implement display. This might be overly verbose for a python repr but this is all internal anyways so I don't expect it to be used for much beyond debugging.
2836ba4
to
2c6ee3d
Compare
2c6ee3d
to
59b96bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one more request and then I think this is ready to go :)
Co-authored-by: Will Jones <[email protected]>
Previously the commit function did not use the new transactions feature. This meant that a commit running concurrently with another operation could lead to bad behavior. The commit operation was also limited mostly to overwrites.
Now commits can apply any kind of transaction operation.