Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update the commit function to work with transactions #1193

Merged

Conversation

westonpace
Copy link
Contributor

Previously the commit function did not use the new transactions feature. This meant that a commit running concurrently with another operation could lead to bad behavior. The commit operation was also limited mostly to overwrites.

Now commits can apply any kind of transaction operation.

@westonpace
Copy link
Contributor Author

Still a work in progress, only overwrite is exposed at the moment and I need to add more tests but this passes the existing tests and should give a sense of the overall idea.

rust/src/dataset.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far :)

python/python/lance/dataset.py Show resolved Hide resolved
@westonpace westonpace force-pushed the feature/GH-1091--transactions-in-commit-api branch 2 times, most recently from 0f1ecf3 to 2836ba4 Compare September 1, 2023 21:05
@westonpace westonpace marked this pull request as ready for review September 1, 2023 21:05
Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Have a few comments.

python/python/lance/dataset.py Outdated Show resolved Hide resolved
mode: str = "append",
operation: LanceOperation.Overwrite,
read_version: Optional[int] = None,
options: Optional[Dict[str, Any]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update the doc string below. We should describe which options this field takes in particular.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was copying what write_dataset did in the rs file but then didn't look to see what it did in the equivalent python file. I see the options dictionary isn't actually given to the user. In fact, it seems like the only piece of object store params that the user has control over is the commit lock. I went ahead and changed the API to just ask for the commit lock (similar to write_dataset) instead of asking for a dictionary.

I went ahead and updated the doc string for all parameters.

python/python/lance/dataset.py Outdated Show resolved Hide resolved
python/python/lance/dataset.py Outdated Show resolved Hide resolved
python/python/tests/test_dataset.py Outdated Show resolved Hide resolved
Comment on lines +784 to +804
old_fragments: Iterable[FragmentMetadata]
new_fragments: Iterable[FragmentMetadata]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this is changing in #1095. If we merge this first, let's just remove this operation from public API for now.

python/src/dataset.rs Outdated Show resolved Hide resolved
python/src/dataset.rs Outdated Show resolved Hide resolved
#[pymethods]
impl Operation {
fn __repr__(&self) -> String {
"".to_string()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps:

Suggested change
"".to_string()
format!("{}", self.0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to use {:?} since the operations don't implement display. This might be overly verbose for a python repr but this is all internal anyways so I don't expect it to be used for much beyond debugging.

@westonpace westonpace force-pushed the feature/GH-1091--transactions-in-commit-api branch from 2836ba4 to 2c6ee3d Compare September 8, 2023 00:03
@westonpace westonpace force-pushed the feature/GH-1091--transactions-in-commit-api branch from 2c6ee3d to 59b96bc Compare September 8, 2023 15:27
@westonpace westonpace requested a review from wjones127 September 8, 2023 15:28
Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one more request and then I think this is ready to go :)

python/python/lance/dataset.py Outdated Show resolved Hide resolved
@wjones127 wjones127 merged commit 8c46445 into lancedb:main Sep 8, 2023
@wjones127 wjones127 mentioned this pull request Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants