Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow multiple Python threads to work with a single DeltaTable instance #3101

Merged
merged 1 commit into from
Jan 4, 2025

Conversation

rtyler
Copy link
Member

@rtyler rtyler commented Jan 4, 2025

This change introduces an internal Mutex inside of RawDeltaTable which allows the PyO3 bindings to share the Python object between threads at the Python layer.

PyO3 will raise a RuntimeError: Already borrowed for any function call which takes a mutable reference to self. Introducing the internal Mutex ensures that all function signatures can operate with just self-references safely.

The Rust-level Mutex is a simple passthrough for most operations which do not need to modify the underlying state. The critical sections which typically need to acquire and mutate with a lock are after I/O bound operations are completed as far as I can tell, so I don't anticipate deadlock or performance issues.

There is still some cleanup of errors that needs to happen to make the code here more ergonomic when blending DeltaError with PoisonError from the lock, as such right now there's a lot of ugly error mapping.

Fixes #2958

@rtyler rtyler added the binding/rust Issues for the Rust crate label Jan 4, 2025
@rtyler rtyler added this to the v0.23 milestone Jan 4, 2025
@github-actions github-actions bot added binding/python Issues for the Python package and removed binding/rust Issues for the Rust crate labels Jan 4, 2025
@rtyler rtyler added the enhancement New feature or request label Jan 4, 2025
Copy link

codecov bot commented Jan 4, 2025

Codecov Report

Attention: Patch coverage is 0% with 349 lines in your changes missing coverage. Please review.

Project coverage is 72.38%. Comparing base (6430151) to head (9848ffb).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
python/src/lib.rs 0.00% 339 Missing ⚠️
python/src/error.rs 0.00% 4 Missing ⚠️
python/src/query.rs 0.00% 4 Missing ⚠️
python/src/filesystem.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3101      +/-   ##
==========================================
- Coverage   72.52%   72.38%   -0.15%     
==========================================
  Files         128      128              
  Lines       41201    41296      +95     
  Branches    41201    41296      +95     
==========================================
+ Hits        29882    29892      +10     
- Misses       9408     9499      +91     
+ Partials     1911     1905       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rtyler rtyler force-pushed the feature/multithreaded-writer-2958 branch from 3acb36f to 3804d51 Compare January 4, 2025 21:29
@rtyler rtyler marked this pull request as ready for review January 4, 2025 22:09
@rtyler rtyler enabled auto-merge January 4, 2025 22:46
Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good!

just some very minor cleanup that you may want to consider?

Err(e) => Err(e),
}
.expect("SHOOT!");
// TODO: XXX
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

artifact?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d'oh! good catch, I did miss some of this junk in my diff review before poooshing

.map_err(PythonError::from)
.map_err(PyErr::from)
})?;
//.map_err(|_| DeltaProtocolError::new_err("table does not yet have a schema"))?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this?

…instance

This change introduces an internal Mutex inside of RawDeltaTable which
allows the PyO3 bindings to share the Python object between threads at
the Python layer.

PyO3 will raise a `RuntimeError: Already borrowed` for any
function call which takes a mutable reference to `self`. Introducing the
internal Mutex ensures that all function signatures can operate with
just self-references safely.

The Rust-level Mutex is a simple passthrough for most operations which
do not need to modify the underlying state. The critical sections which
typically need to acquire and mutate with a lock are after I/O bound
operations are completed as far as I can tell, so I don't anticipate
deadlock or performance issues.

There is still some cleanup of errors that needs to happen to make the
code here more ergonomic when blending DeltaError with PoisonError from
the lock, as such right now there's a lot of ugly error mapping.

Fixes delta-io#2958

Signed-off-by: R. Tyler Croy <[email protected]>
Sponsored-by: Neuralink Corp.
@rtyler rtyler force-pushed the feature/multithreaded-writer-2958 branch from 3804d51 to 9848ffb Compare January 4, 2025 23:13
Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@rtyler rtyler added this pull request to the merge queue Jan 4, 2025
Merged via the queue into delta-io:main with commit a70559e Jan 4, 2025
24 checks passed
@rtyler rtyler deleted the feature/multithreaded-writer-2958 branch January 4, 2025 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python DeltaTable does not support writes in multiple threads
2 participants