-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial thread-safety #82
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is good. I'm only not convinced that ignoring KeyErrors caused by thread unsafe operations, is a way to make it thread-safe. If there exists a condition when we care about a KeyError
other than when caused by threads, then it's not clear we would know how to determine its cause.
I made a prototype PR using threading.RLock
and based on some basic benchmarking the added test here test_getitem_isthreasafe
it seems to be negligible in performance impact, if just a smidgen faster but ya, basically the same:
# Ignoring KeyErrors
Benchmark 1: python zict/tests/test_lru.py
Time (mean ± σ): 1.251 s ± 0.054 s [User: 1.142 s, System: 0.148 s]
Range (min … max): 1.168 s … 1.380 s 25 runs
# Re-entrant locking
Benchmark 1: python zict/tests/test_lru.py
Time (mean ± σ): 1.214 s ± 0.121 s [User: 1.105 s, System: 0.160 s]
Range (min … max): 1.080 s … 1.531 s 25 runs
@milesgranger that's because I modified the test: def test_getitem_is_threasafe():
lru = LRU(100, {})
lru["x"] = 1
def f(_):
barrier.wait()
for _ in range(5_000_000):
assert lru["x"] == 1
n = cpu_count()
barrier = Barrier(n)
with ThreadPoolExecutor(n) as ex:
for _ in ex.map(f, range(n)):
pass On my 12-core Linux box, this PR takes 9.9s to run, while your version takes 18.9s. |
Sorry, my comment wasn't about either implementation being faster or not, but that the alternative, given a test which was assumed to be representative of the use case would be unaffected in terms of time while improving the thread-safety aspect. If the original test was not representative to how it would be used in practice, and discerning if a Apologies if I've misunderstood anything here. |
The original test was representative of how it would be used, functionally. |
Make the library partially thread-safe.
Crucially, this makes
distributed.Worker.data.fast.__getitem__()
thread-safe, which means thatWorker.execute
andget_data
will only need to offload to a thread if any of the keys are actually spilled.Actual thread-offloadling mechanics will be in later PRs (both here and in distributed).