-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster union/intersection implementation for sets? #83
Comments
I'll work on a PR if you find this idea useful |
It would be great. |
So, #86 is the first version of the pull request. https://docs.google.com/spreadsheets/d/1NaMgasBhqyThazU-_TVWAK2fCcnztAQ9WCRxr9lNfUw/edit?usp=sharing These look promising, but there currently are a few pitfalls:
Feel free to check out and destroy the PR =) |
Some comments on the benchmarks: I tested two scenarios, typical and worst-case where typical is uniting two non-intersecting sets/maps and worst case is two completely identical, but different, sets. I didn't try to construct a benchmark for best-case scenario, (say, two sets with all hashes being different in their first bits) because it has constant complexity, but seems very unrealistic to me. |
This would be unfortunate, because there are use cases when a reference equality check is used after an operation to find out whether it has actually changed content (instead of boolean flag that is returned from mutable collection operations). |
Well, it can be done either by checking array element reference-equality manually after the actual logic, or by doing it on-the-fly using two additional masks. Either way we throw away the array we allocated, but there's no way avoiding it, I guess. |
Ok, so I implemented the array element checking, did some benchmarks and it does not seem to introduce much overhead (or any at all, for that matter). |
The asked optimisation was implemented by @belyaev-mikhail and merged to |
Well, not really) I am still investigating the intersection and other bulk ops and will make it a separate PR, so this is not the end) |
But maybe opening a separate issue for that will make more sense, too |
Current union (
addAll
) implementation for sets is O(logN * M) as far as I understand even when both sides have the same implementation.It is certainly possible to improve that to log or even sub-log implementation for HAMTs by merging one-bit arrays only.
It is also possible to improve intersection, although atm you don't even have a method for intersection in the API.
Not sure if it helps maps in any way, but I guess it does.
While intersection is questionable, union is a very important and widely used operation (especially so when
+
is overloaded as union) for persistent sets and implementing this doesn't even need changing the api.The text was updated successfully, but these errors were encountered: