-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization and cleanup for bit arrays. #111
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Before, if you called `bag.ensureCapacity(bag.capacity())`, it would resize and create a new backing `values` Array every time, even though it would seem like the current capacity has already been ensured. There's an expansion to one test that catches this behavior in the old code, but doesn't find it in the PRed code.
These range from things that might not matter at all (because the compiler uses them behind-the-scenes), to taking advantage of non-obvious but specified behavior of bitwise operators. The former includes using `ushr 6` instead of `/ 64`, which does have an advantage in that negative bit indices will be treated as unsigned int indices, which still fit inside a large-ish `Array` (it needs 4GB of RAM to store 2 to the 26 `Long` items, though). The latter includes removing manual modulus on shift amounts, since according to the Kotlin docs on shl and ushr, among others, shift amounts applied to `Long`s are implicitly masked to only use the bottom 6 bits (when applied to `Int`s, they use 5 bits).
The `countLeadingZeroBits()` and `countOneBits()` functions each are marked as `@IntrinsicCandidate` on the JVM, because they compile down to a much faster-than-normal piece of native code (usually one instruction, with a name like CLZ or POPCNT). Using intrinsics wherever possible in bit-twiddling code can yield really serious performance gains if the code before used many instructions and it can be replaced by one. I believe `countOneBits()` had its performance improve on a very recent Java version, but before that some code needed to avoid it because it didn't play nicely with some loop optimizations (though in this case, if you wanted to avoid it, there are better options than looping over 64 bits to count the 1s).
This resize code could be wrong for this application, but I'm just normally used to seeing sets and arrays double in capacity when they run out, to avoid frequently creating a new backing array that's just a hair larger.
These are... pretty crazy pieces of code. They make heavy use of a pattern where they look into an item from the values (just a Long, so this is by value), find the highest bit index, do something with it, and then remove that highest bit from the Long value we have locally. Doing this means that if only one bit is set in the Long we look at, we only have to iterate once through that Long. Two bits set means two iterations, and so on.
Quillraven
reviewed
Aug 9, 2023
src/commonMain/kotlin/com/github/quillraven/fleks/collection/bitArray.kt
Outdated
Show resolved
Hide resolved
Very cool stuff, thank you! I thought I removed the redundant Just fyi, I also get the heap memory issue in the Artemis benchmark from time to time. Normally a simple restart of the benchmark solves it. |
Referring to "twice as many Long items as before" wasn't a reference to earlier code, but the changing state of bits -- it's being resized here.
This seems to make the tests pass again, but we'll see.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is organized logically by commit, and each commit has a (sometimes long-winded) comment. Long story short, addRemove() in the benchmarks goes from 1327 ops/second to 2161 ops/second. The number seems large because I tested with 1000 entities (instead of 10,000) and 5 world updates (instead of 1000) so that I could get meaningful results with short-ish iteration times. That mattered because the benchmark seems to have problems running out of heap if it spends too long on one iteration, though I don't think I encountered this with Fleks -- it was an issue with the Artemis benchmark.
There's lots of bitwise magic here. I'm more than happy to explain any part that doesn't make sense. Bitwise operations are usually quite fast compared to anything that deals with objects, so I'm always happy to see them being put to productive use. The main thing I tried to do was to strip down any of the places where indices 0-63 were looped over in each Long, since that takes more time than necessary in a sparse Long (one with few bits set). There's a new clearEnsuringCapacity() method in MutableEntityBag; it is functionally just like:
But avoids copying the empty array without the need to. This is something that some of libGDX's collections also have.
Let me know what I can do to help!