Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GAIAPLAT-1752] INDEX: eliminate memory allocation when garbage collecting offsets #1124
[GAIAPLAT-1752] INDEX: eliminate memory allocation when garbage collecting offsets #1124
Changes from 9 commits
bbb27c2
4754812
14d54f3
6472777
0339b83
b202a83
49bc4ad
9e46012
a6f1cc3
94f2ee7
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused - doesn't
collected_offsets
contain offsets of different types? Why would we pass the full array to an index of a specific type?Perhaps a better approach is to build a map of types to offset arrays/vectors so that we can only pass relevant offsets in these calls. Building that map can be done with trivial changes to the code that fills the 2 arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, if the offset array is small enough it would be reasonable to just scan the whole array (assuming it contains offset/type pairs) and filter out offsets with inapplicable types. This would be really concise with C++20 ranges but still wouldn't be much code in a for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offset array is small limited to 32 records. Passing the type does not provide any benefit over straight checking against offset: if the type is correct, we still need to do another comparison, if it is wrong, we already did one comparison. Comparing straight against the offset is one comparison regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a fragile assumption. If you rely on the size being small, then that should be statically asserted somewhere, otherwise someone can easily edit the constant value and bump it to a much larger value than you expect to have to deal with in this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
32 is a somewhat arbitrary size set by me. I have not done any benchmarking and picked this value because it seems reasonable. I think it's fine if someone bumps it up (in fact my guess is the performance at 128 or 256 elements is acceptable). The important thing is that the array is bounded at some size.
The dangerous case for gc still remains on 'hot' indexes with many, many entries, and I have not addressed this danger. That case would require a different strategy altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add more comments to the constant declaration and, if possible, some static_asserts to document our expectations on the size of these arrays. Their length is declared in one place and the reliance on that length being small happens in a different place. It's very easy for someone to update the constant without being aware of the reliance on a small value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that the offset array is small is rather irrelevant given that the number of arrays you can batch here is not really bounded.
Furthermore, any comparison that you avoid once when you build your array will add overhead to the find operation executed for all indexes of a different type. The penalty of including offsets of wrong type gets multiplied by the number of indexes of different type.
I.e.: Let's say you collect 2 values each for 8 different types. You avoided 16 type comparisons during this collection. But now for each index, you'll perform a linear search through a 16 element array instead of performing a search through a 2 element array (or no search at all for indexes of different types). Your performance is 8 times worse in this example!
This is why I'd rather see an implementation done with no regard to memory allocation and then look to see if it can be optimized further to remove that memory allocation. As it is now, you're probably doing more operations doing linear array searches than you'd do allocating memory for some vectors. This is premature optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the important worst-case bound here is not on the size of a single array, but on the work done by a thread in a single GC pass. So far I've avoided bounding per-thread GC work in favor of approximating fairness by GC handoff on contention. But if there's no opportunity for contention detection and backoff in this case, we may need to design fairness explicitly into the protocol, which implies batching work across GC threads. I think I need to understand the possible worst-case scenarios better here before commenting further.
I'm not terribly worried about optimizing comparisons, given that basically any computation done in an already-resident cache line is free. Indirections from a mapping structure could easily swamp the overhead of unnecessary comparisons. But of course we don't want to scan thousands or millions of entries for a single entry of a given type. We could possibly optimize this without introducing expensive mappings, e.g. by sorting the (offset, type) pairs by type and binary searching for the first occurrence of a type. Seems like maybe this would be better discussed in a meeting, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to clarify, I'm not worried at all about linear searches of an array that fits into 1 or 2 cache lines. Those comparisons are effectively free (once you've taken the cache miss). Performance will definitely not be "8 times worse" for a 16-element array than a 2-element array. But I do think we could safely use much larger arrays if we sorted (offset, type) pairs by type and binary-searched for the first occurrence of a type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be any need to hardcode the array size if you just use std::vector.