-
-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iteration over continuous intervals of components #462
Comments
This is possible with groups. entt::registry reg;
auto group = reg.group<position, velocity>();
const std::size_t size = group.size();
position *pos = group.raw<position>();
velocity *vel = group.raw<velocity>();
// `pos` and `vel` are parallel arrays with `size` elements For more information about the inner workings of groups, see this article. |
Thanks, I have already familiarized myself with the groups. However groups impose strict implicit requirements on what you can and can't do with them, such as a component can't belong to 2 non-nested groups at once. And that is a fair price to pay in order to get perfect SoA. However, I do NOT need a perfect SoA, so I don't want to pay for it, but I want the next best thing: erm, imperfect SoA(?) for free. In the perfect scenario it would return the entire raw arrays, in the worst scenario it would return chunks of size 1, but that's fine. This is also similar to the type of access flecs provides (although I don't necessarily like their implementation): flecs::system<Position, Velocity>(world)
.action([](const flecs::rows& rows,
flecs::column<Position> p,
flecs::column<Velocity> v)
{
for (auto row : rows) {
p[row].x += v[row].x;
p[row].y += v[row].y;
std::cout << "Moved " << rows.entity(row).name() << " to {" <<
p[row].x << ", " << p[row].y << "}" << std::endl;
}
}); |
That's because of how archetypes work. You pay the cost of having all things grouped somehow, no matter what. The result is that... well, you've things grouped somehow (and possibly fragmented on the other side).
Can you provide more details on this? I'm a bit busy today but I'll do my best to look into it. |
Well, imagine we don't create any groups at all (because we don't want to pay for them). We just iterate over a view: registry.view<position, velocity>().each([dt](auto &pos, auto &vel){
pos += vel * dt;
}); during the iteration there can be some cases where two or more consecutive calls to this lambda will actually point to pos and vel components that are adjacent to each other in memory. What I'm asking is to group those continuous calls into 1 whenever possible: [dt](auto *pos, auto *vel, size_t count) where (auto *pos, auto *vel, size_t count) is a SoA representation of a chunk of components that just happened to be continuous in memory. Worst case this chunk will be length of 1 which will be identical to what we currently have in the view API and this lambda will have to be called as many times as there are such pairs (say, N). But best case scenario (if all components that have position also happen to have velocity) this will call this lambda just once where (auto *pos, auto *vel) will be arrays of size N storing all the needed components in a perfect SoA fashion. For free! Yay! This provides a low-effort way to get some benefits of perfect SoA without paying any cost or imposing any restrictions, just by providing an alternative way to iterate over groups/views that bundles continuous calls together. |
Oh... I must admit that your reasoning surprised me in a way. Actually, by construction, you could spot a lot of these chunks here and there on some types (mainly low level ones, I don't think so for gameplay stuff). Since you seem to know what you're talking about, I take this opportunity to ask you a question. |
The way I imagine it could be done without having any apriori knowledge of how entt works under the hood would be kinda like this (naive pseudocode as I actually have 0 knowledge of how entt works lol): template<typename T1, typename T2, typename Func>
void view::each(Func func)
{
auto chunk_start = begin<T1, T2>();
auto prev = begin<T1, T2>();
size_t chunk_size = 1;
while(auto it = begin<T1, T2>() + 1; it != end<T1, T2>(); it++)
{
//checking if previous call is contiguous in memory with the current one
//this could be implemented way better if the iterator itself keeps track on whether it's continous or not on each component
if(&prev.get<0>() + 1 == &it.get<0>() && &prev.get<0>() + 1 == &it.get<0>())
chunk_size++;
else
{
func(&chunk_start.get<0>(), &chunk_start.get<1>(), chunk_size);
chunk_size = 1;
chunk_start = it;
}
prev = it;
}
} Yes, you do have to pay the price of iterating over a range twice (here and in the user function) as you said. But the benefit of manual SIMD optimization in the user function can way offset this cost. But you're certainly giving me way too much credit for 0 reason as I have never designed a production ECS myself. I was just surprised to find out it does not have something as obvious as detecting continuous chunks that are already there. |
If you care about performance, (since you’re using SIMD I’m sure you do) then you ought to know how EnTT works. For that, I suggest reading the ECS Back and Forth articles. In particular, this one talks about sparse sets and this one talks about groups. If you want to write fast code with EnTT then knowing the implementation details will help. |
Yes, I have already read those after creating this thread. Honestly I did not really like black-boxey feeling that the official documentation left. The implementation details can be arbitrarily complex, but I think at least mentioning sparse sets, they way views and groups are iterated should remove a lot of confusion from the documentation, without overloading it with too much information. |
Actually there is a PR that I've still to merge and the aim of which is to provide an insight on the internals of the library. |
After looking up how entt works internally, it's obvious that we can replace this: if(&prev.get<0>() + 1 == &it.get<0>() && &prev.get<0>() + 1 == &it.get<0>()) with a better implementation that does not have to do an indirection into component array memory. It can be done like this for example: if(it.prev_entity<0>() == prev.entity() && it.prev_entity<1>() == prev.entity()) This thing just checks that the previous entity in all component arrays is the same as the last processed entity. If it is, means that the chunk is continuous. This is done just by comparing entity indices stored in those dense arrays without reading component pointer's address. It still means that look-ahead is necessary, but it can be done very effectively by only accessing dense entity arrays. |
I've looked into this one. Unfortunately, we are missing something important here. That said, what you mention could be used to provide a chunked access to partial owning groups. Any thoughts? |
You can look ahead, but you do have to check all the components to make sure all of them are present and continuous. Both of those checks can be accomplished by one read from the sparse array per component. In fact, you already have to read that array to make sure that the component is present, you can use the same value to tell if it's also continuous. Once you find an entity that does not have one of its components OR one of its components is not adjacent with the previous one (you get all this from just the value you read from the sparse array), you have to call the user function and use get() to actually read where this chunk starts. PS this whole thing is NOT designed to make iteration over entities faster. Furthermore, calculating chunks does take some additional logic. That being said, benefit comes from the user's lambda that can process things in bulk more effectively than it processes things one by one and the difference can be pretty significant with SIMD. |
I made a test during the last days. The result is that iterating elements this way takes more or less three times the time of an |
Nvm. I found a way to make it faster then what I expected and even better of what we discussed here in terms of functionalities. |
Since the chunked function is removed, is there an alternative way to achieve that? |
Mmm yes and no, I'd say that it's possible with a custom pool but I would discourage it honestly. |
Currently, I feel the multi-component view or group is slow when there are many entities compared to perfect SoA solutions. |
Well, technically speaking, full owning groups on fully contiguous storages are the only thing that strictly adheres to perfect SoA requirements/definition. Are you experiencing an actual problem at the moment? |
Yes, I don't know why but I found if I have 1000 entities with 5-6 components, and I frequently use 3 or 4 components in a system, the speed is much slower than OOP design or perfect SoA. |
It's definitely unlikely that 1000 entities and 5/6 components can cause problems. |
looks like erasing the type of a component with poly prevents SIMD optimization. I think maybe it frequently accesses the virtual table. Anyway, thank you for your fast response. |
Mmm... Not sure I get what you mean. |
I wrapped my component into a poly object. And I make the poly object own the original component and I think it is not good to hold a reference of a component in a pool. |
Consider a simple bit of code that does multi-component iteration:
Now I want to make it SIMD. It has to be done in chunks that are known to be continuous in memory. For example, this can be done via an API like this:
Is it possible to do something like this already? If no, how complicated would it be to implement?
The text was updated successfully, but these errors were encountered: