-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate parallel design for filter processing #411
Comments
This is super interesting, and I'm also finding the idea of applying ECS here a quite fascinating use case. One thing I've never been 100% clear on:
What benefit are we aiming at getting with parallel processing of Filters over a byte packet? Are we expecting higher throughput through this model? Something else? My original theory was that if you have multiple packets flowing through the proxy simultaneously, there's no net performance benefit to being able to Filter in parallel, as the system is going to be using every core to run packets in parallel (as opposed to Filters), so there was no real performance benefit to be had, at least in that aspect -- but I could well be missing something here, or the aim of this design is to optimise on something else (it does seem to me like it would reduce mutex locking?). |
Well we get parallelism mostly for free with all ECS frameworks, we’ll of course have to measure for performance, but in the abstract running in parallel should reduce work for larger filter chains by allowing earlier termination of the chain. Imagine a filter chain of a compress filter followed by a rate limit filter. With parallelism, the rate limit filter can flag the packet for deletion while the compress filter is still running. Obviously you wouldn’t want to put those two filters in that order, but it’s an example of different filters in the chain having different amounts work. So while it might only provide minimal to no performance benefits for an already optimised configuration, it provides a more intuitive user experience and better performance for naive configurations because there will be no performance difference for having different amounts work in the filter chains in different orders e.g. (compress, rate-limit) and (rate-limit, compress). |
Aaah yes. That makes a lot of sense 👍🏻 Awesome, that clarifies things for me nicely. Thanks! We should probably write some of our own benchmarks (but from review, it does sound like Sparse Storage in Bevy ECS seems like a winner - and also having pluggable storage seems like something that we could tweak over time as well if we have specific needs of our own based on our use cases). Bevy's documentation isn't as nice as legions' but the docs.rs aren't too bad that we can work it out. One thing I'm trying to work out when digging into Bevy's ECS is how to manually run the series of systems over entities and components? Also, for my own edification, how would the flow work within Quilkin? A traditional ECS would execute on each frame tick, and then match that tick to systems depending on how often they should run -- but we don't have a frame tick, it feels like to me like each system should run as a packet is received (maybe???) ? I'm not quite sure how that would work. What are your thoughts there? |
There’s more documentation on the master branch (a bunch was merged in recently). I’ve been talking with some of the people involved in making each of the frameworks, and right now I think we should be looking at
Using shipyard’s API you would call |
Also worth mentioning as some prior art is VPP, which @gilesheron brought up in another thread. Just from reading the documentation, it's essentially what we'd be doing here AFAICT. |
Since we also have people that aren't familiar with ECS, this is one of the article series I first learned about Entity Component Systems Googling shows me there are lots more content out there that explain the concept back when I first started reading about the pattern 😄 This seems a bit more recent: https://ianjk.com/ecs-in-rust/ , and is likely easier to follow. |
I saw that 👍🏻 Shipyard's documentation is really good 👍🏻 (I could never find the actual function on becy-ecs to run the series of systems).
Sounds like something we should test and compare. But sounds like a definite path through. The other thing I'm thinking through - how do you see us splitting up Worlds? I don't think we can do one big World instance - maybe we would do one (or several workers Worlds) for the local port, and then a World per Session? 🤔 This seems like something we might want to diagram up a bit before diving into code. |
FWIW I'm still mostly using this diagram from #339 as the my mental model.
Well other than for logical complexity I do think we could do one world because with sparse storage, a mutable query on |
That makes sense. I was also debating having 1 world, with a component that indicated if something was This also leans me towards tick based model, so that Filter operations can be batched in a loop, and aren't an invocation for every packet. The only other things I still can't quite work out is that a world can only have one 🤔 does it make sense to have multiple worlds and distribute packets among them to also run each one concurrently? Just thinking about if someone has less filters than the number of cores on a machine, seems like we might be leaving some processing power behind? WDYT? |
Are you sure about that? Either way though, I think the important step in having it be concurrent and parallel is separating the "processing" of the packets from packets entering and exiting. For example, we could have a enum Status {
Fresh,
Processing,
Drop,
Pass,
} |
Ooh, that's a good point, and easy to test 👍🏻 SGTM. |
Oh to be clear, this looks good to me, and the plan for implementation looks good to me too. The only outstanding question is one of performance comparison, which we can test once we have the initial implementation in place. @iffyio I know you said you weren't as familiar with ECS, did you have any outstanding thoughts/questions/etc? |
SGTM! nothing to add atm |
We've been discussing over the past couple of months of moving Quilkin's internal architecture from our current iterative model (One context is created per packet, and we run the context through the filter chains until we have the result). To a parallel model, where filters run through all available packets at once, with apparent order (meaning if
&mut X
is needed by both the first and second filter in the chain, the the system will give access to the first filter, then the second, etc) when components are shared.I initially proposed going in actor model, however I now think that the best approach to make this parallel is use an "Entity Component System" (ECS) to represent the processing pipeline. There are number of benefits that are specific to ECS that I think we should consider that the main proposal.
Benefits
ReadContext
andWriteContext
structs. Instead we'd haveRead
andWrite
"worlds" which contain the context as components associated which entity which is currently "in-flight".Arc<RwLock<T>>
, each filter would declare a query of what component types they want access to with mutability rules, and the ECS framework will resolve the order for us. This also resolves Possible refactor: Make Packet Filtering + Content Routing separate traits #104 without needing a new trait, as filters who's queries don't conflict will just run in parallel.SO_REUSEPORT
task. #410 since the entire system can run in parallel with the listen distributor task removed.dynamic_metadata
abstraction (closing ReplaceReadContext.metadata
withtypemap
. #305), as filters could insert new types as components, and other filters could query for those filter specific types, rather than needing to a lookup and cast fromdyn Any
.Drawbacks
Filter
trait, which will subsequently require updating the existing filter implementations, as well as some of the proxy server's logic.Implementation Plan
If we go with this architecture, I believe there is an approach to implementing this that will help offset some of the drawbacks mentioned.
quilkin test
in their own PR.ecs
branch, with the initial changes that works for a single filter.ecs
branch.ecs
branch, we'd merge that intomain
.Unresolved Questions
bevy-ecs
,legion
, andshipyard
. In my current research it seems like Bevy's ECS with sparse set storage is the fastest for the type workload we want, where we want to create, process, and destroy entities as soon as possible (see [Merged by Bors] - Bevy ECS V2 bevyengine/bevy#1525 for some benchmark comparisons)The text was updated successfully, but these errors were encountered: