-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dos and monitoring docs #160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @MarcoPolo for writing this up!
content/reference/dos-mitigation.md
Outdated
|
||
# General tactics that work across libp2p implementations | ||
|
||
## Limit the number of concurrent streams your protocol needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: I think a similar section on the number of connections would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this @MarcoPolo . It's great starting to get to see how this will be exposed to users.
I'm copying in some of the comments I had in https://www.notion.so/pl-strflt/Guide-for-how-to-respond-to-resource-exhaustion-attacks-b10f55cc9a3d4917ae80c9b914e05e8c
The information above is really helpful/good. Looking at this fresh, I think we need to paint a big picture. I view these a layering that someone should think about. I’m apt to order chronologically based on the lifecycle of a libp2p application from design, development, testing, and operating.
- What are the steps I can take to protect my application? Basically what are all the defenses I can have in place before my application is deployed?
- (Should be thought about early) Architect application well with limiting blast radius as described above
- Give clear signal of problems as highlighted above by using canonicallog
- Setup/tune resource management for each libp2p node. Here we can link to resource manager docs. We have to make sure this is user-friendly. There are followups in More documentation around limits go-libp2p-resource-manager#68 to handle.
- How do I know the health of my node? This is the monitoring doc.
- What are the steps I can take when being attacked? If the steps above were taken, an attack shouldn’t fully compromise my node but it may degrade performance. Followup actions may be needed.
- Ban offending IPs. It’s best to automate this as shown above using fail2ban.
- Maybe something about using the denylist?
Depending on your use case, it can help to limit the number of inbound | ||
connections. You can use go-libp2p's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we give guidance on when to use this mechanism vs. the resource manager? I see this is a good hook for custom logic, but it seems like what Prysm is doing could be covered by go-libp2p resource manager right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we give guidance on when to use this mechanism vs. the resource manager?
No, but I could add something here.
I see this is a good hook for custom logic, but it seems like what Prysm is doing could be covered by go-libp2p resource manager right?
Not really. If you're trying to avoid an adversary that can connect to you and give you a ton of work to do all at once the rcmgr doesn't protect at all. This attack can easily be mitigated by rate limiting though.
Not all applications will want this rate limiting, or they may want to rate limit certain things (e.g. something in the protocol rather than in the connections). For example, if I'm Google I wouldn't want to rate limit any new connection to me. I would rather rate limit work per connection.
Should the rcmgr do this? I don't think so. It's not directly related to limiting the resources being used, and if it can be handled by a smaller component that already exists the better.
I hope that makes sense, but happy to expand more as well.
Co-authored-by: Steve Loeppky <[email protected]>
Co-authored-by: Steve Loeppky <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MarcoPolo for all the work here. I'm game to take one last look once comments are incorporated.
content/reference/dos-mitigation.md
Outdated
Using a stream for a short period of time and then closing it is fine. It's | ||
really the number of _concurrent_ streams that you need to be careful of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know more than me here. Would it be useful to use Identify as an example?
content/reference/dos-mitigation.md
Outdated
own resource usage. So limiting connections can have a leveraged effect on your | ||
resource usage. | ||
|
||
In go-libp2p the number of active connections is managed by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From reading through this doc I don't think it's clear for a user on when to use the connmgr or the resource manager for go-libp2p.
content/reference/dos-mitigation.md
Outdated
`ConnManager` will trim connections when you hit the high watermark number of | ||
connections. You can protect certain connections with the `.Protect` method. | ||
|
||
In rust-libp2p handlers should implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Rust, I think we should draw more from https://www.notion.so/pl-strflt/rust-libp2p-2a62a76b60c54bd69aea2aa3760d6efe . At the minimum we should get a link to https://docs.rs/libp2p/latest/libp2p/swarm/struct.ConnectionLimits.html .
content/reference/dos-mitigation.md
Outdated
Using a stream for a short period of time and then closing it is fine. It's | ||
really the number of _concurrent_ streams that you need to be careful of. | ||
|
||
## Limit the number of connections your application needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given connections happen before streams in an application's lifecycle, maybe move this above the section above?
|
||
Here are some more specific recommendations | ||
|
||
## Limit the number of concurrent streams per connection your protocol needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give pointers on how to do this?
For go-libp2p this means using resource manager right?
For rust, this isn't at the connection level, but somewhere I think we should be linking to https://docs.rs/libp2p/latest/libp2p/swarm/struct.SwarmBuilder.html#method.max_negotiating_inbound_streams . Maybe we say, "rust-libp2p relies on each protocol to limit the number of streams per connection in XXX. A global upperbound on negotiating/transient inbound streams can be set using https://docs.rs/libp2p/latest/libp2p/swarm/struct.SwarmBuilder.html#method.max_negotiating_inbound_streams."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should be more clear.
Here I'm talking about limiting the number of concurrent streams you need by design of the protocol, as opposed to using an existing protocol and trying to limit the streams at the end. For example imagine a RPC style protocol whose procedures are async and often take a long time to return (say > 1min). Here are two ways you could implement it:
- Open a stream for each RPC call, and keep that stream open until the rpc call returns.
- Open a stream for the start of the call then close it. The remote side will open a new stream with the answer.
Assume you make a lot of concurrent calls, method 1 would result in a large number of concurrent and mostly inactive streams. Method 2 would result in a fewer number of concurrent streams, and thus lower memory footprint.
If you add a limit here of say 10 streams, then method 1 will mean you can only have 10 concurrent RPC calls, while method 2 would let you have a much larger number of concurrent RPC calls.
Does that make sense? I should rephrase this to focus on the fact this is about protocol design (the inception stage of a p2p application) not about the deployed stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it - makes sense.
Side: lets find/create a place to point to https://docs.rs/libp2p/latest/libp2p/swarm/struct.SwarmBuilder.html#method.max_negotiating_inbound_streams . Maybe there's a section about transient/negotiating connections and resources and that those should guarded against too. Go and Rust both have some protections here.
@BigLep Thanks for the review! I appreciate the help in getting us some good docs. I believe I addressed all the comments. Another review would be very much appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @MarcoPolo - good stuff. A few last comments but feel free to ship once incorporated. Good times!
content/reference/dos-mitigation.md
Outdated
The `ConnManager` will trim connections when you hit the high watermark number of | ||
connections, and try to keep the number of connections above the low watermark. | ||
You can protect certain connections with the | ||
[`.Protect`](https://github.com/libp2p/rust-libp2p/blob/ea487aebfe6eb672b05d2bec2d9d79bbd92450ba/protocols/kad/src/handler.rs#L562) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rust-libp2p link.
content/reference/dos-mitigation.md
Outdated
2. Open a stream for the start of the call then close it. The remote side will | ||
open a new stream with the response. | ||
|
||
Assume we make a lot of concurrent calls, method 1 would result in a large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume we make a lot of concurrent calls, method 1 would result in a large | |
Assume we make a lot of concurrent calls. Method 1 would result in a large |
content/reference/dos-mitigation.md
Outdated
implements their (Connection | ||
Gater)[https://github.com/prysmaticlabs/prysm/blob/63a8690140c00ba6e3e4054cac3f38a5107b7fb2/beacon-chain/p2p/connection_gater.go#L43]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implements their (Connection | |
Gater)[https://github.com/prysmaticlabs/prysm/blob/63a8690140c00ba6e3e4054cac3f38a5107b7fb2/beacon-chain/p2p/connection_gater.go#L43]. | |
implements their (ConnectionGater)[https://github.com/prysmaticlabs/prysm/blob/63a8690140c00ba6e3e4054cac3f38a5107b7fb2/beacon-chain/p2p/connection_gater.go#L43]. |
Fixing the rendering issue here: https://bafybeid4zqncc4v5epc4urfvl5ajgnmqeksksk4xrgabdwaxxswpsigh6y.on.fleek.co/reference/dos-mitigation/#leverage-the-resource-manager-to-limit-resource-usage-go-libp2p-only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a misuse of (foo)[bar]
vs [foo](bar)
usage. So limiting connections can have a leveraged effect on your resource | ||
usage. | ||
|
||
In go-libp2p the number of active connections is managed by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nitpics here:
- We use connmgr and ConnManager
- When we're hyperlinking, I think it's good to remove the ticks so it's clear that it's a hyperlink. (see screenshot of rendering)
- We do "ConnManager" no space and "Resource Manager" with space.
- Maybe the first time we talk about Resource Manager here we make it a hyperlink?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do "ConnManager" no space and "Resource Manager" with space.
In go-libp2p-core they're called ConnManager
and ResourceManager
. Using Conn Manager
feels weird and so does ResourceManager
although happy to make one change or the other if you feel strongly.
When we're hyperlinking, I think it's good to remove the ticks so it's clear that it's a hyperlink. (see screenshot of rendering)
I think our template should support the fixed width + hyperlink. I'll see if I can fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - lets ship!
Addresses libp2p/go-libp2p#1711.
Adds some documentation and links for libp2p monitoring and dos mitigation.