Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve resource manager UX #9001

Closed
3 tasks
lidel opened this issue May 31, 2022 · 2 comments · Fixed by #9338
Closed
3 tasks

Improve resource manager UX #9001

lidel opened this issue May 31, 2022 · 2 comments · Fixed by #9338
Assignees
Labels
kind/enhancement A net-new feature or improvement to an existing feature
Milestone

Comments

@lidel
Copy link
Member

lidel commented May 31, 2022

Swarm.ResourceMgr was added in #8680 as part of #8761 epic, but there are some paper cuts around it which we need to address, ideally before shipping it enabled by default.

  • user should get a clear feedback when a resource manager limit is hit with a guide how to adjust the limit
    • right now all resource manager logs are WARN or DEBUG, and default in go-ipfs is to log ERROR
      • this means user is not informed that resource manager started impacting performance, and may see either a degrade performance, or some random errors, without clear cause
      • to see resource manager tracing one needs to run daemon with LIBP2P_DEBUG_RCMGR
    • proposed fix:
      • detect when resource manager limits are hit (either by aggregating log warnings, or by tracking counters returned by ipfs swarm stats all) and print ERROR message informing user which specific limits were reached, and inform (inline or link) how to debug/adjust them.
      • For a starting point, see cb72776
  • User should be able to use CLI to quickly see which limits are hit or close to being hit (above % of utilization)
    • problem; ipfs swarm stats all works, but ipfs swarm limit all returns Error: invalid scope "all' making it hard to compare set limits vs current stats – user needs to manually check each scope (dozens of calls)
    • proposed solution: add optional parameter to ipfs swarm stats to only show scopes above certain utilization %.
      • we need this so user can adjust all 'close calls', not just the one that failed at the specific time
      • example: ipfs swarm stats all --min-used-limit-perc 85 will only show scopes which have values between 85 and 100 % of set limit.
  • User should be able to use CLI remove custom limit and restore default
@lidel lidel added the kind/enhancement A net-new feature or improvement to an existing feature label May 31, 2022
@lidel lidel moved this to 🥞 Todo in IPFS Shipyard Team May 31, 2022
@BigLep BigLep added this to the go-ipfs 0.14 milestone May 31, 2022
@BigLep BigLep modified the milestones: kubo 0.14, kubo 0.15 Jul 22, 2022
@BigLep BigLep mentioned this issue Sep 30, 2022
@ajnavarro ajnavarro moved this from 🥞 Todo to 🏃‍♀️ In Progress in IPFS Shipyard Team Oct 6, 2022
@guseggert
Copy link
Contributor

We should be able to ex block a peer by setting its limit to 0, but I don't think our config allows that because 0 is treated as unspecified and so it gets the default?

Repository owner moved this from 🏃‍♀️ In Progress to 🎉 Done in IPFS Shipyard Team Nov 10, 2022
ajnavarro added a commit that referenced this issue Nov 10, 2022
This PR adds several new functionalities to make easier the usage of ResourceManager:

- Now resource manager logs when resources are exceeded are on ERROR instead of warning.
- The resources exceeded error now shows what kind of limit was reached and the scope.
- When there was no limit exceeded, we print a message for the user saying that limits are not exceeded anymore.
- Added `swarm limit all` command to show all set limits with the same format as `swarm stats all`
- Added `min-used-limit-perc` option to `swarm stats all` to only show stats that are above a specific percentage
- Simplify a lot default values.
- **Enable ResourceManager by default.**

Output example:
```
2022-11-09T10:51:40.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:51:50.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 483095 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:51:50.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:00.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 455294 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:00.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:10.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 471384 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:10.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:20.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 8 times with error "peer:12D3KooWKqcaBtcmZKLKCCoDPBuA6AXGJMNrLQUPPMsA5Q6D1eG6: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:20.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 192 times with error "peer:12D3KooWPjetWPGQUih9LZTGHdyAM9fKaXtUxDyBhA93E3JAWCXj: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:20.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 469746 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:20.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:30.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 484137 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:30.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 29 times with error "peer:12D3KooWPjetWPGQUih9LZTGHdyAM9fKaXtUxDyBhA93E3JAWCXj: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:30.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:40.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 468843 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:40.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:52:50.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 366638 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:52:50.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:53:00.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 405526 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:53:00.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 107 times with error "peer:12D3KooWQZQCwevTDGhkE9iGYk5sBzWRDUSX68oyrcfM9tXyrs2Q: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:53:00.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:53:10.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 336923 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:53:10.566+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:53:20.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:55      Resource limits were exceeded 71 times with error "transient: cannot reserve inbound stream: resource limit exceeded".
2022-11-09T10:53:20.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:59      Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr
2022-11-09T10:53:30.565+0100    ERROR   resourcemanager libp2p/rcmgr_logging.go:64      Resrouce limits are no longer being exceeded.

```
## Validation tests

- Accelerated DHT client runs with no errors when ResourceManager is active. No problems were observed.
- Running an attack with 200 connections and 1M streams using yamux protocol. Node was usable during the attack. With ResourceManager deactivated, the node was killed by the OS because of the amount of memory consumed.
	- Actions done when the attack was active:
		- Add files 
		- Force a reprovide
		- Use the gateway to resolve an IPNS address.

It closes #9001 
It closes #9351
It closes #9322
@lidel
Copy link
Member Author

lidel commented Nov 14, 2022

Follow-up work tracked under dedicated label: topic/resource-manager Issues related to Swarm.ResourceMgr (resource manager)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants