Network Resource Manager Implementation #1

vyzo · 2021-12-22T09:41:38Z

Implementation of the network.ResourceManager interface.

The libp2p Network Resource Manager

This package contains the canonical implementation of the libp2p
Network Resource Manager interface.

The implementation is based on the concept of Resource Management
Scopes, whereby resource usage is constrained by a DAG of scopes,
accounting for multiple levels of resource constraints.

Design Considerations

The Resource Manager must account for basic resource usage at all
levels of the stack, from the internals to the application.
Basic resources include memory, streams, connections, and file
descriptors. These account for both space and time used by
the stack, as each resource has a direct effect on the system
availability and performance.
The design must support seamless integration for user applications,
which should reap the benefits of resource management without any
changes. That is, existing applications should be oblivious of the
resource manager and transparently obtain limits which protect it
from resource exhaustion and OOM conditions.
On the same time, the design must support opt-in resource usage
accounting for applications who want to explicitly utilize the
facilities of the system to inform about and constrain their own
resource usage.
The design must allow the user to set its own limits, which can be
static (fixed) or dynamic.

Basic Resources

Memory

Perhaps the most fundamental resource is memory, and in particular
buffers used for network operations. The system must provide an
interface for compoenents to reserve memory that accounts for buffers
(and possibly other live objects), which is scoped within the component.
Before a new buffer is allocated, the component should try a memory
reservation, which can fail if the resource limit is exceeded. It is
then up to the component to react to the error condition, depending on
the situation. For example, a muxer failing to grow a buffer in
response to a window change should simply retain the old buffer and
operate at perhaps degraded performance.

File Descriptors

File descriptors are an important resource that uses memory (and
computational time) at the system level. They are also a scarce
resource, as typically (unless the user explicitly intervenes) they
are constrained by the system. Exhaustion of file descriptors may
render the application incapable of operating (e.g. because it is
unable to open a file).

Connections

Connections are a higher level concept endemic to libp2p; in order to
communicate with another peer, a connection must first be
established. Connections are an important resource in libp2p, as they
consume memory, goroutines, and possibly file descriptors.

We distinguish between inbound and outbound connections, as the former
are initiated by remote peers and consume resources in response to
network events and thus need to be tightly controlled in order to
protect the application from overload or attack. Outbound
connections are typically initiated by the application's volition and
don't need to be controlled as tightly. However, outbound connections
still consume resources and may be initiated in response to network
events because of (potentially faulty) application logic, so they
still need to be constrained.

Streams

Streams are the fundamental object of interaction in libp2p; all
protocol interactions happen through a stream that goes over some
connection. Streams are a fundamental resource in libp2p, as they
consume memory and goroutines at all levels of the stack.

Streams always belong to a peer, specify a protocol and they may
belong to some service in the system. Hence, this suggests that apart
from global limits, we can constrain stream usage at finer
granularity, at the protocol and service level.

Once again, we disinguish between inbound and outbound streams.
Inbound streams are initiated by remote peers and consume resources in
response to network events; controlling inbound stream usage is again
paramount for protecting the system from overload or attack.
Outbound streams are normally initiated by the application or some
service in the system in order to effect some protocol
interaction. However, they can also be initiated in response to
network events because of application or service logic, so we still
need to constrain them.

Resource Scopes

The Resource Manager is based on the concept of resource
scopes. Resource Scopes account for resource usage that is temporally
delimited for the span of the scope. Resource Scopes conceptually
form a DAG, providing us with a mechanism to enforce multiresolution
resource accounting. Downstream resource usage is aggregated at scopes
higher up the graph.

The following diagram depicts the canonical scope graph:

System
  +------------> Transient.............+................+
  |                                    .                .
  +------------>  Service------------- . ----------+    .
  |                                    .           |    .
  +------------->  Protocol----------- . ----------+    .
  |                                    .           |    .
  +-------------->* Peer               \/          |    .
                     +------------> Connection     |    .
                     |                             \/   \/
                     +--------------------------->  Stream

The System Scope

The system scope is the top level scope that accounts for global
resource usage at all levels of the system. This scope constrains all
other scopes and institutes global hard limits.

The Transient Scope

The transient scope accounts for resources that are in the process of
full establishment. For instance, a new connection prior to the
handshake does not belong to any peer, but it still needs to be
constrained as this opens an avenue for attacks in transient resource
usage. Similarly, a stream that has not negotiated a protocol yet is
constrained by the transient scope.

Service Scopes

The system is typically organized across services, which may be
ambient and provide basic functionality to the system (e.g. identify,
autonat, relay, etc). Alternatively, services may be explicitly
instantiated by the application, and provide core components of its
functionality (e.g. pubsub, the DHT, etc).

Services consume resources such as memory and may directly own streams
that implement their protocol flow. Services typically have some
stream handler, so they are subject to inbound stream creation and
resource usage in response to network events. As such, the system
explicitly models them allowing for isolated resource usage that can
be tuned by the user.

Protocol Scopes

Protocol Scopes account for resources at the protocol level. They are
an intermediate resource scope which can constrain streams which may
not have a service associated or for resource control within a
service.

For instance, a service that is not aware of the resource manager and
has not be ported to mark its streams, may still gain limits
transparently without any programmer intervention. Furthermore, the
protocol scope can constrain resource usage for services that
implement multiple protocols for the shake of backwards
compatibility. A tighter limit in some older protocol can protect the
application from resource consumption caused by legacy clients or
potential attacks.

For a concrete example, consider pubsub with the gossipsub router: the
service also understands the floodsub protocol for backwards
compatibility and support for unsophisticated clients that are lagging
in the implementation effort. By specifying a lower limit for the
floodsub protocol, we can can constrain the service level for legacy
clients using an inefficient protocol.

Peer Scopes

The peer scope accounts for resource usage by an individual peer. This
constrains connections and streams and limits the blast radius of
resource consumption by a single remote peer.

Connection Scopes

The connection scope is delimited to the duration of a connection and
constrains resource usage by a single connection. The scope is a leaf
in the DAG, with a span that begins when a connection is established
and ends when the connection is closed. Its resources are aggregated
to the resource usage of a peer.

Stream Scopes

The stream scope is delimited to the duration of a stream, and
constrains resource usage by a single stream. This scope is also a
leaf in the DAG, with span that begins when a stream is created and
ends when the stream is closed. Its resources are aggregated to the
resource usage of a peer, and constrained by a service and protocol
scope.

User Transaction Scopes

User transaction scopes can be created as a child of any extant
resource scope, and provide the prgrammer with a delimited scope for
easy resource accounting. Transactions may form a tree that is rooted
to some canonical scope in the scope DAG.

For instance, a programmer may create a transaction scope within a
service that accounts for some control flow delimited resource
usage. Similarly, a programmer may create a transaction scope for some
interaction within a stream, e.g. a Request/Response interaction that
uses a buffer.

Limits

Each resource scope has an associated limit object, which designates
limits for all basic resources. The limit is checked every time some
resource is reserved and provides the system with an opportunity to
constrain resource usage.

There are separate limits for each class of scope, allowing us for
multiresolution and aggregate resource accounting. As such, we have
limits for the system and transient scopes, default and specific
limits for services, protocols, and peers, and limits for connections
and streams.

Implementation Notes

The package only exports a constructor for the resource manager and
basic types for defining limits. Internals are not exposed.
Internally, there is a resources object that is embedded in every scope and
implements resource accounting.
There is a single implementation of a generic resource scope, that
provides all necessary interface methods.
There are concrete types for all canonical scopes, embedding a
pointer to a generic resource scope.
Peer and Protocol scopes, which may be created in response to
network events, are periodically garbage collected.

scope.go

marten-seemann

Not a thorough review yet, just what I noticed during a quick pass.

scope.go

vyzo · 2021-12-28T11:30:30Z

rebased on master to get the workflows.

go.mod

marten-seemann

Partial review, will continue tomorrow.

scope.go

marten-seemann

Yet another partial review (sorry for that). More to follow tomorrow.

errors.go

rcmgr.go

scope.go

marten-seemann

A bunch of suggestions to un-export types.
If I understand correctly, the only thing we need to export in this package is NewResourceManager. All other types can be private, as they're just implementations of the interfaces defined in -core.

I also added two suggestions how to disentangle the owner and constraints logic. Not 100% sure if they'll work, lmk what you think.

scope.go

rcmgr.go

scope.go

rcmgr.go

vyzo · 2021-12-30T07:26:06Z

Yeah, we only need to export the constructor and the limit types; everything else can be private.

vyzo · 2021-12-30T07:34:14Z

So agreed on unexporting, let me think if the constraints subtype and owner wrapper actually helps.

vyzo · 2021-12-30T09:39:24Z

Unexported all implementation types.

BigLep

This is great - thanks for getting it going.

An idea for providing more clarity on a higher level...

Could we show some code, pseudo code, or config examples? Ideas coming to mind:

State some limits at different scopes. Then walk through what happens when a new connection comes through. I assume we evaluate the the resource request at the different scopes and if any fail, an error comes back.

Side: what does that error object/message look like for the programmer?

Show what some expected configuration examples would look like (i.e., very low/aggresive llimits on inbound connections).
I'm sure there are better examples, but my idea here is to make it even more concrete. I think the README does a good job talking in abstract and in dropping in some examples. I think leaning in on the example side even more will further increase the understandability.

Maybe also articulate what this system can't do. For example can it:

Have different limits for peers that match a certain regex (i.e., use it as a way to block specific peers).

README.md

vyzo · 2022-01-05T19:18:33Z

Sure @BigLep , these are great suggestions for improvement; will get to it.

vyzo · 2022-01-08T16:11:29Z

Note: I will squash the go mod related commits when it is time for merge.

vyzo · 2022-01-08T20:22:30Z

Added some ergonomic polish to the limit interfaces and added support for per service peer limits.

vyzo · 2022-01-09T08:33:55Z

removed gosigar dependency.

marten-seemann · 2022-01-09T08:36:05Z

limit_dynamic.go

+	var memstat runtime.MemStats
+	runtime.ReadMemStats(&memstat)
+
+	freemem += (memstat.HeapInuse - memstat.HeapAlloc) + (memstat.HeapIdle - memstat.HeapReleased)


These are all uint64s. Are we 100% sure we can't underflow?

yes, the actual address space in all 64bit processors is 48bit.

oh you mean HeapAlloc being more than HeapInuse etc? Hrm, that shouldn't happen unless the runtime goes crazy.

These subtractions are actually described in the runtime package documentation.

https://pkg.go.dev/runtime#MemStats

raulk

This generally LGTM besides the inline comments. I do have more general production-readiness comments.

This component is basically opaque right now; it's going to be quite hard to monitor.
Consider adding audit trail style logging that traces all resource management operations.
Consider adding general logging at key sites, e.g. when limits are initialized, when requests are rejected, etc.
Consider adding convenience utils to dump the internal state, either on demand or through a timer-based poller, to some output.

README.md

raulk · 2022-01-13T14:38:14Z

README.md

+resource usage of a peer, and constrained by a service and protocol
+scope.
+
+### User Transaction Scopes


I'm not sure if user resource usage spans qualify as scopes. Also, I don't think transaction is quite the right concept here as there is no atomicity or isolation guarantees. I'd consider the term "user resource usage spans".

agreed, I am leaning towards renaming to Spans.

renamed to Spans, will update the text.

README.md

scope.go

vyzo · 2022-01-13T15:41:16Z

This component is basically opaque right now; it's going to be quite hard to monitor.

Not really, all scopes define the Stat method so it is pretty easy to dump state.

Consider adding audit trail style logging that traces all resource management operations.

Ok, will add some logs.

Consider adding general logging at key sites, e.g. when limits are initialized, when requests are rejected, etc.

sure.

Consider adding convenience utils to dump the internal state, either on demand or through a timer-based poller, to some output.

ok, we can do something about this.

vyzo · 2022-01-17T07:29:35Z

Polish per raul's review in #3

Co-authored-by: raulk <[email protected]>

…rotocol when setting the service

vyzo · 2022-01-17T10:33:35Z

squashed go mod related commits, this is ready for merge now.

vyzo requested review from Stebalien and marten-seemann December 22, 2021 09:41

marten-seemann reviewed Dec 22, 2021

View reviewed changes

scope.go Outdated Show resolved Hide resolved

scope.go Outdated Show resolved Hide resolved

scope.go Outdated Show resolved Hide resolved

scope.go Outdated Show resolved Hide resolved

marten-seemann reviewed Dec 23, 2021

View reviewed changes

scope.go Outdated Show resolved Hide resolved

scope.go Outdated Show resolved Hide resolved

vyzo force-pushed the implementation branch from e4ae64c to 77c4ce2 Compare December 28, 2021 11:30

marten-seemann reviewed Dec 28, 2021

View reviewed changes

go.mod Outdated Show resolved Hide resolved

marten-seemann reviewed Dec 28, 2021

View reviewed changes

scope.go Outdated Show resolved Hide resolved

scope.go Outdated Show resolved Hide resolved

scope.go Outdated Show resolved Hide resolved

scope.go Outdated Show resolved Hide resolved

vyzo changed the title ~~Implementation~~ Network Resource Manager Implementation Dec 28, 2021

marten-seemann reviewed Dec 29, 2021

View reviewed changes

errors.go Outdated Show resolved Hide resolved

rcmgr.go Outdated Show resolved Hide resolved

scope.go Outdated Show resolved Hide resolved

scope.go Outdated Show resolved Hide resolved

scope.go Outdated Show resolved Hide resolved

scope.go Outdated Show resolved Hide resolved

marten-seemann reviewed Dec 30, 2021

View reviewed changes

marten-seemann mentioned this pull request Jan 1, 2022

go-libp2p v0.18.0 libp2p/go-libp2p#1267

Closed

69 tasks

vyzo force-pushed the implementation branch from 6bb8e50 to caf83a2 Compare January 5, 2022 15:00

BigLep reviewed Jan 5, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

vyzo marked this pull request as ready for review January 8, 2022 16:11

marten-seemann reviewed Jan 9, 2022

View reviewed changes

vyzo force-pushed the implementation branch from fc81fb4 to 0381e25 Compare January 13, 2022 13:50

raulk self-requested a review January 13, 2022 14:04

raulk reviewed Jan 13, 2022

View reviewed changes

vyzo force-pushed the implementation branch 2 times, most recently from a93df40 to da8e888 Compare January 17, 2022 07:19

vyzo and others added 25 commits January 17, 2022 12:33

RIP gosigar

d22a48d

adjust memory limit multipliers for the default limiters

927d2d7

adjust default limits

572b3eb

Update README.md

4f92b11

Co-authored-by: raulk <[email protected]>

rename txn to span

e1701c7

introduce per protocol peer limits, don't transfer resources out of p…

90d7e86

…rotocol when setting the service

fix tests

8ec4ed7

rename constraints to edges

5d609db

log, don't panic on resource release bugs

12fcd1d

add logging around blocked reservations/conns/streams

27688b8

add total stream and conn limit

f5556bf

extensions api

16726f1

add options to NewResourceManager constructor

0c16aa4

fix test

d9e855d

tracing instrumentation

98870b0

implement tracer

1dc0961

fix omit empty decl

83c1399

short circuit write/flush if there are no pending events

8a60c5d

emit start event to trace with the limiter

f51ceca

add limit json config parser

868c93b

add limit config parsing unit test

ddb3988

refactor limit defaults for easy access and user manipulation

575eade

update README

9d5e792

normalize limiter constructors

859d206

sort results in List* api methods

7523ae4

vyzo force-pushed the implementation branch from 9cd14c8 to 7523ae4 Compare January 17, 2022 10:33

marten-seemann approved these changes Jan 17, 2022

View reviewed changes

vyzo merged commit 617d17d into master Jan 17, 2022

vyzo deleted the implementation branch January 19, 2022 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network Resource Manager Implementation #1

Network Resource Manager Implementation #1

vyzo commented Dec 22, 2021 •

edited

Loading

marten-seemann left a comment

vyzo commented Dec 28, 2021

marten-seemann left a comment

marten-seemann left a comment

marten-seemann left a comment

vyzo commented Dec 30, 2021

vyzo commented Dec 30, 2021

vyzo commented Dec 30, 2021

BigLep left a comment

vyzo commented Jan 5, 2022

vyzo commented Jan 8, 2022

vyzo commented Jan 8, 2022

vyzo commented Jan 9, 2022

marten-seemann Jan 9, 2022

vyzo Jan 9, 2022

vyzo Jan 9, 2022 •

edited

Loading

vyzo Jan 9, 2022

raulk left a comment

raulk Jan 13, 2022

vyzo Jan 13, 2022

vyzo Jan 14, 2022

vyzo commented Jan 13, 2022

vyzo commented Jan 17, 2022

vyzo commented Jan 17, 2022

Network Resource Manager Implementation #1

Network Resource Manager Implementation #1

Conversation

vyzo commented Dec 22, 2021 • edited Loading

The libp2p Network Resource Manager

Design Considerations

Basic Resources

Memory

File Descriptors

Connections

Streams

Resource Scopes

The System Scope

The Transient Scope

Service Scopes

Protocol Scopes

Peer Scopes

Connection Scopes

Stream Scopes

User Transaction Scopes

Limits

Implementation Notes

marten-seemann left a comment

Choose a reason for hiding this comment

vyzo commented Dec 28, 2021

marten-seemann left a comment

Choose a reason for hiding this comment

marten-seemann left a comment

Choose a reason for hiding this comment

marten-seemann left a comment

Choose a reason for hiding this comment

vyzo commented Dec 30, 2021

vyzo commented Dec 30, 2021

vyzo commented Dec 30, 2021

BigLep left a comment

Choose a reason for hiding this comment

vyzo commented Jan 5, 2022

vyzo commented Jan 8, 2022

vyzo commented Jan 8, 2022

vyzo commented Jan 9, 2022

marten-seemann Jan 9, 2022

Choose a reason for hiding this comment

vyzo Jan 9, 2022

Choose a reason for hiding this comment

vyzo Jan 9, 2022 • edited Loading

Choose a reason for hiding this comment

vyzo Jan 9, 2022

Choose a reason for hiding this comment

raulk left a comment

Choose a reason for hiding this comment

raulk Jan 13, 2022

Choose a reason for hiding this comment

vyzo Jan 13, 2022

Choose a reason for hiding this comment

vyzo Jan 14, 2022

Choose a reason for hiding this comment

vyzo commented Jan 13, 2022

vyzo commented Jan 17, 2022

vyzo commented Jan 17, 2022

vyzo commented Dec 22, 2021 •

edited

Loading

vyzo Jan 9, 2022 •

edited

Loading