Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadBalancer for bitswap (and later, more of libp2p) #786

Merged
merged 19 commits into from
Sep 22, 2022

Conversation

hannahhoward
Copy link
Collaborator

@hannahhoward hannahhoward commented Sep 12, 2022

Goals

Implement a simple libp2p load balancer for booster bitswap based on https://pl-strflt.notion.site/Simple-Libp2p-Load-Balancer-a1f1d8c182f74ea5bba4b48ef5a4992d

The design was simplified significantly based on #801

I've also made the job of configuring the load balancer much simpler -- when you start booster-bitswap for the first time, it automatically sets the load balancers configured peer id for bitswap

Implementation

  • add message types as IPLD serializable go types
  • add utility functions for writing various message types to streams
  • implement the load balancer based on the psuedo code in the load balancer design document. Load balancer:
    • takes a set peer config of expected service nodes and the protocols they will listen on
    • when service nodes connect, activates all protocols registered for service node
    • when service nodes disconnect, deactivates all protocols registered for service node
    • handles inbound and outbound forwarding requests
    • peer routing configuration can be updated as the cost of all service nodes disconnecting and reconnected
  • implement a service node based on the psuedo code in the load balancer design document. Service node handles:
    • handling inbound and outbound forwarding requests
  • config for bitswap peer id in boost
    • if this is set, go ahead and list in the available retrieval protocols
  • add a module to start the load balancer within boost
    • only accept routing requests currently from set bitswap peer id
  • add APIs to programmatically get/set the bitswap peer id
    • when set, updates the peer config in the load balancer
  • in booster bitswap -
    • on start up, load or generate a peer key (previous, generated a new one every time)
    • query the boost for the currently set bitswap peer id
    • if peer id is not set, set it to the newly generated peer id
    • if peer id is set, verify it matches booster bitswaps peer id
      • if no match, error out, unless a special override-peer-id CLI option is passed, which tells booster-bitswap to replace the bitswap peer ID in Boost config with the current booster-bitswap peer

I have verified the load balancer works round trip -- I was able to get a bitswap block from boost's main peer ID

@hannahhoward hannahhoward changed the base branch from main to feat/bitswap-server September 12, 2022 08:17
@dirkmc
Copy link
Contributor

dirkmc commented Sep 12, 2022

Thanks for putting this together Hannah.

I have some general questions about the load balancer implementation:

  1. Can we assume that the booster-bitswap node has the same peer ID as the load balancer? If so, then I think it simplifies things.
  2. Can the load balancer register routes as soon as it gets a connection from the booster-bitswap node, and remove the routes when the connection goes down? Then we wouldn't need a lot of the messages. For example we wouldn't need the renew / termination messages and logic.

The way I'd imagine this working:

In boostd config (maybe in a LoadBalancer section) we could have

  • the protocols that the routed host serves
  • the multiaddr of the routed host

When the load balancer gets a connection from the routed host, it starts listening on the protocols in config.
When the load balancer gets an inbound connection over these protocols

  • if there is no connection to the routed host, it does a stream reset
  • if there is a connection to the routed host it just passes data through to the routed host as-is

With respect to the forwarding protocol I think the general idea makes sense 👍
I wonder if we can use the forwarding protocol to just set up the connection, instead of sending messages for every stream? For example bitswap opens a new stream every time it sends a block.

Base automatically changed from feat/bitswap-server to release/lotus1.17.2 September 13, 2022 09:33
registeredPeer, ok := lb.routes[id]
if !ok || p != registeredPeer {
// error if this protocol not registered to this peer
return ErrNotRegistered{p, id}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a bit too harsh for a "remove them all please!"? should we instead terminate any that exist but also return an ErrNotRegistered if any of those didn't exist rather than bail if just one doesn't exist?

loadbalancer/loadbalancer.go Outdated Show resolved Hide resolved
Comment on lines 452 to 179
//ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
//defer cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were they taking too long or did you forget to uncomment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would appear so


type RoutingResponse struct {
Code ResponseCode (rename "c")
Message String (rename "m") # more info if rejected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe optional too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's probably a case for an inline union here with a success and failure subtype, keyed by code, something like:

type RoutingResponse union {
  | RoutingResponseOk "o"
  | RoutingResponseRejected "r"
} representation inline {
  discriminantKey "c"
}

type RoutingResponseOk struct {
  Message String (rename "m")
}

type RoutingResponseRejected struct {
  Expiry DateTime
}

... having written all of that; I don't think this is supported in bindnode, maybe it's not even in the current Go schema parser! I guess there's a TODO here because the union would give you good validation on the existence of the two different properties.

Copy link
Collaborator Author

@hannahhoward hannahhoward Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that Union types are supported by the parser (https://github.com/ipld/go-ipld-prime/blob/7548eb883bda4712355797547a0628a0ad1c00cb/node/bindnode/infer.go#L221) but the implementation for bindnode in go is pretty rough -- a wrapper struct with a set of members that are all nullable for each potential type in the union. Eek! Not sure what type safety you actually get out of that at least on the go-side, but seems like there are lots of ways to make said struct fail when actually serialized. It is enough that it makes me not want to use it.

type PubKey bytes

type ForwardingRequest struct {
Kind ForwardingKind (rename "k")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another case for a union here; we should look at doing this so we have protocol level enforcement of types

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. At the moment, the bindnode go union type implementation sucks enough that I don't want to deal with it. (I believe currently it's a struct with a bunch of nullable members?)


// open a routing request stream
s, err := sn.Host.NewStream(sn.ctx, sn.balancer, RegisterRoutingProtocolID)
defer s.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be after the err check?

loadbalancer/servicenode.go Outdated Show resolved Hide resolved
loadbalancer/servicenode.go Outdated Show resolved Hide resolved
@hannahhoward hannahhoward force-pushed the feat/libp2p-loadbalancer branch from c2a6247 to a7d2b67 Compare September 16, 2022 00:00
@hannahhoward hannahhoward marked this pull request as ready for review September 16, 2022 00:00
@hannahhoward hannahhoward force-pushed the feat/libp2p-loadbalancer branch 2 times, most recently from 75b75a1 to 395f14f Compare September 16, 2022 00:26
@hannahhoward hannahhoward requested a review from rvagg September 16, 2022 00:43
node/modules/retrieval.go Outdated Show resolved Hide resolved
node/modules/retrieval.go Outdated Show resolved Hide resolved
node/modules/retrieval.go Outdated Show resolved Hide resolved
node/modules/retrieval.go Outdated Show resolved Hide resolved
loadbalancer/errors.go Outdated Show resolved Hide resolved
loadbalancer/loadbalancer.go Outdated Show resolved Hide resolved
loadbalancer/protocols.go Outdated Show resolved Hide resolved
loadbalancer/servicenode.go Outdated Show resolved Hide resolved
loadbalancer/servicenode.go Outdated Show resolved Hide resolved
@@ -125,3 +138,18 @@ func getBoostAPI(ctx context.Context, ai string) (api.Boost, jsonrpc.ClientClose

return api, closer, nil
}

func dataDirPath(ctx *cli.Context) string {
dataDir := ctx.String("data-dir")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "data-dir" flag doesn't seem to be defined in the run command.
I'd suggest using cmd.FlagRepo for consistency with the boost client.

@dirkmc
Copy link
Contributor

dirkmc commented Sep 16, 2022

If I understand correctly, the load balancer

  • starts listening on bitswap protocols when the bitswap node connects to it
  • stops listening on bitswap protocols when the bitswap node disconnects from it

From a client's perspective, whenever they open a stream to the load balancer over bitswap they will get a "protocol not supported" error. To a client this looks like a more long-term state: "This server doesn't support bitswap".

Would it make more sense for the client to get a stream reset instead? To a client that would look like a short term error state.

@dirkmc
Copy link
Contributor

dirkmc commented Sep 16, 2022

With respect to the mechanism of booster-bitswap automatically registering its peer ID with the load balancer, it may cause issues to close and restart the load balancer after it's already been started.

I'd suggest instead that we add a booster-bitswap init command (similar to boost init) that creates a peer ID. Then someone who wants to add a booster-bitswap node on their SP would:

  1. Run booster-bitswap init to create a peer ID. If it has already been inited, it just outputs the already created peer ID.
  2. Add the bitswap peer ID to boostd's config.toml
  3. Restart boostd
  4. Start booster-bitswap

@hannahhoward
Copy link
Collaborator Author

re: booster-bitswap init

It seems like it would be nice for it to use the API call to copy the value directly. What's stored on disk is the private key, not exactly obvious how to direct copy it. But I agree that it's reasonable to expect a restart rather than hot reload.

@hannahhoward
Copy link
Collaborator Author

I've updated the PR so that:

  • configured routes where the service node is not connected return stream reset, instead of protocol not supported
  • the peer id can be configured through the booster-bitswap init command
  • no longer attempting to support hot reloading.

@hannahhoward hannahhoward force-pushed the feat/libp2p-loadbalancer branch from 955f824 to 4e30068 Compare September 20, 2022 01:01
Comment on lines 30 to 26
if cfgDir == "" {
return nil, fmt.Errorf("dataDir must be set")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the repo flag has a default value, I don't think we need this check

log.Infow("get/set peer id of bitswap from boost", "local", selfPid.String(), "boost", existingPid.String(), "boost not set", peerIDNotSet, "override", overrideExistingPeerID)
// error if a peer id is set that is different and we aren't overriding
if !peerIDNotSet && !matchesPid && !overrideExistingPeerID {
return nil, errors.New("bitswap peer id does not match boost node configuration. use --override-peer-id to force a change")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the --override-peer-id flag is no longer defined

@dirkmc
Copy link
Contributor

dirkmc commented Sep 20, 2022

This is looking really good 👍

What's stored on disk is the private key, not exactly obvious how to direct copy it.

I don't think you need to directly copy the private key. I'd suggest that the booster-bitswap init command outputs the peer ID (which is derived from the public key). Then the user can manually set the peer ID on the boostd node.

My concern is that if there is a booster-bitswap process on a diffferent node that can silently update a config value on the boostd node that might lead to confusing situations for the user. I think it's clearer for them to explicitly update the config values manually. At the moment it's a little tricky to follow the logic in the configureRepo method. The suggested change should simplify all that.

I'd suggest that

  • booster-bitswap init creates a new peer ID, or prints out the existing peer ID if one was already created (take a look at boost init that does something similar)
  • booster-bitswap run fails if booster-bitswap init has not yet been run

@dirkmc dirkmc mentioned this pull request Sep 20, 2022
8 tasks
hannahhoward and others added 18 commits September 21, 2022 15:00
implementation of the load balancer node itself
implements code for running a service node
remove the routing protocol, instead relying on a set config. also remove forwarding response for
inbound requests
removed pub keys to minimize network traffic, added api's to configure and update bitswap peer id,
added auto config of bitswap peer id in booster-bitswap
renames, reconfigured architecture, etc
@hannahhoward hannahhoward force-pushed the feat/libp2p-loadbalancer branch from 4e30068 to 1ae395e Compare September 21, 2022 22:03
@hannahhoward
Copy link
Collaborator Author

@dirkmc changes made and hopefully ready to go!

@dirkmc dirkmc merged commit a58ea5a into release/lotus1.17.2 Sep 22, 2022
@dirkmc dirkmc deleted the feat/libp2p-loadbalancer branch September 22, 2022 07:47

func configureRepo(ctx context.Context, cfgDir string, createIfNotExist bool) (peer.ID, crypto.PrivKey, error) {
if cfgDir == "" {
return "", nil, fmt.Errorf("dataDir must be set")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest changing this error message to `FlagRepo.name + " is a required flag"

repoDir := cctx.String(FlagRepo.Name)
host, err := setupHost(ctx, repoDir, port)
if err != nil {
return fmt.Errorf("setting up libp2p host: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the peer key is not found, I think this will output an error like "setting up libp2p host: libp2p.key not found"
I'd suggest we show something more informative, eg

if os.IsNotExist(err) {
	return nil, fmt.Errorf("booster-bitswap has not been initialized. Run the booster-bitswap init command.")
}

@dirkmc
Copy link
Contributor

dirkmc commented Sep 22, 2022

Nice work! 🙌

I merged the PR, and left a couple of small UI things to address in a follow-up PR.

@@ -206,6 +206,8 @@ type DealmakingConfig struct {
// The time that can elapse before a download is considered stalled (and
// another concurrent download is allowed to start).
HttpTransferStallTimeout Duration

BitswapPeerID string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a detailed comment about what this config value is. The comment ends up being copied into the config file itself, so it should be easy for users to understand what the value is and know what to set it to.

LexLuthr pushed a commit that referenced this pull request Oct 4, 2022
* booster bitswap MVP executable (#707)

* feat(booster-bitswap): booster bitswap MVP untested

* refactor(booster-bitswap): use API for fetching blocks

* fix(deps): update deps to compile

* feat(booster-bitswap): makefile & fixes

add commands to build booster-bitswap, and very a round tripped successful fetch from
booster-bitswap

* refactor: clean up unused vars etc

* fix: booster-bitsawp - check error when creating libp2p key

* refactor(node): avoid FreeAndUnsealed method

Co-authored-by: Dirk McCormick <[email protected]>
Co-authored-by: Anton Evangelatov <[email protected]>

* booster-bitswap devnet and tracing (#796)

* return ipld ErrNotFound from remote blockstore interface (#798)

* fix: return ipld ErrNotFound from remote blockstore interface

* test: add more tests for ipld ErrNotFound

* test: comment out part of TestDummydealOnline that is flaky due to a bug in latest lotus (#802)

* fix normaliseError nil ptr dereference (#803)

* feat: shard selector (#807)

* LoadBalancer for bitswap (and later, more of libp2p) (#786)

* feat(loadbalancer): add message types

* feat(messages): add utility functions

* feat(loadbalancer): initial load balancer impl

implementation of the load balancer node itself

* feat(loadbalancer): add service node

implements code for running a service node

* feat(loadbalancer): integrate into boost and booster-bitswap

* Update loadbalancer/loadbalancer.go

Co-authored-by: Rod Vagg <[email protected]>

* Update loadbalancer/servicenode.go

Co-authored-by: Rod Vagg <[email protected]>

* Update loadbalancer/servicenode.go

Co-authored-by: Rod Vagg <[email protected]>

* Update loadbalancer/messages/messages.ipldsch

Co-authored-by: Rod Vagg <[email protected]>

* Update loadbalancer/messages/messages.ipldsch

Co-authored-by: Rod Vagg <[email protected]>

* refactor(loadbalancer): remove routing protocol

remove the routing protocol, instead relying on a set config. also remove forwarding response for
inbound requests

* fix(loadbalancer): update tests

* refactor(loadbalancer): integrate simplified load balancer

removed pub keys to minimize network traffic, added api's to configure and update bitswap peer id,
added auto config of bitswap peer id in booster-bitswap

* docs(gen): regenerate api docs

* chore(lint): fix lint errors

* fix(loadbalancer): minor bridgestream fix

* Update loadbalancer/servicenode.go

Co-authored-by: dirkmc <[email protected]>

* refactor(protocolproxy): address PR comments

renames, reconfigured architecture, etc

* refactor(make init print out peer id): remove apis and transparent peer id setting. have init print

Co-authored-by: Rod Vagg <[email protected]>
Co-authored-by: dirkmc <[email protected]>

* Add block filter via BadBits (#825)

* feat(booster-bitswap): add block filter via BadBits

* refactor(booster-bitswap): use bitswap blockfilter for filtering

* feat(blockfilter): only update when list is modified

* feat(blockFilter): add on disk caching

* Update cmd/booster-bitswap/blockfilter/blockfilter.go

Co-authored-by: dirkmc <[email protected]>

* fix(blockfilter): minor PR fixups

Co-authored-by: dirkmc <[email protected]>

* Libp2p 0.22 upgrade (#837)

* chore(deps): upgrade to Lotus RC & libp2p v0.22

* chore(deps): update go to 1.18

* ci(circle): update circle to go 1.18

* style(imports): fix imports

* fix(build): update ffi

* fix(lint): fix deprecated strings.Title method

* fix(mod): mod tidy

* Protocol Proxy cleanup (#836)

* refactor(booster-bitswap): minor UI fixes for booster-bitswap UI

* Update cmd/booster-bitswap/init.go

Co-authored-by: dirkmc <[email protected]>

Co-authored-by: dirkmc <[email protected]>

* feat: update to dagstore v0.5.5 (#849)

* add booster-bitswap to devnet (#866)

* bump lotus-test version

* add docker/booster-bitswap target in Makefile

Co-authored-by: Hannah Howard <[email protected]>
Co-authored-by: Dirk McCormick <[email protected]>
Co-authored-by: Rod Vagg <[email protected]>
dirkmc added a commit that referenced this pull request Oct 4, 2022
* booster bitswap MVP executable (#707)

* feat(booster-bitswap): booster bitswap MVP untested

* refactor(booster-bitswap): use API for fetching blocks

* fix(deps): update deps to compile

* feat(booster-bitswap): makefile & fixes

add commands to build booster-bitswap, and very a round tripped successful fetch from
booster-bitswap

* refactor: clean up unused vars etc

* fix: booster-bitsawp - check error when creating libp2p key

* refactor(node): avoid FreeAndUnsealed method

Co-authored-by: Dirk McCormick <[email protected]>
Co-authored-by: Anton Evangelatov <[email protected]>

* booster-bitswap devnet and tracing (#796)

* return ipld ErrNotFound from remote blockstore interface (#798)

* fix: return ipld ErrNotFound from remote blockstore interface

* test: add more tests for ipld ErrNotFound

* test: comment out part of TestDummydealOnline that is flaky due to a bug in latest lotus (#802)

* fix normaliseError nil ptr dereference (#803)

* feat: shard selector (#807)

* LoadBalancer for bitswap (and later, more of libp2p) (#786)

* feat(loadbalancer): add message types

* feat(messages): add utility functions

* feat(loadbalancer): initial load balancer impl

implementation of the load balancer node itself

* feat(loadbalancer): add service node

implements code for running a service node

* feat(loadbalancer): integrate into boost and booster-bitswap

* Update loadbalancer/loadbalancer.go

Co-authored-by: Rod Vagg <[email protected]>

* Update loadbalancer/servicenode.go

Co-authored-by: Rod Vagg <[email protected]>

* Update loadbalancer/servicenode.go

Co-authored-by: Rod Vagg <[email protected]>

* Update loadbalancer/messages/messages.ipldsch

Co-authored-by: Rod Vagg <[email protected]>

* Update loadbalancer/messages/messages.ipldsch

Co-authored-by: Rod Vagg <[email protected]>

* refactor(loadbalancer): remove routing protocol

remove the routing protocol, instead relying on a set config. also remove forwarding response for
inbound requests

* fix(loadbalancer): update tests

* refactor(loadbalancer): integrate simplified load balancer

removed pub keys to minimize network traffic, added api's to configure and update bitswap peer id,
added auto config of bitswap peer id in booster-bitswap

* docs(gen): regenerate api docs

* chore(lint): fix lint errors

* fix(loadbalancer): minor bridgestream fix

* Update loadbalancer/servicenode.go

Co-authored-by: dirkmc <[email protected]>

* refactor(protocolproxy): address PR comments

renames, reconfigured architecture, etc

* refactor(make init print out peer id): remove apis and transparent peer id setting. have init print

Co-authored-by: Rod Vagg <[email protected]>
Co-authored-by: dirkmc <[email protected]>

* Add block filter via BadBits (#825)

* feat(booster-bitswap): add block filter via BadBits

* refactor(booster-bitswap): use bitswap blockfilter for filtering

* feat(blockfilter): only update when list is modified

* feat(blockFilter): add on disk caching

* Update cmd/booster-bitswap/blockfilter/blockfilter.go

Co-authored-by: dirkmc <[email protected]>

* fix(blockfilter): minor PR fixups

Co-authored-by: dirkmc <[email protected]>

* Libp2p 0.22 upgrade (#837)

* chore(deps): upgrade to Lotus RC & libp2p v0.22

* chore(deps): update go to 1.18

* ci(circle): update circle to go 1.18

* style(imports): fix imports

* fix(build): update ffi

* fix(lint): fix deprecated strings.Title method

* fix(mod): mod tidy

* Protocol Proxy cleanup (#836)

* refactor(booster-bitswap): minor UI fixes for booster-bitswap UI

* Update cmd/booster-bitswap/init.go

Co-authored-by: dirkmc <[email protected]>

Co-authored-by: dirkmc <[email protected]>

* feat: update to dagstore v0.5.5 (#849)

* feat: bitswap client

* feat: bitswap client - output car file

* refactor: bitswap client - remove tracing

* feat: debug logs

* fix: write blocks to blockstore

* fix: duration output

* fix: duration output for block received

* feat: add pprof to bitswap client

* feat: protocol proxy logging

* feat: bitswap client - check host supports bitswap protocol

* feat: listen for bitswap requests locally as well as through forwarding protocol

Co-authored-by: Hannah Howard <[email protected]>
Co-authored-by: Anton Evangelatov <[email protected]>
Co-authored-by: Rod Vagg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants