Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: easy setup fleets for lpt #3125

Merged
merged 21 commits into from
Oct 25, 2024

Conversation

NagyZoltanPeter
Copy link
Contributor

@NagyZoltanPeter NagyZoltanPeter commented Oct 17, 2024

Description

A greater upgrade for liteprotocoltester.

Changes

  • Added various metrics to follow test performance and failures.
  • Dashboard defined to follow test performance and failure counts.
  • Added ENR support for service and bootstrap peer configuration.
  • Added bootstrap capabilities using peer exchange.
  • If bootstrap node is defined, max 100 nodes will be gathered via PeerExchange.
  • PeerExchange nodes will be tested for connectivity and report created.
  • If further nodes are available (via bootstrap node and PX) in case of failure during test, the service node will be switched to another service node. (derived through internal thresholds).
  • jenkins job defined to build and create and deploy image of liteprotocoltester

Extension

A new repository https://github.com/waku-org/lpt-runner created to ease the usage and run against various waku networks and fleets.

Issue

#2999

@NagyZoltanPeter NagyZoltanPeter force-pushed the chore-easy-setup-fleets-for-lpt branch from b0a1a05 to 460f561 Compare October 17, 2024 10:44
@NagyZoltanPeter NagyZoltanPeter marked this pull request as ready for review October 17, 2024 10:45
Copy link

github-actions bot commented Oct 17, 2024

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:3125

Built from 3d8c20a

Copy link
Contributor

@gabrielmer gabrielmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woooow amazing PR! 🔥 🔥 🔥 Thanks so much!

Left a bunch of random comments but mostly nitpicks.

Really looking forward to deploy the LPT and use it on the fleets! 😍

apps/liteprotocoltester/tester_config.nim Outdated Show resolved Hide resolved

return ok(peerInfo)

randomize()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this call on a global scope or inside a proc? if it's globally, better to put it before every proc definition so it's easier to see

apps/liteprotocoltester/service_peer_management.nim Outdated Show resolved Hide resolved
Comment on lines +102 to +103
elif codec.contains("filter"):
capability = Capabilities.Filter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this elif redundant if capability is already initialized to Capabilities.Filter?

Same in selectRandomCapablePeer

var found = none(RemotePeerInfo)
var okPeers: seq[RemotePeerInfo] = @[]

while found.isNone() and supportivePeers.len > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't even update found in this proc, it's very similar to selectRandomCapablePeer - not sure if it's better to avoid the code duplication by having all that same logic in a same place

Actually, I think we don't use selectRandomCapablePeer anywhere 😶

Comment on lines +83 to +88
if connOpt.value().isSome():
found = some(randomPeer)
debug "Dialing successful",
peer = constructMultiaddrStr(randomPeer), codec = codec
else:
debug "Dialing failed", peer = constructMultiaddrStr(randomPeer), codec = codec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we do here

lpt_dialed_peers.inc()

and

lpt_dial_failures.inc()

like in tryCallAllPxPeers?

if conf.bootstrapNode.len > 0:
info "Bootstrapping with PeerExchange to gather random service node"
let futForServiceNode = pxLookupServiceNode(wakuApp.node, conf)
if not (waitFor futForServiceNode.withTimeout(20.minutes)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitFor with 20 mins timeout? 😱

Comment on lines 113 to 115
# so we will mount PeerExchange to gather possible service peers, if got some
# we will mount the client protocols afterward.
wakuConf.peerExchange = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand - according to the comment, shouldn't wakuConf.peerExchange be set to true?

Comment on lines +30 to +41
proc `$`*(cap: Capabilities): string =
case cap
of Capabilities.Relay:
return "Relay"
of Capabilities.Store:
return "Store"
of Capabilities.Filter:
return "Filter"
of Capabilities.Lightpush:
return "Lightpush"
of Capabilities.Sync:
return "Sync"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth having this proc in waku/waku_enr/capabilities.nim so it can be used in other places too :)

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for it! 💯
I just added some nitpick comments that I hope you find useful ;P

### Using bootstrap nodes

There are multiple benefits of using bootstrap nodes. By using them liteprotocoltester will use Peer Exchange protocol to get possible peers from the network that are capable to serve as service peers for testing. Additionally it will test dial them to verify their connectivity - this will be reported in the logs and on dashboard metrics.
Also by using bootstrap node and peer exchange discovery, litprotocoltester will be able to simulate service peer switch in case of failures. There are built in tresholds for service peer failures during test and service peer can be switched during the test. Also these service peer failures are reported, thus extening network reliability measures.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also by using bootstrap node and peer exchange discovery, litprotocoltester will be able to simulate service peer switch in case of failures.

This is super interesting! Maybe in the future we could set a list of service-nodes and then validate that the peer switch works well from a fixed service-node set.


There are built in tresholds for service peer failures during test and service peer can be switched during the test. Also these service peer failures are reported, thus extening network reliability measures

Sorry, but I don't fully get the first part of the previous sentence. Aside, there's a tiny typo in entending

@@ -205,7 +194,18 @@ docker build -t liteprotocoltester:latest -f Dockerfile.liteprotocoltester ../..

# edit and adjust .env file to your needs and for the network configuration

docker run --env-file .env liteprotocoltester:latest RECEIVER <service-node-ip4-peer-address>
docker run --env-file .env liteprotocoltester:latest RECEIVER <service-node-peer-address>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd help to have an example on each command so that an not-experienced reader can learn about the expected address format (that tiny suggestion also applies to the following lines as well.)

apps/liteprotocoltester/filter_subscriber.nim Outdated Show resolved Hide resolved
apps/liteprotocoltester/filter_subscriber.nim Outdated Show resolved Hide resolved
apps/liteprotocoltester/service_peer_management.nim Outdated Show resolved Hide resolved
apps/liteprotocoltester/service_peer_management.nim Outdated Show resolved Hide resolved
waku/node/waku_node.nim Show resolved Hide resolved
@NagyZoltanPeter
Copy link
Contributor Author

Added more detailed examples section about how to use it in different ways and conditions.
For infra deployment metrics server port configurability added.
All review comments addressed.

@NagyZoltanPeter NagyZoltanPeter merged commit 268e7e6 into master Oct 25, 2024
10 of 11 checks passed
@NagyZoltanPeter NagyZoltanPeter deleted the chore-easy-setup-fleets-for-lpt branch October 25, 2024 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants