Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcast external IP from NAT in enode record #10274

Merged
merged 18 commits into from
Oct 10, 2024

Conversation

cody-wang-cb
Copy link
Contributor

@cody-wang-cb cody-wang-cb commented Aug 12, 2024

In our set up, we are running some op-reth nodes in non-discovery mode and running them in non-host docker image, and peering the nodes manually. By default, in non-discovery mode reth uses 127.0.0.1 as the enode ip, but we want to set the ip in the enode url using its host ip so that other internal nodes can still connect to it.

We tried with --addr but it didn't work. Seems like it was because it'd result in the server to listen to the ip directly, which we cannot do since we are running it on a non-host docker image, thus we prefer to just broadcast the host ip rather than directly setting it.

In geth, --nat:extip has worked for us. However, we tried using --nat:extip in reth which didn't work. Looking at the code it looks like it's only set in the discv4 but we have discovery mode turned off.

Thus proposing a fix to support this scenario by passing --nat info into the NetworkHandle, so that when local_node_record() is called, it would still try to fetch for external ip if there's any in non-discovery mode, rather than just the local ip for the enode.

Also based on suggestions, added functionality such that if disable-discovery is on, nat should also be disabled, and added disable-nat as well.

@cody-wang-cb cody-wang-cb marked this pull request as draft August 12, 2024 20:08
@@ -213,7 +215,17 @@ impl PeersInfo for NetworkHandle {
let id = *self.peer_id();
let mut socket_addr = *self.inner.listener_address.lock();

if socket_addr.ip().is_unspecified() {
let external_ip: Option<IpAddr> = self.inner.nat.and_then(|nat| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just use self.inner.nat.external_addr() instead of only supporting the ExternalIp variant.

@@ -81,6 +81,8 @@ pub struct NetworkConfig<C> {
pub tx_gossip_disabled: bool,
/// How to instantiate transactions manager.
pub transactions_manager_config: TransactionsManagerConfig,
/// The NAT resolver for external IP
pub nat: Option<NatResolver>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth calling this external_ip_resolver instead of nat for clarity

Copy link
Member

@emhane emhane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a valid issue, adding NatResolver as a separate discovery service makes sense. Probably Discovery type is a more appropriate place for it than NetworkInner

/// An abstraction over the configured discovery protocol.
///
/// Listens for new discovered nodes and emits events for discovered nodes and their
/// address.
#[derive(Debug)]
pub struct Discovery {
/// All nodes discovered via discovery protocol.
///
/// These nodes can be ephemeral and are updated via the discovery protocol.
discovered_nodes: LruMap<PeerId, PeerAddr>,
/// Local ENR of the discovery v4 service (discv5 ENR has same [`PeerId`]).
local_enr: NodeRecord,
/// Handler to interact with the Discovery v4 service
discv4: Option<Discv4>,
/// All KAD table updates from the discv4 service.
discv4_updates: Option<ReceiverStream<DiscoveryUpdate>>,
/// The handle to the spawned discv4 service
_discv4_service: Option<JoinHandle<()>>,
/// Handler to interact with the Discovery v5 service
discv5: Option<Discv5>,
/// All KAD table updates from the discv5 service.
discv5_updates: Option<ReceiverStream<discv5::Event>>,
/// Handler to interact with the DNS discovery service
_dns_discovery: Option<DnsDiscoveryHandle>,
/// Updates from the DNS discovery service.
dns_discovery_updates: Option<ReceiverStream<DnsNodeRecordUpdate>>,
/// The handle to the spawned DNS discovery service
_dns_disc_service: Option<JoinHandle<()>>,
/// Events buffered until polled.
queued_events: VecDeque<DiscoveryEvent>,
/// List of listeners subscribed to discovery events.
discovery_listeners: Vec<mpsc::UnboundedSender<DiscoveryEvent>>,
}
.

nit: would have been beneficial to open this as an issue first, to get feedback on the solution, before implementing

@cody-wang-cb
Copy link
Contributor Author

I think this is a valid issue, adding NatResolver as a separate discovery service makes sense. Probably Discovery type is a more appropriate place for it than NetworkInner

/// An abstraction over the configured discovery protocol.
///
/// Listens for new discovered nodes and emits events for discovered nodes and their
/// address.
#[derive(Debug)]
pub struct Discovery {
/// All nodes discovered via discovery protocol.
///
/// These nodes can be ephemeral and are updated via the discovery protocol.
discovered_nodes: LruMap<PeerId, PeerAddr>,
/// Local ENR of the discovery v4 service (discv5 ENR has same [`PeerId`]).
local_enr: NodeRecord,
/// Handler to interact with the Discovery v4 service
discv4: Option<Discv4>,
/// All KAD table updates from the discv4 service.
discv4_updates: Option<ReceiverStream<DiscoveryUpdate>>,
/// The handle to the spawned discv4 service
_discv4_service: Option<JoinHandle<()>>,
/// Handler to interact with the Discovery v5 service
discv5: Option<Discv5>,
/// All KAD table updates from the discv5 service.
discv5_updates: Option<ReceiverStream<discv5::Event>>,
/// Handler to interact with the DNS discovery service
_dns_discovery: Option<DnsDiscoveryHandle>,
/// Updates from the DNS discovery service.
dns_discovery_updates: Option<ReceiverStream<DnsNodeRecordUpdate>>,
/// The handle to the spawned DNS discovery service
_dns_disc_service: Option<JoinHandle<()>>,
/// Events buffered until polled.
queued_events: VecDeque<DiscoveryEvent>,
/// List of listeners subscribed to discovery events.
discovery_listeners: Vec<mpsc::UnboundedSender<DiscoveryEvent>>,
}

.
nit: would have been beneficial to open this as an issue first, to get feedback on the solution, before implementing

@emhane Yeah sorry I felt the change was pretty small here so thought opening a PR would be ok too. I can still go open an issue if that's preferred.

Let me make a change to add nat in pub struct Discovery and test it out.

@cody-wang-cb
Copy link
Contributor Author

@emhane im actually a bit confused what the ideal implementation should be so wanted to get more clarification.
Looks like the enode is handled only in local_node_record() defined in NetworkHandle, it seems like I still need to pass nat_resolver into NetworkHandle/NetworkInner regardless? Just like how discv4 is still passed into NetworkHandle to get the enode record from it.

Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a valid point, initially --natext was only intended for discovery, but I agree there should be an option to set this even if discovery is turned off.

need to check how this flag usually behaves but I assume integrating this on the NetworkManager, need to think about this a bit, because I think we can then also get rid of the optional discv4 handle that we use to access the external or provided ip

@emhane
Copy link
Member

emhane commented Aug 13, 2024

moving NatResolver into Discovery would require making a service for NatResolver. as you point this out, it is fine to leave NatResolver in NetworkInner. what we should still do though is treat the standalone NatResolver as a discovery service. so if the flag --disable-discovery is passed, also the Option<NatResolver> in NetworkInner should be None. then for your use case, you would pass the existing flags --disable-discv4-discovery and --disable-discv5-discovery and possibly also --disable-dns-discovery.

@cody-wang-cb
Copy link
Contributor Author

I see, so if I understand correctly, the ideal functionalities would be

  • --disable-discovery should disable all discovery services, including NatResolver
  • To enable NatResolver only but not other types of discovery, using other flags to disable individual discovery instead of --disable-discovery
  • NatResolver can still be passed into NetworkInner like the current implementation, to fetch the external ip
  • Perhaps clean up the code logic so that external ip only comes from NatResolver rather than in discv4

Does my understanding make sense? @emhane @mattsse

One thing I'm still a bit confused is, what exactly does it mean by making a service for NatResolver? Or as long as it's added to Discovery and is disabled when disable-discovery is true, it means it's a discovery service?

@emhane
Copy link
Member

emhane commented Aug 15, 2024

I see, so if I understand correctly, the ideal functionalities would be

  • --disable-discovery should disable all discovery services, including NatResolver
  • To enable NatResolver only but not other types of discovery, using other flags to disable individual discovery instead of --disable-discovery
  • NatResolver can still be passed into NetworkInner like the current implementation, to fetch the external ip

yes to all above

  • Perhaps clean up the code logic so that external ip only comes from NatResolver rather than in discv4

open an issue for this, but skip it in this PR

Does my understanding make sense? @emhane @mattsse

One thing I'm still a bit confused is, what exactly does it mean by making a service for NatResolver? Or as long as it's added to Discovery and is disabled when disable-discovery is true, it means it's a discovery service?

making a service for NatResolver means making new types NatResolverCommand and NatResolverService. NatResolverService would listen for NatResolverCommands from network manager over a channel, and invoke NatResolver accordingly. this allows running the NatResolver asynchronously. nonetheless - don't do this for this PR, keep NatResolver in NetworkInner as you mentioned.

@emhane emhane self-assigned this Aug 28, 2024
@emhane emhane marked this pull request as ready for review August 31, 2024 16:39
@emhane emhane requested review from onbjerg and gakonst as code owners August 31, 2024 16:39
@emhane emhane requested a review from mattsse August 31, 2024 20:47
@emhane
Copy link
Member

emhane commented Aug 31, 2024

  • Separated the semantics of discovery and local ip address resolution after all. IP address resolution is done regardless if discovery is disabled. Usually discovery is disabled to isolate the node, i.e. not make outgoing connections. The way docker ip resolution is implemented now, doesn't require any network traffic.

  • Adds flag --docker-if <str> which takes the name of the docker interface to use, or uses default eth0 if empty string is passed. Did it this way so it would be possible to run with custom docker network setup from cli.

  • Reverts changes to network config and manager. Motivation being, that it's ok if program exists already in cli, if for example the default interface name isn't found.

  • Doesn't work for windows. Docker IP resolution add to own module which doesn't compile for windows. Docker flag won't be compiled for windows either.

Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now very different from what was original proposed in 27da136 and I don't see how this supports setting an external IP now via --nat:extip

can we please roll this back to the approach in 27da136 and use the Natresolver.

we could maybe add this docker resolver as additional variant separately.

book/cli/reth/debug/execution.md Outdated Show resolved Hide resolved
@cody-wang-cb
Copy link
Contributor Author

cody-wang-cb commented Sep 3, 2024

Hey @emhane @mattsse , I was working on merging the commit with main in my forked repo, and just realized it updated/reverted in this PR too.
Sorry I haven't had a chance to continue looking into this since last time I worked at it, got occupied with other tasks. I can continue working on this these 2 weeks and try to merge it in.
But to confirm, should I continue based on the earlier discussions couple weeks back? Or there's new proposals (looks like @emhane has been adding new changes)
cc: @danyalprout

@emhane
Copy link
Member

emhane commented Sep 3, 2024

@cody-wang-cb can you try out the solution I made and see if it work for your setup?

@mattsse there is a test for the docker IP resolution, which passes in CI, so that means it works in docker

@mattsse
Copy link
Collaborator

mattsse commented Sep 3, 2024

But to confirm, should I continue based on the earlier discussions couple weeks back? Or there's new proposals

imo the earlier approach makes the most sense here to set a provider --nat:extip as the local node record and this was already very close.

there is a test for the docker IP resolution

but this isn't quite the same feature as setting a given extip?
so I'd like to add support for this based on 27da136

@@ -467,6 +479,12 @@ impl NetworkConfigBuilder {
self.build(Default::default())
}

/// Sets the NAT resolver for external IP.
pub const fn add_nat(mut self, nat: Option<NatResolver>) -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this mainly so that the unit test can be written more conveniently, would love to know if there's a better way

Copy link
Member

@emhane emhane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pls better explain the scenario how you will use this? If the docker container can't connect to the Internet anyway, why would you want its ENR to have the host's IP?

@cody-wang-cb
Copy link
Contributor Author

@emhane the reason is because we don't want these nodes to be connected to the Internet directly but able to manually peer them with other internal nodes, thus exposing their host ips (which are private ips 10.xx.xx.xx) is necessary

@emhane
Copy link
Member

emhane commented Sep 17, 2024

thanks for the explanation @cody-wang-cb . how about using the provided solution with --net-if, and setting up one container with a static ip to use as boot node? this will also result in all the nodes on the 172.17.0.X network discovering each other if you enable discovery. I'm not sure the RLPx protocol can support this custom use where all nodes in the network have the same IP address + port. alternatively, you can set up all nodes in the network with a static IP address, and add them all to each other.

@cody-wang-cb
Copy link
Contributor Author

this will also result in all the nodes on the 172.17.0.X network discovering each other if you enable discovery

@emhane I don't know networking very well but I don't think this really works cross hosts? I think the 172.x.x.x ips are isolated per host. This is what I tested using your PR which didn't work (the reth nodes couldn't p2p each other via 172.x.x.x).
Our current set up is that we run op-reth on separate hosts and p2p each op-reth container via host address + p2p port.

host 1 <host 1 private ip> <=== we want to advertise this IP, so it's reachable by the others
- docker network
   - reth <ip> (listening on a host port)
   - ...other containers
host 2
- docker network
   - reth <ip> (listening on a host port)
   - ...other containers
host 3
- docker network
   - reth <ip> (listening on a host port)
   - ...other containers

cc: @danyalprout

Also I think it makes more sense for us to focus on whether exposing an external ip via nat in non-discovery mode is a valid use case. This is a feature that geth already supports so not sure why reth shouldn't have it.
(Looks like geth sets the ip before it checks the discovery mode
https://github.com/ethereum-optimism/op-geth/blob/optimism/p2p/server_nat.go#L64)

@emhane
Copy link
Member

emhane commented Sep 17, 2024

@emhane I don't know networking very well but I don't think this really works cross hosts? I think the 172.x.x.x ips are isolated per host. This is what I tested using your PR which didn't work (the reth nodes couldn't p2p each other via 172.x.x.x).

I think this I because you didn't have one container with static ip as boot node. but usually docker assigns IPs sequentially, so 172.17.0.2 for the first one you start, 172.17.0.3 for the next one. so if you start 5 containers, you know their ip addresses will be 172.17.0.2, 172.17.0.3, 172.17.0.4, 172.17.0.5 and 172.17.0.6 in default settings. you can see it by running some simple linux containers in interactive mode and querying for their IP address. run these commands in a few shells at the same time and read the IP address of eth0 interface.

$ docker run -it --rm alpine /bin/sh
/ # ifconfig

@ziyincody
Copy link

@emhane I don't know networking very well but I don't think this really works cross hosts? I think the 172.x.x.x ips are isolated per host. This is what I tested using your PR which didn't work (the reth nodes couldn't p2p each other via 172.x.x.x).

I think this I because you didn't have one container with static ip as boot node. but usually docker assigns IPs sequentially, so 172.17.0.2 for the first one you start, 172.17.0.3 for the next one. so if you start 5 containers, you know their ip addresses will be 172.17.0.2, 172.17.0.3, 172.17.0.4, 172.17.0.5 and 172.17.0.6 in default settings. you can see it by running some simple linux containers in interactive mode and querying for their IP address. run these commands in a few shells at the same time and read the IP address of eth0 interface.

$ docker run -it --rm alpine /bin/sh

/ # ifconfig

But this is for a single host? I don't think we want to run multiple reth containers in one single host.
If you mean by having one container act as the boot node for another reth container in the node, wouldn't that bootnode container still need to expose its external ip for other hosts to connect to?

@emhane
Copy link
Member

emhane commented Sep 18, 2024

why do you specifically want to not use the host network driver? because it would be the best suited for your setup

@cody-wang-cb
Copy link
Contributor Author

@emhane In theory we can run it on host mode but that'll require a bunch of other changes like fixing port collisions. We've been running our chain infra with geth in non-host mode since the beginning, I don't think it makes sense for us to change our infra set up when the change in this PR is very straightforward to make it work.

@emhane
Copy link
Member

emhane commented Sep 18, 2024

@emhane In theory we can run it on host mode but that'll require a bunch of other changes like fixing port collisions. We've been running our chain infra with geth in non-host mode since the beginning, I don't think it makes sense for us to change our infra set up when the change in this PR is very straightforward to make it work.

sure, we are open to make changes to cater for your specific setup :) regarding the implementation, you manually want to pass the host's LAN ip to the node running in docker if I understood it right. pending requested changes from @mattsse

@cody-wang-cb
Copy link
Contributor Author

@emhane In theory we can run it on host mode but that'll require a bunch of other changes like fixing port collisions. We've been running our chain infra with geth in non-host mode since the beginning, I don't think it makes sense for us to change our infra set up when the change in this PR is very straightforward to make it work.

sure, we are open to make changes to cater for your specific setup :) regarding the implementation, you manually want to pass the host's LAN ip to the node running in docker if I understood it right. pending requested changes from @mattsse

Yes that's correct. Would be great if yall could help review this, hoping to iterate on this and have it merged EoW if possible :), so that we don't have to continue using my forked repo

@cody-wang-cb
Copy link
Contributor Author

@mattsse @emhane curious when we could merge this PR given it's already approved?

@emhane
Copy link
Member

emhane commented Sep 23, 2024

I'd like @mattsse to check it out too, think he has a better idea of what/why we are including

@cody-wang-cb
Copy link
Contributor Author

@mattsse mind taking a look this PR when you have a chance?

@cody-wang-cb
Copy link
Contributor Author

Wanted to bump this PR again, would love to have this be included in the next release so we don't need to use a forked commit
@mattsse

Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, sorry I overlooked this

@mattsse mattsse enabled auto-merge October 10, 2024 11:04
@mattsse mattsse added this pull request to the merge queue Oct 10, 2024
Merged via the queue into paradigmxyz:main with commit 47c3a48 Oct 10, 2024
37 checks passed
vandenbogart pushed a commit to testmachine-ai/reth that referenced this pull request Oct 14, 2024
reymom pushed a commit to reymom/reth that referenced this pull request Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants