Keep Hydra head peerIDs between restarts #128

aschmahmann · 2021-07-21T18:01:07Z

There doesn't seem to be a good reason for us to rotate our peerIDs (and therefore locations in the Kademlia keyspace) just because we OOM, update the version we're running, etc.

The negative effects of us rotating our keys are:

If you're running a small number of heads then you're effectively making the records previously stored with you useless since no one will look for them with you
If you're running many heads then you're invalidating a bunch of people's routing tables which can make clients less efficient. It'll all work itself out over time, but we might as well be nice

Solution for always reusing the same balanced IDs in the hydra deployment

The ID generator is a pseudorandom (i.e. seeded) algorithm that generates an infinite sequence of mutually-balanced IDs:
ID0, ID1, ID2, ID3, ...

Any one ID in this sequence is uniquely determined by the seed (which determines the sequence) and its index (i.e. sequence number) in the sequence.

Therefore, to ensure that a collection of Hydra heads (1) have mutually-balanced IDs and (2) they always reuse the same IDs (after restart), it suffices to parameterize each of them with the same seed and index at execution time, such that each of them has a different index in the space of positive integers.

For example, heads can be parameterized as:
id_seed=xyz, id_index=1
id_seed=xyz, id_index=2
id_seed=xyz, id_index=3
...

Note that it is irrelevant which machines or processes the heads run on.
The key requirement is that each head (across the entire fleet) gets a unique index!

Therefore, heads should be parameterized at the infra/deployment level, perhaps using command-line arguments. Restarting a head then guarantees it reuses the same ID and it is unique across the fleet.

Furthermore, this methodology enables easy (auto)scaling: Just assign unused index numbers to heads that are being added. (The space of positive integers is large enough!)

There is no requirement that indices are consequtive numbers (just that they are unique). This facilitates ops engineers to use different blocks of integers for different types of scaling purposes. For example, two entirely independent (with no coordination between them) hydra fleets can be deployed. For instance, the first fleet can use only even numbers for its heads; the second fleet can use only odd numbers for its fleet. Clearly, this example generalizes in various ways.

Note that this methodology completely alleviates the need for any kind of direct network coordination/connection between heads, making the system considerably more robust!

Progress

A first step in this direction is provided in #130.

dennis-tra · 2021-07-23T13:43:53Z

Hi @aschmahmann ,

For the past ~2 weeks I was constantly running my crawler and noticed some things that could be of interest for this issue.

There is a significant fraction of provider records in the DHT that yield "Peer ID mismatch" errors when the crawler tries to connect. Correct me if I'm wrong: this can only happen if the PeerID was rotated while retaining the same host/port combination (or multi-address in general). In the screenshot below these errors constitute 13% of all connection errors.

The day before yesterday the ratio was 26 %. My data indicates that there was something going on (deployment?) with the hydra boosters around midnight 22.07. - but that's not relevant here I guess.
The list below shows the redacted top IP addresses and their corresponding distinct peer IDs. So, an IPFS host at the first IP address in that list was online with over 5000 different Peer IDs over the course of ~6 days. When running ipfs swarm peers | grep <ip address> and then ipfs id <PeerID> it gave me the hydra agent version for the top 8 IPs.

        maddrs     | count
-------------------+-------
 /ip4/138.xx.xx.xx |  5194
 /ip4/138.xx.xx.xx |  4951
 /ip4/165.xx.xx.xx |  4822
 /ip4/138.xx.xx.xx |  4487
 /ip4/138.xx.xx.xx |  4197
 /ip4/138.xx.xx.xx |  4005
 /ip4/165.xx.xx.xx |  3872
 /ip4/165.xx.xx.xx |  3786
 /ip4/138.xx.xx.xx |  3134
 /ip4/138.xx.xx.xx |  2945
 /ip4/138.xx.xx.xx |  1627
 /ip4/138.xx.xx.xx |  1627
 /ip4/159.xx.xx.xx |   223

There is a huge number of undialable peers in the DHT:

This could be partly (13 - 26 %) related to the hydra nodes rotating their PeerIDs as the records will stay in the DHT for up to 24 h.

petar · 2021-07-23T14:08:13Z

Yep. This confirms our observation that in the past 2 weeks hydras were restarting constantly and therefore rotating their IDs.

aschmahmann · 2021-08-06T15:11:52Z

closed by #130

aschmahmann assigned petar Jul 21, 2021

aschmahmann added effort/days Estimated to take multiple days, but less than a week exp/expert Having worked on the specific codebase is important kind/enhancement A net-new feature or improvement to an existing feature need/triage Needs initial labeling and prioritization labels Jul 21, 2021

thattommyhall mentioned this issue Jul 24, 2021

Allow deterministic key generation from seed #130

Merged

petar mentioned this issue Jul 29, 2021

Hydra upgrade #132

Open

9 tasks

aschmahmann closed this as completed Aug 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep Hydra head peerIDs between restarts #128

Keep Hydra head peerIDs between restarts #128

aschmahmann commented Jul 21, 2021 •

edited by petar

Loading

dennis-tra commented Jul 23, 2021 •

edited

Loading

petar commented Jul 23, 2021

aschmahmann commented Aug 6, 2021

Keep Hydra head peerIDs between restarts #128

Keep Hydra head peerIDs between restarts #128

Comments

aschmahmann commented Jul 21, 2021 • edited by petar Loading

Solution for always reusing the same balanced IDs in the hydra deployment

Progress

dennis-tra commented Jul 23, 2021 • edited Loading

petar commented Jul 23, 2021

aschmahmann commented Aug 6, 2021

aschmahmann commented Jul 21, 2021 •

edited by petar

Loading

dennis-tra commented Jul 23, 2021 •

edited

Loading