core/src/peer_id: Support and output peer IDs as CIDs (RFC 0001) #2259

mxinden · 2021-10-01T19:01:22Z

rust-libp2p should implement RFC 0001.

Also tackle:

rust-libp2p/protocols/kad/src/protocol.rs

Lines 102 to 104 in 9f331e5

    
           // TODO: this is in fact a CID; not sure if this should be handled in `from_bytes` or 
        
           //       as a special case here 
        
           let node_id = PeerId::from_bytes(&peer.id).map_err(|_| invalid_data("invalid peer id"))?;

JerryHue · 2021-10-05T23:35:42Z

@mxinden, I have spent a few hours reading over the several documents related to this.

Would it be okay for me to work on this issue with some guidance?

mxinden · 2021-10-06T16:44:47Z

Hi @JerryHue,

Would it be okay for me to work on this issue

Yes for sure. Your help would be very much appreciated.

with some guidance?

👍 I don't know how familiar you are with the stack. Feel free to post questions here or via mail. Note that there is a CID implementation in Rust.

JerryHue · 2021-10-20T02:58:38Z

Hello again @mxinden!

Despite my absence, I haven't forgotten about this issue!

I believe I have implemented the changes as per the RFC stated, but I would like to make sure that I have not forgotten anything before I make my PR.

I would like to ask certain questions that have passed through my mind while working on this.

Previous public methods

Encoding to the string representation

As per the RFC requirements, the PeerId should still support the base58btc representation when printing it. This means that the Debug and Display traits use the base58 representation.

To encode the PeerId to the base32 representation, one may use to_base32.

Decoding from the string representation

This one was tougher to separate, since we can only implement the FromStr once. At the end, I just had them in the same function. So, that means we can use the same function to parse the following strings:

bafzbeie5745rpv2m6tjyuugywy4d5ewrqgqqhfnf445he3omzpjbx5xqxe,
QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N,
12D3KooWD3eckifWpRn9wQpMG9R9hX3sD158z7EqHWmweQAJU5SA.

I believe this fits the criteria of the RFC the best, and causes the least friction when upgrading.

`to_bytes` and `from_bytes`

I am not sure how to handle the to_bytes and the from_bytes functions. Right now, there are two versions of each: one version to handle the multihash representation (the original to_bytes and from_bytes), and another version to handle the cid representation.

I believe that the from_bytes function should behave very similarly to the implementation of the FromStr trait, but I am not sure if I should leave the to_bytes as two separate functions (to_bytes and to_bytes_cid), or having a single one that represents the raw byte representation of the CID. If I leave them separate, we would need to choose one of them as the default representation for the trait From<PeerId> for Vec<u8> to work.

This question is also related to the following snippet:

rust-libp2p/protocols/kad/src/protocol.rs

Lines 102 to 104 in 9f331e5

    
           // TODO: this is in fact a CID; not sure if this should be handled in `from_bytes` or 
        
           //       as a special case here 
        
           let node_id = PeerId::from_bytes(&peer.id).map_err(|_| invalid_data("invalid peer id"))?;

The comment associated is confusing me a little. What does it mean by "handling as a special case"? Isn't the new representation of the PeerId a CID? If so, why handle it as a special case?

Tests

Since this is a substantial PR, I would need to write some tests. Could you recommend me some behaviour to test whether it is working correctly? I am aware I need to implement the tests that confirm the behaviour stated in the RFC, but I am not sure if I should test other external places.

mxinden · 2021-10-26T19:26:51Z

Sorry for the late reply here. I was sick for the last week.

Encoding to the string representation

As per the RFC requirements, the PeerId should still support the base58btc representation when printing it. This means that the Debug and Display traits use the base58 representation.

To encode the PeerId to the base32 representation, one may use to_base32.

👍 sounds good to me. In the future we would switch the Debug and Display implementations to use to_base_32, though that would happen in a coordinated fashion across the implementations.

Decoding from the string representation

This one was tougher to separate, since we can only implement the FromStr once. At the end, I just had them in the same function. So, that means we can use the same function to parse the following strings:
* `bafzbeie5745rpv2m6tjyuugywy4d5ewrqgqqhfnf445he3omzpjbx5xqxe`,

* `QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N`,

* `12D3KooWD3eckifWpRn9wQpMG9R9hX3sD158z7EqHWmweQAJU5SA`.
I believe this fits the criteria of the RFC the best, and causes the least friction when upgrading.

In my eyes a single function that can parse both is the way to go.

to_bytes and from_bytes

I am not sure how to handle the to_bytes and the from_bytes functions. Right now, there are two versions of each: one version to handle the multihash representation (the original to_bytes and from_bytes), and another version to handle the cid representation.

I believe that the from_bytes function should behave very similarly to the implementation of the FromStr trait, but I am not sure if I should leave the to_bytes as two separate functions (to_bytes and to_bytes_cid), or having a single one that represents the raw byte representation of the CID. If I leave them separate, we would need to choose one of them as the default representation for the trait From<PeerId> for Vec<u8> to work.

I would suggest a single function for from_bytes, in line with your above suggestion for FromStr and two functionf for to_bytes (to_bytes and to_bytes_cid) with From<PeerId> for Vec<u8> using the old behaviour for now until we do the above mentioned coordinated upgrade.

This question is also related to the following snippet:

rust-libp2p/protocols/kad/src/protocol.rs

Lines 102 to 104 in 9f331e5

// TODO: this is in fact a CID; not sure if this should be handled in `from_bytes` or

// as a special case here

let node_id = PeerId::from_bytes(&peer.id).map_err(|_| invalid_data("invalid peer id"))?;

The comment associated is confusing me a little. What does it mean by "handling as a special case"? Isn't the new representation of the PeerId a CID? If so, why handle it as a special case?

I think the comment suggests to either support PeerID as CIDs, as you are implementing today, or add a hack in libp2p-kad that would support PeerIDs as CIDs in this special case only. With your work everything can happen within PeerId::from_bytes.

Tests

Since this is a substantial PR, I would need to write some tests. Could you recommend me some behaviour to test whether it is working correctly? I am aware I need to implement the tests that confirm the behaviour stated in the RFC, but I am not sure if I should test other external places.

It would be great to have some quickcheck tests (see e.g. ed25519.rs tests as an example) that test through the various from_XXX to_XXX methods.

While still minimal, we should as make sure we can parse the examples listed at the bottom of https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md.

Thank you for the very thorough work thus far @JerryHue!

JerryHue · 2021-11-01T07:08:53Z

Hello again, Max!

Thank you for your follow up! I believe I have finished my changes, so I will follow up with my PR shortly.

Thank you for the help and support!

tomaka · 2022-03-02T15:19:29Z

I find this RFC 1 (that I discover for the first time) very questionable. It's extremely over-engineered.

But beyond that, I don't understand the motivation for storing a Cid instead of Multihash?
Modifying the internals of the PeerId struct just adds a couple of bytes to it, and these bytes are always the same. It's as if you were adding another field called prefix to the PeerId struct and that this field always has the same constant value.

Why not build the bytes on the fly when to_cidv1_bytes() or to_base32() are called? It seems that me that we're building the bytes on the fly anyway, and so storing a Cid instead of a Multihash doesn't bring anything to the table? Or am I mistaken?

mxinden added difficulty:moderate getting-started Issues that can be tackled if you don't know the internals of libp2p very well labels Oct 1, 2021

JerryHue mentioned this issue Nov 1, 2021

Change representation of PeerId to CIDv1 #2322

Closed

mxinden mentioned this issue Dec 30, 2021

*: Add Serialize and Deserialize to PeerId gossipsub MessageId and kad Key #2408

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/src/peer_id: Support and output peer IDs as CIDs (RFC 0001) #2259

core/src/peer_id: Support and output peer IDs as CIDs (RFC 0001) #2259

mxinden commented Oct 1, 2021

JerryHue commented Oct 5, 2021

mxinden commented Oct 6, 2021

JerryHue commented Oct 20, 2021

mxinden commented Oct 26, 2021

Encoding to the string representation

Decoding from the string representation

`to_bytes` and `from_bytes`

Tests

JerryHue commented Nov 1, 2021

tomaka commented Mar 2, 2022 •

edited

Loading

core/src/peer_id: Support and output peer IDs as CIDs (RFC 0001) #2259

core/src/peer_id: Support and output peer IDs as CIDs (RFC 0001) #2259

Comments

mxinden commented Oct 1, 2021

JerryHue commented Oct 5, 2021

mxinden commented Oct 6, 2021

JerryHue commented Oct 20, 2021

Previous public methods

Encoding to the string representation

Decoding from the string representation

to_bytes and from_bytes

Tests

mxinden commented Oct 26, 2021

Encoding to the string representation

Decoding from the string representation

to_bytes and from_bytes

Tests

JerryHue commented Nov 1, 2021

tomaka commented Mar 2, 2022 • edited Loading

`to_bytes` and `from_bytes`

`to_bytes` and `from_bytes`

tomaka commented Mar 2, 2022 •

edited

Loading