Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/src/peer_id: Support and output peer IDs as CIDs (RFC 0001) #2259

Open
mxinden opened this issue Oct 1, 2021 · 6 comments
Open

core/src/peer_id: Support and output peer IDs as CIDs (RFC 0001) #2259

mxinden opened this issue Oct 1, 2021 · 6 comments
Labels
difficulty:moderate getting-started Issues that can be tackled if you don't know the internals of libp2p very well

Comments

@mxinden
Copy link
Member

mxinden commented Oct 1, 2021

rust-libp2p should implement RFC 0001.

Tracking issue: libp2p/specs#216

Also tackle:

// TODO: this is in fact a CID; not sure if this should be handled in `from_bytes` or
// as a special case here
let node_id = PeerId::from_bytes(&peer.id).map_err(|_| invalid_data("invalid peer id"))?;

@mxinden mxinden added difficulty:moderate getting-started Issues that can be tackled if you don't know the internals of libp2p very well labels Oct 1, 2021
@JerryHue
Copy link
Contributor

JerryHue commented Oct 5, 2021

@mxinden, I have spent a few hours reading over the several documents related to this.

Would it be okay for me to work on this issue with some guidance?

@mxinden
Copy link
Member Author

mxinden commented Oct 6, 2021

Hi @JerryHue,

Would it be okay for me to work on this issue

Yes for sure. Your help would be very much appreciated.

with some guidance?

👍 I don't know how familiar you are with the stack. Feel free to post questions here or via mail. Note that there is a CID implementation in Rust.

@JerryHue
Copy link
Contributor

Hello again @mxinden!

Despite my absence, I haven't forgotten about this issue!

I believe I have implemented the changes as per the RFC stated, but I would like to make sure that I have not forgotten anything before I make my PR.

I would like to ask certain questions that have passed through my mind while working on this.

Previous public methods

Encoding to the string representation

As per the RFC requirements, the PeerId should still support the base58btc representation when printing it. This means that the Debug and Display traits use the base58 representation.

To encode the PeerId to the base32 representation, one may use to_base32.

Decoding from the string representation

This one was tougher to separate, since we can only implement the FromStr once. At the end, I just had them in the same function. So, that means we can use the same function to parse the following strings:

  • bafzbeie5745rpv2m6tjyuugywy4d5ewrqgqqhfnf445he3omzpjbx5xqxe,
  • QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N,
  • 12D3KooWD3eckifWpRn9wQpMG9R9hX3sD158z7EqHWmweQAJU5SA.

I believe this fits the criteria of the RFC the best, and causes the least friction when upgrading.

to_bytes and from_bytes

I am not sure how to handle the to_bytes and the from_bytes functions. Right now, there are two versions of each: one version to handle the multihash representation (the original to_bytes and from_bytes), and another version to handle the cid representation.

I believe that the from_bytes function should behave very similarly to the implementation of the FromStr trait, but I am not sure if I should leave the to_bytes as two separate functions (to_bytes and to_bytes_cid), or having a single one that represents the raw byte representation of the CID. If I leave them separate, we would need to choose one of them as the default representation for the trait From<PeerId> for Vec<u8> to work.

This question is also related to the following snippet:

// TODO: this is in fact a CID; not sure if this should be handled in `from_bytes` or
// as a special case here
let node_id = PeerId::from_bytes(&peer.id).map_err(|_| invalid_data("invalid peer id"))?;

The comment associated is confusing me a little. What does it mean by "handling as a special case"? Isn't the new representation of the PeerId a CID? If so, why handle it as a special case?

Tests

Since this is a substantial PR, I would need to write some tests. Could you recommend me some behaviour to test whether it is working correctly? I am aware I need to implement the tests that confirm the behaviour stated in the RFC, but I am not sure if I should test other external places.

@mxinden
Copy link
Member Author

mxinden commented Oct 26, 2021

Sorry for the late reply here. I was sick for the last week.


Encoding to the string representation

As per the RFC requirements, the PeerId should still support the base58btc representation when printing it. This means that the Debug and Display traits use the base58 representation.

To encode the PeerId to the base32 representation, one may use to_base32.

👍 sounds good to me. In the future we would switch the Debug and Display implementations to use to_base_32, though that would happen in a coordinated fashion across the implementations.

Decoding from the string representation

This one was tougher to separate, since we can only implement the FromStr once. At the end, I just had them in the same function. So, that means we can use the same function to parse the following strings:

* `bafzbeie5745rpv2m6tjyuugywy4d5ewrqgqqhfnf445he3omzpjbx5xqxe`,

* `QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N`,

* `12D3KooWD3eckifWpRn9wQpMG9R9hX3sD158z7EqHWmweQAJU5SA`.

I believe this fits the criteria of the RFC the best, and causes the least friction when upgrading.

In my eyes a single function that can parse both is the way to go.

to_bytes and from_bytes

I am not sure how to handle the to_bytes and the from_bytes functions. Right now, there are two versions of each: one version to handle the multihash representation (the original to_bytes and from_bytes), and another version to handle the cid representation.

I believe that the from_bytes function should behave very similarly to the implementation of the FromStr trait, but I am not sure if I should leave the to_bytes as two separate functions (to_bytes and to_bytes_cid), or having a single one that represents the raw byte representation of the CID. If I leave them separate, we would need to choose one of them as the default representation for the trait From<PeerId> for Vec<u8> to work.

I would suggest a single function for from_bytes, in line with your above suggestion for FromStr and two functionf for to_bytes (to_bytes and to_bytes_cid) with From<PeerId> for Vec<u8> using the old behaviour for now until we do the above mentioned coordinated upgrade.

This question is also related to the following snippet:

// TODO: this is in fact a CID; not sure if this should be handled in `from_bytes` or
// as a special case here
let node_id = PeerId::from_bytes(&peer.id).map_err(|_| invalid_data("invalid peer id"))?;

The comment associated is confusing me a little. What does it mean by "handling as a special case"? Isn't the new representation of the PeerId a CID? If so, why handle it as a special case?

I think the comment suggests to either support PeerID as CIDs, as you are implementing today, or add a hack in libp2p-kad that would support PeerIDs as CIDs in this special case only. With your work everything can happen within PeerId::from_bytes.

Tests

Since this is a substantial PR, I would need to write some tests. Could you recommend me some behaviour to test whether it is working correctly? I am aware I need to implement the tests that confirm the behaviour stated in the RFC, but I am not sure if I should test other external places.

It would be great to have some quickcheck tests (see e.g. ed25519.rs tests as an example) that test through the various from_XXX to_XXX methods.

While still minimal, we should as make sure we can parse the examples listed at the bottom of https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md.


Thank you for the very thorough work thus far @JerryHue!

@JerryHue
Copy link
Contributor

JerryHue commented Nov 1, 2021

Hello again, Max!

Thank you for your follow up! I believe I have finished my changes, so I will follow up with my PR shortly.

Thank you for the help and support!

@tomaka
Copy link
Member

tomaka commented Mar 2, 2022

I find this RFC 1 (that I discover for the first time) very questionable. It's extremely over-engineered.

But beyond that, I don't understand the motivation for storing a Cid instead of Multihash?
Modifying the internals of the PeerId struct just adds a couple of bytes to it, and these bytes are always the same. It's as if you were adding another field called prefix to the PeerId struct and that this field always has the same constant value.

Why not build the bytes on the fly when to_cidv1_bytes() or to_base32() are called? It seems that me that we're building the bytes on the fly anyway, and so storing a Cid instead of a Multihash doesn't bring anything to the table? Or am I mistaken?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty:moderate getting-started Issues that can be tackled if you don't know the internals of libp2p very well
Projects
None yet
Development

No branches or pull requests

3 participants