-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add HTTP spec #508
add HTTP spec #508
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this, it looks very promising!
It does have a few aspects in it that I don't quite understand yet. I thought the idea was that we can access existing protocols over HTTP, is the following something that you expect to work? (Incorporating some ideas from my comments)
- Browser makes GET request to
https://example.com/.well-known/libp2p/%2Fipfs%2Fid%2F1.0.0
- Response is the protobuf-encoded identify payload
Or:
- Browser makes POST request to
https://example.com/.well-known/libp2p/%2Frendezvous%2F1.0.0
- Body of request is a protobuf encoded
DISCOVER
message - Response is a protobuf encoded
DISCOVER
response
If this is meant to work, then I think we should have some way of having clients discover, what HTTP method to use for which protocol so we don't need to carry this information out-of-bound. Might be useful to attach other meta-data to it as well. (Hypertext - yay!)
Thank you for your quick review :)
We can use some existing protocols over HTTP, but not all. Your example is one that works well over HTTP, but not all of them do, for multiple reasons:
Regarding JSON vs. Protobuf: I was under the impression that JSON is more common in the use in REST API than Protobuf (but I’m no expert in designing REST APIs). Using JSON would certainly make it easier to access the API using curl. Given that HTTP is a pretty large step and breaking step for libp2p as a protocol suite, this would be the right time to make a switch, if we wanted to. Historical reasons aside, do you think we should prefer JSON or Protobuf (or something else)? |
UPDATE: The two authentication methods described in this PR are terribly broken, as they an attacker could just forward the requests to the actual server (or client). What we'd need is a way to bind the authentication to the underlying connection. Unfortunately, there's no browser API for that. https://datatracker.ietf.org/doc/html/draft-schinazi-httpbis-transport-auth would help for client auth (if it ever becomes and RFC and is implemented by browsers), but that still leaves the arguably more important server auth unsolved. |
Following up on a 1-on-1 conversation with @marten-seemann, here is my updated take on this. I am calling it the "http2p"-initiative 😁 HTTP on top of libp2pWhat makes HTTP great is its richness in semantics. That is what allows us to define middlewares. It allows application developers to focus on their usecase without having to re-implement things over and over again. For example, with HTTP we get:
What makes libp2p great is the ability to open light-weight streams in both directions of a connection. Application developers can operate under the assumption that the communication is encrypted and authenticated and don't need to care about who opened the connection. If we combine the two, we get a networking stack where we can design protocols using HTTP where both parties can simultaneously act as server and client. For example, retrieving the supported protocols from the other peer (aka identify) could be:
Kademlia's put value could be:
I don't think we should try to automatically map existing protocols to HTTP. Our protocol definitions are not expressive enough to leverage all the cool things HTTP gives us. Paraphrasing @marten-seemann here: The number of protocols defined in the future is likely much greater than what we have today. Based on that, I don't think it is worth investing into a transition period. I'd rather bite the bullet and re-design all existing protocols to fully leverage HTTP semantics. With the ability to open streams in both directions, even protocols like gossipsub are not that hard. Nodes can just send POST requests to each other or subscribe to a topic via SSE (Server-Sent-Events: https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) Accessing libp2p nodes via HTTPIf we define our protocols already with HTTP, wouldn't it be cool if any browser could also just access them? Yes it would be cool. Should we do that? I don't think so. Here is why:
What about protocols that are better expressed as streamsAt the cost of an additional round-trip, we can leverage the Moving forwardToday, libp2p implementations assume that Likely, implementations will also have to undergo massive internal changes to support this new proposal. In particular, it is most likely the easiest to not support stream-based and HTTP-based protocols in the same libp2p implementation. Hence, my proposal is to mint a new multiaddr protocol Users can then gradually migrate from one to the other by depending on two different versions / implementations of a libp2p implementation within their application. tl;dr
|
Thank you @thomaseizinger for this super detailed post! I think I agree with almost everything!
Agreed for protocols like Gossipsub, which are hard to map onto HTTP. The browser version will necessarily look different than the full-node version. This is not only due to the different capabilities, but also because Gossipsub expects peers to stay around for a long(ish) time to be able to build a reputation score. Making Gossipsub usable for browser will likely require changes to the protocol itself to accommodate for extremely shortlived clients. For other protocols on the other hand, namely those that already use request-response semantics, we won’t need any changes though. The best example for this is probably Kademlia. Maybe a middleground would be to require all protocols to define if it is appropriate to speak it from the browser? |
I think this can work for some, it might not work for others. I also think this is a detail that we don't have to decide now. On an architectural level, the decision I think we should make is: The same functionality (i.e. sending a message via gossipsub) can have a different interface depending on the transport it is accessed on. We can put a recommendation out that protocol designers should design their protocols to be client-server1 friendly but if the protocol comes out cleaner for p2p with a different design, they should design two. In the end, this guidance is just an intuition on what I think will be the easier implementation. Separate constraints warrant different designs. However, we can't stop implementations from plugging a handler for a p2p protocol into the client-server transport. Thinking about it more, what we could do is e.g. define that a certain route must be called only authenticated. If a client can authenticate themselves somehow, why shouldn't we allow it to be called from a browser or curl? The beauty of HTTP is that it is stateless so we don't care where they got the credentials from. The way this may play out is that a protocol defines certain bits of functionality twice, once for the client-server context and once for the p2p context. Implementations would be free to only implement a subset as well and thus e.g. omit client-server support. Footnotes
|
I updated the server and client auth. We now rely on the domain (incl. subdomain) to bind the handshake to the specific endpoints. I hope this 1. actually works and 2. is feasible in the CDN setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass. I need to refresh my libp2p+http thoughts again.
Thanks for this :)
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped some comments around statelessness, HTTP caching, and reusing Authorization
header
http/README.md
Outdated
|
||
### Server Authentication | ||
|
||
Since HTTP requests are independent from each other (they are not bound to a single connection, and when using HTTP/1.1, will actually use different connections), the server needs to authenticate itself on every single request. |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
* better motivation for libp2p+HTTP * incorporate review feedback
Co-authored-by: Marcin Rataj <[email protected]>
Sorry for being late to the party, and sorry for the possible pain the message may cause: I think this effort is fundamentally misguided, the goal is wrong. While this does sound like a radical statement, please hear me out. What I care about are peer-to-peer systems, possibly even an Internet of them. People who know me can already unfold the rest of this post from the previous sentence because I aspire to use precise and succinct language: when I say “peer” I mean exactly that, so P2P is a system of equals. This is why I built a whole cloud-free product — devoid of servers of any hind — on top of The crux of the matter is that HTTPS is fundamentally a client–server protocol where the server cannot be an edge device (like a mobile phone — a data center in my home country is by no means “edge”, it is centralised backbone). The latter restriction comes from TLS and the situation that edge devices today never have public IP addresses, so they cannot have public names and thus no certificates. The rest of HTTPS is the result of fine-tuning a protocol where a client requests resources from or uploads data to a server. The server is always passive, the client is never addressable or diallable. In other words: HTTPS does NOT model an interaction between peers, every fibre of its design rejects this notion. It is for this reason that I think this PR would have a devastating effect on libp2p. @thomaseizinger gave more details above, focusing on some particular areas where the different design goals are visible as a technical impedance mismatch — it would be a mistake to “deal with this on a technical level” due to the fundamental incompatibility that will never be bridged. If libp2p embraces HTTPS then it gives up on the peer-to-peer part. This is a possibility, I am not a maintainer or core contributor, so you may choose to do so. My intention with this post is to ensure that you know what such a decision will entail. Regarding the topic of using HTTP for all sorts of negotiation within libp2p streams: while this doesn’t clash as fiercely with the design principles of either system, it still is a mismatch. The hypertext transfer protocol has been designed to transfer TEXT, which is therefore in its name. None of the data my code ships over libp2p is text, there are much better ways nowadays. The promise of HTTP negotiation is that the client could send to the server in whatever format they want and then the server will hopefully understand it or — rarely — reject it. This promise never materialised, the range of formats accepted by a given HTTP endpoint is usually tiny. The reason is that both semantics and syntax are essential parts of protocol design, efficiency is usually an important goal, and there are no generic translations between even the common formats (JSON, XML, CBOR, protobuf) due to needing case-specific semantic knowledge. The only thing that works transparently is encryption and compression. For this reason I think it is a bad idea to implement HTTP-over-libp2p on the libp2p level. If there is a use-case where a given libp2p protocol benefits from HTTP semantics, then that protocol can speak HTTP over an established stream to a peer. In all other cases a ProtocolName implies the structure, format, and permissible sequence of messages on the stream — we can (and should) have multiple ProtocolNames where there is more than one way to solve the problem at hand. How to let AWS lambda functions participate in a libp2p world? To my mind, the best and most honest solution is to provide APIs that such constrained environments are designed for, backed by a libp2p node. Instead of shoehorning everything into a multiaddr scheme, offer a purpose-built HTTP API that the “serverless” function calls, because that is all such a function can do. A function can never be a peer, it has a fundamentally different shape. |
There are many reasons to want an HTTP transport for libp2p, so I won't assume the problems and motivations that have lead us (nft.storage/web3.storage) to invest in verifiable HTTP protocols for IPFS rather than libp2p are the same motivations for this proposal. That said, this isn't something we would adopt as replacement for those protocols as none of the problems that motivated us to build them would be resolved by it. We need two things in order to operate these protocols at scale:
I can imagine an implementation of this that was less stateful than our current libp2p infra, but I suspect it would perform quite poorly compared to our current HTTP protocols because the totality of the data being addressed is built into those interfaces which ensures a single roundtrip, and I cringe a little at the prospect of debugging a protocol like this given the HTTP logs would relate to peer information rather than data. The vast majority of caching infrastructure is built for HTTP. It's ubiquitous and relatively cheap to operate. It's pretty trivial to map IPFS/IPLD semantics into URL structures in such a way that you can leverage the hashes to provide caching systems that will outperform traditional (non-IPFS) systems, so that's what we do today. In other words, we can't do much with a stateful HTTP protocol that doesn't address the underlying data layer in HTTP terms 🤷 But we aren't everyone, other folks might have a pressing need for something like this, but I feel it's important to point out the problems we have operating stuff like this and what has motivated us to take a different approach. I wrote this up to provide some clearer understanding of how we've come to think about the transports in general, for our engineers and a few other groups. As we aren't investing in libp2p much as a transport it seemed appropriate for me to write up what our approach to the transport layer actually is. We don't have any particular loyalty to HTTP beyond practicality. IPFS protocols are verifiable at the data layer, so on a public IP address there's not a clear job-to-be-done for libp2p if you're integrating IPFS addressing into an HTTP protocol, and nobody can make the claim that it's easier or more widely supported to run a non-HTTP protocol. It may seem like we run a bunch of centralized HTTP services, but we actually encode more verifiability into our new UCAN based protocols than existing IPFS protocols implemented on libp2p, and the UCAN protocols are actually transport agnostic because they're just IPLD proof chains encoded into tiny CARs 😁 We tend to send them over HTTP because... everyone has it. |
Co-authored-by: Marcin Rataj <[email protected]>
http/README.md
Outdated
|
||
libp2p does not squat the global namespace. libp2p application protocols can be | ||
discovered by the [well-known resource](https://www.rfc-editor.org/rfc/rfc8615) | ||
`.well-known/libp2p`. This allows server operators to dynamically change the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use well-known resources elsewhere, e.g. .well-known/libp2p-webtransport
. Perhaps we should namespace this one too?
`.well-known/libp2p`. This allows server operators to dynamically change the | |
`.well-known/libp2p-http`. This allows server operators to dynamically change the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aesthetically it's nicer if we aren't referencing HTTP multiple times. .well-known/libp2p
is a HTTP resource. It's kind of like saying "ATM machine".
Technically this doesn't matter, but my vote is for .well-known/libp2p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I agree. Given we have other well-known resources, .well-known/libp2p
is ambiguous, .well-known/libp2p-http
is not.
Though yes, from a technical perspective it just has to be a predictable string with a low chance of collision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lidel can you be our tie breaker? .well-known/libp2p
vs .well-known/libp2p-http
or something else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Squatting .well-known/libp2p
for a file may cause us problems in the future if we need to add something else.
I think if we want to avoid that, there is still time to can change it (this spec is still a draft), we should go with either .well-known/libp2p-http
..or make a directory and put protocol mapping configuration in .well-known/libp2p/http
.
The latter (libp2p/foo
) feels a bit better to me, as it creates libp2p-specific directory/namespace, which is easier to map via reverse proxies, but other than that it is mostly aesthetics.
If we don't want 'http' twice, could be .well-known/libp2p/protocols
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different though. This is metadata about a peer's supported protocols. The well-know webtransport URI is about where to send the HTTP CONNECT request to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it is, I think how you interact with the well-known resource is a detail of the protocol that's unrelated to the path it uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .well-known/libp2p/
suffix has the benefit of being a single thing we would need to register. Future versions of libp2p-webtransport may be placed under that suffix. Remember WebTransport isn't even out of draft status yet. We'll probably need to make a new multiaddr for webtransport-v1 just like we did with QUIC.
Any other well-known resource we would want could also fit under that suffix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this from "/.well-known/libp2p" to "/.well-known/libp2p/protocols" will break the ability to communicate with existing servers deployed without upgrading them first. Since these servers are running in places not under our control, they will not be immediately upgradable.
Because there are already deployments that use the old value, I suggest keeping it as /.well-known/libp2p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the pain in this rollout. Especially when the full rollout is outside your control. Maybe go-libp2p can provide a transition period to help out?
Since this spec itself is in draft we need flexibility in how things work. As much as I prefer the .well-known/libp2p
, I don't think the argument of "we should use it because we've deployed this draft" is a strong one. Deployments of drafts are extremely useful to get some experience, but we shouldn't ossify to those decisions because we already made them.
Once this is merged, things will be stable and we won't break existing users. That's the guarantee that comes with merging this. And partly the reason why I haven't merged it yet. I want to build the js-libp2p side before merging and make sure that the interop works and is reasonable. I'm pretty close on that. Hoping to get it done by the end of the month :)
|
||
libp2p does not squat the global namespace. libp2p application protocols can be | ||
discovered by the [well-known resource](https://www.rfc-editor.org/rfc/rfc8615) | ||
`.well-known/libp2p/protocols`. This allows server operators to dynamically change the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the path now has /protocols
do we need the top level map key protocols
in the json response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it is good practice to keep a list in named field like that.
Makes it easy to quickly validate JSON, allows us to add more things to the file, if the need arises, without breaking legacy clients.
|
||
1. That the Kademlia application protocol is available with prefix `/kademlia` | ||
and, | ||
2. The [IPFS Trustless Gateway API](https://specs.ipfs.tech/http-gateways/trustless-gateway/) is mounted at `/`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this only specifies the path but not the method (GET / POST) to use when accessing this protocol over HTTP and that's up to the specific protocol to define how to run it over HTTP?
If so, should we add this explainer in the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. the methods will be specific to each protocol at each mount point, so not part of this spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. As I see it there are two sections in this document:
- Running libp2p protocols over standard http like h2 or h3
- Running http protocols over libp2p streams.
So should we mention it in the specs that a libp2p protocol supporting http transport should specify the http method and headers to be used for the protocol. For the path they can expose it via the wellknown endpoint /.well-known/libp2p/protocols
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. An application protocol would be built using HTTP semantics, and that protocol would then be able to run on libp2p streams or "standard" http transports like h2, h3.
What do you mean by:
Running libp2p protocols over standard http like h2 or h3
This spec does not define how you would take an existing libp2p protocol and map it to HTTP semantics. That is best done by the specific protocol itself. But maybe I'm misunderstanding your point?
Changed in libp2p spec draft and go-libp2p: libp2p/specs#508 (comment) libp2p/specs@3c0ac40 libp2p/go-libp2p#2757
libp2phttp: Define the multiaddr URI
We have two implementations now in go-libp2p and js-libp2p. We have an approval on this spec. And we've had a lot of eyes and time for it to bake. I think this is ready to merge! Of course we are still able to adjust the spec in follow up PRs, but I don't expect any fundamental changes. Thank you all for the input, and I hope you make use of this new spec and APIs :) |
This PR adds a specification for libp2p+HTTP.
It builds on countless discussions with various people interested in this topic, and incorporates the thinking outlined in #477 and #481.
Peer ID authentication over HTTP is deferred to a separate proposal: #564. That work is put on hold for now until we find a strong use case for it.
The work was started by Marten & Marco, and continued by @MarcoPolo. Recent edits are from @MarcoPolo .