[RFC] Split HTTP::Client::Response in two #9901

asterite · 2020-11-11T12:48:10Z

Context

I think it's a shame that having a compiled and type-safe language we are using the same type for non-streaming (consumed) and streaming responses:

require "http/client"

response = HTTP::Client.get("...")
response.body # okay
response.body_io # not okay

HTTP::Client.get("...") do |response|
  response.body # not okay
  response.body_io # okay
end

The "okay" and "not okay" here mean that when it's "not okay" we get a runtime error. This is just a minor thing because you'll probably notice it right away and change it to use the other. However, it can be quite confusing to understand why you get that error. And also, we could catch this at compile-time.

Proposal

I propose we have this hierarchy:

abstract HTTP::Client::Response
- HTTP::Client::Response::Consumed: returned in the non-block version and has a String body.
- HTTP::Client::Response::Streaming: returned in the block version and has an IO body

Some decisions to make

Should we do this? I personally think so.
Do we use the same name for the body property in both cases? That means you can always use body (nice) but the type changes depending on whether it's in a block or not. The downsides are that it can be a bit confusing, and it's also a breaking change. Alternatively we can keep it as body and body_io so it's not a breaking change.

The text was updated successfully, but these errors were encountered:

jhass · 2020-11-11T13:00:42Z

I was having the same thought process for it actually, so I'm all for it. body and body_io always felt weird and out of place. So also 👍 to calling it body on both.

If wanted Streaming could maintain body_io as deprecated alias to body for a while. Consumed will not need body_io as that was never working for its usecases in the first place and body keeps working as expected.

The biggest breakage here will be people typing stuff as Response which in this approach will loose its body and body_io properties, so people will have to update their type declarations to continue using it.

The alternative to subclassing is making body be String|IO and have people type check that... which still is a runtime assertion and barely better than body and body_io.

Third option I see would be to make body always be IO, a prefetched IO::Memory instance for the non-streaming case, to not leak FDs

asterite · 2020-11-11T13:04:58Z

Another idea: let both have body and body_io:

for non-streaming we have body and we could expose body_io as an IO::Memory over body.
for streaming we could make a call to body consume the body_io into a String, then replace body_io with IO::Memory over body. The problem with this is if someone consumes a bit of the IO before calling body.

But maybe the downsides are not that bad. And we could keep a single Response type.

But I still feel having two types is better.

jhass · 2020-11-11T13:08:29Z

Yeah, that seems like it's equally brittle to use as the current interface. I would prefer forcing users to always call .body.gets_to_end if they want a string explicitly (building on the always IO option outlined above).

straight-shoota · 2020-11-11T13:12:42Z

Yeah, this part of the API is really shady and I'm all in for improving that. I'm not to sure about having two separate types, though. Feels a bit overhead and IMO might not be that user-friendly.

There's another solution I've already mentioned in #8461 (comment): Yield the IO as a second argument to the block in the streaming scenario.

I'd like to inspect some real examples to see how this is used in the field. That should help us discuss an optimal improvement for this API.

jhass · 2020-11-11T13:16:41Z

Hm, yielding it as second argument still seems to have potential runtime errors or weird behavior when mixing using that IO with calling Response#body at some point.

straight-shoota · 2020-11-11T13:43:05Z

According to skimming the code available on Github, by far the most common scenario is a quick and dirty call which doesn't really need a response object and just retrieves the body as a string:

HTTP::Client.get("http://example.com").body

(more elaborate versions also include a .success? check or - rarely - some crude redirect logic)

Maybe there should be a convenience method for this which also handles some HTTP protocol context (like status code).
If we had this, the majority of use cases (the simple ones) would be dealt with. Then there would actually be little reason to have a non-streaming variant of Reponse... All more complex use case which still only want the body as string would just have to call gets_to_end.

I think that might really be a great solution. Essentially, it just means removing Response#body and providing an easy-to-use alternative for the most common use case.

j8r · 2020-11-11T13:48:16Z

Why there is one that consumes the body to a string, and not the other way around? It's not obvious at a first glance, so I think having a single behavior for both will be less confusing.

We could have:

getter body_io : IO
# Consummes the body IO
def body!
  @body_io.gets_to_end
end

jhass · 2020-11-11T13:49:39Z

Why there is one that consumes the body to a string, and not the other way around? It's not obvious at a first glance, so I think having a single behavior for both will be less confusing.

We could have:
getter body_io : IO
# Consummes the body IO
def body!
  @body_io.gets_to_end
end

That's #9901 (comment) with different names, so 👎

bcardiff · 2020-11-11T13:53:03Z

If we want to do something for 1.0 I am inclined to:

keep a single response type
allow the non-streaming to have a body_io with a IO::Memory over body
make the streaming response raise on runtime when body is called

That way:

body_io is always available if needed as lower level fetch of the body, but potentially only once.
There is no ambiguity if some data from the body_io is consumed in the streamed response

If we want to tweak further the design I think is after 1.0.

j8r · 2020-11-11T13:54:09Z

The one-liner argument is not a good enough one, because we can do:

p HTTP::Client.get("http://example.com").body
p HTTP::Client.get "http://example.com", &.body_io.gets_to_end
# Perhaps even p HTTP::Client.get "http://example.com", &.body!

straight-shoota · 2020-11-11T13:56:37Z

@j8r I don't understand your last comment. What point are you trying to make? Is it hat the yielding variant is only marginally longer and the "simple" one isn't necessary?

j8r · 2020-11-11T13:59:49Z

Yes, I mean that all could be streaming, and as user-friendly as non-streaming. Both kind of users - those needing more low level/performance and those just wanting something easy - will be happy then.

straight-shoota · 2020-11-11T14:01:30Z

@bcardiff Sounds good to me. That would allow a later discussion about bigger changes, possibly in conjuction with #6011.

jhass · 2020-11-11T14:02:39Z

If we want to do something for 1.0 I am inclined to:

I'm not sure whether that really improves the situation, it just seems to shift the inconsistency away from the particular case that triggered this discussion, so I think I would then rather prefer to leave things as is. (Personally I think 1.0 is the perfect opportunity for a breaking change like this and starting with a clean API into the stable phase).

To me this is about getting rid of having both body and body_io, which is where all the confusion, weirdness and inconsistency stems from. I don't see benefit in changes that don't achieve that goal.

asterite · 2020-11-11T14:03:15Z

@bcardiff

make the streaming response raise on runtime when body is called

With this RFC I'm trying to avoid runtime errors and move them to compile-time errors.

@straight-shoota

If the most common case is fetching a response without a block and getting the body as a string then it shouldn't matter if we introduce two response types, because most don't have to deal with a specific type name. The body property will continue working for them. The advantage is that when you use the streaming version you will get a compile-time error if you type body instead of body_io (same for the other way around).

straight-shoota · 2020-11-11T14:08:16Z

@asterite

If the most common case is fetching a response without a block and getting the body as a string then it shouldn't matter if we introduce two response types

But what's the point in having two separate types when one of them is rarely used and can simply be emulated using the other one?

asterite · 2020-11-11T14:09:31Z

But what's the point in having two separate types when one of them is rarely used and can simply be emulated using the other one?

Type safety.

straight-shoota · 2020-11-11T14:14:46Z

I know that's the goal for this RFC. My suggestion to remove Response#body would be completely type safe. A second type for type safety is only required if there's a real need for a non-streaming response type. And that's what I'm questioning.

asterite · 2020-11-11T14:19:48Z

A second type for type safety is only required if there's a real need for a non-streaming response type

I think in most cases you want a non-streaming response because responses are small. However, in a few cases you'd like a streaming response, for example when downloading a large file, or when the response is meant to be streaming (like sending lines of JSON that you parse individually, and the connection is kept open).

The way we design it, for the streaming way you have to do it in a block so you can't forget to close the IO. In Go you always get an "IO" and you have to remember to close it, and we didn't like that.

So, we have two use cases and the way you use them in the same except that in one case you deal with a String, in the other you deal with an IO. We could extract that outside of the Response object, but then it would be a bit weird:

response, body = HTTP::Client.get("...")

When you talk about an HTTP response body, the body really belongs to the response, it's not a separate entity. And it's just nice to be able to pass the response to other functions.

What I'm wondering is: if we split it into two types, in which way that produces friction?

jhass · 2020-11-11T14:27:27Z

I feel like the most simple breaking change we could do right now is removing body_io, making body return IO always and let that be IO::Memory for the non streaming case (avoiding Go's pitfall). This increases API usage verbosity for the moment, but would also provide a simple baseline for adding convenience API in a non-breaking manner, say for example HTTP.get_body(...) : String.

straight-shoota · 2020-11-11T14:31:52Z

@asterite

I totally agree to your assessment of the typical use cases. I already mentioned that in #9901 (comment)
My point is, that when you want the response as a string, chances are high you don't really care about the response object. The vast majority of use cases are literally just HTTP::Client.get("http://example.com").body.
So why not make it easier for these use cases and just return the body (as String) and not the Response with cached body?
For example, we could end up with two API methods like this:

HTTP::Client.get("http://example.com") # returns String
HTTP::Client.get("http://example.com") do |response|
  # here we consume response body io
end

If you just want a simple response body, you use the non-yielding method and get a string result. The method would also provide basic status code handling (for example raise or return nil unless successful). That's more included features for robustness and simpler to use than the current API.

If you have more advanced use case, like responding to specific status codes, handling response headers etc. you use the yielding method and get a response object (which only provides a streaming IO). If you need the body as string, you have to consume the IO.

asterite · 2020-11-11T14:37:28Z

The vast majority of use cases are literally just HTTP::Client.get("http://example.com").body

Then the vast majority of use cases are wrong. Consuming the body without checking that the status is okay It seems like a latent bug to me.

So why not make it easier for these use cases and just return the body (as String) and not the Response with cached body?

Because a body without an HTTP status code can't be trusted.

straight-shoota · 2020-11-11T15:21:28Z

Totally agree and that's why I mentioned that the method should check the status code and fail if it's not a success.

asterite · 2020-11-11T15:23:44Z

Ouch, I should finish reading before replying 😊

asterite · 2020-11-14T18:50:10Z

Guess what someone just asked in gitter? https://gitter.im/crystal-lang/crystal?at=5fb01edab4283c208a656378

I can't believe there's so much resistance to make this trivial change.

asterite · 2020-11-14T18:50:44Z

And I say it's trivial because I already have this implemented in my machine, I'm just waiting for an "okay" so I can send a PR.

straight-shoota · 2020-11-14T19:25:06Z

I don't see any resistance to improving this. We just need to agree on the solution.

IMO having to separate types is completely unnecessary. We really need only a single type with IO. I think the best improvement we can do right now is to add IO::Memory for non-streaming instance. So that's pretty much what @bcardiff and @jhass have been proposing as well. There's only different opinions on whether the IO should be named body_io or body (and subsequently what to do with the string-returning method).

Just adding a fallback for body_io is the easiest change. It doesn't break anything. It doesn't improve much either, but it would fix the issue discussed on gitter.
Renaming body_io to body OTOH is a breaking change. It's easy to fix though (just insert to_s). And when it's used in a place where either IO or string do, there's no need to change any code either. So maybe it's fine. However, it's not necessarily better than renaming to body. IMO it doesn't matter whether the property ends up being named body or body_io. Staying with the latter won't break code that is already using the streaming interface. And it would allow to properly deprecate body before removing/breaking its return type.

asterite · 2020-11-14T19:33:16Z

If you take a look at newer languages like Rust, Elm... even Haskell has this... you define all possible states a type can be (algebraic data type) and then there's no way to represent an invalid state. In my proposal, when you fetch a non streaming response, you can only read it as a string. If you are streaming it, you can only treat it as an IO. Of course you can explicitly go from one to the other (build an IO memory or read everything into a string) but as it's been avoided all the time here, implicit behavior is not good.

I sincerely can't see the downside of having two types other than hypothetical situations.

straight-shoota · 2020-11-14T19:37:54Z

A very practical downside from the API design is that IMO a non-streaming type is simply not necessary at all. So we can eventually have just a single type with streaming IO. No type with buffered string body.
It means we can't have a proper fix right now. That's a benefit of the two-type solution. But the final solution to have only one type sounds much better to me.

asterite · 2020-11-14T19:41:41Z

How can you make it so that users don't forget to close the IO? If you have a solution for that, I can concede.

straight-shoota · 2020-11-14T20:05:15Z

I'd suggest to remove the overload that returns a Response and only keep the one that yields the instance.

That would also free up the non-yielding overload for a convenience method that directly returns String as suggested in #9901 (comment). Note that it doesn't have to be that way. The convenience method needs extra discussion. But it would be an option.

asterite · 2020-11-14T20:34:08Z

I think we are going in circles here so I'll just stop. Sorry :-)

straight-shoota · 2020-11-15T11:39:21Z

You're probably right.

I'll still add this to my last response because I might have been thinking to far ahead: We can obviously also keep the non-yielding overload and have it return a Response instance. That one would just not provide access to the network IO, but the buffered body via IO::Memory. So the data situation and the responsibility to close a response are exactly the same as now. What changes is just that you can't access the body as a string directly. Instead you always have to consume an IO, either memory or network depending on how the response is obtained.

jhass · 2020-11-15T14:22:25Z

Which is what I suggested in #9901 (comment) and #9901 (comment) already :P

straight-shoota · 2020-11-15T14:27:07Z

Yes and I in #9901 (comment) and @bcardiff in #9901 (comment). It essentially just differs on whether to stay with body_io or move to body for the streaming IO.

asterite · 2020-11-15T14:35:17Z

Oh, I really like that. @jhass I think I missed that third option, sorry.

So, to see if I understand things correctly, the idea is:

have a single HTTP::Client::Response type whose body is always an IO
In the case of a streaming response, the IO is directly connected to the socket, and it will be closed once the block ends
In the case of a non-streaming response, we read the HTTP body, close the response/connection, and set body to an IO::Memory over the string body

The nice thing about this approach is that you always deal with a response in the same way: it's an IO. The not-so-nice thing about this is that for the non-streaming version you have to do response.body.gets_to_end to get a string, which has three "problems":

it's a bit more verbose than just body: but I'm fine with
IO::Memory#to_s and IO::Memory#gets_to_end will create a duplicate string: however, we can make IO::Memory know that it was initialized from a string, and in that case return the original string. There's still one more memory allocation for the IO::Memory, but it's fine (it's just 80 extra bytes).
if the body is always an IO one could pass it down to JSON.parse or X.from_json, or X.from_yaml, and parsing an IO is slower than parsing a String: but you can always read the IO into a string if you want to avoid that performance issue.

So I think I'm 👍 for this approach. It's a breaking change, but it's probably better to do it before 1.0 is released.

jhass · 2020-11-15T14:44:17Z

Yes, that's the idea.

As mentioned previously it's easy to build helpers on top of it to get rid of the verbosity, both first class in stdlib or second class as or inside a shard.
A "read only" mode/flag sounds good, as a generally useful performance optimization too. This way we can use IO::Memory as an IOish view over a string, even a static one (think file embedding).
If the parsing slow down due to this turns out to be a real world issue, it seems easy to have JSON and YAML detect an IO::Memory in "read only" mode and access the underlying string instead.

straight-shoota · 2020-11-15T15:02:16Z

Great 👍 Sounds like we're pretty much agreed then. It seems body vs. body_io is still to be settled.

body is the better alternative because the _io suffix is simply unnecessary. But the change to a bit more concise name has to match the cost of a breaking change.

Staying with body_io has the benefit of not breaking any existing code using the streaming response. body returning string (for non-streaming response) could be properly deprecated and continue to work for now. So effectively, we wouldn't need to break anything immediately.
IMO it's better to go into 1.0 with a deprecated method than a broken one.

I'm in favor of body_io but going for body is fine with me either.

jhass · 2020-11-15T15:10:15Z

I would strongly prefer body, making body_io a deprecated alias to it is fine for me, but not necessary. Turning Response#body from returning a String to returning an IO is a breaking change already anyhow.

asterite added kind:feature status:discussion kind:refactor topic:stdlib:networking labels Nov 11, 2020

asterite mentioned this issue Nov 11, 2020

Unhandled exception: HTTP::Client::Response#body_io cannot be nil (NilAssertionError) #9900

Closed

jhass mentioned this issue Nov 20, 2020

Add form data shortcut method to HTTP::Request #9941

Open

j8r mentioned this issue Nov 29, 2020

Security issue: how to avoid payload attacks for the HTTP server #9985

Open

SamantazFox mentioned this issue Nov 28, 2021

Temporarily fix for #2612 (round 2) iv-org/invidious#2677

Merged

[RFC] Split HTTP::Client::Response in two #9901

[RFC] Split HTTP::Client::Response in two #9901

Comments

asterite commented Nov 11, 2020

Context

Proposal

Some decisions to make

jhass commented Nov 11, 2020 • edited Loading

asterite commented Nov 11, 2020

jhass commented Nov 11, 2020 • edited Loading

straight-shoota commented Nov 11, 2020

jhass commented Nov 11, 2020

straight-shoota commented Nov 11, 2020

j8r commented Nov 11, 2020

jhass commented Nov 11, 2020

bcardiff commented Nov 11, 2020

j8r commented Nov 11, 2020 • edited Loading

straight-shoota commented Nov 11, 2020

j8r commented Nov 11, 2020

straight-shoota commented Nov 11, 2020

jhass commented Nov 11, 2020 • edited Loading

asterite commented Nov 11, 2020

straight-shoota commented Nov 11, 2020

asterite commented Nov 11, 2020

straight-shoota commented Nov 11, 2020 • edited Loading

asterite commented Nov 11, 2020

jhass commented Nov 11, 2020 • edited Loading

straight-shoota commented Nov 11, 2020

asterite commented Nov 11, 2020 • edited Loading

straight-shoota commented Nov 11, 2020

asterite commented Nov 11, 2020

asterite commented Nov 14, 2020

asterite commented Nov 14, 2020

straight-shoota commented Nov 14, 2020

asterite commented Nov 14, 2020

straight-shoota commented Nov 14, 2020 • edited Loading

asterite commented Nov 14, 2020

straight-shoota commented Nov 14, 2020

asterite commented Nov 14, 2020

straight-shoota commented Nov 15, 2020

jhass commented Nov 15, 2020

straight-shoota commented Nov 15, 2020 • edited Loading

asterite commented Nov 15, 2020

jhass commented Nov 15, 2020

straight-shoota commented Nov 15, 2020 • edited Loading

jhass commented Nov 15, 2020 • edited Loading

jhass commented Nov 11, 2020 •

edited

Loading

jhass commented Nov 11, 2020 •

edited

Loading

j8r commented Nov 11, 2020 •

edited

Loading

jhass commented Nov 11, 2020 •

edited

Loading

straight-shoota commented Nov 11, 2020 •

edited

Loading

jhass commented Nov 11, 2020 •

edited

Loading

asterite commented Nov 11, 2020 •

edited

Loading

straight-shoota commented Nov 14, 2020 •

edited

Loading

straight-shoota commented Nov 15, 2020 •

edited

Loading

straight-shoota commented Nov 15, 2020 •

edited

Loading

jhass commented Nov 15, 2020 •

edited

Loading