-
Notifications
You must be signed in to change notification settings - Fork 150
Invalid Content-Length header (and/or issue with output-buffering) #251
Comments
No comment? I'm looking at $response = $this->injectContentLength($response);
$this->emitStatusLine($response);
$this->emitHeaders($response);
$this->flush($maxBufferLevel);
$this->emitBody($response); There is a serious logical flaw in this code: you cannot inject the content-length until you know what the content-length is going to be - and if you both So you have to decide, either:
In my opinion, you should be doing (1) - though I realize this is a BC break, it's the only correct thing to do, because no one should be emitting the content part of the response via the output buffer and the other parts (headers etc) via the PSR-7 model. This is a serious issue, guys - dead, blank pages are difficult to debug, and when this happens in production, it leads to very unhappy clients and users. |
@mindplay-dk Honestly, I had no idea how to reproduce it until your comment today, making it hard to comment. I'll discuss with the review team how we should handle this. As you note, whatever approach we take will result in a BC break, but taking no action can leave things unusable. |
Just encountered this too and it took some quite debugging till found cause because all I got was white page without any errors anywhere. Basically what happened is that |
+1 on this being a bug with serious consequences. Had a ZFE app where a ZFE config file was emitting a blank line by mistake (a blank line had been added to the top of a ZFE config file, before the '<?php' marker. This was a mistake made in an Ansible template, therefore only showed up on deployed instances - not local dev environments). As a result, the 'Content-Length' header was out by 1 for every reply the ZFE app made. cURL will discard any extra bytes after 'Content-Length' bytes in the response - and the PHP extension doesn't tell you that it's done so. cURL from the command-line (by default) doesn't warn you either. End result: client apps were seeing truncated JSON from the ZFE app, but the ZFE app logs showed valid JSON sent. The serious consequences come from the time it takes to work out what's going on. Because the ZFE app looks like it's working (the logs show valid JSON sent, and the Content-Length header matches the length of the JSON that the app had in the Response object), my client has just spent a week looking everywhere else for the bug. It's the kind of problem that junior folks just aren't going to spot. One possible fix is to simply stop sending the Content-Length header. Seems appropriate, because there's no 100% reliable way for Diactoros to know what the right value for Content-Length is. Diactoros only knows about what's in the Response object. It can't know about anything else emitted outside that context. Unless I'm missing something, no value is better than sending the wrong value. |
I meant to address this with 1.5.0, but because I'd not created a pull request, missed it. I'm looking into it again now. The second option is essentially a non-starter, though technically potentially do-able. It would require that we The third already works if there is only buffered content or only PSR-7 response content. In the latter case, the My current thought is this: we know the content-length by the time we currently call The problem I've run up against, however, in trying to understand when exactly the buffered content is emitted is that when the The easiest way to detect the condition would be to loop through every buffer (not just those above the I'll try a few more experiments, and hopefully get a patch up for review in the next day or two. |
New discovery: if we throw an exception during This lends credence to what @stuartherbert suggests: that the emitter simply not auto-inject the header. Middleware can easily be registered to calculate and inject the $app->pipe(function ($request, DelegateInterface $delegate) {
if ($response->hasHeader('Content-Length')) {
return $response;
}
$length = $response->getBody()->getSize();
return null === $length
? $response
: $response->withHeader('Content-Length', $length);
}); This still poses a problem, however, as the contents will be truncated and/or inconsistent if As such, I suggest that the only possible step forward is as follows:
|
I've spent around 8 hours on this now, and I'm stumped. I can easily test for an empty response, and, if detected, simply flush the output buffer: if ($this->isEmptyResponse($response)) {
$this->flush($maxBufferLevel);
return;
}
// where:
private function isEmptyResponse(ResponseInterface $response)
{
if (200 !== $response->getStatusCode()) {
return false;
}
if ([] !== $response->getHeaders()) {
return false;
}
return 0 === $response->getBody()->getSize();
} That works fine. The bigger issue is determining what to do after that. I tried writing a method to check for an empty output buffer, and, if not empty, raise an exception, and slip-streaming that call immediately after the conditional checking for an empty response. I tried three different approaches. The first, and current, approach, looks like this: private function checkForEmptyOutputBuffer($maxBufferLevel = null)
{
$maxBufferLevel = $maxBufferLevel ?: ob_get_level();
$maxBufferLevel = $maxBufferLevel > -1 ? $maxBufferLevel : ob_get_level();
$bufferLevel = ob_get_level();
while ($maxBufferLevel <= $bufferLevel && 0 < $bufferLevel) {
$content = ob_get_contents();
if ('' !== $content) {
throw new RuntimeException(
'Unable to emit response; content emitted outside request/response lifecycle: '
. $content
);
}
ob_end_flush();
$bufferLevel = ob_get_level();
}
} The second is similar, but does a private function checkForEmptyOutputBuffer($maxBufferLevel = null)
{
$maxBufferLevel = $maxBufferLevel ?: ob_get_level();
while (ob_get_level() > $maxBufferLevel) {
$content = ob_get_contents();
if ('' !== $content) {
throw new RuntimeException(
'Unable to emit response; content emitted outside request/response lifecycle: '
. $content
);
}
ob_end_flush();
}
} The third approach is brute-force: private function checkForEmptyOutputBuffer()
{
while (ob_get_level()) {
$content = ob_get_contents();
if ('' !== $content) {
throw new RuntimeException(
'Unable to emit response; content emitted outside request/response lifecycle: '
. $content
);
}
ob_end_flush();
}
} The third approach raises problems (risky test detection) in PHPUnit:
Essentially, we also end up closing the output buffer PHPUnit initializes; PHPUnit detects that, and flags the test. It's not a failure, but the verbose output gives me pause; it feels like swatting a fly with a hammer. The second approach raises an issue in that most of the time, the current output buffer level is the SAME AS the The first approach addresses the problems of the second approach, but, believe it or not, still has the problems of the third approach in that the output buffers are not correctly closed. Because PHPUnit's error message is vague, I do not know if I did not close enough buffers, or too many; this, then, also makes me wonder if the approach is viable. Of more interest is the fact that NONE of these are compatible with the private function clearOutputBuffer($maxBufferLevel)
{
$maxBufferLevel = $maxBufferLevel ?: ob_get_level();
while (ob_get_level() > $maxBufferLevel) {
ob_end_clean();
}
} and then to call it within
What it comes down to is:
In either of the latter two cases, we likely need to use something like I'm leading towards the second option at this point, but really, really need feedback from those of you who have experienced the issue before proceeding. |
The second approach mostly works, but requires a check for content in the current output buffer to be robust, as PHP may not yet have flushed that content. You can see a patch here: The primary issue with this approach is that if the developer has called I think that this approach, however, is the simplest, most robust solution, and we can document the remaining issues to aid developers. Thoughts? |
It's really hard to say here what would be best solution but it sounds fine. In our codebase we worked around this issue by simply discarding all output buffers since we assume that they shouldn't contain anything and everything should be written inside Response. while (ob_get_level() > 0)
{
ob_end_clean();
}
$emitter->emit($response); |
@davispuh That was one approach I tried with the
Both of these are really hard to discover and/or debug, which is what led to the solution I've proposed. It means the library does less to protect the developer from their self, but also ensures that problems like these are exposed clearly and early. It also means that you can write your own emitter or |
A few thoughts:
Either way, I wouldn't be going to the lengths you are to make 'Content-Length' accurate 90-odd% of the time. That kind of effort and complexity has a nasty habit of breaking somewhere down the road - today's clever code is tomorrow's bug and all that. Maybe I'm just scarred by bitter experience with what people do on top of ZF :) |
@stuartherbert Would you be able to do some research for me, please, and check to see what each of Apache/mod_php, Apache/fastcgi + phpfpm, nginx + phpfpm, and the built-in PHP webserver do? My guess is one or more of these is not injecting the However, if any of these are NOT injecting the header, we need some way of injecting it, whether that's via middleware or an emitter. The problems I'm worried about are:
In each of those two cases, we need an approach, as they lead to the same issues we're seeing currently - we don't get the entire content back to the user. This could be as simple as removing the header if it was set and we discover content in the current buffer, or it could be doing something like raising an exception. I just need some guidance on what approach to take — as in, what would be most valuable or helpful to users. 😄 |
I'll get it done this weekend, and post the results.
Did the original contributor provide any details on what problem their contribution fixed? Can we reach out to them and ask for their experience?
As mentioned earlier, I wouldn't go looking in the output buffers. I'd flush them, try to emit the headers, and let PHP's existing SAPI code detect that we're writing headers after having sent content. That'll (correctly) show up as an application bug that people can see and do something about.
I haven't come up with a credible reason yet why a PSR7 app should accommodate content being emitted from elsewhere in the app code.
… On 24 Aug 2017, at 22:52, weierophinney ***@***.***> wrote:
@stuartherbert Would you be able to do some research for me, please, and check to see what each of Apache/mod_php, Apache/fastcgi + phpfpm, nginx + phpfpm, and the built-in PHP webserver do? My guess is one or more of these is not injecting the Content-Length, as this feature was provided by a contributor, which I assume was for a reason... If these are all injecting the header for us (with the exception of the built-in server, which I don't really care about), then I have zero issue dropping the functionality entirely.
However, if any of these are NOT injecting the header, we need some way of injecting it, whether that's via middleware or an emitter. The problems I'm worried about are:
if we inject it in the emitter, what should be the behavior when there's content in the output buffer?
if the header is present already in the response, what should we do if we discover content in the output buffer?
In each of those two cases, we need an approach, as they lead to the same issues we're seeing currently - we don't get the entire content back to the user. This could be as simple as removing the header if it was set and we discover content in the current buffer, or it could be doing something like raising an exception.
I just need some guidance on what approach to take — as in, what would be most valuable or helpful to users. 😄
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Pinging @mnapoli — you contributed #88, which adds the |
That would be semantically similar to what I'm attempting now, and my previous attempts as well. The main issue I run into with that is testing, as it causes quite a number of warnings from PHPUnit (not test failures, mind you, but warnings of risky tests and/or tested code, due to flushing the output buffer PHPUnit uses). We may have to find another way to test these classes if we go that route. |
Pragmatically, does it need a unit test? It's something that clearly falls outside what unit testing was invented to cover.
In systems I've built in the past, I've wrapped this kind of code in @codeCoverageIgnore tags, and tested it at a different layer of testing. (Always a huge advocate of a layered approach :) ). I appreciate that different folks have different approaches :)
… On 24 Aug 2017, at 23:19, weierophinney ***@***.***> wrote:
@stuartherbert —
As mentioned earlier, I wouldn't go looking in the output buffers. I'd flush them, try to emit the headers, and let PHP's existing SAPI code detect that we're writing headers after having sent content. That'll (correctly) show up as an application bug that people can see and do something about.
That would be semantically similar to what I'm attempting now, and my previous attempts as well. The main issue I run into with that is testing, as it causes quite a number of warnings from PHPUnit (not test failures, mind you, but warnings of risky tests and/or tested code, due to flushing the output buffer PHPUnit uses). We may have to find another way to test these classes if we go that route.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hi, I don't recall exactly why I looked into that but I think it may have been because some downloads did not work with Diactoros (because of the missing header). Whatever the reason I had a look at Slim which did add the header at the time (2015): https://github.com/slimphp/Slim/blob/ca438e786d24d9d96bb3dcb993c33ab8be8af5e9/Slim/App.php#L473 I then opened #84 to suggest adding the header in Diactoros as well. Slim seems to have changed on this point as well:
So I would say "go for it" and remove that behavior? But there may be some problems with some kind of HTTP responses that need a Content-Length, I don't know HTTP well enough to know which one though… |
Some test results ...
Test environment:
What do these results mean? I think they're really good news.
There will be problems around non-compliant / incomplete HTTP client implementations - ones that do not correctly / fully support HTTP/1.1 streaming (chucked transport encoding). They are going to need a 'Content-Length' header from somewhere. My advice is:
I've changed my mind on |
I'm mostly with @stuartherbert on this subject, the last remark of his last post: stay away from output buffering entirely. It's global state. You shouldn't be mixing, at all - from my point of view, it defeats the entire purpose of PSR-7 and Diactoros, which was to avoid the aches and pains caused by using anything that manipulates or depends upon global state. We're struggling to take control of something that is almost by definition out of control. From my point of view, you can simplify the whole thing and boil it down to: if (headers_sent() || (ob_get_level() > 0 && ob_get_length() > 0)) {
throw new GlobalStateException();
} In other words, don't allow global state in the mix. In my opinion, you either choose PSR-7 and work with it, or you don't. As demonstrates by all the edge cases and marginal issues with every conceivable work-around discussed above, there isn't really any peaceful way to reconcile global state mixed with something designed to avoid global state. I personally wouldn't struggle to provide a solution for those who are unwilling to move away from global state. Note that this doesn't block anyone from integrating with legacy scripts that do use output buffering - they'll simply need to capture the output and take control of it, by injecting it into a Response object; or in other words, integrate it properly with PSR-7 and Diactoros which they've elected to use. I don't think Diactoros needs to enable code that relies on global state - I think it's actually better to help developers learn how to avoid it. |
Thanks for the research and feedback, @stuartherbert, and thanks to @mindplay-dk for the follow-up. I'm currently recovering from pneumonia, but plan to tackle this soon when I've got a bit more energy. I'll ping each of you with the pull request I create. |
I've created TODO:
I have decided against prompting to add the |
Per zendframework#251, this patch removes Content-Length injection from SAPI stream emitters. It also renames the `checkForPreviousOutput()` method to `assertNoPreviousOutput()` to make it more clear that the method is a no-op under proper usage. The `assertNoPreviousOutput()` method also borrows an idea from the comment to the issue, indicating we can check for (1) a non-zero output buffer, and (2) use `ob_get_length()` to determine if it has content.
Per zendframework#251, this patch removes Content-Length injection from SAPI stream emitters. It also renames the `checkForPreviousOutput()` method to `assertNoPreviousOutput()` to make it more clear that the method is a no-op under proper usage. The `assertNoPreviousOutput()` method also borrows an idea from the comment to the issue, indicating we can check for (1) a non-zero output buffer, and (2) use `ob_get_length()` to determine if it has content.
Per zendframework#251, this patch removes Content-Length injection from SAPI stream emitters. It also renames the `checkForPreviousOutput()` method to `assertNoPreviousOutput()` to make it more clear that the method is a no-op under proper usage. The `assertNoPreviousOutput()` method also borrows an idea from the comment to the issue, indicating we can check for (1) a non-zero output buffer, and (2) use `ob_get_length()` to determine if it has content.
Whew, nice, thanks for fixing this! I think we (Flarum) have been bitten by this before as well. :) |
The
SapiEmitter
generates aContent-Length
header based on the length of the body stream length.But it also flushes any existing output buffer, and doesn't take measures to prevent both of those those things happening during the same request.
This can leads to an invalid HTTP response body getting emitted, which e.g. Chrome will refuse to even display.
For example, if you
var_dump('something')
and then emit, you end up with a broken response.I suspect this is what leads to bug-reports such as #216.
A check should probably be introduced to prevent this, as it's likely a common mistake - rather than creating an invalid response, we should probably throw an exception?
The text was updated successfully, but these errors were encountered: