Please read my blog's comment policy here.
Earlier in IE9, we tried to change the WinINET networking component to reject as incomplete any HTTP responses for which the Content-Length header specified more bytes than the server actually sent back. It turns out that some sites and applications expect to be able to specify an incorrect Content-Length without the client raising a fuss.
This was a pretty surprising finding, and we spent a lot of time looking at it. It turns out that all common browsers are accommodating to this violation of the HTTP protocol.
I built a script for Meddler which tests the browser’s behavior in a variety of Incorrect Content-Length circumstances; the results of running this script are shown below:
In this test, the browser navigates a frame to a page which sends an incorrect content length. The scenario where the server sends fewer bytes than promised is the “Underrun” case, and the server closes the connection either gracefully (“FIN”) or abruptly (“RST”). The scenario where the server sends more bytes than promised is the “Overrun” case.
ShowContent to C-L
IE8 / IE9 RC
ShowContent reloads it once
Safari 5.0 (7533.18)
Redownloads content 3 times, shows the concatenated output. Bizarre.
As you can see, all browsers simply ignore the additional content in the “Overrun” case and render the page as if nothing is amiss. In the case of underruns, however, browsers differ in behavior. In the RST cases, browser behaviors differ (and Safari 5’s behavior is utterly bizarre). However, all browsers except for IE9 Beta accommodate the case where the server returns less content than promised, so long as the connection is gracefully closed with FIN.
We ended up reverting IE9 RC to IE8’s behavior here for compatibility reasons. We found at least one major ecommerce site and one major streaming product which always triggered an underrun with a graceful close, and we elected to keep compatibility by accepting the invalid content.
In the File Download scenario, we were particularly expecting browsers to be strict about Content-Length, as an underrun seems like a good clue that the transfer is incomplete.
Treat as complete
Truncate at C-L
Offer Retry (“Stopped”)
Offer Retry (“Error”)
Shows empty DLM window (??)
Here too, we found that all browsers agree on the behavior of an overrun—the downloaded file is simply truncated at the Content-Length specified value. In the case of an Underrun ended by RST, only Chrome treats the file as correctly completed. Safari simply shows a blank download manager window, and all other browsers display an error.
In the gracefully-closed Underrun case, we found that only Opera and IE9 Beta treated the file as incomplete. During testing of the beta, we encountered some internal line-of-business applications that send an invalid Content-Length header, and again, concerned about the compatibility impact of the change, we reverted to IE8 behavior for the IE9 Release Candidate.
Ultimately, this was a surprising and disappointing exercise; Content-Length is one of the most fundamental aspects of HTTP, and it’s one of the most important things to get right in order to ensure reliable operation of the protocol across myriad server, proxy, and client products. Alas, the “real world web” is polluted by buggy behavior and accommodating implementations, and hence it will likely be some time before products can increase their strictness without significant compatibility risk.
Fiddler will show a HTTP Protocol Violation warning if it encounters content with an invalid Content-Length (or missing the proper message length indications altogether). If you encounter such a warning on a site or service that you maintain, please help clean up the web by fixing the issue.
Update: IE10 updated the behavior here somewhat; see my post Content-Length Validation in IE10.
I was wondering if there is any good reason a server would return an incorrect Content-Length?
Another interesting thing to check is what happens if there are multiple C-L header fields. See trac.tools.ietf.org/.../95.
Also, it would be good if handling of truncated downloads could be fixed sometime in the future; the difference to viewing content is that the user may not be even *aware* of the problem until much later.
There are an infinite number of things to check; I simply blog about the things we're working on. :-) I believe IE uses the first C-L found (which is what we do with headers in general) I don't know what the other guys do. I'm not sure what you had in mind for "fixed" since my point was that our attempt to "fix" this in IE9 Beta met with died at the hands of the cruel mistress "Compatibility."
HTTPbis p1 current says:
Request messages that are prematurely terminated, possibly due to a cancelled connection or a server-imposed time-out exception, MUST result in closure of the connection; sending an HTTP/1.1 error response prior to closing the connection is OPTIONAL. Response messages that are prematurely terminated, usually by closure of the connection prior to receiving the expected number of octets or by failure to decode a transfer-encoded message-body, MUST be recorded as incomplete. A user agent MUST NOT render an incomplete response message-body as if it were complete (i.e., some indication must be given to the user that an error occurred). Cache requirements for incomplete responses are defined in Section 2.1.1 of [Part6].
... so it looks like those ShowContents are a problem. Were there any indications of error?
The overrun behaviours aren't talked about in bis yet, as they'd be considered separate responses. Not sure what could be said.
One additional thing to check is whether and how browsers cache something that has an incorrect content-lenth; bis p6 currently says
A cache that receives an incomplete response (for example, with fewer bytes of data than specified in a Content-Length header field) can store the response, but MUST treat it as a partial response [Part5]. Partial responses can be combined as described in Section 4 of [Part5]; the result might be a full response or might still be partial. A cache MUST NOT return a partial response to a client without explicitly marking it as such using the 206 (Partial Content) status code. A cache that does not support the Range and Content-Range header fields MUST NOT store incomplete or partial responses.
bill ii: a server may not know the content length, eg if it is creating zip files on the fly.
@mARK: That's no excuse to return an incorrect Content-Length. If the server doesn't know the length, it MUST omit the Content-Length header and send the Transfer-Encoding: Chunked header instead, following the rules for chunked encoding.
Too bad this bad behaviour is ingrained like this and sites can't even opt-in to stricter handling in this case. :-(
BTW, Firefox bug tracker entry concerning multiple Content-Length statements: bugzilla.mozilla.org/show_bug.cgi