Please read my blog's comment policy here.
BackgroundLast year, I wrote about two bugs in IE8’s Lookahead Downloader that would cause IE8 to make spurious download requests for non-existent URLs. These spurious download requests generally went unnoticed by users, because the main parser would eventually retrieve the correct resource when it was needed. However, for a small number of sites (where requesting non-existent URLs has side-effects), significant user-experience problems occurred when spurious requests were issued. For instance, on some sites, ASP.NET ViewState Corruption exceptions would result in the sites defensively logging the user out.
In October, we fixed one of the two bugs, correcting the URLs requested by the Lookahead Downloader when the markup contained a BASE tag.
After that fix, one more type of bug remained: a timing-related problem whereby the Lookahead Downloader would sometimes request a malformed URL consisting of the part of a URL preceding the 4096th byte of the markup, combined with whatever text follows the 8192nd byte, up to the next quotation mark (sometimes called "the 4kb bug"). Our investigation determined that there were two scenarios that could lead to the 4kb bug:
While web developers could easily avoid Scenario #1 (by specifying the CHARSET in the HTTP Content-Type response header), critically, Scenario #2 didn't have any easy, comprehensive workarounds.
Yesterday’s FixYesterday’s IE8 Cumulative Update (KB980182) resolves the timing problems such that IE8’s Lookahead Downloader will no longer issue spurious requests. The Update resolves problems in Scenario #2 outright-- parser suspensions will no longer lead to problematic behavior. However, the Update kills the bug in Scenario #1 by disabling the Lookahead Downloader when a restart is encountered. Hence, we continue to strongly recommend that web developers specify the CHARSET in the HTTP Content-Type response header, as this ensures that the performance benefit of the Lookahead Downloader is realized. Even if a future version of IE addresses Scenario #1 more elegantly, there are other performance and security benefits to specifying the CHARSET using the HTTP header for pages targeting any browser.
I’ve built a Meddler Script which demonstrates the Restart-related timing issue, but keep in mind that it shouldn’t do anything interesting in IE8 after the 3/30/2010 IE Cumulative Update is applied.
 actual values varied, but were typically a multiple of 4kb Technically, using a Unicode BOM at the top of the document would also prevent the restart, but it doesn't confer the same security benefit.
Thanks for tracking and fixing these issues, and giving insight into how browsers work. Is there any kind of "spec" around speculative downloading? Do you know if the other major browsers do this, and if so how different the implementations differ? Is there benefit in getting more common behavior in this area? It would be beneficial if there was more sharing in this area - we'd end up with a better algorithm and well understood behavior.
@Steve: I haven't seen any documentation Lookahead Downloader implementations, but I'm pretty sure that every major browser does it. I'm hoping to write a comprehensive explanation of behavior at some point in the future, because trying to "reverse engineer" how it works by looking at the wire traffic can be very misleading.
IE8's "Lookahead Downloader" begins scanning ahead of the preparser (confusing name) when an script block end has been found, under the theory that the blocking behavior of script (especially when the script is from an external SRC) leads to wasted time.
The Lookahead Downloader looks *only* for SCRIPT SRCs to download, in order to ensure that the next time we hit a SCRIPT block, we're less likely to have to wait on the network.
What makes things a bit confusing is that the normal preparser also performs speculative downloads, for a wider variety of content types (e.g. images, CSS, script). The behavior of the preparser's speculative downloader is one of the things that your "browser download tester" tests.
Eric - KB980182 appears to be for IE6 SP1? (http://www.microsoft.com/downloads/details.aspx?FamilyID=daf199c4-da56-4a7f-80e6-3936ce5c267b&displaylang=en)
Does this apply to IE8 as well?
@Kenza: The same KB number is used for the entire cumulative update, regardless of which IE version / OS it applies to. See http://www.microsoft.com/downloads/en/results.aspx?pocId=&freetext=KB980182&DisplayLang=en, or better yet, just let Windows Update install it for you.
This problem seems to be happening again. We're seeing spurious GETs of objects with spurious HTML content appended to the URL. Pretty much what was described in:
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)
Can you confirm whether this bug re-appeared around a month ago?
FWIW, our application already provides this header:
@Yves: No, there are no known regressions here. How many hits are you seeing? It's possible that the user in question simply hasn't installed current updates from Windows Update. Do you have a URL I might have a look at?
I concur that this bug seems to have re-appeared. We're getting copious amounts of these errors each day.
To reiterate, there are no known regressions here, and without any additional information, I expect your errors are related to something else entirely.
It looks IE8 can't get the base if the href has relative path("..")
<a href="../zzz.action?zzzId=198&action=View" class="nodec"><img src="../images/view.gif" border="0" alt="View">
result ==== http://localhost/zzz.action and http://localhost/images/view.gif
which means the action will be 404 and it will not display the gif
@Dex: That is incorrect. Here's a test case: www.debugtheweb.com/.../base
@Eric: thanks for the reply.
The page was generated by struts1 and works in IE6(with correct url - verified thru mouse hover).
but somehow IE8 lost or can't get the base. To work in IE8, I have to get rid of the relative path.
@What's the URL? Keep in mind that the standard (and IE8) requires the BASE to be specified in the HEAD. IE6 had no such requirement.
Hi @Eric, I know its been a while, but I'm seeing this exact behavior in IE9 where the browser is making a request to a non-existing URL, that contains beginning and ending text from markup that is about 4KB apart from each other. This is a rough illustration of the markup:
<link rel="stylesheet" href="foo-bar.css">
... 4KB worth of text including -bar.css"> from above...
The URL requested then looks something like this which triggers a 404 response:
I know its tough to say, but heard of any regressions lately?
@Eddie: I haven't heard any complaints. Do you have access to the client? If so, the version # from Help > About and a Nemon or Wireshark capture would be super useful. If not, if the URL's public that might help too.