IEInternals

A look at Internet Explorer from the inside out. @EricLaw left Microsoft in 2012, but was named an IE MVP in '13 & an IE userAgent (http://useragents.ie) in '14

Bugs in IE8's Lookahead Downloader

Bugs in IE8's Lookahead Downloader

All bugs mentioned in this post are now fixed. 

Internet Explorer has a number of features designed to render pages more quickly. One of these features is called the "Lookahead Downloader" and it's used to quickly scan the page as it comes in, looking for the URLs of resources which will be needed later in the rendering of the page (specifically, JavaScript files). The lookahead downloader runs ahead of the main parser and is much simpler-- its sole job is to hunt for those resource urls and get requests into the network request queue as quickly as possible. These download requests are called "speculative downloads" because it is not known whether the resources will actually be needed by the time that the main parser reaches the tags containing the URLs. For instance, inline JavaScript runs during the main rendering phase, and such script could (in theory) actually remove the tags which triggered the speculative downloads in the first place. However, this "speculative miss" corner case isn't often encountered, and even if it happens, it's basically harmless, as the speculative request will result in downloading a file which is never used.

IE8 Bugs and their impact
Unfortunately, since shipping IE8, we've discovered two problems in the lookahead downloader code that cause Internet Explorer to make speculative requests for incorrect URLs. Generally this has no direct impact on the visitor's experience, because when the parser actually reaches a tag that requires a subdownload, if the speculative downloader has not already requested the proper resource, the main parser will at that time request download of the proper resource. If your page encounters one of these two problems, typically:

  • The visitor will not notice any problems like script errors, etc
  • The visitor will have a slightly slower experience when rendering the page because the speculative requests all "miss"
  • Your IIS/Apache logs will note requests for non-existent or incorrect resources

If your server is configured to respond in some unusual way (e.g. logging the user out) upon request of a non-existent URL, the impact on your user-experience may be more severe.

The BASE Bug

Update: The BASE bug is now
 
fixed.

The first problem is that the speculative downloader "loses" the <BASE> element after its first use. This means that if your page at URL A contains a tag sequence as follows:

<html><head><base href=B><script src=relC><script src=relD><script src=relE><body>

which requests 3 JavaScript files from the path specified in "B", IE8's speculative downloader will incorrectly request download of URLs "B+relC", and "A+relD" and "A+relE". Correct behavior is to request download of URLs "B+relC", "B+relD", and "B+relE". Hence, in this case, two incorrect requests are sent, usually resulting in 404s from the server. Of course, when the main parser gets to these script tags, it will determine that "B+relC" is already available, but "B+relD", and "B+relE" have not yet been requested, and it will request those correct two URLs and complete rendering of the page.

At present, there is no simple workaround for this issue. Technically, the following syntax will result in proper behavior:

 <html><head><base href=B><script src=relC><base href=B><script src=relD><base href=B><script src=relE><body>

...but this is not standards-compliant and is not recommended. If the page removes its reliance upon the BASE tag, the problem will no longer occur.

Remember: The BASE bug is now fixed.

The Missing 4k Bug

Update: The 4k bug is now fixed. 

The second problem is significantly more obscure, although a number of web developers have noticed it and filed a bug on Connect. Basically, the problem here is that there are a number of tags which will cause the parser and lookahead downloader to restart scanning of the page from the beginning. One such tag is the META HTTP-EQUIV Content-Type tag which contains a CHARSET directive. Since the CHARSET specified in this tag defines what encoding is used for the page, the parser must restart to ensure that is parsing the bytes of the page in the encoding intended by the author. Unfortunately, IE8 has a bug where the restart of the parser may cause incorrect behavior in the Lookahead downloader, depending on certain timing and network conditions.

The incorrect behavior occurs if your page contains a JavaScript URL which spans exactly the 4096th byte of the HTTP response. If such a URL is present, under certain timing conditions the lookahead downloader will attempt to download a malformed URL consisting of the part of the URL preceding the 4096th byte combined with whatever text follows the 8192nd byte, up to the next quotation mark. Web developers encountering this problem will find that their logs contain requests for bogus URLs with long strings of URLEncoded HTML at the end.

As with the previous bug, end users will not typically notice this problem, but examination of the IIS logs will show the issue.

For many instances of this bug, a workaround is available-- the problem only appears to occur when the parser restarts, so by avoiding parser restarts, you can avoid the bug.  By declaring the CHARSET of the page using the HTTP Content-Type header rather than specifying it within the page, you can remove one cause of parser restarts.

So, rather than putting

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">

In your HEAD tag, instead, send the following HTTP response header:

Content-Type: text/html; charset=utf-8

Note that specification of the charset in the HTTP header results in improved performance in all browsers, because the browser's parsers need not restart parsing from the beginning upon encountering the character set declaration. Furthermore, using the HTTP header helps mitigate certain XSS attack vectors.

Unfortunately, however, suspension of the parser (e.g. when it encounters an XML Namespace declaration) can also result in this problem, and it's not feasible for a web developer to avoid suspension of the parser.

But, remember: The 4k bug is now fixed. 

Summary
While these problems are significant, they are not so dire as some readers will conclude at first glance. The second bug, in particular, is quite rarely encountered due to its timing-related nature and the requirement that page have a JavaScript URL spanning a particular byte in the response. Encountering the second issue is not nearly as prevalent as some web developers believe-- for instance, we've heard claims that IE6, 7, and Firefox all have this problem, which is entirely untrue. Readers can easily determine if a page is hitting either bug by examining server logs, or watching network requests with Fiddler.

The IE team will continue our investigation into these bugs and, as with any reported issues, may choose to make available an IE8 update to resolve the issues.

Remember: All bugs mentioned in this post are now fixed. 

Apologies for the inconvenience, and thanks for reading!

-Eric

  • Eric-

    is there a more exhaustive list of tags that can cause the lookahead downloader to reset?  Our web-app doesn't use the BASE tag, and rarely uses the meta tag to set the charset, which we're already specifying in the HTTP headers.  I'd love to reduce the amount of spam our error tracking software generates, thanks to the 404s, and every little bit helps.

  • @Tim: Unfortunately, as @hofstee mentioned, I believe that XML Namespace declarations (commonly used in XHTML) also trigger the restart logic.

  • @EricLaw

    Are the base tags, the meta tags, and the XML declarations all that we know so far?  What I'd love to be able to do is go to my developers with a list of problematic tags and say "fix these, and the e-mails will stop."

  • @Tim: Unfortunately, I wouldn't feel confident in suggesting that XML Namespaces and META tags are the only cause of restarts, because I know very little about the overall parsing architecture.

    The BASE tag is obviously the biggest cause of incorrect requests, because the incorrect speculative requests due to BASE are not at all related to timing.

    While unfortunately I'm not able to make any statements or speculations about IE code fixes (either availability or timeframe) I can say that this is an issue that we're getting a significant amount of customer escalations about because the workarounds are unappealing.

  • @EricLaw:

    I appreciate everything you're doing for this problem.  Thanks for your help.

  • And this bug will be fixed **when**?

    Thanks.

  • Sorry Randy, as mentioned previously:

    Unfortunately I'm not able to make any statements or speculations about IE code fixes (either availability or timeframe). I can say that this is an issue that we're getting a significant amount of customer escalations about because the workarounds are unappealing.

  • Eric,

    Sorry, I didn't catch that.  Thanks for responding.

  • Eric:

    I appreciate you taking the time to investigate this issue and help the community.  I saw a post you made several months ago like so:

    "If you can send me the HTML (or a network capture: www.fiddler2.com) of the affected page, I'll have a look to see if there's any other cause for the parser restart. Email me at microsoft.com, username ericlaw"

    Here is my issue, we are definately getting the 4096 K bug.  Our site is ASP.net 3.5 and we get the invalid viewstate error on DecryptStringIV, etc.  I can tell you for a fact this ALWAYS seems to focus around ScriptResource.axd and WebResource.axd.  We have a standard master page, css, etc.

    The issue I'm having is its intermittent, occurs on various pages, so I can't send you a single HTML snapshot of the problem, since I also personally can't get it to fail.

    Can you provide any insight, we do have a development server with a public ip where the full application is running.  Should I sent this to you via email? Can you help me? Given the facts I have presented what is the best way for us to combat this error.

    Thanks

  • @Martin: Does your page use a META CHARSET tag that specifies a character set?  Do you specify any namespaces?

    In terms of workarounds, I can think of several unappealing ones (e.g. use a comment at the top of the page that pushes all of the relevent script tags out of the 4096th byte).

  • Eric:

    Thanks for your reply.  We did a very simplistic/clean overall design for the web site.  We basically have a base aspx page which all other pages derive from.  We have a single Master Page that is used for all pages, etc.  We have all logic encapsulated within User Controls, ascx, etc.

    Regarding your question.  In our Master Page this is defined:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

    <html xmlns="http://www.w3.org/1999/xhtml" >

    <!--start content-->

    <head runat="server">

    Does this answer your question, its fairly vanilla we use standard CSS and JQuery as well, if that matters.

    So is there anything we can do or do you see anything in the Master Page top declaration we could or should change that won't effect anything else?

    What would be the least unappealing work around.  This is actually a part of a big eCommerce site and we use ELMAH nad IIS logging so we are flooded with this error for IE 8 users.

    Thanks

  • Sorry, eeyor145, but I'm pretty sure the XMLNS declaration ends up triggering a parser restart as well.

    In terms of "least unappealing workarounds"-- I'm afraid I no longer have any.  I took a look at sticking a huge buffer comment just inside the HEAD tag to push the first script URL outside of the 4096th byte, but it looks (not surprisingly) like this impacts the 8192nd byte as well.

  • Thank you Eric for clarifying the issue for us. As stated by many others I am also requesting that a fix be made available as soon as possible. Most high volume sites log errors and we are being flooded with these .. making our logging a pain to  analyze.At the very least IE should include the "lookahead downloader" in the user agent string.

  • I am seeing the missing 4k bug about every two minutes on one of my sites.  Sure would like to see a fix.

    Thanks.

  • Would putting an IE conditional that re-states the base tag before each resource be considered standards-compliant or problematic?

    <!--[if IE 8]><base href="blah.com" /><![endif]-->

    <script type="text/javascript" language="javascript" src="foo/bar.js"></script>

    <!--[if IE 8]><base href="blah.com" /><![endif]-->

    <script type="text/javascript" language="javascript" src="foo/bar2.js"></script>

    <!--[if IE 8]><base href="blah.com" /><![endif]-->

    <link rel="stylesheet" href="foo/bar.css" type="text/css" />

Page 3 of 8 (116 items) 12345»
Leave a Comment
  • Please add 2 and 5 and type the answer here:
  • Post