IEInternals

A look at Internet Explorer from the inside out. @EricLaw left Microsoft in 2012, but was named an IE MVP in '13 & an IE userAgent (http://useragents.ie) in '14

Bugs in IE8's Lookahead Downloader

Bugs in IE8's Lookahead Downloader

All bugs mentioned in this post are now fixed. 

Internet Explorer has a number of features designed to render pages more quickly. One of these features is called the "Lookahead Downloader" and it's used to quickly scan the page as it comes in, looking for the URLs of resources which will be needed later in the rendering of the page (specifically, JavaScript files). The lookahead downloader runs ahead of the main parser and is much simpler-- its sole job is to hunt for those resource urls and get requests into the network request queue as quickly as possible. These download requests are called "speculative downloads" because it is not known whether the resources will actually be needed by the time that the main parser reaches the tags containing the URLs. For instance, inline JavaScript runs during the main rendering phase, and such script could (in theory) actually remove the tags which triggered the speculative downloads in the first place. However, this "speculative miss" corner case isn't often encountered, and even if it happens, it's basically harmless, as the speculative request will result in downloading a file which is never used.

IE8 Bugs and their impact
Unfortunately, since shipping IE8, we've discovered two problems in the lookahead downloader code that cause Internet Explorer to make speculative requests for incorrect URLs. Generally this has no direct impact on the visitor's experience, because when the parser actually reaches a tag that requires a subdownload, if the speculative downloader has not already requested the proper resource, the main parser will at that time request download of the proper resource. If your page encounters one of these two problems, typically:

  • The visitor will not notice any problems like script errors, etc
  • The visitor will have a slightly slower experience when rendering the page because the speculative requests all "miss"
  • Your IIS/Apache logs will note requests for non-existent or incorrect resources

If your server is configured to respond in some unusual way (e.g. logging the user out) upon request of a non-existent URL, the impact on your user-experience may be more severe.

The BASE Bug

Update: The BASE bug is now
 
fixed.

The first problem is that the speculative downloader "loses" the <BASE> element after its first use. This means that if your page at URL A contains a tag sequence as follows:

<html><head><base href=B><script src=relC><script src=relD><script src=relE><body>

which requests 3 JavaScript files from the path specified in "B", IE8's speculative downloader will incorrectly request download of URLs "B+relC", and "A+relD" and "A+relE". Correct behavior is to request download of URLs "B+relC", "B+relD", and "B+relE". Hence, in this case, two incorrect requests are sent, usually resulting in 404s from the server. Of course, when the main parser gets to these script tags, it will determine that "B+relC" is already available, but "B+relD", and "B+relE" have not yet been requested, and it will request those correct two URLs and complete rendering of the page.

At present, there is no simple workaround for this issue. Technically, the following syntax will result in proper behavior:

 <html><head><base href=B><script src=relC><base href=B><script src=relD><base href=B><script src=relE><body>

...but this is not standards-compliant and is not recommended. If the page removes its reliance upon the BASE tag, the problem will no longer occur.

Remember: The BASE bug is now fixed.

The Missing 4k Bug

Update: The 4k bug is now fixed. 

The second problem is significantly more obscure, although a number of web developers have noticed it and filed a bug on Connect. Basically, the problem here is that there are a number of tags which will cause the parser and lookahead downloader to restart scanning of the page from the beginning. One such tag is the META HTTP-EQUIV Content-Type tag which contains a CHARSET directive. Since the CHARSET specified in this tag defines what encoding is used for the page, the parser must restart to ensure that is parsing the bytes of the page in the encoding intended by the author. Unfortunately, IE8 has a bug where the restart of the parser may cause incorrect behavior in the Lookahead downloader, depending on certain timing and network conditions.

The incorrect behavior occurs if your page contains a JavaScript URL which spans exactly the 4096th byte of the HTTP response. If such a URL is present, under certain timing conditions the lookahead downloader will attempt to download a malformed URL consisting of the part of the URL preceding the 4096th byte combined with whatever text follows the 8192nd byte, up to the next quotation mark. Web developers encountering this problem will find that their logs contain requests for bogus URLs with long strings of URLEncoded HTML at the end.

As with the previous bug, end users will not typically notice this problem, but examination of the IIS logs will show the issue.

For many instances of this bug, a workaround is available-- the problem only appears to occur when the parser restarts, so by avoiding parser restarts, you can avoid the bug.  By declaring the CHARSET of the page using the HTTP Content-Type header rather than specifying it within the page, you can remove one cause of parser restarts.

So, rather than putting

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">

In your HEAD tag, instead, send the following HTTP response header:

Content-Type: text/html; charset=utf-8

Note that specification of the charset in the HTTP header results in improved performance in all browsers, because the browser's parsers need not restart parsing from the beginning upon encountering the character set declaration. Furthermore, using the HTTP header helps mitigate certain XSS attack vectors.

Unfortunately, however, suspension of the parser (e.g. when it encounters an XML Namespace declaration) can also result in this problem, and it's not feasible for a web developer to avoid suspension of the parser.

But, remember: The 4k bug is now fixed. 

Summary
While these problems are significant, they are not so dire as some readers will conclude at first glance. The second bug, in particular, is quite rarely encountered due to its timing-related nature and the requirement that page have a JavaScript URL spanning a particular byte in the response. Encountering the second issue is not nearly as prevalent as some web developers believe-- for instance, we've heard claims that IE6, 7, and Firefox all have this problem, which is entirely untrue. Readers can easily determine if a page is hitting either bug by examining server logs, or watching network requests with Fiddler.

The IE team will continue our investigation into these bugs and, as with any reported issues, may choose to make available an IE8 update to resolve the issues.

Remember: All bugs mentioned in this post are now fixed. 

Apologies for the inconvenience, and thanks for reading!

-Eric

  • In thinking on how to collect info on this bug, I realized that this problem can be reproduced using the standard J2EE web app security constraints.

    Here's a demo site with downloadable code (since none of it's proprietary).

    http://www.skylarking.org:8180/ie8-bug

    Note the port number and that since it's running on my personal server via RR, I may pull the plug on this in a few weeks or so.  

  • @CG: I'm not sure I understand your repro.  Here's the HTTP Traffic:

    --------

    GET http://www.skylarking.org:8180/ie8-bug/private/

    200 OK

    GET http://www.skylarking.org:8180/ie8-bug/login/login.js

    304 Not Modified

    [**** INCORRECT LOOKAHEAD DOWNLOAD HERE ****]

    GET http://www.skylarking.org:8180/ie8-bug/private/login.css

    200 OK

    [*** CORRECT URL RETRIEVAL ***]

    GET http://www.skylarking.org:8180/ie8-bug/login/login.css

    304 Not Modified

    [*** YOUR CODE REDIRECTS TO AN ERROR PAGE ***]

    POST http://www.skylarking.org:8180/ie8-bug/login/j_security_check

    302 Moved Temporarily to http://www.skylarking.org:8180/ie8-bug/private/login.css

    GET http://www.skylarking.org:8180/ie8-bug/private/login.css

    404 Not Found

    --------

    As you can see, the "j_security_check" page redirects, using a HTTP header, to an invalid URL. Hence, it is your server that is navigating the browser to the incorrect target URL.

    Why it does that, I cannot tell for sure. My assumption would be that there's a session variable on the server that keeps track of the last "protected URL" requested and navigates the user to that page after login, but such a feature is under the control of the server, not the client. Such an architecture is not common, and as I noted above: "If your server is configured to respond in some unusual way (e.g. logging the user out) upon request of a non-existent URL, the impact on your user-experience may be more severe." From the looks of it, this applies to your design.

  • Yes, the session tracking the last protected URL is the behaviour that is happening.  You get the same behaviour in other browsers if you put a link statement to a protected css file on the login page.

    But some important points here are:

    First, the authentication method is not MY code... this example is based on Tomcat's implementation of the Java Servlet API's security definitions. Most likely, the same error will occur on any standard Java webapp platform, e.g. JBoss, Websphere, and the like.

    The use of a BASE tag for "view" elements like the login.jsp example is not that uncommon.

    The HTML generated does refer any anyway to the invalid login.css URL... It's written correctly to all standards. The tracking of the last called protected URL is not the bug, it is IE's disreguard of the Base tag that is causing the problem.

    Finally, you asked a while back, why should this be considered a show stopper or critical bug... hopefully, this discussion has shown that this bug is not just a "nuisance" bug but it can cause problems for a wide set of web applications that require security.  I would hope that this discussion finds it's way into the MS bug priority setting and that a this problem will be fixed in an update cycle.

    Oh, should have said this sooner.. Thank you for finding the underlying IE bug in the first place!

  • @CG: Thanks for the clarification.

    If the architecture used on your site was widespread, it's unlikely that it would have taken this long to discover; it would have likely been discovered in one of the beta cycles.

    As it stands, it's still a significant bug, both for the corner cases where there is an end-user impact, but also for the performance implications. Lookahead downloading is intended to make the browser faster, not slower. :-)

  • My beautiful, long-in-the-making, highly admired, highly sophisticated, super secure business application has suddenly been rendered unusable by the 'base tag' bug.

    At least there is Firefox...

    Is there going to be a fix (?)

  • CG: before the days of stick everything in session variables, typically you rediected people to /login?ref=/secureUrl.

    This is also causing us issues where its causing an extra 20 requests ( for all the .js files ) to our app servers for each page request.

  • The Missing 4k Bug:

    The article states: "By declaring the CHARSET of the page using the HTTP Content-Type header rather than specifying it within the page, you can remove one cause of parser restarts."

    Eric in an email:

    "Unfortunately, another known cause of parser restarts is use of XML namespaces, which your site appears to use." So if you use XHTML the 4K issue can occur!

  • The BASE-tag bug is causing our customers much grief as well. Our system is based on the BASE tag as our pages is rendered by a servlet which is located in another directory than the css- and js-files that we refer to in the head-tag on all pages.

    The third-party servletexec that we use is logging every request for a non-existing file as an invalid call for a non-existing class. The user is not affected by this, but our server is being bogged down by the enormous amount of logging that we can not disable. Introducing extra BASE-tags in our pages is not an alternative, and non-standard at that.

    A "smart" downloader is only smart if the number of requests sent to the server is the same or reduced, but this generates a lot more requests than necessary.

  • <<A "smart" downloader is only smart...>>

    David, let me reiterate because clearly you missed it:

    This is a bug. A plain old boring bug. Bug. Bug. Bug bug bug.

  • No, I didn't miss the "This is a bug"-part...

    So, when are we going to see a fix for this, not only boring bug, but a bug killing business for a lot of people?

  • So much drama here.

    Harry/David: If one bug in one browser "kills" your business, I think your business suffers from some more fundamental problems.

    It sounds like you already know how to workaround this problem, and refuse to do so.

  • @Bob - Why should we not "refuse" to redesign our solution and deploy new code to our customers that doesn't follow standard in order to counter a problem in a faulty browser?

  • David: You're missing the point.

    There are two possibilities:

    1> This bug doesn't "kill" your business, and you were just being sensational/dramatic.

    2> This bug does somehow "kill" your business. In this case allowing your business to die because you're married to "standards" ("to death do you part") is an option, although a rather unusual choice.

  • Drama or not... Possible to work around the bug or not...

    The underlying issue is still if or when MS will fix this. As with any bug, the question of priority comes up and that is based on a lot of factors including impact on your users.

    There is also a question of commitment to your own company/product's stated goals.  There was much fanfare with IE about it being more standards complient and developers not needing to do all the tricks they needed to do to get things to "behave" the same across different browsers.

    There always are fixes to browsers misbehaving.. but it has been illustrated here that IE8 has yet another quirk that developers have to either know about or find the hard way when code written to standards doesn't work.

    I hope the people who rank and priortize bugs are not being blinded by the drama, but are thinking about MS's stated commitment to have IE be standards complient (so we all start complaining about FF/Chrome compliant bugs :) ) and the fact that it's been shown to be more than some extra 404 entries in logs.

  • The ISSUE is how much time you waste in trying to determine if the bug is in the browser, or your software, and then develop a fix.

    IE yet again proves to be a browser with some evil bad gotchas, which waste valuable developer time.

    IE 8 is just another release which proves the old adage "Lets make the site work for standards compliant browsers, and then we will fix it for IE"

    Too bad you can't invoice MS for your wasted hours.

    I'm having to track down a different IE-8 related bug... :P

Page 2 of 8 (116 items) 12345»
Leave a Comment
  • Please add 3 and 6 and type the answer here:
  • Post