IEInternals

A look at Internet Explorer from the inside out. @EricLaw left Microsoft in 2012, but was named an IE MVP in '13 & an IE userAgent (http://useragents.ie) in '14

Bugs in IE8's Lookahead Downloader

Bugs in IE8's Lookahead Downloader

All bugs mentioned in this post are now fixed. 

Internet Explorer has a number of features designed to render pages more quickly. One of these features is called the "Lookahead Downloader" and it's used to quickly scan the page as it comes in, looking for the URLs of resources which will be needed later in the rendering of the page (specifically, JavaScript files). The lookahead downloader runs ahead of the main parser and is much simpler-- its sole job is to hunt for those resource urls and get requests into the network request queue as quickly as possible. These download requests are called "speculative downloads" because it is not known whether the resources will actually be needed by the time that the main parser reaches the tags containing the URLs. For instance, inline JavaScript runs during the main rendering phase, and such script could (in theory) actually remove the tags which triggered the speculative downloads in the first place. However, this "speculative miss" corner case isn't often encountered, and even if it happens, it's basically harmless, as the speculative request will result in downloading a file which is never used.

IE8 Bugs and their impact
Unfortunately, since shipping IE8, we've discovered two problems in the lookahead downloader code that cause Internet Explorer to make speculative requests for incorrect URLs. Generally this has no direct impact on the visitor's experience, because when the parser actually reaches a tag that requires a subdownload, if the speculative downloader has not already requested the proper resource, the main parser will at that time request download of the proper resource. If your page encounters one of these two problems, typically:

  • The visitor will not notice any problems like script errors, etc
  • The visitor will have a slightly slower experience when rendering the page because the speculative requests all "miss"
  • Your IIS/Apache logs will note requests for non-existent or incorrect resources

If your server is configured to respond in some unusual way (e.g. logging the user out) upon request of a non-existent URL, the impact on your user-experience may be more severe.

The BASE Bug

Update: The BASE bug is now
 
fixed.

The first problem is that the speculative downloader "loses" the <BASE> element after its first use. This means that if your page at URL A contains a tag sequence as follows:

<html><head><base href=B><script src=relC><script src=relD><script src=relE><body>

which requests 3 JavaScript files from the path specified in "B", IE8's speculative downloader will incorrectly request download of URLs "B+relC", and "A+relD" and "A+relE". Correct behavior is to request download of URLs "B+relC", "B+relD", and "B+relE". Hence, in this case, two incorrect requests are sent, usually resulting in 404s from the server. Of course, when the main parser gets to these script tags, it will determine that "B+relC" is already available, but "B+relD", and "B+relE" have not yet been requested, and it will request those correct two URLs and complete rendering of the page.

At present, there is no simple workaround for this issue. Technically, the following syntax will result in proper behavior:

 <html><head><base href=B><script src=relC><base href=B><script src=relD><base href=B><script src=relE><body>

...but this is not standards-compliant and is not recommended. If the page removes its reliance upon the BASE tag, the problem will no longer occur.

Remember: The BASE bug is now fixed.

The Missing 4k Bug

Update: The 4k bug is now fixed. 

The second problem is significantly more obscure, although a number of web developers have noticed it and filed a bug on Connect. Basically, the problem here is that there are a number of tags which will cause the parser and lookahead downloader to restart scanning of the page from the beginning. One such tag is the META HTTP-EQUIV Content-Type tag which contains a CHARSET directive. Since the CHARSET specified in this tag defines what encoding is used for the page, the parser must restart to ensure that is parsing the bytes of the page in the encoding intended by the author. Unfortunately, IE8 has a bug where the restart of the parser may cause incorrect behavior in the Lookahead downloader, depending on certain timing and network conditions.

The incorrect behavior occurs if your page contains a JavaScript URL which spans exactly the 4096th byte of the HTTP response. If such a URL is present, under certain timing conditions the lookahead downloader will attempt to download a malformed URL consisting of the part of the URL preceding the 4096th byte combined with whatever text follows the 8192nd byte, up to the next quotation mark. Web developers encountering this problem will find that their logs contain requests for bogus URLs with long strings of URLEncoded HTML at the end.

As with the previous bug, end users will not typically notice this problem, but examination of the IIS logs will show the issue.

For many instances of this bug, a workaround is available-- the problem only appears to occur when the parser restarts, so by avoiding parser restarts, you can avoid the bug.  By declaring the CHARSET of the page using the HTTP Content-Type header rather than specifying it within the page, you can remove one cause of parser restarts.

So, rather than putting

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">

In your HEAD tag, instead, send the following HTTP response header:

Content-Type: text/html; charset=utf-8

Note that specification of the charset in the HTTP header results in improved performance in all browsers, because the browser's parsers need not restart parsing from the beginning upon encountering the character set declaration. Furthermore, using the HTTP header helps mitigate certain XSS attack vectors.

Unfortunately, however, suspension of the parser (e.g. when it encounters an XML Namespace declaration) can also result in this problem, and it's not feasible for a web developer to avoid suspension of the parser.

But, remember: The 4k bug is now fixed. 

Summary
While these problems are significant, they are not so dire as some readers will conclude at first glance. The second bug, in particular, is quite rarely encountered due to its timing-related nature and the requirement that page have a JavaScript URL spanning a particular byte in the response. Encountering the second issue is not nearly as prevalent as some web developers believe-- for instance, we've heard claims that IE6, 7, and Firefox all have this problem, which is entirely untrue. Readers can easily determine if a page is hitting either bug by examining server logs, or watching network requests with Fiddler.

The IE team will continue our investigation into these bugs and, as with any reported issues, may choose to make available an IE8 update to resolve the issues.

Remember: All bugs mentioned in this post are now fixed. 

Apologies for the inconvenience, and thanks for reading!

-Eric

  • @Chris

    We are in the same situation.

    Load balanced servers in production, same machine keys across the web farm, and we see this conistently in production, but we HAVE also seen it in our QA/Staging environments.

    Very few times though compared to production.

    We have a very robust logging system in our application that captures all of these events, and are registered as many different types of errors:

    Invalid Viewstate

    Invalid character in a Base-64 string

    Invalid length for a Base-64 char array

    Length of the data to decrypt is invalid

    This is an invalid script resource request

    This is an invalid webresource request

    I actually reproduce the issue by chance myself, and attached to the debugger IE 8 provides, and found that a validator that's required on a page was erroring out since the script.axd didn't load properly.

    I wasn't able to confirm that it eventually 'did' load, and that the error I saw was during the pre-parsing, but nevertheless, the error WAS customer facing and this issue is causing a serious headache for us.

    Eric,

    Any update on the status of a possible fix? Or do you have any suggestions for us?

    Could we insert a block "IF IE 8" that would push content past the span of bytes that cause the issue?

  • @Mark, as noted in the comments above, unfortunately I'm not able to make any statements or speculations about IE code fixes (either availability or timeframe). I can say that this is an issue that we're getting a significant amount of customer escalations about because the workarounds are unappealing.

    In terms of "least unappealing workarounds"-- I'm afraid I no longer have any.  I took a look at sticking a huge buffer comment just inside the HEAD tag to push the first script URL outside of the 4096th byte, but it looks (not surprisingly) like this impacts the 8192nd byte as well, and so on.

  • We never really bothered with this since we're using a powerful logger that is able to filter out these kinds of errors.

    Recently however we've been receiving reports like the one below:

    Requested URL: /TemplateModule/Web/Scripts/TemplateBaseContr.aspx?ID=6036&Action=Delete
    URL Referrer: http://XXXX/COMModule/ComViewArticle.aspx?MODULE=CONTENT&ID=6036

    It's the same type of error but the requested URL is disturbing! Doesn't this mean that the Lookahead downloader could form a valid request to a page that deletes stuff from the database without us knowing?

  • @Aaron: Sure, it seems fairly unlikely, but it's technically possible.

    A "valid" URL could be incorrectly retrieved if the pre-parser stream broke the first script URL at exactly the right byte and then skipped to continue the URL from exactly the right byte later in the page.

  • Hello!

    I'm tracing this bug from beginning.

    It costed me plenty of time to get information about it.

    Doing test-applications to trying to recreate the bug or implementing some solutions read in forums and posts.

    First of all I want to thank Eric for his first helpful text concerning this issue. SERIOUSLY

    Now I have only to send the linked text to our customers (some of them get mailed error reports) and they believe me, that site users don't experience any problems and thats not my responsibility / ability to fix it.

    Thanks Eric...

    since i spent many time on investigation, this error now saves investigation time, which is funny.

    some portals made hundreds of error mails a day.

    Since nobody is reading them anymore (e.g. our customers, and me), there is fewer investigation time on bugs compared to the time before this bug. (real bug reporting is lost somewhere in the huge amount of "placebo bugs"). Now we just react on errors reported through website users by phone or mail.

    The bug information is typicaly more precise.

    So one more time Eric I want to THANK YOU, to have courage on reporting some information on this issue provided by a authoritive source.

    But I'm wondering how this bug couldn't be solved for about a year.

    I'm a developer and love MS for the .NET Framework and especially ASP.NET...

    I work with many MS tchnolgies/products, like MS-SQL, VS.NET or the office suite.

    In my opinion the mentioned apps are more complex then a browser. (Thats just my thoughts)

    the funny thing is, that since my development beginnings in 2000, no bug / problem in an MS Product hits me that much often and don't have even a clear workaround

    What do these IE8 Dev-guys do the whole day? working on IE9 only...

    Or is there just one poor guy spending 90% of his time responding to emails and can only use 10% for developing..

    a solution seems so simply for me...

    e.G.

    if the "IE8's Lookahead Downloader" feature have some malfunction, it could be disabled with the next IE8 Update

    OR

    In my understanding this feature starts preloading jsdata from url before document parsing is done.

    Maybe a simple check for a preceded and postpositioned " char (quotation and similar escape chars) would fix it for well formed html.

    OR

    at least a simple

    if not (in LookAheadUrl exists "webresource.axd" or "scriptresource.axd" ) then

    call DoLookAhead(LookAheadUrl)

    end if

    will help in my case (sounds like i'm an egoist).

    Maybe this solutions are stupid and not complex enough, but i got them at 4:00 AM after while writing this comment, after i found this blog during repetitive search for solution to this issue.

    By this chance i gain allways some other interesting information about things not connected to this one, so I don't wasted my time..

    Thank you Eric and greetings from cologne, germany

  • >In my opinion the mentioned apps are more complex then a browser. (Thats just my thoughts)

    It's really, really hard to measure complexity, but I wouldn't assume that the other projects you mention are more complicated than the browser. If nothing else, all of the other types of software you mention can be implemented and run inside a browser. :-)

  • We have some users experiencing a lockup in IE8 on a certain page, and that page also tends to be the one that shows up in our logs as having this issue.  Have you heard of any correlation?  I’m trying to find the cause of the lockups, and this is about the only thing I have left that isn’t eliminated.

  • Eric, can we get an update confirmation that this bug is still not fixed?

    I'm surprised no one has added "Don't let your users use IE8 because its going to generate lots of wasted requests" to the lists of how to optimize your web applications.

    My app gets about 60 of these errors a day (I receive an email about each and every one unfortunately).

    Had I known 6 months ago that this bug wasn't going to be resolved (ever?!) perhaps I would have added a filter to my logging code to throw away errors with URLs of ScriptResource and WebResource at least.

  • BrettJ: Correct-- there's no publicly available fix for the 4k bug at this point in time.

    (FWIW, even in spite of this issue, IE8's network performance will soundly trump that of IE7 in virtually every non-contrived case).

  • @Dean: There's no known way for this issue to cause a hang of any sort. Does the page in question have any ActiveX content (e.g. Flash) on it?

  • No, no flash.  Our client having problems with IE freezing found that the problem doesn't exist when they create a new user profile.  A test user having problems have wiped their profiles and created a new one, and are using it successfully.  However, this isn't an acceptable solution for all their users.  We're continuing to look for a cause.

  • Using fiddler, I can see the lookahead base tag bug.  I was thrilled to see a fix for this.  However, after trying to download Windows6.1-KB974455-x86.msu and install, I get...

    "The update is not applicable to your computer."

    I'm running Win7 Professional.  Please help..

    Mike

  • @Michael: You may have this update installed already-- if you're up-to-date on patches for Windows, then it's already installed.

    If not, the problem is probably that you're trying to install the x86 package on a 64-bit computer. You should try to install the update via WindowsUpdate, and if you must install directly, make sure you install the proper bitness.

  • Thanks for the quick reply!  

    I have checked the patches on my machine using appwiz.cpl and it doesn't show that I have 974455 installed.  

    I have spoken with my domain admin and WSUS 3.x tells him I don't require this update.  

    Using windows updates, it does not offer 974455 and I don't have any critical updates waiting to be installed.

    I have checked system properties and it shows that my system type is "32-bit operating system"

    The real issue is I'm still seeing many 404's from what I assume is the Lookahead Downloader not using my base tag.

    Thanks again for the help...

    Mike

  • @Michael: Sorry, it looks like the IE8+Win7 version of this fix hasn't yet been released. The prior fix went out for XP/Vista. To answer the obvious question, sorry, no, I'm not allowed to make any statements about timelines for unreleased fixes, but I understand the urgency in getting this fixed.

Page 6 of 8 (116 items) «45678
Leave a Comment
  • Please add 7 and 5 and type the answer here:
  • Post