IEInternals

A look at Internet Explorer from the inside out. @EricLaw left Microsoft in 2012, but was named an IE MVP in '13 & an IE userAgent (http://useragents.ie) in '14

Best Practice: Get your HEAD in order

Best Practice: Get your HEAD in order

  • Comments 19

To ensure optimal performance and reliability when rendering pages, you should order the elements within the HEAD element carefully. First, I’ll explain the optimal order, and then explain the reasoning for this structure.

Optimal Head Ordering

<doctype>
    <html>
        <head>
            <meta http-equiv content-type charset>
 
           
<meta http-equiv x-ua-compatible>

            <base>
            <title, favicon, comments, script blocks, etc>

Why Order Matters

In order to understand why the ordering of the elements in the HEAD matters, it’s important to understand how the browser parses webpages, and what impact each element has on the parsing of the page.

When the browser begins parsing a page, it begins reading the bytes of the HTTP response body. If the response’s Content-Type header specifies a charset attribute, those body bytes can immediately be interpreted as text using the specified character encoding. However, if a charset declaration is not present, the browser must begin scanning the bytes of the response body, checking for a Unicode Byte-Order-Marker at the top or scanning for a META HTTP-EQUIV element that specifies the charset. When such a declaration is reached, the parser may need to restart in order to ensure that the bytes previously read were interpreted properly. When a restart occurs, the F12 Developer Tools will show the following note in the console:

image

This restart can impact performance, as we’ll discuss momentarily.

If a character set declaration is not found, the browser is forced to “Autodetect” the content-encoding based on the nature of the bytes read or other factors, potentially resulting in a mismatch between the web developer’s intent and the browser’s guess. That mismatch can result in a broken page, or a page which contains gibberish in some places. Therefore, for functionality and performance reasons, it is a best-practice to specify the encoding[1] using HTTP response headers. If you must specify the character set using a META tag for some reason, it is critical that the META tag is the first element in the HEAD.

Internet Explorer 8 and later versions allow page authors to specify which document mode should be used for the rendering of the page, in order to enable a site to suggest that later versions of IE should render a given page in a legacy mode for compatibility reasons. Because the document mode can impact how the browser parses a page, Internet Explorer will need to restart the parsing process if a META element is found that specifies an X-UA-Compatible value different than was originally used to start parsing. The F12 Developer tools will note when a restart was needed:

image

For that reason, it is a best-practice to specify any X-UA-Compatible value as a HTTP response header. If you must specify the X-UA-Compatible value using a META tag for some reason, this element MUST appear before any script blocks and SHOULD appear as early in the HEAD element as possible. In some cases, a specified X-UA-Compatible META tag can be ignored (e.g. because the document mode was already finalized due to earlier markup[2]). When this happens, the F12 Developer Tools’ console will show the following message:

image

The BASE element controls how any relative URLs in your page are made absolute in order to retrieve the specified resources from the network. Ordinarily, a relative URL is combined with the URL of a page in order to make it absolute. However, when a BASE tag is present, the specified HREF is used for URL-combination, and the page’s URL is not used. Because the BASE element impacts the combination of all relative URLs in a page, it MUST appear before any relative URLs in your page. As of IE7, the BASE element MUST appear within the HEAD element and will be ignored if it appears in the body. While it is technically possible to use JavaScript (e.g. via document.write) to specify a BASE element, doing so is strongly discouraged.

After you’ve specified any of the charset, X-UA-Compatible, and BASE declarations you need, finish out your HEAD tag with a TITLE element and any other markup.

Understanding the Lookahead Pre-parser

To reduce the delay inherent in downloading script, stylesheets, images, and other resources referenced in an HTML page, Internet Explorer needs to request the download of those resources as early as possible in the loading of the page. The key problem is that the browser’s parser must pause when it encounters a non-async/non-defer script element, in order to run the script in order, as required by the standards. In the absence of mitigations, this pause would result in a significant delay in the parsing of the rest of the page, and thus result in resources being requested from the network much later.

To mitigate this issue when loading a page, Internet Explorer runs a second instance of a parser whose job is to hunt for resources to download while the main parser is paused. This mode is called the lookahead pre-parser[3] because it looks ahead of the main parser for resources referenced in later markup. The download requests triggered by the lookahead are called “speculative” because it is possible (not likely, but possible) that the script run by the main parser will change the meaning of the subsequent markup (for instance, it might adjust the BASE against which relative URLs are combined) and result in the speculative request being wasted.

Critically, when the parser is forced to incur a document mode or charset restart, IE aborts all of that page’s in-flight requests and begins parsing of the page anew, again looking for resources to speculatively download. Beyond the CPU cost of these restarts, there can be a network cost as well.

For instance, consider a page that restarts from IE9 Standards mode to Quirks Mode. The F12 console shows the restart:

image

…and you can see the aborted speculative requests listed in the Network tab of the F12 Developer Tools:

image

Because the restart occurs very early on in page load, only the first speculative request actually made it through URLMon and WinINET and was requested from the network.

In Fiddler, you can see the aborted request for /1.js. The Reason column shows that the first request was from the original html lookahead tokenizer while the subsequent (successful) downloads were triggered by the restarted html lookahead tokenizer:

image

Because the browser aborted the first request for the script file, it didn’t read the entire response from the network; instead it issued a TCP RST on the request’s connection, immediately closing it. When the restarted lookahead identified the same resource to download, it required the establishment of a new TCP/IP connection.

For best performance, specify your page’s character set and X-UA-Compatible (if desired) using HTTP response headers, helping the browser avoid expensive restarts.

-Eric

[1] In case you’re wondering, UTF-8 encoding is the best choice. For web content, UTF-8 is almost always more efficient than UTF-16, and there are some peculiarities with UTF-16 support that make it inadvisable for use in web content.

[2] Internet Explorer 10 PPB2 introduced a new architecture where a versioning pre-scan occurs before the parsers begin their work; the X-UA-Compatible META tag must appear in the first 4kb of the markup or it will be ignored.

[3] Regular readers may recall that I’ve written about Internet Explorer’s Lookahead downloading feature previously.

  • The HTML5 standard requires that a <base> tag outside the <head> still be respected, and that's how all non-IE browsers behave.  Test case:

    <!DOCTYPE html>
    <body>
    <base href="http://google.com" rel="nofollow" target="_new>
    <a href=/>test</a>
    <script>document.body.textContent = document.querySelector("a").href;</script>

    IE10PP2 sets the body's content to the current domain.  Firefox 7.0a2, Chrome 14 dev, and Opera 11.50 all set it to "http://google.com".  Citation from HTML5:

    "If there is no base element that has an href attribute, then the document base URL is fallback base url; abort these steps. Otherwise, let url be the value of the href attribute of the first such element." dev.w3.org/.../urls.html Nothing in the standard says where the base element has to be in the document tree, so other browsers match the spec here.

  • >Nothing in the standard says where the base element has to be in the document tree

    dev.w3.org/.../semantics.html says:

    Contexts in which this element can be used: In a head element containing no other base elements.

  • @EricLaw: See html5.org/.../web-apps-tracker.

    github.com/.../69 may also be of interest, along with mathiasbynens.be/.../base.

  • >  dev.w3.org/.../semantics.html says:
    > Contexts in which this element can be used: In a head element containing no other base elements.

    That's an authoring requirement and has nothing to do with how UAs should behave. Aryeh referenced the requirements that apply to implementors.

  • Thanks for the links, gents.

    While I appreciate the notes, I think you folks might have missed a few points: 1> This is a post that applies to multiple versions of IE, including those authored before HTML5 reached last call (or even existed at all). 2> This is a post that describes best practices. Putting the BASE in the HEAD is a best practice, and is required if you want your BASE to be respected across all browser versions.

    It seems a bit bizarre to me that HTML5 would impose an "authoring requirement" which it doesn't bother to add to its implementation requirements, particularly when its failure to do so could result in a security vulnerability. But I don't work on much HTML5 stuff myself. Perhaps there's a section in that spec that explains what should happen when a BASE element is the last tag in the page, and it appears after external scripts and styles have already been downloaded and run using the page's URI as the implicit base.

  • > particularly when its failure to do so could result in a security vulnerability.

    Could you file a bug on the spec, or email public-html with some details? Obviously the spec shouldn't mandate insecure behaviour.

  • Yeah, please do report the security problem. You can mail me directly if you'd rather not make it public: ian@hixie.ch

  • EricLaw: Here is some history, BTW: krijnhoetmer.nl/.../20110426. The reason that authoring requirements are not always implementation requirements, BTW, is not only for compatibility, but also that HTML5 defines parsing for *all markup*, even invalid. Previously browsers had to reverse engineer other browsers for error handling.

     

  • @eric how come the aborts happen on this page on IE8 http://www.axlscloset.com

    they do specify the meta tags first thing in the <head>

  • @John: You don't avoid restarts by putting these in the HEAD, you avoid restarts by specifying the document mode and CHARSET using the HTTP Response Headers.

    Beyond that, this page has a lot of other performance opportunities, including reducing the currently huge number of external references (e.g. tens of script and CSS files).

  • Eric, is CHARSET needed in the HTTP Response headers of JS and CSS files? Will the browser do the same thing like looking for the meta tag information in the response body? The reason I am asking is that JS & CSS will not have meta tag in the response body so it is no use scanning through looking for charset.

  • @Senthil: Correct, setting the charset in the HTTP Response headers is a best-practice for JS and CSS files. You're right to note that JavaScript files don't have a way to specify their character set internally to the script content itself. You may be surprised to learn that CSS files actually do have such a method, using the @charset declaration.

  • Thanks Eric. I am trying to reach out to our Edge Caching Service folks to include "charset: utf-8" in the HTTP headers for JS & CSS static resources. Do you have any rough performance measurements (like millisecond improvements)  of including charset in headers?

  • @Senthil: I don't have any hard numbers, but I'd expect the improvement to be under a millisecond. The primary advantage in specifying charset for CSS/JS is to ensure proper functionality (e.g. that they aren't misinterpreted as being in a different charset). I believe we've seen some cases where charset misinterpretation in such files could be abused by an attacker.

  • @Eric you might want to the Expression Web Developer team to remove the the Content-Type meta tag from the default template. It is there when you create a new HTML file, at least on Web Dev 3.0!

Page 1 of 2 (19 items) 12
Leave a Comment
  • Please add 5 and 2 and type the answer here:
  • Post