Advances in JavaScript Performance in IE10 and Windows 8

IEBlog

Windows Internet Explorer Engineering Team Blog

Advances in JavaScript Performance in IE10 and Windows 8

  • Comments 47

Thursday, May 31, 2012, we delivered the Windows 8 Release Preview and the Sixth IE10 Platform Preview. Windows 8 includes one HTML5 browsing engine that powers both browsing experiences (Metro style and desktop) as well as Metro style applications that use HTML5 and JavaScript. The release preview represents a major revision of the same modern JavaScript engine, Chakra, which first debuted with IE9. With each platform preview we make progress against our goals to create an engine that delivers great performance on the Web while ensuring that it is highly compatible, interoperable, and secure. This post will explore how the JavaScript engine has been enhanced to deliver great performance for emerging Web application scenarios.

Performance for Real Web Applications

Web applications have been evolving rapidly in recent years. A decade ago the Web consisted primarily of Web sites with static content, like what you may encounter in a blog, a small business landing page, or on Wikipedia. The emergence of AJAX helped spawn more complex and interactive sites like what you see on Facebook or JetSetter. Subsequent advances in performance allowed for large and complex applications to be created, such as Office 365, Bing Maps, etc. Most recently, the expansion of the W3C standard APIs, gains in JavaScript performance, and hardware accelerated graphics made building even sophisticated games on the Web possible, for example, Angry Birds, Pirates Love Daisies, Cut The Rope, etc.

Diagram showing spectrum of Web pages and their performance characteristics. On the left are Basic Web Pages where Page Load Time is the driving performance goal. On the right are Web Applications, HTML5 Games, and Windows 8 Metro style apps where JavaScript Execution Speed, DOM Interactions, and Accelerated graphics have the biggest impact on performance.

As applications evolve, the performance factors affecting user experience change. For traditional Web sites, initial page load determines how quickly the user can see the content. Interactive Web sites and large Web applications may be gated by the efficiency of DOM operations, CSS processing, and manipulation of large internal state in memory. HTML5 games often depend on fast canvas rendering, JavaScript execution and efficient garbage collection. In short, browser performance is a complex problem, which requires taking into account the needs of a broad spectrum of diverse applications.

In this post we fill focus on performance of only one browser subsystem, the JavaScript engine. With recent gains in JavaScript performance, for many Web applications JavaScript execution is no longer a limiting factor. On the other hand, as performance increases, new scenarios emerge that place additional demands on the JavaScript engine. We continually look for opportunities to evolve Chakra to match performance requirements of real JavaScript-intensive applications.

Two dimension chart showing screen shots of various sites plotted on two axis: Use of Other Browser Components (Y) and JavaScript Execution (X). Content sites are illustrated in the lower left (least use of other browser components and least use of JavaScript). Graphics-intensive games such as Angry Birds are show in the top right quadrant.
Dimensions of Web Application Performance

Internals of Chakra

From its inception in IE9, the Chakra JavaScript engine was designed around two guiding principles, which remain equally important in IE10:

  • Minimize the amount of work on the critical path for the user experience. This involves deferring as much work as possible until absolutely necessary, avoiding work altogether, making use of periods of inactivity, and parallelizing work to minimize impact on the responsiveness of the application.
  • Take advantage of all available hardware. This translates to utilizing all available CPU cores, as well as generating advanced specialized CPU instructions, for example, Intel’s SSE2, if available.

Diagram illustrating the Chakra JavaScript engine's use of two processor cores.
Chakra’s Parallel Architecture

Chakra, though only one of the browser subsystems – is itself comprised of several components which work together to process and execute JavaScript code. When the browser downloads a JavaScript file it hands its content over to Chakra’s parser to verify its syntactical correctness. This is the only operation that applies to the entire file. Subsequent steps are performed individually on each function (including the global function). As a function is about to be executed (the global function is run immediately after parsing) Chakra’s parser builds an abstract syntax tree (AST) representation of the code, and hands it off to the bytecode generator, which produces an intermediate form (bytecode) suitable for execution by the interpreter (but not directly by the CPU). Both the AST and the function bytecode are preserved so they don’t need to be recreated on subsequent executions. The interpreter is then invoked to run the function. As the interpreter executes individual operations it collects information (a profile) about the types of inputs it encounters and keeps track of how many times the function was called.

As the number of calls reaches certain threshold, the interpreter queues the function up for compilation. Unlike in other browsers, Chakra’s just-in-time (JIT) compiler runs on a separate dedicated thread and thus does not interfere with script execution. The sole job of the compiler is to generate optimized machine instructions for each function in the compilation queue. Once a function is compiled, the availability of the machine code is signaled to the main script thread. Upon the next invocation, the entry point to the function is redirected to the newly compiled machine code and execution proceeds directly on the CPU. It’s important to note that functions that are called only once or twice never actually get compiled, which saves time and resources.

JavaScript is a managed runtime in that memory management is hidden from the developer and performed by an automatic garbage collector, which runs periodically to clean up any objects that are no longer in use. Chakra employs a conservative, quasi-generational, mark and sweep, garbage collector that does most of its work concurrently on a dedicated thread to minimize script execution pauses that would interrupt the user experience.

This architecture allows Chakra to start executing JavaScript code almost immediately during page load. On the other hand, during periods of intense JavaScript activity, Chakra can parallelize work and saturate up to three CPU cores by running script, compiling and collecting garbage at the same time.

Fast Page Load Time

Even relatively static Web sites tend to use JavaScript for interactivity, advertising, or social sharing. In fact, the volume of JavaScript included in Alexa’s top 1 million pages has been steadily increasing, as reported by Steve Souders’ HTTP Archive.

Chart showing volume of JavaScript in Alexa’s Top 1 Million Pages
Volume of JavaScript in Alexa’s Top 1 Million Pages

The JavaScript code included in these Web sites must be processed by the browser’s JavaScript engine and the global function of each script file must be executed before the content can be fully rendered. Consequently, it is crucial that the amount of work performed on this critical path be minimized. Chakra’s parser and bytecode interpreter were designed with this objective in mind.

Bytecode Interpreter. JavaScript code executed during page load often performs initialization and setup that is executed only once. To minimize the overall page load time it is imperative to start executing this code immediately – without waiting for a just-in-time compiler to process the code and emit machine instructions. The interpreter starts running JavaScript code as soon as it is translated into bytecode. To further reduce the time to first executed instruction, Chakra processes and emits bytecode only for functions that are about to be executed using a mechanism called deferred parsing.

Deferred Parsing. Chart showing the fraction of code executed in 11 popular Web sites. The amount ranges from a little over 30% to a little over 50%.The JSMeter project from Microsoft Research showed that typical Web pages use only a fraction of code that they download – generally on the order of 40-50% (see chart to right). Intuitively, this makes sense: developers often include popular JavaScript libraries like jQuery or dojo or custom ones like those used in Office 365, but only leverage a fraction of the functionality the library supports.

To optimize such scenarios, Chakra performs only the most basic syntax-only parsing of the source code. The rest of the work (building the abstract syntax tree and generating bytecode) is performed one function at a time only when the function is about to be invoked. This strategy not only helps with the responsiveness of the browser when loading Web pages, but also reduces the memory footprint.

In IE9 there was one limitation of Chakra’s deferred parsing. Functions nested inside other functions had to be parsed immediately with their enclosing functions. This restriction proved important because many JavaScript libraries employ the so called “module pattern,” in which most of the library’s code is enclosed in a large function which is immediately executed. In IE10 we removed this restriction and Chakra now defers parsing and bytecode generation of any function that is not immediately executed.

Performance Improvements for JavaScript-Intensive Applications

In IE10, as in IE9 before, we strive to improve the performance of real Web applications. However, Web applications depend on JavaScript performance to a varying degree. To discuss the enhancements in IE10 it’s most useful to focus on those applications which are JavaScript-intensive; where improvements in Chakra yield substantial performance gains. An important class of JavaScript-intensive applications includes HTML5 games and simulations.

At the onset of IE10 we analyzed a sample of popular JavaScript games (for example, Angry Birds, Cut the Rope, or Tankworld) and simulations (for example, FishIE Tank, HTML5 Fish Bowl, Ball Pool, Particle System) to understand what performance improvements would have the most significant impact on the user experience. Our analysis revealed a number of common characteristics and coding patterns. All of the applications are driven by a high frequency timer callback. Most of them use canvas for rendering, but some rely on animating DOM elements, and some use a combination of the two. In most applications at least portions of the code are written in the object oriented style – either in application code or in included libraries (for example, Box2d.js). Short functions are common, as are frequent property reads and writes, and polymorphism. All of the applications perform floating point arithmetic and many allocate a fair amount of memory putting pressure on the garbage collector. These common patterns became the focus of our performance work in IE10. The following sections describe the changes we’ve made in response.

Just-in-Time Compiler – Reconsidered and Improved

IE10 includes substantial improvements to Chakra’s JIT compiler. We added support for two additional processor architectures: x64 and ARM. That’s why, whether your JavaScript application is experienced by the user on a 64-bit PC or an ARM-based tablet, it enjoys the benefits of executing directly on the CPU.

We also changed the fundamental approach to generating machine code. JavaScript is a very dynamic language, which limits how much a compiler can know when generating code. For example, when compiling the function below, the compiler doesn’t know the shape (property layout) of the objects involved or types of their properties.

function compute(v, w) {

return v.x + w.y;

}

In IE9 Chakra’s compiler generated code that located every property at runtime and handled all plausible operations (in the example above: integer addition, floating point addition, or even string concatenation). Some of these operations were handled directly in machine code, while others required help from Chakra’s runtime.

In IE10, the JIT compiler generates profile-based, type-specialized machine code. In other words, it generates machine code that is tailored to objects of a particular shape and values of a particular type. To emit the right code the compiler needs to know what types of input values to expect. Because JavaScript is a dynamic language, this information is not available in the source code. We enhanced Chakra’s interpreter to collect it at runtime, a technique we call dynamic profiling. When a function is scheduled for JIT compilation, the compiler examines the runtime profile gathered by the interpreter and emits code tailored to the expected inputs.

The interpreter gathers information for the runs it observes, but it’s possible that the execution of the program will lead to runtime values which violate assumptions made in the generated optimized code. For every assumption it makes, the compiler emits a runtime check. If a later execution results in an unexpected value, the check fails, execution bails out of the specialized machine code, and is continued in the interpreter. The reason for bailout (the failed check) is recorded, additional profile information is collected by the interpreter, and the function is recompiled with different assumptions. Bailout and re-compilation are two fundamentally new capabilities in IE10.

The net effect is that Chakra’s IE10 compiler generates fewer machine instructions for your code, reducing the overall memory footprint and speeding up execution. This particularly impacts apps with floating point arithmetic and object property access, like the HTML5 games and simulations we previously discussed.

If you write JavaScript code in the object oriented style, your code will also benefit from Chakra’s support for function inlining. Object oriented code commonly contains a large proportion of relatively small methods, for which the overhead of the function call is significant compared to the execution time of the function. Function inlining allows Chakra to reduce this overhead, but more importantly it greatly expands the scope of other traditional compiler optimizations, such as loop invariant code motion or copy propagation.

Faster Floating Point Arithmetic

Most JavaScript programs perform some amount of integer arithmetic. As the example below illustrates, even in programs that don’t focus primarily on arithmetic, integer values are commonly used as iteration variables in loops or as indices into arrays.

function findString(s, a) {

for (var i = 0, al = a.length; i < al; i++) {

if (a[i] == s) return i;

}

return -1;

}

Floating point math, on the other hand, is typically restricted to certain classes of applications such as games, simulations, sound, image or video processing, etc. Historically, few such applications were written in JavaScript, but recent advances in browser performance have made JavaScript implementations viable. In IE9 we optimized Chakra for the more common integer operations. In IE10 we dramatically improved floating point math.

function compute(a, b, c, d) {

return (a + b) * (c − d);

}

Given a simple function above a JavaScript compiler cannot determine the types of arguments a, b, c and d from the source code. The IE9 compiler would assume that the arguments were likely to be integer numbers and generate fast integer machine instructions. This worked very well if during execution the arguments were, indeed, integers. If floating point numbers were used instead, the code had to rely on much slower helper functions in Chakra’s runtime. The overhead of function calls was further exacerbated by boxing and unboxing of intermediate values on the heap (in most 32-bit JavaScript engines, including Chakra, individual floating point values must be allocated on the heap). In the expression above the result of each operation required a heap allocation, followed by storing the value on the heap, and then retrieval of the value from the heap for the next operation.

In IE10, the compiler takes advantage of the profile information collected by the interpreter to generate dramatically faster floating point code. In the example above, if the profile indicates that all arguments are likely to be floating point numbers, the compiler will emit floating point machine instructions. The entire expression will be computed in just three machine instructions (assuming all arguments are already in registers), all intermediate values will be stored in registers, and only one heap allocation will be required to return the final result.

For floating point intensive applications this is a massive performance gain. Experiments show that in IE10 floating point operations execute about 50% faster than in IE9. In addition, the reduced rate of memory allocation means fewer garbage collections.

Faster Objects and Property Access

JavaScript objects are a convenient and broadly used mechanism for grouping logically related sets of values. Whether you’re using JavaScript objects in a structured object oriented programming style or merely as flexible packaging for values, your code will greatly benefit from the improvements in object allocation and property access performance added in IE10.

As mentioned earlier, efficient property access is complicated in JavaScript because the shape of an object isn’t known during compilation. JavaScript objects can be created ad hoc without a predefined type or class. New properties can be added to (or even removed from) objects on the fly and in any order. As a result, when compiling the following method, the compiler doesn’t know where to find the values of properties x, y, and z on the Vector object.

Vector.prototype.magnitude = function() {

return Math.sqrt(this.x * this.x + this.y * this.y + this.z * this.z);

}

In IE9 we introduced inline caches which greatly speed up access to properties. Inline caches remember the shape of the object and the location in the object’s memory where a given property can be found. Inline caches can remember only one object shape and work well if all objects a function works with are of the same shape. In IE10 we added a secondary caching mechanism which improves performance of code operating on objects of different shapes (polymorphic).

Before a property value can be read the compiler must verify that the object’s shape matches that stored in the inline cache. To do that, in IE9, the compiler generates a runtime shape check before every property access. Because programs often read or write multiple properties of the same object in close succession (as in the example below), all these checks add overhead.

function collide(b1, b2) {

var dx = b1.x - b2.x;

var dy = b1.y - b2.y;

var dvx = b1.vx - b2.vx;

var dvy = b1.vy - b2.vy;

var distanceSquare = (dx * dx + dy * dy) || 1.0;

//...

}

In IE10, Chakra generates code tailored to the expected object shape. Through careful symbol tracking combined with bailout and re-compilation capabilities the new compiler dramatically reduces the number of runtime shape checks performed. In the example above, instead of 8 separate shape checks, only 2 are done, one each for b1 and b2. In addition, once the shape of an object has been established, all property locations are known, and read or write operations are as efficient as in C++.

In ECMAScript 5 objects may contain a new kind of properties, called accessor properties. Accessor properties differ from traditional data properties in that custom get and set functions are invoked to handle the read and write operations. Accessor properties are a convenient mechanism for adding data encapsulation, computed properties, data validation, or change notification. Chakra’s internal type system and inline caches were designed to accommodate accessor properties and facilitate efficient reading and writing of their values.

If you write an HTML5 game or animation, you often need a physics engine which performs computation required to produce realistic movement of objects under the force of gravity, simulate collisions, etc. For very simple physics, you may build your own engine, but for a more complex requirements, you would typically use one of the popular physics libraries now available in JavaScript, such as Box2d.js (ported from Box2d). These libraries often use small objects, such as Point, Vector or Color. On every animation frame a large number of these objects are created and promptly discarded. Therefore, it’s important that the JavaScript runtime create objects efficiently.

var Vector = function(x, y, z) {

this.x = x;

this.y = y;

this.z = z;

}

 

Vector.prototype = {

//...

normalize : function() {

var m = Math.sqrt((this.x * this.x) + (this.y * this.y) + (this.z * this.z));

return new Vector(this.x / m, this.y / m, this.z / m);

},

 

add : function(v, w) {

return new Vector(w.x + v.x, w.y + v.y, w.z + v.z);

},

 

cross : function(v, w) {

return new Vector(-v.z * w.y + v.y * w.z, v.z * w.x - v.x * w.z, -v.y * w.x + v.x * w.y);

},

//...

}

In IE10, the internal layout of JavaScript objects is optimized to streamline object creation. In IE9 every object consisted of a fixed-size header and an expandable property array. The latter is necessary to accommodate additional properties that may be added after the object has been created. Not all JavaScript applications exploit this flexibility, and objects often receive most of their properties at construction. This trait allows Chakra to allocate most of the properties for such objects directly with the header, which results in only one memory allocation (instead of two) for every newly created object. This change also reduces the number of memory dereferences required to read or write the object’s property, and improves register utilization. Improved object layout and fewer runtime shape checks result in up to 50% faster property access.

Garbage Collection Enhancements

As discussed above, HTML5 games and animations often create and discard objects at a very high rate. JavaScript programs don’t explicitly destroy discarded objects and reclaim memory. Instead, they rely on the engine’s garbage collector to periodically reclaim memory occupied by unused objects to make room for new ones. Automatic garbage collection makes programming easier, but typically requires JavaScript execution to pause every now and then for the collector to do its work. If the collector takes a long time to run, the whole browser may become unresponsive. In HTML5 games, even short pauses (tens of milliseconds) are disruptive because they are perceptible by the user as glitches in animation.

In IE10 we made a number of enhancements to our memory allocator and garbage collector. We already discussed object layout changes and generation of machine code specialized for floating point arithmetic, which result in fewer memory allocations. In addition, Chakra now allocates leaf objects (for example, numbers and strings) from a separate memory space. Leaf objects don’t hold pointers to other objects, so they don’t require as much attention during garbage collection as regular objects. Allocating leaf objects from a separate space has two advantages. First, this entire space can be skipped during the mark phase, which reduces its duration. Second, during concurrent collection, new allocations from the leaf object space don’t require rescanning affected pages. Because Chakra’s collector works concurrently with the main script thread, the running script may modify or create new objects on pages that have already been processed. To make sure such objects aren’t prematurely collected, Chakra write-protects pages before the mark phase starts. Pages that have been written to during the mark phase must be later rescanned on the main script thread. Because leaf objects don’t require such processing, pages from the leaf object space don’t need to be write-protected or rescanned later. This saves precious time on the main script thread, reducing pauses. HTML5 games and animations benefit significantly from this change, because they often work heavily with floating point numbers and devote much of the allocated memory to heap-boxed numbers.

When the user interacts directly with a Web application, it is critical that the application’s code be executed as fast as possible, ideally without interruptions for garbage collection. However, when the user switches away from the browser, or even just changes tabs, it is important to reduce the memory footprint of the now inactive site or application. That’s why in IE9 Chakra triggered collection upon exiting JavaScript code if enough memory has been allocated. This worked well for most applications, but proved problematic for applications driven by high frequency timers, such as HTML5 games and animations. For such applications collections were triggered too frequently and resulted in dropped frames and overall degradation of the user experience. Perhaps the most apparent manifestation of this problem was the Tankworld game, but other HTML5 simulations also exhibited pauses in animation induced by frequent garbage collections.

In IE10 we solved this problem by coordinating garbage collections with the rest of the browser. Chakra now delays the garbage collection at the end of script execution and requests a callback from the browser after an interval of script inactivity. If the interval elapses before any script executes, Chakra starts a collection, otherwise collection is further postponed. This technique permits us to shrink memory footprint when the browser (or one of its tabs) becomes inactive, while at the same time greatly reducing frequency of collections in animation-driven applications.

Combined, these changes reduced the time spent in garbage collection on the main thread by an average factor of four on the HTML5 simulations measured. As a proportion of JavaScript execution time, garbage collection dropped from around 27% to about 6%.

Summary

IE10 achieves dramatic performance gains for JavaScript-intensive applications, particularly HTML5 games and simulations. These gains were accomplished through a range of important improvements in Chakra: from new fundamental capabilities of the JIT compiler to changes in the garbage collector.

As we wrap up development on IE10 we celebrate the progress we’ve made, but we are keenly aware that performance is a perpetual quest. New applications emerge almost daily that test the limits of modern browsers and their JavaScript engines. Without a doubt there will be plenty to work on in the next release!

If you’re a JavaScript developer, we’d love to hear from you. If the new capabilities and performance advances in IE10 helped you create entirely new experiences for your users, or make existing applications better, please, let us know. If you’ve hit any performance limitations in IE, please, drop us a note as well. We carefully read all the comments on this blog, and we strive to make IE10 and Windows 8 the most comprehensive and performant application platform available.

—Andrew Miadowicz, Program Manager, JavaScript

  • Loading...