To deliver on the industry promise of “same markup,” browsers must implement features in the same way. Open Web standards and their associated public test suites provide the means for making sure this happens. Using the same process for tests as for the standards themselves is so important that Microsoft contributes to this industry effort in many ways, such as submitting conformance tests to the W3C.
In addition, we believe it’s important for these tests to be comprehensive and test the depth and quality of standards support, not just the existence of features. Compliance tests from standards bodies do that.
We’ve written about these tests before. They come from the W3C working groups that write the specifications. The purpose of the test suite is to ensure the specification is complete and has two conforming implementations, the requirement for a W3C specification to become a Recommendation.
Examples of W3C conformance test suites:
The HTML5 Test Suite is particularly noteworthy given the excitement around HTML5 and we’ll blog more about that next week.
Many different types of tests exist beyond those created by the W3C and take various approaches. You’ve probably seen many of these already. While not an exhaustive list, here are a few examples with descriptions of what they test and how.
Some tests check for just the existence of features. Examples include The HTML5 Test, “When can I use…,” and findmebyIP. They perform a basic feature check, but don’t test the depth of an implementation or how well the feature’s behavior matches the standard. Thee tests rarely reflect the current status of a specification. For example, in the list above, only “When can I use…” lets you filter the view to only features with a chosen W3C specification status.
Some tests in this category compute a total score, though there is no consistent approach to calculating one. While a single score is easy to communicate, it doesn’t help people understand what the score means. When reading these tests, it’s important to see which features are and are not included and to what depth, rather than depending only on a final score.
These tests attempt to verify more than whether a feature exists, and typically focus on a single specification. They’re usually developed by various people outside the W3C. They can be useful when the W3C working group hasn’t finished their test suite or as additional testing, but there is no assurance they represent the entire specification or are otherwise complete.
These suites use test cases to create compatibility information so Web developers can understand the behavior of certain features across different browsers. Similar to the Feature Existence tests, they check support for a feature but typically test the implementation’s correctness to some degree. This is helpful when you need information on a specific element, property, or other feature. Like the previous group of tests, these don’t benefit from the working group’s review and evaluation, and it isn’t clear what they actually test.
Some often-cited tests are a mix of the kinds described above.
Acid3, for example, is unique in that it tests about 100 fragments of a dozen different technologies. Some parts check for the existence of a feature, some check for specific behavior of a feature, and others check for behaviors where browsers were known to differ when the test was created. Acid3 gives each test one point and computes a final score based on the number of tests passed. While people often communicate the score, it’s unclear what that score represents given the variety of tests included.
Browserscope is a collection of tests in many areas, ranging from security and network behavior to RichText and Selectors API. Many of the tests check if a feature exists, as in the tests for postMessage and JSON.parse. Other tests check for specific behavior. For example, the RichText section tests many small behaviors in different categories. Various groups participate in creating these tests, but there’s no assurance they represent an entire specification and they don’t benefit from a working group’s evaluation.
—John Hrvatin, Lead Program Manager, Internet Explorer