Welcome to MSDN Blogs Sign in | Join | Help

Dave Massy's Blog

Embedded Windows

Syndication

Bad HTML

I just posted about the Internet Explorer focus on backwards compatibility over on the IE team blog

http://weblogs.asp.net/ie/archive/2004/10/15/243074.aspx

There's plenty of bad HTML on the web today, and by that I mean a lack of closing tags, overlapping tags and implied tags. 

I thought I'd add some historical perspective from the back of my mind.

I remember back in 1996 when we started coding up the then new Trident rendering engine for Internet Explorer 4, at that time there was lots of bad HTML content already on the web. Indeed there was probably a higher percentage of bad content then as there were fewer good HTML editing tools available then. We knew that if we couldn't render existing content on the internet our browser would immediately be rejected by our customers. So we coded an in built tolerance for bad HTML. I particularly recall our developers pulling their hair out when it came to matching the table rendering algorithm of the then dominant Netscape browser.

The fact is that content on the internet can and does live forever and any browser must continue to be able to render that content. Does that mean we should encourage such content? Clearly not, but it's important that developers know they can rely on rendering behavior not changing as the internet evolves.

Published Friday, October 15, 2004 4:36 PM by DMassy

Filed under:

Comments

# re: Bad HTML @ Friday, October 15, 2004 4:41 PM

It would be nice if broken rendering STOPPED being broken though. consitency is nice....consistently broken isn't. How about an option in settings "I want things to be non-compliant" or somesuch? There are so many good things about IE, that it sucks to see the bad things still there (png alpha anyone?)

Brian Hampson

# re: Bad HTML @ Friday, October 15, 2004 4:50 PM

Hey Brian,
We have something similar with IE6 and the strict doctype flag for CSS. Although I admit that this could go further by enforcing strict compliance rather than just rendering the CSS differently. It's certainly something to think about as we move forward. It's never quite as simple as it seems though as changing the parser which is central to the engine can easily impact issues such as performance, which we know we also need to look at as we move forward. PNG alpha as I've blogged about before is an even trickier issue due to the way the display tree works but is certainly something we'd all like to address.
-Dave

Dave Massy

# re: Bad HTML @ Friday, October 15, 2004 5:47 PM

I don't really agree that you need THAT level of backward compatibility. Old content on the web needs to continue to be ACCESSIBLE (readable), not necessarily rendered the exact same way over time. I think that's a more reasonable standard. Otherwise, you force the bugs of older version to become desing goals of future versions.

Stephen Duncan Jr

# re: Bad HTML @ Friday, October 15, 2004 8:37 PM

There's a real simple solution: when you get bad HTML, display the page, with a warning ... like the new popup blocking warning. Nobody will mind it and eventually the site authors will feel like they have to get their sites in shape.

Joel Spolsky

# re: Bad HTML @ Friday, October 15, 2004 9:41 PM

Right... just some kind of warning. Then every one who wants good clean content can get on with producing it.

At the risk of getting personal, I really feel like the IE team's ethos of backward compatibility is the wrong road to take. I know, I know, we're just the *little* guys with no say, you really should start listening to the community. Ultimately it is they (us) that you serve... right?

matt

# re: Bad HTML @ Saturday, October 16, 2004 2:04 AM

CypherXero: You obviously care since you commented.

I know this has been said before but please consider this: If you find a a page that is sent as application/*+xml (especially XHTML), text/xml or application/xml go to a super strict rendering mode where no quirks are present (at least strive for that) and a DOM that follows DOM 3 (and do not allow document.all, global id mapping and other abominations). Since IE does not support XHTML or XML+CSS today this cannot break any existing sites/applications.

Erik Arvidsson

# re: Bad HTML @ Saturday, October 16, 2004 4:35 AM

I have to agree with Joel. There needs to be some sort of motivation for the people to fix their pages and make them be HTML compliant. Give the users a warning whenever they see a page that isn't valid. Eventually people will get it and either change or realize that the person authoring the site doesn't care enough to make their content viewable.

Aaron Weiker

# re: Bad HTML @ Saturday, October 16, 2004 9:50 AM

I just deleted a comment because it used offensive language. I have a zero tollerance policy when it comes to this. Any comment that uses offensive maguage will be deleted.
I want to make it clear that I will not delete comments because I disagree with them. However anything offensive or totally off topic will get the delete key treatment.
Thanks
-Dave

Dave Massy

# re: Bad HTML @ Saturday, October 16, 2004 9:59 AM

It's an interesting idea to consider putting up the goldbar on "bad HTML", but there are plenty of issues to consider.

First, there would be the user issue of "I was using my favorite site "XYZ.COM" yesterday just fine, and now with the new IE I always get this warning thing at the top of the page". There would be a certain class of user confused and upset by this. Perhaps the solution is to not have the default the HTML warning on by default, but that would defeat the purpose of forcing change on the Web.

Second, there is the issue of fixing sites. Imagine, for example, department level corporate intranet sites, small hobbyist sites, elementary school sites put up by classrooms, etc. They may use tools to generate their pages, and have no idea or real care that the tool generates "bad HTML". So the user would have to get a new version of the tool that doesn't generate this bad HTML and republish their pages.

These people may or may not have the resources to go fix historical HTML content on their sites to avoid the goldbar. Forcing change on this category of user doesn't do them any favors.

Third, there's the issue of what's bad enough to get a goldbar and what's not. While it's easy to say something like "anything that's non-standard or won't validate on W3C", it's another thing to implement it. There is plenty of gray area to keep us busy for quite a while.

For example, I've heard it said that the "alt" attribute is required for the "img" element. Imagine how many goldbars there would be if IE warned on every page that contained an img tag without an alt. Same for overlapping tags or extraneous end tags and the like. They may not be "valid", but they're generally harmless from a parsing and/or rendering perspective. While I'm sure we could come up with a reasonable compromise, we then have a situation where there's "the standard", and then there's what IE displays without warning (looser than the standard), and then what IE warns about but renders the same anyway. That may be improvement enough for some but probably not for others.

Last, I think one of the great things about HTML is that it's so approachable. A casual user can do a relatively credible job on a web page with a thin HTML reference plus a little time, imagination, and effort. My ten year old can code a relatively functional webpage via almost trial and error, and she's quite proud of the result. A lot of the ease of this sort of development comes from the forgiving characteristics of most browsers. I certainly wouldn't want to lose that approachability for casual web page creators.

Bruce Morgan [MSFT]

# re: Bad HTML @ Saturday, October 16, 2004 11:36 AM

Bruce,

For point 1 it's obvious it should be activated on demand (like the web developper extension in FF) but that's kind of tools help a lot during a web development process ; it's a lot better than "it looks good on IE so my code is good" scheme.

For point 2 I agree, html editors are not doing their jobs ; they don't produce valid html at all. Some tools are better than other for specialized usage (for example wiki, blogs...) but simple and affordable solution doesn't exist to manage and to publish content on a web site respecting standards. That's a huge issue because web browsers are not so strict and web editor softwares are lazy to be polite. Just think about to deliver web content in xhtml, I mean application/xhtml+xml, well it changes a lot of things: editors *must* produce valid code, browser *must* discard any document not conforming to a schema. Make sense, like any programming language or description language: strict respect of the grammar and semantics.
There's something to do in this area but it's not only a matter of web browsers but also concern html/web editor softwares.

I disagree on point 3. Using your example of goldbar and the alt attribute, it already exists. Use any non visual navigator and you have your gold bars:

[spacer.gif][banner.swf][spacer.gif]
Welcome on XXX web site
[spacer.gif]
[spacer.gif]
[spacer.gif]
[star.gif] News [spacer.gif]
[star.gif] Products [spacer.gif]
[star.gif] About us [spacer.gif]
...

And it's an utopic case, most of the time it's "Sorry your browser is not supported please upgrade to IE 4 or NS 4 and install Flash player and Quick time..."

This is what you see/heard on most web sites (try Home Page Reader and switch off your screen or lynx for example) it's "harmless" when you're using a visual web browser but when you think about accessibility it's a *major* concern.

By the way if the IE team can fix the object behaviour according to W3C's specs it will be a good thing since this html tag is *fundamental* to allow a good accessibility approach.

Thanks for reading my bad English.

François Battail

# re: Bad HTML @ Saturday, October 16, 2004 1:08 PM

How many times does Anne van Kesteren have to mention this solution?

http://annevankesteren.nl/archives/2004/06/standard-compliant-ie

That way the standards people can have what they want and you can still maintain backwards compatibility. Am I missing something?

Dean Edwards

# re: Bad HTML @ Saturday, October 16, 2004 2:07 PM

Bruce, making HTML is *VERY* difficult. The approachability you're talking about results in tag soup, not HTML. These are two very different things.

[unknOwn]

# re: Bad HTML @ Saturday, October 16, 2004 3:39 PM

I agree with Joel except that I think the warning should be more annoying--something that would prompt sloppy Web developers to fix their code. If only there could be a pop-up window with the picture of and contact information for the Web developer next to a heading like "Lazy, Sloppy Web Developer"--then we might see things get cleaned up very quickly (unless the developers are no longer in charge of those pages, of course).

I have to disagree with you, though, Dave, about it being important for developers to know that they can count on rendering behavior not changing. I think invalid markup should be treated as such regardless of how it was treated in the past. Disallowing invalid markup is not what breaks the Web; allowing it is.

Brian Sexton

# re: Bad HTML @ Saturday, October 16, 2004 5:08 PM

Also, how come Presto (Opera), Gecko (Mozilla, Firefox), KHTML (Konqueror) and whatever's used in Safari don't break anything?

[unknOwn]

# re: Bad HTML @ Saturday, October 16, 2004 6:22 PM

One idea: There's already [in SP2] an Outlook-style strip that appears when IE blocks a pop-up; rather than popping up a dialog warning that bad HTML was used (which would then be dismissed and never seen again), how about putting a message there, instead?

I have to agree with Brian, though - depending on undefined behaviour is just wrong. By continuing to support it, web "developers" continue to churn out poor approximations of HTML, safe in the knowledge that IE will continue to render whatever rubbish they throw at it forever. They still do it today. Personally, I take pride in producing (at the very least) valid XHTML 1.0; but I know that I'm in a minority in that respect.

The 'warning strip' could, in typical fashion, be hyperlinked with "Click here for more information", which would take you to a page which read something like this:

"The site you are visiting does not conform to web standards. Content may not be rendered correctly in this or future versions of Internet Explorer. If you experience problems, please contact the webmaster of the site for assistance."

Do this, but keep the rendering engine status quo; this should give fair warning to just about everybody that at some point, IE will stop accepting non-standard stuff for the sake of backwards compatibility. Don't do it now, but do it in a couple of versions time, once everybody knows it's coming. Nobody will have any excuse.

Mo

# re: Bad HTML @ Saturday, October 16, 2004 6:23 PM

Heh, in all the comments, I missed Joel's. Apologies for the dupe, there, then. :)

Mo

# re: Bad HTML @ Saturday, October 16, 2004 6:36 PM

Also; I hate to point out, but producing valid XHTML is *easy*, even with accounting for backwards-compatibility with HTML 4.

Yes, even a ten year-old can be taught about proper nesting and contracted empty tags, *provided there's something there to point out when she's going wrong*. IE makes that learning process *more* difficult, because it's so forgiving.

Yes, the results of an unstyled structured document don't look very flashy - but they work, they're clear, they're unambiguous, and they're not difficult to *make* flashy without breaking everything.

Mo

# re: Bad HTML @ Sunday, October 17, 2004 7:41 AM

Making invalid HTML is easy, but then so is making valid XHTML. The difficulty arises in trying to debug pages. Debugging valid XHTML is easy, debugging 'tag soup' invalid HTML is (from experience) extremely taxing.

And this is not just on a human level. It's far easier to write a parser to read valid XHTML than it is to read invalid HTML. Vice versa, it's easier to write a WYSIWYG editor to output valid XHTML than it is to output invalid HTML.

Yes, so some older content's layout would break if you moved to a validating parser. I'm sympathetic - it's not easy being blamed for anything. But like Joel / Mo / everyone else here, please spare a thought for the other developers - the ones who want to do more with code than have a single browser read it.

Robin

# re: Bad HTML @ Tuesday, October 19, 2004 6:59 AM

Bruce, I just wanted to respond to your comment where you say:
"For example, I've heard it said that the "alt" attribute is required for the "img" element."

Well (I believe) you heard right, as there is legislation in both the US and UK (and elsewhere I would guess) that make it a legal requirement to provide web pages that are accessible to disabled users. One of the common requirements is that all images must have an alt attribute. Thus, depending on the nature of the website and it's target audience, leaving out alt attributes could leave the site owners open to legal action.
Given this, warnings might be quite a bonus to developers...

tre

# re: Bad HTML @ Tuesday, October 19, 2004 7:33 AM

Dave, your post is quite timely! I think the discussion that Michal Zalewski and his "mangleme" tool has just started (see the BugTraq post at http://www.securityfocus.com/archive/1/378632/2004-10-16/2004-10-22/0 ; now on Slashdot at http://it.slashdot.org/article.pl?sid=04/10/19/0236213&tid=113&tid=128&tid=154&tid=218 ) shows the wisdom of IE's "backwards"-compatibility with bad code...

People the world over--including me, I admit--have complained about bugs and insecurities in Microsoft software, but c'mon, when you get down to it--the software is on so many millions of PCs with some many bazillions of combinations of software, it's impossible to predict what garbage will be thrown at it next. Kudos to the developers for being able to take garbage in but _not_ spit back garbage out!

~ewall

Eric Wallace

# re: Bad HTML @ Tuesday, October 19, 2004 9:56 AM

The perspective thus is fatally flawed. You see, when a person codes a page incorrectly, instead of having to fix it IF THEY WANT IT ACCESSIBLE ON THE WEB, they are given a pacifier, a bottle and a blanky and told that it is ok to pee their pants even though they are 30, 40, 50 years old, and should be responsible for themselves. If I want something to work correctly on the web, it should be ME that ensures that it displays correctly. The big problem with your fatal philosophy is that when I try to code CORRECTLY, 98% of the internet users in the world see that code INCORRECTLY. That is because you tried to babysit, rather than present an application that can be relied upon as "par" with standards and trusted for secure display of content. MSIE is fatally flawed in my book, and that is because MS started out on a quest to squash all the competition at any cost, without fully understanding and ACCEPTING the responsibility that came with that goal. Now get ready for the boom.

Ron Adair

# re: Bad HTML @ Tuesday, October 19, 2004 10:29 AM

Eric, that's an interesting one. I don't think anyone can seriously claim that IE is the only browser to suffer from bugs... BUT (challenge coming up), in that slashdot thread there is a link to a page of VALID html that crashes IE (at least on the couple of machines I tried).
Now here's the challenge - which browser is going to fix these bugs first? (If this is already fixed in SP2, then feel free to mock me, and delete this post...)
The page is at http://www.diplo.nildram.co.uk/crashie.html

tre

# re: Bad HTML @ Wednesday, October 20, 2004 2:03 PM

I think Anne van Kesteren's method is the best way to go forward. Only those truly interested in standards-based coding would move to application/xhtml+xml, so you can use the super-strict parser with all the CSS3 and PNG goodies there. If you see a DOCTYPEd page with a text/html MIME type, use the IE6 "standards mode" rendering (that needs to be fixed as well, since anything above the <!DOCTYPE> in source code, like a comment or line break, will force IE into quirks mode), and if you see no DOCTYPE or an old one like HTML 3.2 then use quirks mode/IE4-5 rendering. This way, you have to make ZERO changes to the current way things are done and can focus on moving to a true XML browser.

Vinnie Garcia

# re: Bad HTML @ Saturday, October 23, 2004 5:10 AM

As an outsider I like the idea of a bimodal IE, Standards vs Quirks mode using the MIME type "application/xhtml+xml". It is opt-in.

Microsoft has already dealt with backwards compatibility in DOS using "setver" and in XP with flagging executables for Win95 or NT compatibility. Those required the user to specify the required version-level behaviour of the OS but defaulted to the current standard. I see no problem with recording a compliance level for a website along with Security Zones/PopUp behaviour (for example).

Developers could be "encouraged" either by yellow bars or even a version of Microsoft Error Reporting that invited users to send error feedback to the website. That would require a non-standard IE specific tag (for developers to optionally include) that identified that the website wanted to be told of non-compliance with the standard and how it wanted to be told (presumably an email address).

Whatever you do, have mercy on the user. Compliance/quirkiness is an attribute of a website along with security, cookie behaviour (session, permanent, 3rd party, private headers), picture/webbug downloading and on and on and should be set via a single dialog for each site not spread all over theOPtions dialog as they are at present.

Per site records need only be kept for the nominated History period unless saved (as an offline page eg).

AllanCorfield

# re: Bad HTML @ Saturday, October 23, 2004 12:12 PM

re: the suggestion above, a non-standard IE specific tag that triggers a non-standards-compliance warning.
Think about it...
...figured out the problem yet?

tre

# re: Bad HTML @ Saturday, October 23, 2004 5:36 PM

There is no need for a non standard _tag_. If you look at the suggestions on Channel9, you'll many good ideas for handling the problem: using a new parser/rendering engine depending on the MIME type (html vs xhtml), more DOCTYPE sniffing (it could even be made configurable), a special http header that specify the desired rendering mode (can be used in a meta tag), a per-site configuration dialog similar to the security zones configuration dialog, etc. These ideas do not require breaking the standard conformance in your pages. Like conditional comments, they add hints for IE in a nearly orthogonal way, and could even be implemented simultaneously (e.g. a new engine for xhtml, improvements in DOCTYPE sniffing, overridden by an http header, overridden by per-url user settings). There are solutions.
The problem lies in what the default settings should be. If a page is valid html4 but has poorly designed workarounds for IE's CSS handling, should you break it? As Dave pointed out, you can't always say "Just fix it". It's an imperfect world, with lots of imperfect pages, and some will never be fixed.
I've no doubt that IE's developpers will try to create a mode for high conformance to the standards, while keeping a high tolerance mode for sloppy html, but they face a real challange in the gray area in-between. (If you want to help, I guess they would find statistics on the number of broken pages due to this or that change quite useful).
As for the debate on html vs xhtml, may I point out that you can have both? (That's why I like the idea of a completely new engine for xhtml). Current-style html, parsed as tag soup by everybody except validators, is better for copy-pasted, half understood html snippets, while xhtml is designed for people who want a stricter syntax and cleaner code. Why not accept to live with both?

Lionel Fourquaux

# re: Bad HTML @ Sunday, October 24, 2004 12:26 AM

I've been reading through these postings and have to throw in my 2 cents.

Two years ago I designed and developed what has grown to a 1700 page website. I had no HTML experience when I started.

I spent dozens and dozens of hours "figuring out" on my own that some of my HTML was "incorrect". It began when I noticed that some pages would render differently on different browsers. I was confused by this - I had to search for information. When I found information it was often confusing and fixing errors took huge chunks of time, and I'm talking about dozens and dozens if not hundreds of hours.

My point is this: Had IE warned me in some way that my HTML was invalid it would have saved me time and frustration. It took me about a year to learn correct HTML - all by trial and error. I'm still going back and fixing old pages as I write this. If from the very beginning IE flagged a page with incorrectly formatted HTML and refered me to W3.org or maybe a white paper outlining correct HTML I would have hit the ground running. I WANTED to be told of my errors. How else would I learn?

Due to lack of information and growing frustration I settled with "it looks good enough in IE so it must be OK". I had the desire to fix it, but if it looks fine and searching for information was a huge pain then why bother?

I'm sure there is no shortage of kindegardners putting up webpages as class projects. Obviously, they don't need to care nor should they care about correctly formatted HTML. But, for those who care it would have been nice to have been guided in the right direction. I think an IE warning when it encounters an incorrectly formatted webpage would go a very long way in cleaning up the web.

andrew

# re: Bad HTML @ Sunday, October 24, 2004 5:02 AM

What about a nonobtrusive warning? Someone (possibly on channel9) suggested another icon in the status bar to signal whether the page is valid. It would probably be a lot of work to implement correctly, but it might be a help for self-taught web developpers like andrew without causing too many problems for non expert users.

To give it more visibility, you could add a chapter "How Internet Explorer helps you to write better HTML code" in the help.

If it takes too much work, it could be a powertoy for some time.

(This is for text/html; xhtml is a different problem).

Lionel Fourquaux

# re: Bad HTML @ Monday, October 25, 2004 9:26 PM

Hmm..I see your point with the "users being pissed" part if the warning is added, however, in one person's post they said that the user would have to buy new HTML authoring software because of bad HTML warnings, that is not IE's liability, it's the authoring software's fault for not implementing valid code.

Also, I don't see why the warning has to be annoying. IE already has a "bad Javascript warning" in the lower left corner of the window does it not? I don't see why you couldn't just put a simple small warning there. As long as you get the message across.

I also find it funny (and quite sad) how reluctant Microsoft is to implement XML and (X)HTML standards in their browser when they boast their XML capabilities with the .NET framework. I would think they would embrace it more than they have.

And as for the "Harder than it seems" excuse -- It's called "Hire more developers for the project." I doubt that would make a severe cut in your company's profit now would it.

Sorry if I was a bit harsh, it's just this whole thing is very frustrating and I'm slowly losing my favor towards Microsoft's products, it seems you care more about profit and the legalities of your actions than you do about customers, yet you've failed to uphold the ideal responsibilities of 90% of the market's browser share.

Best Regards,
Chris

Chris McAndrews

# re: Bad HTML-Allan Corfield's opt-in switch @ Sunday, October 31, 2004 8:41 PM

This is a good idea that should be extended to all of Microsoft's web developer and web content generation tools like Visual Studio, Front Page, Word, Commerce Server, etc, etc. Developers users set the opt-in switch and they are committed to strict W3C HTML, XML, DOM - and ECMA JavaScript/ECMAScript standards - no bad HTML, no proprietary extensions. Microsft used to do this for VC++ on other standards. Adobe's Golive, Photoshop and Image Ready plus Macromedia's Dreamweaver and Fireworks provide such opt-in to strict standrds switches. So its not unheard on the Redmond campus and in current competitive products. And its is simple and elegant to implement. Both Adobe and Macromedia issue a todo list with suggested work arounds - it really helps the web developer. As long as theopt-in switch is not done in a slipshod manner - e.g. all the code is flagged and the workarounds are draconian - then the developers and their masters are responsible for persisting bad code.

Jacques Surveyer

# Dave Massy s Blog Bad HTML | Paid Surveys @ Friday, May 29, 2009 5:32 PM

PingBack from http://paidsurveyshub.info/story.php?title=dave-massy-s-blog-bad-html

Dave Massy s Blog Bad HTML | Paid Surveys

# Dave Massy s Blog Bad HTML | Hair Growth Products @ Saturday, June 13, 2009 8:17 AM

PingBack from http://hairgrowthproducts.info/story.php?id=530

Dave Massy s Blog Bad HTML | Hair Growth Products

New Comments to this post are disabled
Page view tracker