What's New in HTML 5

HTML 5 Specifications focus on the DOM

The specifications for HTML 5 define elements of the language in terms of the operation and effects of the document's internal object model, making HTML 5 more of an abstract language than earlier versions. As a result, the language can be encoded in more than one syntax, as determined by the media type (text/html for the HTML syntax or application/xhtml+xml for the XML syntax, for example). Documents can even exist without an external representation through using the DOM APIs internally.

Different requirements for HTML authors and document parsers

HTML pages on the web have been somewhat haphazardly created under various different standards and proprietary formats and in many cases with no particular standard or format in mind. As a result, different browser vendors have developed a variety of incompatible methods of handling a lot of non-standard and just plain bad HTML coding.

For backward compatibility with the majority of existing documents, the HTML 5 specification requires document parsers and viewers (browsers) to support older, deprecated elements and attributes and other non-standard HTML coding in as consistent manner as possible. This means that HTML 5 compliant user agents will gracefully handle some of the more common errors in HTML coding. It's sort of a "Do What I Mean, Not What I Say" feature of web browsers. It also means that different browsers should start handling certain types of errors in the coding of HTML documents in a more consistent and predictable way.

In layman's terms, there are two different branches of the HTML standard - one for HTML authors and another for developers of browsers and other HTML parsers. While browsers may recognize deprecated and non-standard HTML coding, developers creating new HTML pages should avoid deprecated HTML elements and attributes and try to create documents that conform to the authoring requirements of the HTML 5 standard. The benefits of conforming to the HTML authoring standard include more consistent presentation among traditional web browsers, better support in handheld and mobile devices and greatly increased longevity of the HTML pages being created.

Document Sections in HTML

One of the most significant new features of HTML 5 is the ability to mark up sections of an HTML document using the sectioning tags and related tags, which can identify types of content within a section such as headers, footers and sidebars displayed along with a web page.

Other New Tags in HTML 5

Other Changes in HTML 5

Differences Between HTML 5 and Earlier Versions
Differences Between HTML 5 and Earlier Versions
Differences Between HTML 5 and Earlier Versions
Style vs. Semantics
Differences Between HTML 5 and Earlier Versions
Differences Between HTML 5 and Earlier Versions
Differences Between HTML 5 and Earlier Versions
Tags and Attributes Deprecated in HTML
Character Entities in HTML 5
The HTML 5 Document Type Definition (HTML DTD)
Differences in the <html> Tag

Differences between HTML 4 / XHTML and HTML 5

Differences Between HTML 5 and Earlier Versions
Differences Between HTML 5 and Earlier Versions
Differences Between HTML 5 and Earlier Versions
Style vs. Semantics
Differences Between HTML 5 and Earlier Versions
Differences Between HTML 5 and Earlier Versions
Differences Between HTML 5 and Earlier Versions
alternative HTML syntax standards
For backward compatibility with most existing web pages, HTML 5 allows documents to be coded using either the syntax traditionally used for the 1997-1999 versions of HTML (version 4.x), or the XSLT and mobile-friendly 2000-2001 W3C extended HTML syntax. (For example, compare how a site such as this one looks on a mobile device with how other sites such as WhiteHouse.gov look.)
An HTML page can be coded such that it adheres to a common subset of the syntactic requirements that are shared by both syntaxes and avoids anything that is unique to one syntax or the other. HTML documents that can be properly delivered and processed as either syntactic flavor of HTML are known as polyglot HTML documents.
attribute coding for polyglot HTML documents
There are some specific requirements for coding attributes in HTML documents that can be parsed using either set of syntax rules. Coding of attributes, even boolean attributes, must include an equal sign and value, and the value must always be enclosed in quotes. The value of a boolean attribute must be the name of the attribute itself or else the attribute must be omitted - the value of a boolean attribute should not be true, false or an empty string. <option selected="selected"/>

The following ways of coding HTML attributes should be avoided:

<option selected/>
<option selected=""/>
<option selected="true"/>
<option selected="yes"/>
Style vs. Semantics
migration toward separation of style from content and structure
Previous versions of HTML have deprecated presentational elements that provide no semantic value and are used exclusively to style text. HTML 5 continues the move in this direction, encouraging styles to be applied through use of CSS .
separation of content and layout
One of the most significant improvements of the 2000-2001 version of HTML over the 1997-1999 version was the ability to take advantage of XSLT style sheets. Using templates allows separating the content of individual pages from the page layout and general structure of related pages. For example, you could create templates for the layout of various sections of a web site, such as a shopping cart, as well as import a site-wide template for the entire site's "look and feel". Templates are stored in separate documents, which web browsers only need to cache once, thus improving page load times. HTML version 5 continues to support these templates, which were not available in HTML version 4.
Both XSLT and CSS style sheets for any number of HTML pages can be referenced by the <link> tag which allows creating specific layouts for various devices. For example, the template and styles for a printer-friendly version would be indicated by <link media="print" .../> while a version for small screen devices such as cell phones would be indicated by <link media="handheld" .../>.
case-sensitive tags and attributes
In the 2000 W3C recommendation and later versions of HTML, HTML tag names should be coded in all lower case. Attributes on HTML elements and CSS selectors in a class attribute are also case-sensitive.
consistent namespace for HTML elements (either explicit or implied)
In HTML version 4, the tag element names were unnamespaced. If you applied a style sheet (such as a site-wide template for the site's "look and feel") to a "well-formed" HTML 4 document (where all tags were properly closed with matching end tags) the element names would have to be matched without a namespace prefix (for example: match="img"). The template would use <xsl:output method="html" omit-xml-declaration="yes"/> to convert the document to true HTML, by removing extraneous closing tags, for example. That is, coding in the source document such as <img src="..." alt="..."></img> would automatically become <img src="..." alt="..."> with no closing </img> tag or self-closing />.
In the 2000 W3C standard, a namespace is explicitly specified, usually within the top element, which applies to all HTML elements (<html xmlns="http://www.w3.org/1999/xhtml">). Therefore, template matching rules would include a prefix for the namespace (ex: match="html:img" xmlns:html="http://www.w3.org/1999/xhtml").
In HTML version 5, the namespace can be explicitly specified in well-formed HTML documents or, in HTML syntax documents, the "http://www.w3.org/1999/xhtml" namespace is implied, which allows the matching rules with namespace prefixes to be applied to either syntax.

Last updated Friday September 11, 2009


Printer-friendly PDF* format:

Differences Between HTML 5 and Earlier Versions

You are currently viewing this page in XHTML 1 Style Sheet* format (* see Clicklets for more infomation). This document is also available in XHTML 1*XML*HTML 4*HTML 5 Style Sheet*HTML 5 XML*HTML 5 non-XML* XHTML 2* XHTML Mobile* WML Mobile* and printer-friendly PDF* formats. This is accomplished with Single Source Publishing, a content management system that uses templates in XSLT style sheets provided by XML Styles .com to transform the source content for various content delivery channels. There is also RDF* metadata that describes the content of this document.