Converting to HTML 5
Converting from 2000 W3C standard HTML
(Some links on this page take you to details in the HTML Tag Reference. Bookmark this page in your Favorites so you can come back to it later.)
Converting to HTML 5 polyglot code from a strict implementation of the previous version of HTML is as simple as 1-2-3:
- Remove the PUBLIC FPI and SYSTEM identifiers from the DOCTYPE declaration.
- Replace any obsolete or deprecated HTML tags or constructs with HTML code that is compliant with the HTML 5 standard. For example, make sure that any set of <col> tags for HTML table columns always have a colgroup element as their parent.
- Start taking advantage of new features of HTML 5, such as converting from <div> tags to HTML 5 sectioning tags.
Converting from 20th Century versions of HTML (prior to 2000)
Converting from earlier versions of HTML or transitional HTML code requires more extensive changes, since there were significant changes to the HTML standard between 1997 and 2000 in order to support XML-based parsers, mobile devices with stricter parsing rules, cacheable client-side templates and aggregation with other types of content such as RSS feeds.
- Make sure the document starts with an xml declaration and a DOCTYPE declaration.
- Make sure the <html> top element tag includes the
xmlns="http://www.w3.org/1999/xhtml"
attribute. (The namespace URI for xHTML and later verions of HTML includes the year "1999" because it was assigned that year while the W3C HTML Recommendation released in 2000 was still being developed.) - Make sure that all tags are matched with end tags, or are self-closed with
/>
. Make sure the element names in the start and end tags match and are lower case. Make sure that all attribute values are enclosed in quotation marks. Make sure boolean attributes are coded in their full form using the attribute name in quotes as the value (
attribute="attribute"
) when the value is true and completely omitting it when the value is false. The full form will be properly understood by web browsers parsing polyglot documents with either the HTML syntax or the XML syntax of HTML 5. Avoid using a minimized form for boolean attributes, such asselected
, or values with an empty string, such as inselected=""
, which XPath treats as false rather than true.Note that the HTML 5 specifications explicitly state that:
The values "true" and "false" are not allowed on boolean attributes.
This is because browsers that look at the coded value for boolean attributes would treat the string
"false"
as false while browsers that only look for the presence or absence of the attribute would treat that code as true, resulting in very inconsistent behavior.Boolean attributes that may need to be changed include:
async
- change to
async="async"
checked
- change to
checked="checked"
compact
- change to
compact="compact"
declare
- change to
declare="declare"
defer
- change to
defer="defer"
disabled
- change to
disabled="disabled"
ismap
- change to
ismap="ismap"
multiple
- change to
multiple="multiple"
noresize
- change to
noresize="noresize"
noshade
- change to
noshade="noshade"
nowrap
- change to
nowrap="nowrap"
open
- change to
open="open"
readonly
- change to
readonly="readonly"
required
- change to
required="required"
reversed
- change to
reversed="reversed"
scoped
- change to
scoped="scoped"
selected
- change to
selected="selected"
Note that "true"
and "false"
are valid values for some non-boolean attributes, in particular enumerated attributes such as the draggable
attribute.
Detecting which version of HTML is being used
A good way to determine which version of HTML a web site is using is to submit the URL of the web site to the W3C Markup Validation Service. Possible results include:
- HTML5
- indicates that the site has already been converted to HTML 5. For example, Google's web site is using HTML 5.
- XHTML 1.0 Strict
- indicates that the site is using the 2000 W3C standard version of HTML. For example, the W3C web site itself adheres to that standard.
- XHTML 1.0 Transitional or HTML 4.01 Transitional
- indicates that the site is using a transitional format between the 1997 HTML 4 standard and the 2000 W3C standard version of HTML. For example, AltaVista uses the HTML 4.01 Transitional format and Microsoft's web site uses the XHTML 1.0 Transitional format. A web site that is using the 2000 XHTML 1.0 Transitional format is easier to convert to HTML 5 than one that is using the 1997 HTML 4.01 Transitional format.
- HTML 4.01 Strict
- indicates that the site is using the older 1997 HTML 4 version of HTML. For example, the Yahoo! web site uses the older HTML standard.
- indicates that the site is using a transitional format between the 1997 HTML 4 standard and the 2000 W3C standard version of HTML. For example, Microsoft's web site uses this transitional format.
If the W3C HTML validator reports any errors while checking the HTML syntax of your own web site, fixing those errors will make the rest of the conversion to HTML 5 go easier.