Structure of an HTML 5 page
What are the tags of structure required to an HTML page in the specification of version 5?
The basic structure is as follows:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title></title> <meta name="description" content=""> </head> <body> </body> </html>
What changes with HTML 5? The format is much simplified compared to the previous standard with especially a minimal doctype.
The document type has been introduced to mark the difference between old browsers which followed the usual format in the 90s and newer browsers that are closer to the HTML specifications 3 then 4 and 5.
On most browser, a missing DOCTYPE order compatibility with older formats.
The language and the lang attribute
The lang attribute is not for browsers, but rather to the processing tools that must understand contents according to their language.
And among these tools, search engines are not included, they ignore this attribute and prefer to rely on the content to know the language.
It can therefore be considered optional. Even if the lang attribute is typically used for the whole document, it can be assigned to a particular item, for example:
<p lang="fr">Citation en Français</p>
The tag contains several types of elements:
- Encoding with the meta tag or charset.
- The title of the page.
- Links with the link tag.
- And other indications by metas.
The most common tag has the following form:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
It defines the content type, its format that is generally text/html and its encoding, usually the utf8 charset.
This tag is for the server that notifies the browser. It may be omitted if the server is configured, for example through .htaccess, to assign the format to the pages forf a given extension, like html.
This tag should be the first in the HEAD section, because the server will process the text above as ASCII with no specific format that it only known once the tag is analyzed.
This basic format is generally sufficient for all situations. There are other charsets, like iso-8859-1, but they add nothing more in the Latin world. For pages in Chinese or Japanese, a different coding is required.
Care must be taken however when you include dynamic content that must be encoded in the specified charset.
HTML 5 can simplify the encoding:
This was actually implemented before HTML 5 but was not previously part of the specification. The quotes are not required.
HTML is assumed by default, and it is only needed to specify the charset. It remains to verify that the page code is in this format, which is not necessarily automatic with all HTML editors.
Many links can be specified in the header. Some are essential to the browser as the link to a style sheet or the RSS feed, or the favicon.
Others are optional as the prefetch value which loads a page in the background, and speed up the display.
Sample of links
<link rel="icon" type="image/gif" href="/favicon.gif" />
<link rel="stylesheet" type="text/css" href="style.css">
RSS or Atom
<link rel="alternate" type="application/rss+xml" href="" title="">
Other common attributes are nofollow that tells search engines not to follow links on the page.
In HTML 4 there is no structure specialized tags, the content is structured with <div> <span> and other containers.
HTML 5 introduces multiple tags to help represent the usual structure of documents.
- Contains an introduction to a part or the whole page.
- Contains information that are usually placed at the end of a section. We can put it at the end of a section or page, but also anywhere in the section.
For example it contains a link on the index, which can be placed below the title.
- Sections mark out parts of content. It is then up to the webmaster to associate a style sheet or using them dynamically in scripts.
Very basically, we can frame a section with a border, or separate it from the above by a space.
- Represents the header of a section. The <header> tag may contain at the beginning a <hgroup> tag.
- This container is intended to enclose a group of links.
- Denotes a typical content that can be found on different pages, or even different sites. This can be a forum post, a newspaper article and this is for tools to extract more easily the content (by separating the unnecessary data such as navigation menus).
- To delimit something separate to the actual content, and may define a sidebar.
- Contains contact information, eg name of the author.
- Used to mark a portion of a text, highlight, as the old <strong> but more general.
There are many other semantic tags, which can be found described in the documents in references.
- Table. Organizing data into a table.
- HTML 5 overview. New structure tags.
- Charset in HTML 4. W3C.
- Values of the rl attribute.