HTML Document - Accessing content of a webpage

The Document object defined in the DOM specification provides access to the contents of an HTML page or an XML file. The most commonly used methods of the object are getElementByTagName and getElementById while write comes from the HTMLDocument interface, which will be used instead for webpages.

Purpose of the object

The document represents the whole page, its attributes indicate how it is defined and its methods allow access to its content.
It is used with the keyword document:

document.write("démo")

Attributes of Document

These are read-only attributes defined when creating the page.

DocumentType doctype
The doctype defines at top line.
DOMImplementation implementation
 
Element documentElement
An Element is an object that represents a tag and has methods.

Attributes of HTMLDocument

The HTMLDocument interface is derived from Document. You can access HTMLDocument by the name document (and therefore also to Document that is the superclass of HTMLDocument).
You can access the attributes and collections by the following names:

DOMString title
The contents of the tag <title> in the <head> section of the page.
DOMString referrer
When we load a page by clicking on a link, contains the URL of the calling page. If the address is typed in the URL bar of the browser, contains an empty string.
DOMString domain
The domain name of the website.
DOMString URL
The URL of the page (capitalized attribute).
HTMLElement body
Correspond to the <body> tag or the frameset.
HTMLCollection images
List of all IMG tags.
HTMLCollection applets
List of object and applet tags.
HTMLCollection anchors
List of all <a> tags. They must have a name attribute in HTML.
HTMLCollection links
As anchors, but beside <a> tags, also contains <area> with a href attribute.
HTMLCollection forms
List of forms on the page.
DOMString cookies
String containing data of cookie.

Collections are accessed by an index as an array for example: form [x].

Methods of Document

The DOM methods can take for parameter a DOMString object, which in practice is a simple string in JavaScript.

Element createElement(DOMString)
Creates an element, a tag, whose name is given, and returns an Element. We can give this tag attributes with the createAttribute method of the object Element.
Text createTextNode(DOMString)
Allows you to insert text in the document. Returns an object Text.
Element getElementById(DOMString)
Returns an element whose identifier is given in argument.
var x = document.getElementById("myid");
This implies that a tag should have an attribute id = "myid".
NodeList getElementsByTagName(DOMString)
Returns the list of elements whose tag name is given as a parameter.
var x = document.getElementsByTagName("a")
The example returns all the links in the page defined by the tags <a> </ a>. You can access the element of the NodeList by the method item(n) with the index in argument.

Methods of HTMLDocument

In addition to the methods of Document, HTMLDocument offers additional functions to directly alter the content of the page.

void open()
Create a new document.
void close()
Closes the current document.
void write(DOMString)
Write in the document.
void writeln(DOMString)
Writte the string in argument and add an end of line.

Note that we place text in HTML tags by assigning the innerHTML attribute. The write method is used to create a new document in a dynamic way.

NodeList

This interface has one attribute and a single method.

unsigned long length
Number of items in the list.
Node Item(unsigned long)
Returns a Node object corresponding to the index as a parameter. This index is between 0 and length minus 1.

Node

A node is any element of the tree of the document, possibly contained in another element, and which may contain other elements (except for Text which can not have children).

We do not describe here this interface that requires an entire chapter. Some attributes and methods are necessary to use a static document. They are all read-only.

DOMString nodeName
The name of the tag.
DOMString nodeValue
The data contained in the tag.
Node firstChild
The first child node.
If it designates a table, and this table contains tr tags, the rows of the table, tbl.firstChild returns the first row.
Node nextSibling
The successor of the node of which it is the attribute. For example:
var y = x.nextSibling;
Y is the Node following x in the list of children of a node or in the document. Example:
tr1 = tbl.firstChild      // first row of the tbl table
tr2 = tr1.nextSibling     // next row
Boolean hasChildNodes
Returns true if the node has children else false.

It is possible to assign a Node to an Element and vice-versa, that allows to use the methods of one or the other depending on need. To access the attributes (for example href in the case of a tag), we will use the element.

HTMLCollection

A collection represents a list of HTML tags in a page, such as images, links, etc..

unsigned long length
Many elements in the list.
Node item(unsigned long)
Returns an element of the list, according to the index.
Node namedItem(DOMString)
Returns an element of the list based on his ID or otherwise under the name attribute provided that this type of tag accepts the attribute name.

The methods return null in case of failure.

Demonstration

Demonstration of methods of the DOM object Document and the HTMLDocument interface (which inherits) showing how to access data in the document.

Click a button to see data that can be obtained with these interfaces. The source code associated to the button is displayed on the right.

onclick="alert(document.title)"

onclick="alert(document.URL)"

onclick="alert(document.domain)"

onclick="alert(document.referrer)"

onclick="alert(document.anchors.item(0))"
Note: without a name attribute, anchors are not included.

onclick="alert(document.links.item(0))"
All links are included.

onclick="alert(document.doctype.name)"

Specifications from the W3C

© 2008-2014 Xul.fr