HTML Document - Accessing content of a webpage
The Document object defined in the DOM specification provides access to the contents of an HTML page or an XML file. The most commonly used methods of the object are getElementByTagName and getElementById while write comes from the HTMLDocument interface, which will be used instead for webpages.
Purpose of the object
represents the whole page, its attributes indicate how it is defined and
its methods allow access to its content.
It is used with the keyword document:
Attributes of Document
These are read-only attributes defined when creating the page.
- DocumentType doctype
- The doctype defines at top line.
- DOMImplementation implementation
- Element documentElement
- An Element is an object that represents a tag and has methods.
Attributes of HTMLDocument
The HTMLDocument interface is derived from Document. You can access
HTMLDocument by the name document (and therefore also to Document
that is the superclass of HTMLDocument).
You can access the attributes and collections by the following names:
- DOMString title
- The contents of the tag <title> in the <head> section of the page.
- DOMString referrer
- When we load a page by clicking on a link, contains the URL of the calling page. If the address is typed in the URL bar of the browser, contains an empty string.
- DOMString domain
- The domain name of the website.
- DOMString URL
- The URL of the page (capitalized attribute).
- HTMLElement body
- Correspond to the <body> tag or the frameset.
- HTMLCollection images
- List of all IMG tags.
- HTMLCollection applets
- List of object and applet tags.
- HTMLCollection anchors
- List of all <a> tags. They must have a name attribute in HTML.
- HTMLCollection links
- As anchors, but beside <a> tags, also contains <area> with a href attribute.
- HTMLCollection forms
- List of forms on the page.
- DOMString cookies
- String containing data of cookie.
Collections are accessed by an index as an array for example: form [x].
Methods of Document
- Element createElement(DOMString)
- Creates an element, a tag, whose name is given, and returns an Element. We can give this tag attributes with the createAttribute method of the object Element.
- Text createTextNode(DOMString)
- Allows you to insert text in the document. Returns an object Text.
- Element getElementById(DOMString)
- Returns an element whose identifier is given in argument.
This implies that a tag should have an attribute id = "myid".
var x = document.getElementById("myid");
- NodeList getElementsByTagName(DOMString)
- Returns the list of elements whose tag name is given as a parameter.
The example returns all the links in the page defined by the tags <a> </ a>. You can access the element of the NodeList by the method item(n) with the index in argument.
var x = document.getElementsByTagName("a")
Methods of HTMLDocument
In addition to the methods of Document, HTMLDocument offers additional functions to directly alter the content of the page.
- void open()
- Create a new document.
- void close()
- Closes the current document.
- void write(DOMString)
- Write in the document.
- void writeln(DOMString)
- Writte the string in argument and add an end of line.
Note that we place text in HTML tags by assigning the innerHTML attribute. The write method is used to create a new document in a dynamic way.
This interface has one attribute and a single method.
- unsigned long length
- Number of items in the list.
- Node Item(unsigned long)
- Returns a Node object corresponding to the index as a parameter. This index is between 0 and length minus 1.
A node is any element of the tree of the document, possibly contained in another element, and which may contain other elements (except for Text which can not have children).
We do not describe here this interface that requires an entire chapter. Some attributes and methods are necessary to use a static document. They are all read-only.
- DOMString nodeName
- The name of the tag.
- DOMString nodeValue
- The data contained in the tag.
- Node firstChild
- The first child node.
If it designates a table, and this table contains tr tags, the rows of the table, tbl.firstChild returns the first row.
- Node nextSibling
- The successor of the node of which it is the attribute. For example:
Y is the Node following x in the list of children of a node or in the document. Example:
var y = x.nextSibling;
tr1 = tbl.firstChild // first row of the tbl table tr2 = tr1.nextSibling // next row
- Boolean hasChildNodes
- Returns true if the node has children else false.
It is possible to assign a Node to an Element and vice-versa, that allows to use the methods of one or the other depending on need. To access the attributes (for example href in the case of a tag), we will use the element.
A collection represents a list of HTML tags in a page, such as images, links, etc..
- unsigned long length
- Many elements in the list.
- Node item(unsigned long)
- Returns an element of the list, according to the index.
- Node namedItem(DOMString)
- Returns an element of the list based on his ID or otherwise under the name attribute provided that this type of tag accepts the attribute name.
The methods return null in case of failure.
Demonstration of methods of the DOM object Document and the HTMLDocument interface (which inherits) showing how to access data in the document.
Click a button to see data that can be obtained with these interfaces. The source code associated to the button is displayed on the right.