XML - eXtensible Markup Language

What is XML and why use it?

XML is a tag based format as HTML, but it describes the content rather than the presentation of that content.
Tags may have attributes also, and that sounds as objects in programming languages. Actually, XML may be used to serialize classes from programming languages.
As its name suggests, XML is extensible and it serves as the basis for many descriptive languages as discussed below.

XML markup language and children

XML is more and more used as the format for documents, and it is now the file format for Office and LibreOffice.
The first recommendation by the W3C, XML 1.0 dated 16 August 2006, but its history goes back to 1996 as indicated in the history from the W3C.

Advantages of XML

Any type of data may be described by XML providing there is a grammar of the structure (the tags). Its universality allows to use it in any context and system.
Its tree structure and extensibility allows to describe anything. It is so easy to parse a document by a script while it remain easy to read by human.
As the tags contain raw data, it is easy to perform searches on them.

Defining the grammar

Before to write an XML document, you should write a Document Type Declaration. A DTD declares a grammar of tags, and an XML document is an instance of that grammar as an object is an instance of a class, or as a program for a language.
The DTD may be included into the XML document, or linked by an URL (web address).
Without the DTD, the XML document may be used but not checked for validity. A document is validated with:

  1. DCD  (Document Content Description for XML).
    DCD is a language that provides a structural schema facility (using XML syntax), which replaces the functions of the DTD to describe constraints on tags and content of XML documents. Additionally it also describes datatypes and relationships in databases.
    DCD incorporates a subset of XML-Data, and is an RDF vocabulary.
  2. A schema, as a DTD, describes the grammar of tags for validating XML documents.
    Schemas have the same syntax that XML (DTD use a different one), allow custom datatypes and have lot or predefined ones.

Valid format for an XML document

An XML document has the following form:

<?xml version="1.0" encoding="utf-8" ?>
<xml>
   <tag attribute="value" attribute="value.."
       ... actual content ...
   </tag>
</xml>  

The file begins with a descriptor. This is optional, but it depends on the tools, they may require these informations of version and encoding.

A root tag encompasses all others. It is here called <xml>, but it is only an example. Any other name is possible.
The root tag can include other tags that can either embrace other tags or contain a text. Each can have attributes in the form name = "value".

If a tag does not have content, it can be written in short form without using the name to close it:

<tab attribute="value" /> 

Many other formalisms can be added to this basic structure, but it is sufficient for most documents.

How to use XML

To use an XML document, you need for a parser. Several kinds of parsers exist.
The parser may translate the document into a tree in memory, accessible through to the Document Object Model (DOM). But you can also associate functions to tags, in the "sax" way.
Implementations of parsers are available for all programming languages.
XML, as HTML, has its stylesheet, named XSL that provides rules to transform XML into another format (XHTML for example).
Tools has been standardized (by the W3C) to access XML documents.

Applications beyond data storage

 The power of XML goes beyond simple data storage, is is also a langage of interface of applications.

It is also possible to turn XML into executable by inserting tags that are recognized as instructions by a special parser.

Extensions and languages based on XML

XSL  (eXtensible Stylesheet Language)
An XSL document is a set of transformation rules, allowing to map structures with choosen elements and attributes in XML documents.
A set of rules for translating an XML document into HTML is the best example of an XSLl, but we can translate XML into anything.
The difference between XSL and XSLT is that XSL produces any format while XSLT convert an XML document to another XML document.

SVG  (Scalable Vector Graphics)
An API that describes graphical objects, that can be dynamically interfaced with JavaScript to make animations. A SVG document currently may be displayed as a web page providing the browser has the correct plug-in.

SMIL (Synchronized Multimedia Integration Language)
Multimedia language that combines data from various sources, to make animations.

XQuery. (XML Query)
Specification to turn XML documents into databases.

XHTML (XML HTML 4.0)
HTML rewriten in XML, with an associated DTD.

XForms (XML Forms)
Defining forms.

RDF (Ressource Definition Format)
Standard to describe various date including images. It adds a description of the structure to that of data.

RSS
It is a set of formats for syndication, RSS 1.0 being defined in RDF, et RSS 2.0 en XML as is Atom.

XML Oriented language
Scriptol is a programming language which has a syntax similar to XML and embed XML in the source.

Schema. Standard format to validate any XML document.

Tools

XML parsers

Other tools

W3C Specifications

© 2006-2021 Xul.fr