RSS 1.0 Tutorial

RDF Site Summary is the format of syndication of the semantic Web, for sites of social network. It is based on Resource Description Framework, defined in the XML markup language. It is a standard to share a list of news or ads on the Web. It is also an modular format expandable with namespaces.

This tutorial describes the structure and tags of RSS 1.0 and provides examples and links to useful software.
Note that the content of the examples is purely descriptive and the links do not correspond to real URLs.

Summary

History of the format

RSS 0.90 was created by Netscape in March 1999 for its own use, and it was the second format after ScriptingNews of UserLand in 1997. The header was RDF, the body in XML. The RDF specification by the W3C was published the same year.
He inspired ScriptingNews 2.0 of UserLand whereupon Netscape created the format RSS 0.91, which in turn made use of improvements in ScriptingNews 2.0.
In August 2000, the O'Reilly publisher offers the RSS 1.0 format based entirely on RDF.

The semantic Web

The Semantic Web as of the W3C:

It is about common formats for integration and combination of data drawn from diverse sources.
It is also about language for recording how the data relates to real world objects.

This allows when you are logged on the Web into a database to access other databases which are in connection to it. This requires tools of knowledge representation, therefore formats.

The RDF format

The RDF format has been specified by the W3C in 1999 to record relations in a form usable by computers. The fact that X is the author of a document connects two databases. The notion of resources is the basis, it is a document on the Web, in any form. The format records relations between documents in order to exchange and process them automatically.

It is therefore convenient for RSS feeds in a social network where relationships between people are processed belon publication of documents.

The RDF format provides to the feed the attribute rdf:about and a summary, which can be handled by specialized software and user interfaces as XUL. The contribution of the RDF format is primarily in extension modules and very few in the core standard.

What's an RSS 1.0 feed?

The quality of the semantic Web is that RSS will be connected to various databases: articles, authors, sites...

Information provided

  1. The channel title, link, description.
  2. Articles: title, description, link.
  3. The summary of the feed: the list of items.

This information can be automatically extracted from the pages.

Differences with RSS 2.0

The formats 1.0 and 2.0 contain common information on channel and articles: title, link and description, thus allowing achieving universal feed readers that can read all the RSS files.
However, the 1.0 format is more developed than the 2.0 format:

Différences:

Where is taken the information?

Content management systems, or CMS, such as Wordpress, must extract information automatically from the new pages on the site.

Channel
  1. <title> title of the homepage, or <h1> tag.
  2. <summary>: description found in the meta description tag on the home page or taken from the content.
  3. The rdf-about attribute of the channel: the URL of the feed.
Item
  1. <title>: title of the page containing the article, or <h1> tag.
  2. <description>: meta tag description page of the article or the first paragraph, or extracted from the content.
  3. <date>: date of last modification of the file.
  4. <link>: absolute URL of the page.
  5. The rdf-about attribute of the item: the URL of the page.

A CMS may have in the configuration panel fields that are used to generate the feed, including the title and description of the site.

Building an RSS 1.0 feed step-by-step

A Web site that meets the standards requires no change in its content so it can automatically generate a feed or syndication of content. The home page will provide information to the channel, and each page will match an item in the feed.

Preparing a Web page

This step is not useful if you manually construct the feed, but if it must be generated automatically by a script, some data must be included in the page:

  1. The <title> tag contains the title of the page as displayed by search engines in results. The <h1> tag can play the same role.
  2. The description. Either in the meta description tag, either in the text itself. In this case it could be designated by an identifier:
<span div="description">Summary of the  page</span>

If one has a page that is the summary of recent articles on a site, this HTML page can be turned into RSS feeds. A method is proposed in resources by the W3C. However, with current tools, it is easier to do the opposite, generate the feed with a script and display it in an HTML page.
The pages are automatically generated and prepared by the CMS.

Structure of the document

The feed is an XML and RDF file. It is defined thus:

<?xml version= "1.0"?>
<rdf :RDF : xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/"> 
... channel and articles ...
 </rdf:RDF>

Two namespaces are defined, that of RDF by the W3C and that of RSS 1.0 by Purl.

Defining the channel

The channel is defined by:
- A title.
- A description.
- A link.
- The summary of articles.
- And optionally an image, and a textinput.

The base channel will be defined as follows:

<channel rdf:about="https://www.xul.fr/xml/news.rdf">
<title>Title of the feed</title>
<link>https://www.xul.fr</link>
    <description>
        The topic of the feed.
    </description>
</channel>

We will see further optional tags and the summary, that is required.

The property rdf:about defines a unique URL, which is that of the feed.
The title is the subject of the feed, or the title of the website if there is a single feed for the site.
The link is usually the homepage, or the folder, or a page of news.
The description defines the purpose of the feed.

Defining an element

Each element is defined by the title, description and the link to the article.

The title comes from the <title> tag of the page, or is missing that of the first <h1> tag.

<meta name="description" content="my description">

We can build it also from the page contents. This is what the ARA RSS editor does.

The URL of the page will be assigned to the <link> tag and also the property <rdf:about> for RDF software.

An article will have the following form:

<item rdf:about="https://www.xul.fr/article.html">
    <title>Title of the article</title>
    <link>https://www.xul.fr/article.html</link>
    <description>
        Abstract of the article or content of the anounce.
    </description>
</item>

An item is defining for each article or ad, with a maximum recommended but not absolute of 15 items.

Creating a summary

The table of contents is defined by the <items> tag.

Elements of summary:

- The <items> container.
- The <rdf:Seq> tag for a list in RDF format.
- As many <rdf:li> tags as articles.

Example of summary:

<items>
    <rdf:Seq>
        <rdf:li rdf:resource="https://www.xul.fr/article1.html" />
        <rdf:li rdf:resource="https://www.xul.fr/article2.html" />
    </rdf:Seq>
</items>

Definition of the complete channel with a summary:

<channel rdf:about="https://www.xul.fr/en/news.rdf">
<title>Title of the feed</title>
<link>https://www.xul.fr</link>
    <description>
        The topic of the feed.
    </description>
    <items>
        <rdf:Seq>
            <rdf:li rdf:resource="https://www.xul.fr/article1.html" />
            <rdf:li rdf:resource="https://www.xul.fr/article2.html" />
        </rdf:Seq>
    </items>
</channel>

A file without summary will be rejected by RDF parsers, but universal readers ignore it.

Adding an image to the channel

An image of 88x31 pixel size and in a standard file format can be displayed in the feed. For this we use the optional tag image. The image tag is included in the channel tag.

The rdf:resource attribute combines the canal and the image file.

<image rdf:resource="https://www.xul.fr/logo.gif" />

The canal is thus linked to an image that is further defined in an image tag at the same level as item tags. The definition contains internal tags title, link and url, which indicate in order: the title of the site, the link on the site, and the url of the image. Here the attribute rdf: about defines the URL of the image.

<image rdf:about="https://www.xul.fr/logo.gif">
    <title>Ajax et XUL</title>
    <link>https://www.xul.fr</link>
    <url>https://www.xul.fr/logo.gif</url>
</image>

These definitions are somewhat redundant and are intended for software processing automatically RDF files.

Adding textinput

A textinput is an HTML form field. It allows you to enter text on the page of the feed as in our example, a search box. The handling of the form is made by the software that displays the feed, which must convert this into HTML form and involve processing of the text entered by the user.

As for the image, textinput is declared in the canal and it is defined in the feed at the same level as the articles.

Declaration:

<textinput rdf:resource="https://www.xul.fr/search.php" />

Example definition textinput for a field of research:

<textinput rdf:about="https://www.xul.fr/search.php">
    <title>Research</title>
    <description>search on xul.fr</description>
    <name>myfield</name>
    <link>https://www.xul.fr/search.php</link>
</textinput>

The name property will be used by DOM functions to locate the form on the page.

Giving dates

There is no date tag in the core RSS 1.0 standard, dates of publication or modification must be specified by the dublin core module.

In the list of namespaces, we add this line:

xmlns:dc="http://purl.org/dc/elements/1.1/"

And in the <item> tag, we add the date line in accordance with the specification of the module.

<dc:date>2008-05-16/dc:date>

This module can also give a date to the channel and add metadata about the articles.

Example

Using an RSS 1.0 feed

The feed may be placed on a Web site and read by browsers or be handled by an aggregator or RDF software. In all cases some conventions should be followed ...

Mime type:

The type recommended ist application/rss, but we may use application/rss+xml.

Extension of the file:

The .rdf extension identifies the nature of the feed, and is therefore recommended.

Encoding:

RSS 1.0 is encoded in UTF-8. So if it is displaying in a Web page, it must have this format or the feed must be converted in the format of the page.

Page header:

For the feed to be seen by browsers, the header of the page should contain the following line (change the url and title):

<link rel="alternate" type="application/rss+xml" href="https://www.xul.fr/rss.rdf" title="My RSS 1.0 feed">

A link on the feed inform visitors of its presence. It can use the rss image: (download the image, do not use the image directly on this site as did Facebook without regard to my bandwidth, because the image will be replaced by the xul.fr logo. perhaps you've met this logo on Facebook blogs?).

References

© 2008-2012 Xul.fr