Chapter 4 XML Markup

Table of Contents
4.1 XHTML
4.2 DocBook

This chapter describes the two markup languages you will encounter when you contribute to the FreeBSD documentation project. Each section describes the markup language, and details the markup that you are likely to want to use, or that is already in use.

These markup languages contain a large number of elements, and it can be confusing sometimes to know which element to use for a particular situation. This section goes through the elements you are most likely to need, and gives examples of how you would use them.

This is not an exhaustive list of elements, since that would just reiterate the documentation for each language. The aim of this section is to list those elements more likely to be useful to you. If you have a question about how best to markup a particular piece of content, please post it to the FreeBSD documentation project mailing list.

Inline Versus Block: In the remainder of this document, when describing elements, inline means that the element can occur within a block element, and does not cause a line break. A block element, by comparison, will cause a line break (and other processing) when it is encountered.

4.1 XHTML

XHTML is the XML version of the HyperText Markup Language, which is the markup language of choice on the World Wide Web. More information can be found at http://www.w3.org/.

XHTML is used to markup pages on the FreeBSD web site. It should not (generally) be used to mark up other documentation, since DocBook offers a far richer set of elements to choose from. Consequently, you will normally only encounter XHTML pages if you are writing for the web site.

HTML has gone through a number of versions, 1, 2, 3.0, 3.2, 4.0 and then an XML-compliant version has also been created, which is called XHTML and the latest widespread version of it is XHTML 1.0(available in both strict and transitional variants).

The XHTML DTDs are available from the Ports Collection in the textproc/xhtml port. They are automatically installed as part of the textproc/docproj port.

4.1.1 Formal Public Identifier (FPI)

There are a number of XHTML FPIs, depending upon the version (also known as the level) of XHTML that you want to declare your document to be compliant with.

The majority of XHTML documents on the FreeBSD web site comply with the transitional version of XHTML 1.0.

PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

4.1.2 Sectional Elements

An XHTML document is normally split into two sections. The first section, called the head, contains meta-information about the document, such as its title, the name of the author, the parent document, and so on. The second section, the body, contains the content that will be displayed to the user.

These sections are indicated with <head> and <body> elements respectively. These elements are contained within the top-level <html> element.

Example 4-1. Normal XHTML Document Structure

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
	  <title>The Document's Title</title>
  </head>

  <body>

    …

  </body>
</html>

4.1.3 Block Elements

4.1.3.1 Headings

XHTML allows you to denote headings in your document, at up to six different levels.

The largest and most prominent heading is <h1>, then <h2>, continuing down to <h6>.

The element's content is the text of the heading.

Example 4-2. <h1>, <h2>, and Other Header Tags

Use:

<h1>First section</h1>

<!-- Document introduction goes here -->

<h2>This is the heading for the first section</h2>

<!-- Content for the first section goes here -->

<h3>This is the heading for the first sub-section</h3>

<!-- Content for the first sub-section goes here -->

<h2>This is the heading for the second section</h2>

<!-- Content for the second section goes here -->

Generally, an XHTML page should have one first level heading (<h1>). This can contain many second level headings (<h2>), which can in turn contain many third level headings. Each <hn> element should have the same element, but one further up the hierarchy, preceding it. Leaving gaps in the numbering is to be avoided.

Example 4-3. Bad Ordering of <hn> Elements

Use:

<h1>First section</h1>

<!-- Document introduction -->

<h3>Sub-section</h3>

<!-- This is bad, <h2> has been left out -->

4.1.3.2 Paragraphs

XHTML supports a single paragraph element, <p>.

Example 4-4. <p>

Use:

<p>This is a paragraph.  It can contain just about any
  other element.</p>

4.1.3.3 Block Quotations

A block quotation is an extended quotation from another document that should not appear within the current paragraph.

Example 4-5. <blockquote>

Use:

<p>A small excerpt from the US Constitution:</p>

<blockquote>We the People of the United States, in Order to form
  a more perfect Union, establish Justice, insure domestic
  Tranquility, provide for the common defence, promote the general
  Welfare, and secure the Blessings of Liberty to ourselves and our
  Posterity, do ordain and establish this Constitution for the
  United States of America.</blockquote>

4.1.3.4 Lists

You can present the user with three types of lists, ordered, unordered, and definition.

Typically, each entry in an ordered list will be numbered, while each entry in an unordered list will be preceded by a bullet point. Definition lists are composed of two sections for each entry. The first section is the term being defined, and the second section is the definition of the term.

Ordered lists are indicated by the <ol> element, unordered lists by the <ul> element, and definition lists by the <dl> element.

Ordered and unordered lists contain listitems, indicated by the <li> element. A listitem can contain textual content, or it may be further wrapped in one or more <p> elements.

Definition lists contain definition terms (<dt>) and definition descriptions (<dd>). A definition term can only contain inline elements. A definition description can contain other block elements.

Example 4-6. <ul> and <ol>

Use:

<p>An unordered list.  Listitems will probably be
  preceded by bullets.</p>

<ul>
  <li>First item</li>

  <li>Second item</li>

  <li>Third item</li>
</ul>

<p>An ordered list, with list items consisting of multiple
  paragraphs.  Each item (note: not each paragraph) will be
  numbered.</p>

<ol>
  <li><p>This is the first item.  It only has one paragraph.</p></li>

  <li><p>This is the first paragraph of the second item.</p>

    <p>This is the second paragraph of the second item.</p></li>

  <li><p>This is the first and only paragraph of the third
    item.</p></li>
</ol>

Example 4-7. Definition Lists with <dl>

Use:

<dl>
  <dt>Term 1</dt>

  <dd><p>Paragraph 1 of definition 1.</p>

    <p>Paragraph 2 of definition 1.</p></dd>

  <dt>Term 2</dt>

  <dd><p>Paragraph 1 of definition 2.</p></dd>

  <dt>Term 3</dt>

  <dd><p>Paragraph 1 of definition 3.</p></dd>
</dl>

4.1.3.5 Pre-formatted Text

You can indicate that text should be shown to the user exactly as it is in the file. Typically, this means that the text is shown in a fixed font, multiple spaces are not merged into one, and line breaks in the text are significant.

In order to do this, wrap the content in the <pre> element.

Example 4-8. <pre>

You could use <pre> to mark up an email message:

<pre>  From: nik@FreeBSD.org
  To: freebsd-doc@FreeBSD.org
  Subject: New documentation available

  There is a new copy of my primer for contributors to the FreeBSD
  Documentation Project available at

    &lt;URL:http://people.FreeBSD.org/~nik/primer/index.html&gt;

  Comments appreciated.

  N</pre>

Keep in mind that < and & still are recognized as special characters in pre-formatted text. This is why the example shown had to use &lt; instead of <. For consistency, &gt; was used in place of >, too. Watch out for the special characters that may appear in text copied from a plain-text source, e.g., an email message or program code.

4.1.3.6 Tables

Note: Most text-mode browsers (such as Lynx) do not render tables particularly effectively. If you are relying on the tabular display of your content, you should consider using alternative markup to prevent confusion.

Mark up tabular information using the <table> element. A table consists of one or more table rows (<tr>), each containing one or more cells of table data (<td>). Each cell can contain other block elements, such as paragraphs or lists. It can also contain another table (this nesting can repeat indefinitely). If the cell only contains one paragraph then you do not need to include the <p> element.

Example 4-9. Simple Use of <table>

Use:

<p>This is a simple 2x2 table.</p>

<table>
  <tr>
    <td>Top left cell</td>

    <td>Top right cell</td>
  </tr>

  <tr>
    <td>Bottom left cell</td>

    <td>Bottom right cell</td>
  </tr>
</table>

A cell can span multiple rows and columns. To indicate this, add the rowspan and/or colspan attributes, with values indicating the number of rows or columns that should be spanned.

Example 4-10. Using rowspan

Use:

<p>One tall thin cell on the left, two short cells next to
  it on the right.</p>

<table>
  <tr>
    <td rowspan="2">Long and thin</td>
  </tr>

  <tr>
    <td>Top cell</td>

    <td>Bottom cell</td>
  </tr>
</table>

Example 4-11. Using colspan

Use:

<p>One long cell on top, two short cells below it.</p>

<table>
  <tr>
    <td colspan="2">Top cell</td>
  </tr>

  <tr>
    <td>Bottom left cell</td>

    <td>Bottom right cell</td>
  </tr>
</table>

Example 4-12. Using rowspan and colspan Together

Use:

<p>On a 3x3 grid, the top left block is a 2x2 set of
  cells merged into one.  The other cells are normal.</p>

<table>
  <tr>
    <td colspan="2" rowspan="2">Top left large cell</td>

    <td>Top right cell</td>
  </tr>

  <tr>
    <!-- Because the large cell on the left merges into
         this row, the first <td> will occur on its
         right -->

    <td>Middle right cell</td>
  </tr>

  <tr>
    <td>Bottom left cell</td>

    <td>Bottom middle cell</td>

    <td>Bottom right cell</td>
  </tr>
</table>

4.1.4 In-line Elements

4.1.4.1 Emphasizing Information

You have two levels of emphasis available in XHTML, <em> and <strong>. <em> is for a normal level of emphasis and <strong> indicates stronger emphasis.

Typically, <em> is rendered in italic and <strong> is rendered in bold. This is not always the case, however, and you should not rely on it. According to best practices, webpages only hold structural and semantical information and stylesheets are later applied to use these two so you should think of semantics not formatting when using these tags.

Example 4-13. <em> and <strong>

Use:

<p><em>This</em> has been emphasized, while
  <strong>this</strong> has been strongly emphasized.</p>

4.1.4.2 Indicating Fixed-Pitch Text

If you have content that should be rendered in a fixed pitch (typewriter) typeface, use <tt> (for “teletype”).

Example 4-14. <tt>

Use:

<p>This document was originally written by
  Nik Clayton, who can be reached by email as
  <tt>nik@FreeBSD.org</tt>.</p>

4.1.5 Links

Note: Links are also inline elements.

4.1.5.1 Linking to Other Documents on the WWW

In order to include a link to another document on the WWW you must know the URL of the document you want to link to.

The link is indicated with <a>, and the href attribute contains the URL of the target document. The content of the element becomes the link, and is normally indicated to the user in some way (underlining, change of color, different mouse cursor when over the link, and so on).

Example 4-15. Using <a href="...">

Use:

<p>More information is available at the
  <a href="http://www.FreeBSD.org/">FreeBSD web site</a>.</p>

These links will take the user to the top of the chosen document.

4.1.5.2 Linking to Other Parts of Documents

Linking to a point within another document (or within the same document) requires that the document author include anchors that you can link to.

Anchors are indicated with <a> and the id attribute instead of href.

Example 4-16. Using <a id="...">

Use:

<p><a id="para1">This</a> paragraph can be referenced
  in other links with the name <tt>para1</tt>.</p>

To link to a named part of a document, write a normal link to that document, but include the id of the anchor after a # symbol.

Example 4-17. Linking to a Named Part of Another Document

Assume that the para1 example resides in a document called foo.html.

<p>More information can be found in the
  <a href="foo.html#para1">first paragraph</a> of
  <tt>foo.html</tt>.</p>

If you are linking to a named anchor within the same document then you can omit the document's URL, and just include the name of the anchor (with the preceding #).

Example 4-18. Linking to a Named Part of the Same Document

Assume that the para1 example resides in this document:

<p>More information can be found in the
  <a href="#para1">first paragraph</a> of this
  document.</p>