All the vocabularies written in XML share certain characteristics. This is hardly surprising, as the philosophy behind XML will inevitably show through. One of the most obvious manifestations of this philosophy is that of content and elements.
Your documentation (whether it is a single web page, or a lengthy book) is considered to consist of content. This content is then divided (and further subdivided) into elements. The purpose of adding markup is to name and identify the boundaries of these elements for further processing.
For example, consider a typical book. At the very top level, the book is itself an element. This “book” element obviously contains chapters, which can be considered to be elements in their own right. Each chapter will contain more elements, such as paragraphs, quotations, and footnotes. Each paragraph might contain further elements, identifying content that was direct speech, or the name of a character in the story.
You might like to think of this as “chunking” content. At the very top level you have one chunk, the book. Look a little deeper, and you have more chunks, the individual chapters. These are chunked further into paragraphs, footnotes, character names, and so on.
Notice how you can make this differentiation between different elements of the content without resorting to any XML terms. It really is surprisingly straightforward. You could do this with a highlighter pen and a printout of the book, using different colors to indicate different chunks of content.
Of course, we do not have an electronic highlighter pen, so we need some other way of indicating which element each piece of content belongs to. In languages written in XML (XHTML, DocBook, et al) this is done by means of tags.
A tag is used to identify where a particular element starts, and where the element ends. The tag is not part of the element itself. Because each grammar was normally written to mark up specific types of information, each one will recognize different elements, and will therefore have different names for the tags.
For an element called
element-name
the start tag will
normally look like
. The
corresponding closing tag for this element is
element-name
/
.element-name
XHTML has an element for indicating that the content
enclosed by the element is a paragraph, called
p
.
Some elements have no content. For example, in XHTML you can indicate that you want a horizontal line to appear in the document.
For such elements, that have no content at all, XML introduced a shorthand form, which is ccompletely equivalent to the above form:
XHTML has an element for indicating a horizontal rule,
called hr
. This element does not wrap
content, so it looks like this.
For such elements, that have no content at all, XML introduced a shorthand form, which is ccompletely equivalent to the above form:
If it is not obvious by now, elements can contain other elements. In the book example earlier, the book element contained all the chapter elements, which in turn contained all the paragraph elements, and so on.
em
The grammar will specify the rules detailing which elements can contain other elements, and exactly what they can contain.
People often confuse the terms tags and elements, and use the terms as if they were interchangeable. They are not.
An element is a conceptual part of your document. An element has a defined start and end. The tags mark where the element starts and end.
When this document (or anyone else knowledgeable about
XML) refers to “the p
tag”
they mean the literal text consisting of the three characters
<
, p
, and
>
. But the phrase “the
p
element” refers to the whole
element.
This distinction is very subtle. But keep it in mind.
Elements can have attributes. An attribute has a name and a value, and is used for adding extra information to the element. This might be information that indicates how the content should be rendered, or might be something that uniquely identifies that occurrence of the element, or it might be something else.
An element's attributes are written
inside the start tag for that element, and
take the form
.attribute-name
="attribute-value
"
In XHTML, the
p
element has an attribute called
align
, which suggests an alignment
(justification) for the paragraph to the program displaying the
XHTML.
The align
attribute can take one of four
defined values, left
,
center
, right
and
justify
. If the attribute is not specified
then the default is left
.
Some attributes will only take specific values, such as
left
or justify
. Others
will allow you to enter anything you want.
XML requires you to quote each attribute value with either single or double quotes. It is more habitual to use double quotes but you may use single quotes, as well. Using single quotes is practical if you want to include double quotes in the attribute value.
The information on attributes, elements, and tags is stored
in XML catalogs. The various Documentation Project tools use
these catalog files to validate your work. The tools in
textproc/docproj
include a
variety of XML catalog files. The FreeBSD Documentation
Project includes its own set of catalog files. Your tools need
to know about both sorts of catalog files.
In order to run the examples in this document you will need to install some software on your system and ensure that an environment variable is set correctly.
Download and install
textproc/docproj
from
the FreeBSD ports system. This is a
meta-port that should download and
install all of the programs and supporting files that are
used by the Documentation Project.
Add lines to your shell startup files to set
SGML_CATALOG_FILES
. (If you are not working
on the English version of the documentation, you will want
to substitute the correct directory for your
language.)
Then either log out, and log back in again, or run those commands from the command line to set the variable values.
Create example.xml
, and enter
the following text:
Try to validate this file using an XML parser.
Part of
textproc/docproj
is
the xmllint
validating
parser.
Use xmllint
in the following way to
check that your document is valid:
%
xmllint --valid --noout example.xml
As you will see, xmllint
returns
without displaying any output. This means that your
document validated successfully.
See what happens when required elements are omitted.
Try removing the title
and
/title
tags, and re-run the
validation.
%
xmllint --valid --noout example.xml
example.xml:5: element head: validity error : Element head content does not follow the DTD, expecting ((script | style | meta | link | object | isindex)* , ((title , (script | style | meta | link | object | isindex)* , (base , (script | style | meta | link | object | isindex)*)?) | (base , (script | style | meta | link | object | isindex)* , title , (script | style | meta | link | object | isindex)*))), got ()This line tells you that the validation error comes from
the fifth
line of the
example.xml
file and that the
content of the head
is the part, which
does not follow the rules described by the XHTML grammar.
Below this line xmllint
will show you
the line where the error has been found and will also mark the
exact character position with a ^ sign.
Put the title
element back
in.
This, and other documents, can be downloaded from http://ftp.FreeBSD.org/pub/FreeBSD/doc/
For questions about FreeBSD, read the
documentation before
contacting <questions@FreeBSD.org>.
For questions about this documentation, e-mail <doc@FreeBSD.org>.