All the DTDs written in SGML share certain characteristics. This is hardly surprising, as the philosophy behind SGML will inevitably show through. One of the most obvious manifestations of this philosophy is that of content and elements.
Your documentation (whether it is a single web page, or a lengthy book) is considered to consist of content. This content is then divided (and further subdivided) into elements. The purpose of adding markup is to name and identify the boundaries of these elements for further processing.
For example, consider a typical book. At the very top level, the book is itself an element. This “book” element obviously contains chapters, which can be considered to be elements in their own right. Each chapter will contain more elements, such as paragraphs, quotations, and footnotes. Each paragraph might contain further elements, identifying content that was direct speech, or the name of a character in the story.
You might like to think of this as “chunking” content. At the very top level you have one chunk, the book. Look a little deeper, and you have more chunks, the individual chapters. These are chunked further into paragraphs, footnotes, character names, and so on.
Notice how you can make this differentiation between different elements of the content without resorting to any SGML terms. It really is surprisingly straightforward. You could do this with a highlighter pen and a printout of the book, using different colors to indicate different chunks of content.
Of course, we do not have an electronic highlighter pen, so we need some other way of indicating which element each piece of content belongs to. In languages written in SGML (HTML, DocBook, et al) this is done by means of tags.
A tag is used to identify where a particular element starts, and where the element ends. The tag is not part of the element itself. Because each DTD was normally written to mark up specific types of information, each one will recognize different elements, and will therefore have different names for the tags.
For an element called element-name
the
start tag will normally look like
<element-name>
. The
corresponding closing tag for this element is
</element-name>
.
HTML has an element for indicating that the content enclosed by
the element is a paragraph, called p
. This
element has both start and end tags.
<p>This is a paragraph. It starts with the start tag for the 'p' element, and it will end with the end tag for the 'p' element.</p> <p>This is another paragraph. But this one is much shorter.</p>
Not all elements require an end tag. Some elements have no content. For example, in HTML you can indicate that you want a horizontal line to appear in the document. Obviously, this line has no content, so just the start tag is required for this element.
HTML has an element for indicating a horizontal rule, called
hr
. This element does not wrap content, so only
has a start tag.
<p>This is a paragraph.</p> <hr> <p>This is another paragraph. A horizontal rule separates this from the previous paragraph.</p>
If it is not obvious by now, elements can contain other elements. In the book example earlier, the book element contained all the chapter elements, which in turn contained all the paragraph elements, and so on.
em
<p>This is a simple <em>paragraph</em> where some of the <em>words</em> have been <em>emphasized</em>.</p>
The DTD will specify the rules detailing which elements can contain other elements, and exactly what they can contain.
People often confuse the terms tags and elements, and use the terms as if they were interchangeable. They are not.
An element is a conceptual part of your document. An element has a defined start and end. The tags mark where the element starts and end.
When this document (or anyone else knowledgeable about SGML) refers
to “the <p> tag” they mean the literal text
consisting of the three characters <
,
p
, and >
. But the phrase
“the <p> element” refers to the whole
element.
This distinction is very subtle. But keep it in mind.
Elements can have attributes. An attribute has a name and a value, and is used for adding extra information to the element. This might be information that indicates how the content should be rendered, or might be something that uniquely identifies that occurrence of the element, or it might be something else.
An element's attributes are written inside the
start tag for that element, and take the form
attribute-name="attribute-value"
.
In sufficiently recent versions of HTML, the p
element has an attribute called align
, which suggests
an alignment (justification) for the paragraph to the program displaying
the HTML.
The align
attribute can take one of four defined
values, left
, center
,
right
and justify
. If the
attribute is not specified then the default is
left
.
<p align="left">The inclusion of the align attribute on this paragraph was superfluous, since the default is left.</p> <p align="center">This may appear in the center.</p>
Some attributes will only take specific values, such as
left
or justify
. Others will
allow you to enter anything you want. If you need to include quotes
("
) within an attribute then use single quotes around
the attribute value.
Sometimes you do not need to use quotes around attribute values at all. However, the rules for doing this are subtle, and it is far simpler just to always quote your attribute values.
The information on attributes, elements, and tags is stored in SGML catalogs. The various Documentation Project tools use these catalog files to validate your work. The tools in textproc/docproj include a variety of SGML catalog files. The FreeBSD Documentation Project includes its own set of catalog files. Your tools need to know about both sorts of catalog files.
In order to run the examples in this document you will need to install some software on your system and ensure that an environment variable is set correctly.
Download and install textproc/docproj from the FreeBSD ports system. This is a meta-port that should download and install all of the programs and supporting files that are used by the Documentation Project.
Add lines to your shell startup files to set
SGML_CATALOG_FILES
. (If you are not working
on the English version of the documentation, you will want
to substitute the correct directory for your
language.)
SGML_ROOT=/usr/local/share/xml SGML_CATALOG_FILES=${SGML_ROOT}/jade/catalog SGML_CATALOG_FILES=${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES SGML_CATALOG_FILES=${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES SGML_CATALOG_FILES=${SGML_ROOT}/docbook/4.1/catalog:$SGML_CATALOG_FILES SGML_CATALOG_FILES=/usr/doc/share/xml/catalog:$SGML_CATALOG_FILES SGML_CATALOG_FILES=/usr/doc/en_US.ISO8859-1/share/xml/catalog:$SGML_CATALOG_FILES export SGML_CATALOG_FILES
setenv SGML_ROOT /usr/local/share/xml setenv SGML_CATALOG_FILES ${SGML_ROOT}/jade/catalog setenv SGML_CATALOG_FILES ${SGML_ROOT}/iso8879/catalog:$SGML_CATALOG_FILES setenv SGML_CATALOG_FILES ${SGML_ROOT}/html/catalog:$SGML_CATALOG_FILES setenv SGML_CATALOG_FILES ${SGML_ROOT}/docbook/4.1/catalog:$SGML_CATALOG_FILES setenv SGML_CATALOG_FILES /usr/doc/share/xml/catalog:$SGML_CATALOG_FILES setenv SGML_CATALOG_FILES /usr/doc/en_US.ISO8859-1/share/xml/catalog:$SGML_CATALOG_FILES
Then either log out, and log back in again, or run those commands from the command line to set the variable values.
Create example.xml
, and enter the
following text:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> <title>An example HTML file</title> </head> <body> <p>This is a paragraph containing some text.</p> <p>This paragraph contains some more text.</p> <p align="right">This paragraph might be right-justified.</p> </body> </html>
Try to validate this file using an SGML parser.
Part of textproc/docproj is the
nsgmls
validating
parser. Normally, nsgmls
reads in a document
marked up according to an SGML DTD and returns a copy of the
document's Element Structure Information Set (ESIS, but that is
not important right now).
However, when nsgmls
is given the -s
parameter, nsgmls
will suppress its normal output, and
just print error messages. This makes it a useful way to check to
see if your document is valid or not.
Use nsgmls
to check that your document is
valid:
%
nsgmls -s example.xml
As you will see, nsgmls
returns without displaying any
output. This means that your document validated
successfully.
See what happens when required elements are omitted. Try
removing the title
and
/title
tags, and re-run the validation.
%
nsgmls -s example.xml
nsgmls:example.xml:5:4:E: character data is not allowed here nsgmls:example.xml:6:8:E: end tag for "HEAD" which is not finished
The error output from nsgmls
is organized into
colon-separated groups, or columns.
Column | Meaning |
---|---|
1 | The name of the program generating the error. This
will always be nsgmls . |
2 | The name of the file that contains the error. |
3 | Line number where the error appears. |
4 | Column number where the error appears. |
5 | A one letter code indicating the nature of the
message. I indicates an informational
message, W is for warnings, and
E is for errors[a], and X is for
cross-references. As you can see, these messages are
errors. |
6 | The text of the error message. |
[a] It is not always the fifth column either.
|
Simply omitting the title
tags has
generated 2 different errors.
The first error indicates that content (in this case,
characters, rather than the start tag for an element) has occurred
where the SGML parser was expecting something else. In this case,
the parser was expecting to see one of the start tags for elements
that are valid inside head
(such as
title
).
The second error is because head
elements
must contain a title
element. Because it does not nsgmls
considers that the
element has not been properly finished. However, the closing tag
indicates that the element has been closed before it has been
finished.
Put the title
element back in.
本文及其他文件,可由此下載: ftp://ftp.FreeBSD.org/pub/FreeBSD/doc/。
若有 FreeBSD 方面疑問,請先閱讀
FreeBSD 相關文件,如不能解決的話,再洽詢
<questions@FreeBSD.org>。
關於本文件的問題,請洽詢
<doc@FreeBSD.org>。