Previous chapter Next chapter Table of Contents

© Martin Bryan 1997 from SGML and HTML Explained published by Addison Wesley Longman


Chapter 2
Document Analysis and Information Modelling

An SGML document forms a self-contained unit that can be delivered either electronically or in printed form. There are no size constraints on SGML documents - they can range in size from a one line memo or letter to a multi-volume set such as the Encyclopedia Britannica.

SGML documents consist of a number of inter-related elements. Each element contains data which serves a specific purpose. A particular word can be a subcomponent of more than one element. For instance, a word can be part of a highlighted phrase within a paragraph that forms part of a section in a chapter, and so on.

Each SGML document is associated with a document type definition (DTD) which defines the structure of the document in terms of the elements it can contain and the order in which these elements can occur in. Within the DTD each element in the document is given a name (generic identifier) by which its role can be recognized. When placed within markup delimiters these generic identifiers form the tags used to identify the start and end of each element.

To allow large documents to be generated efficiently, SGML documents can be developed as a sequence of subdocuments, each subdocument forming a document in its own right. Authors can opt to process each subdocument separately, or can create a master document that links the various subdocuments together in the required order.

To use SGML effectively authors should be able to recognize:

Each of these concepts will be explained in this chapter. The chapter is split into the following sections:

2.1 Identifying document types

From the author's point of view the most important thing about a document is that it is a unit that can be passed to another user. How big this unit is will depend on circumstances. For example, if an author has contracted to supply certain chapters of a book to his publisher at specific dates he will probably find it easiest to treat each chapter as an individual document, whereas if the contract calls for the delivery of the completed manuscript at one point the whole book can be considered as the document, individual parts or chapters forming subelements of the main document.

Another consideration in determining what constitutes a document may be the form in which the data is delivered. For example, documents designed to be delivered over the WWW are typically written in smaller units to reduce transmission delays. By linking together a set of related small files it is possible create what is referred to as a hyperdocument. In this case the contents of each file may be thought of as forming a subcomponent of a chapter, which itself is a subelement of a master document that combines the chapters into a coherent set.

Parts of documents that need separate processing should be treated as separate entities. For example, complex tables that require special document type definitions, or features not used elsewhere in the document, can be treated as subdocuments and stored in a separate file. By processing complex tables separately the complexity of the DTD used for the main document can be reduced. Where pre-processed graphics are incorporated into an SGML document by reference to external files, however, it is not necessary to create a separate subdocument for the processed data, as an SGML external entity reference can both identify the location of non-SGML information and control its processing.

In many applications creating a document can be thought of as completing a pre-printed form. For example, a memo will normally be output on a sheet of paper that has been pre-printed with the name of the company and special fields for the entry of the names of the sender and recipient, and possibly the subject and date of the memo. These pre-printed fields do not, as such, constitute part of the document. For pre-printed forms the SGML document just contains the text to be added to the pre-printed sheet.

It is important, when analyzing documents, to separate the role of a piece of data from its format. It is the role of a piece of information that determines how it should be processed. Part of the processing of the element is its formatting but, as you will see, this may be only a small part of the processing required to properly manage the elements that make up a document set.

2.2 The structure of documents

While the structure of a document can vary in complexity from the simple format of a memo or letter to the complex format of a technical manual or a textbook, the concepts used to identify the structure of each document remain the same. One advantage of this is that the elements used to generate a simple memo or letter can also be used within a more complex document such as a textbook.

2.2.1 The structure of a memo

Figure 2.1 shows the structure of a simple memo. Each element of the structure has been ringed and allocated an identifier. In the cases of the first three elements the identifier is the pre-printed name of the field (<FROM>, <DATE> and <TO>). The <SUBJECT> element can be considered as an optional heading to the memo. The text of the memo, in this case, consists of two paragraphs of text, which have been assigned the generic identifier PARA in the marked up document.

Example Memorandum

Figure 2.1 The structure of a memo

In SGML this memo could be marked up as:

   <!DOCTYPE memo PUBLIC "-//The SGML Centre//DTD Memo//EN">
   <MEMO>
   <FROM>Martin Bryan
   <TO>All staff
   <DATE>5th November
   <SUBJECT>Fireworks Reminder
   <PARA>Please remember to ensure that the cats are locked into one of the
         inner rooms before going to tonight's firework party.
   <PARA>The barbecue will start at 6.30pm, and we will start the firework
         display promptly at 7pm.
   </MEMO>

The first line of the coded text contains a document type declaration identifying the document type definition (DTD) required for the document. In this example the definition required is one declared previously for the production of English (EN) language memos for The SGML Centre. (A document type declaration is required at the start of each document. In many cases, however, this declaration will be generated automatically as part of the file conversion/transmission process to avoid the need for manual entry.)

The document type declaration is immediately followed by an SGML start-tag whose name (generic identifier) is identical to the one used for the document type declaration (e.g. <MEMO>). This base document element is the markup instruction that is used to check that the correct DTD has been associated with the document.

The elements following the <MEMO> base document element are fairly self-explanatory. The start of each element is identified by a start-tag consisting of the element's name entered between a pair of delimiters (in this case angle brackets).

Notice how the <DATE> element has not been entered in the sequence that might be expected from its position on the printed memo. This illustrates one of the powers of SGML - it is based on the logical structure of the contents rather than their position or appearance. In many cases this will allow entries to be made in a more convenient sequence, the formatting program determining exactly where each element should be placed on the printed page or display screen.

The end of the coded memo is identified by an end-tag indicating that the memo has been completed. The tag consists of the name of the base document element between a pair of special end-tag delimiters (</ and >). In many simple documents this final end-tag will be the only one identifying the end of an element, all other end-tags being omitted to reduce the amount of embedded coding required. (In general, end-tags may be omitted whenever no ambiguity would occur from their omission.)

2.2.2 The structure of a letter

Letter Structure

Figure 2.2 The structure of a letter

Figure 2.2 shows the structure of a typical letter. This letter could be coded as:

   <!DOCTYPE letter PUBLIC "-//The SGML Centre//DTD Letter//En">
   <LETTER>
   <REF>MTB/290296-4
   <DATE>29th February 1996
   <ADDRESS>Hugh Tucker
   Documenta ApS
   Marievej 7
   DK-2900 Hellerup
   Denmark
   <DEAR>Hugh
   <P>
   Many thanks for placing the latest version of the editor on your 
   FTP server. I had no trouble downloading the program and loading
   it onto my system.
   <P>
   I particularly liked the new feature to prompt users for missing
   elements if they try to save incomplete files. It certainly helps
   to bring home the fact that partial documents are just that - 
   something that needs completing as soon as possible.
   <SIGNED type=grateful>
   <NAME>Martin Bryan
   </LETTER>

As is to be expected, a letter requires more elements to define its structure than a memo, but perhaps surprisingly it does not contain many more markup instructions within the text, despite its greater length. This is made possible by the use of "implied" tags and short references, as will be explained shortly.

The start of the coded letter is similar to that of the memo, with a document type declaration requesting a previously declared set of element and entity definitions being immediately followed by the base document element, in this case <LETTER>, that activates the declared set. (Once again this element is the only one with an explicit end-tag.) The next two elements contain the document's unique reference number and its date. Note that these precede the address in the markup, though they appear to be printed after specific lines of the address.

Each address consists of a number of lines, each of which can have a specific purpose. It is here that some care needs to be taken in clearly identifying the true structure of the document, rather than its apparent one. The apparent structure of an address is that it starts with the recipient's name, possibly followed by a company name. This is then followed by one or more lines of address information, which may contain a postcode. But beware: if the markup is simply being entered to produce a printed letter there is nothing to be gained, and much to be lost, by treating the name, company and postcode as separate elements of the document where this information is not used elsewhere in the document.

In the coded example above the various components of the address have not been identified individually. To reduce the amount of markup needed, the document designer has asked the program to recognize the end of each line of the address as the end of a nested <LINE> element, but this fact is not visible to the typist. By using the power of SGML short references the amount of markup required to code the address has been reduced to a single start-tag preceding the first line.

The rest of the SGML-coded letter is fairly straightforward, with <DEAR> being used as the tag for the form of the recipient's name to be placed after the word "Dear" at the top of the letter and, in this case, <P> being used to identify the start of each paragraph of the text.

As is shown in Figure 2.2, the formatting program may treat two paragraph tags differently, depending on where they occur in the document. Often the first paragraph of a letter will be indented while other paragraphs will be set full out, with a blank line of space identifying the paragraph break. Whereas most current word processing programs would expect the typist to remember to indent the first paragraph manually, such features can be taken care of automatically by SGML's text formatting procedures. All the operator needs to do is to identify the start of the logical element called a paragraph so that the program can apply the relevant house rules to produce the final letter.

The <SIGNED type=grateful> tag in the coded letter illustrates another SGML feature - the use of attributes to qualify a tag. In this case, selection of the appropriate type of signature is all that is needed to tell the formatting program:

For this example the name of the sender has been entered as a separate <NAME> element, embedded within the <SIGNED> element.

2.2.3 The structure of a report

Example of simple report

Figure 2.3 The structure of a report

Figure 2.3 shows the start of a typical report, each element of the structure once again being ringed for clarity. The report could be marked up, in SGML form, as:

   <!DOCTYPE report PUBLIC "-//The SGML Centre/DTD Report//EN">
   <REPORT>
   <TITLE>The Advantages of SGML
   <AUTHOR>Martin Bryan
   <P>SGML provides a standardized technique for marking up
   electronically prepared copy that does not presume any
   typographic knowledge on the part of users. It can be
   applied by almost any author/editor to code any type
   of book.
   <P>SGML identifies<LIST number=alpha>
   <ITEM>the structure of the document
   <ITEM>the characters to be used in the document
   <ITEM>entities that are used more than once
   <ITEM>externally stored information.</LIST>
   <SECTION>
   <TITLE>The Structure of a Document
   <P>The <EMPHASIS>structure</EMPHASIS> of a document can be thought
   of in terms of a series of nested <INDEX>elements</INDEX>, the
   start and end of each element being at some clearly
   definable point in the text.
   <SUBSECTION>
   <TITLE>The Structure of a Memo
   <P>The memo shown in Figure 2.1 of <CITE>SGML and HTML
   Explained</CITE> shows how the structure of a memo can be
   thought of as<LIST>
   <ITEM>a From statement
   <ITEM>one or more To statements
   <ITEM>a Date
   <ITEM>an optional subject
   <ITEM>one or more paragraphs of text.</LIST>
   ...

The first two elements of the report (<TITLE> and <AUTHOR>) have been given fairly self-explanatory identifiers. Note that the <TITLE> start-tag has been used in three places. In each place the immediately preceding tag indicates which element the title identifies.

As in the previous example, the start of each paragraph of text is identified by <P>. This time, though, the format of the first paragraph of text is the same as that of subsequent paragraphs. Within reports a paragraph can contain a number of embedded elements (identifying lists, highlighted phrases, index entries, etc.), making the overall structure of the document slightly more complex than was the case for the earlier memo and letter examples.

A level of structure that occurs within a report, but which does not normally occur in a memo or letter, is that of sections (divisions) of text. Such sections are normally readily identifiable because they have headings explaining their purpose. More than one level of such heading may apply, as illustrated in Figure 2.3. The number of different types of headings, and the use to which they are put, varies according to the purpose of the report.

Another point to note about the SGML coding used for the text in Figure 2.3 is the fact that the four elements of the document that are printed in italics have each been coded differently. The advantages of separate coding for the heading, which could end up being set in a different typeface or size, are fairly clear, but the reason for coding the other three italicized phrases differently may not be immediately obvious.

If you look at the wording of text you can see that the role of these italicized words differ. The first italicized word is simply an emphasized phrase, which in this case happens to have been printed in italic. (The graphic designer could equally well have asked for such words to appear in small caps, or another face.) The second italicized word appears at first sight to have exactly the same function as its predecessor, but there is one important difference - this word also has to be included in the index. To indicate this, the word has been flagged by tags identifying the start and end of an index term (<INDEX> and </INDEX>) rather than those identifying a particular style of emphasized phrase (<EMPHASIS> and </EMPHASIS>).

The last item of embedded italicized text identifies a book that has been cited in the text. As the publisher may request that such citations be expanded to form a bibliography (which may result in their being replaced by cross-references to the bibliography) yet another pair of tags has been used to identify the italicized phrase as a citation (<CITE> and </CITE>).

It should be noticed that end-tags have been entered for each of the embedded elements. End-tags are compulsory wherever the end of an element occurs at a point which is not immediately followed by the start-tag of another element or the end-tag of the parent element.

Another form of embedded element shown in Figure 2.3 is a list. The start of the list is identified by the <LIST> start-tag, which also marks the point where the program is to output the colon required by the publisher's house style to indicate the start of a list. As with the other embedded items, an end-tag is used to identify the end of a list. Actually, in this example the </LIST> end-tag isn't really necessary because end-tags are only compulsory for lists that occur in the middle of other elements, i.e. where the list is immediately followed by more of the current paragraph's text, or where one list is embedded within another.

Each list is made up of a number of list items, in this case identified by the tag <ITEM>. Each item can, if required, be numbered or otherwise identified on output. In Figure 2.3 the items in the first list have been identified by individual letters, while those in the second list are simply preceded by dashes. This is achieved by the addition of a number=alpha attribute to the first <LIST tag to specify that items in the list are to be "numbered" alphabetically, whereas the second <LIST> tag contains no attributes, invoking the default style for the item identifier at the start of its embedded items. (The styles applied depend on how the text formatting program has been set up. There are no pre-defined styles associated with SGML.)

If you look carefully at the coded text you will notice that the letters used to identify each item have not been specified by the author - they have been added by the program as the text has been formatted for output. Besides reducing the amount of keying required, this also means that a new item can be added to the list without having to renumber all the entries.

Where items in a list are referred to elsewhere in the document, renumbering of individual items can prove to be a trap for the unwary. Fortunately SGML has provided a neat solution to this problem. Any element can be assigned an attribute that provides a unique identifier to specific occurrences of that element. If a list item is given such an attribute, reference can be made to the item by entry of a tag containing a special cross-reference attribute at the appropriate point in the text. For example, if the tag for the fourth item in the first list shown in Figure 2.3 is entered as <ITEM id="external"> it could be cross-referred to within the coded text as:

  ... as shown in <ITEM-REF refid="external">, the ...

An advanced text formatting program could check the page and number of the item being referred to, which it could then output in the form:

  ... as shown in item d) on page 24, the ...

Whenever the list is extended, or positioned on another page, the program will automatically adjust the reference when it reaches the cross-reference point.

2.2.4 The structure of scientific papers

Structure of scientific paper

Figure 2.4 The structure of a simple scientific paper

Figure 2.4 illustrates the structure of a simple scientific paper. As with our report, the basic text consists of a title element, an author element, two introductory paragraphs of text, and two levels of heading, each with associated text paragraphs. The fact that the title and author elements extend across both columns of the formatted page does not affect the way in which the text is marked up in SGML. As far as the author is concerned the main structure of this text is exactly the same as for the previous report example.

Of course, a few additional elements are required for scientific papers, such as those used to reserve space for figures and their captions. Because the position of elements such as figures, tables and footnotes can vary according to the page layout, and the position at which they are referenced within the text, such elements are sometimes referred to as floating elements. To simplify the entry of floating elements, SGML allows them to be defined at the point at which they are initially referred to, their final position being determined by the formatting process on output.

One major difference between this scientific paper and the previous report is in the treatment of citations. Though three different citations are made in the first paragraph, none of the details are shown there because, in common with most scientific papers, the citations have been moved to a separate reference bibliography at the end of the paper, where they have been numbered in the sequence they are referred to in the paper. Within the SGML coded text, however, the citation details are still entered at the point of reference, even though they are formatted in a different way from the citations in the report shown in Figure 2.3. Typically the citations for Figure 2.4 would be entered as:

   Recent studies<CITE>Simonian, KO
   <TITLE>Open University Thesis</TITLE> (unpub.
   1975).</CITE><CITE>Bear, LM <TITLE>Geol. Survey Dept.
   Cyprus Mem. 3</TITLE>, 180pp (1960).</CITE><CITE>Panayiotou,
   A <TITLE>Geol. Survey Dept. Cyprus Thesis</TITLE> (unpublished
   thesis, 1977)</CITE> of the ...

Notice that the entered citations have not been given numbers, and no commas or spaces have been entered between them: this is all handled by the text formatter, as is the positioning of the references at the end of the main text.

Another feature of scientific papers that is illustrated by Figure 2.4 is the use of mathematical signs such as the superior 2 used to produce the 90km² sequence at the end of the second paragraph. As far as physical appearance is concerned there is no difference between this and the 2 used in the citation at the start of the previous paragraph, but both a reader and an SGML program recognize this symbol as a character with special significance. Such special characters will be entered either by using special entity references of the form &sup2; or by using a special element to identify the text to be treated in a different way, e.g. <SUP>2</SUP>.

At the end of the second paragraph there is a reference to the identifier assigned to the map through its associated caption. Note that this illustration has not been given a figure number: it has been assigned a map number instead. In most SGML-based systems figure numbering is handled automatically by the program. To be able to assign a map number rather than a figure number the system must be able to identify which figures contain maps. Typically this would be done using markup of the following form:

   ...shown in <FIG-REF idref=cyprus>.
   <FIGURE id=cyprus type=map>
   <ARTWORK file=cyprus.map notation=GIF>
   <CAPTION>Site of Limassol Forest Plutonic Complex
   </FIGURE>

Note that neither the reference to the figure, in the <FIG-REF> element, nor the caption indicate the map number. The map number is obtained by counting how many preceding <FIGURE> elements there are whose type is map. The figure reference is a pointer to the unique identifier, id, of the <FIGURE> element that acts as a container for both the artwork and the associated caption.

Notice that the caption has been specifed after the artwork, though on the formatted page it appears above the map. This is another example of positioning being determined by the text formatting process. For example, there could be a house style rule that figures floated to the top of the page are to have captions at the foot whilst those placed at the foot of a column have their captions above the artwork. SGML only specifies the logical relationships between the artwork and the caption, not its physical relationship.

2.2.5 The structure of books

Structure of simple textbook

Figure 2.5 The structure of a textbook

Figure 2.5 shows how the divisional structure of a simple textbook can be illustrated in the form of a tree diagram. Six basic text divisions have been specified: parts, chapters, sections, subsections, sub-subsections and low-level divisions. Each division must be given a title, which can optionally be followed by some text before the next lower division of text is encountered.

Main textual elements in a textbook

Figure 2.6 Text elements within a textbook

Figure 2.6 shows the structure of a typical set of text elements within a textbook. Within each text section authors can identify:

The last of these options, when combined with the list of special elements, allows quotations and poems to be placed at the start of the text without a preceding <P> paragraph element. For example, a chapter might start:

   <CHAPTER><TITLE id="three">Cold War: Provocation and Prevarication
   <QUOTE>
   <LINE><EMPHASIS>Words to the heat of deeds too cold breath gives</EMPHASIS>
   <SOURCE align=right>Macbeth II i 58</QUOTE>
   <P>The discovery of the Spaniards ...

If quotations are not allowed at the main level a <P> would be needed in front of the <QUOTE> tag identifying the long quotation, though this may not be obvious to the author. (It is for this reason that definition of the structure of complicated documents will normally be left to specially trained document designers, whose job it is to design the element structure to fit the author's perceptions of the format of the text as exactly as possible.)

Figure 2.6 is not exhaustive, as the subelements associated with the lower levels of elements could not be added without clouding the overall structure. Many elements have a similar substructure, consisting of text, emphasized phrases, embedded quotes, cross-references, embedded lists, etc. Fortunately SGML allows users to define a commonly used group of elements as a referencable parameter entity, whose name can then be used in a number of places within other markup declarations as a shorthand identifier for shared elements.

It can be seen from the above analyses of the structure of documents that different structures are required for different types of documents, but that in many cases the basic textual elements will be shared by most types of document. Authors will not, however, normally be expected to analyse documents: this will be done by specially trained information analysts. An author will be provided with a previously defined document structure, which he may need to modify slightly to meet special requirements. The techniques used to interpret and modify existing document type definitions are described in this book.

2.3 Formatting structured documents

Before going into detail about the use of SGML, a word needs to be said about how the structure of documents is related to the appearance (format) of the final document.

A formatting program needs to know the format required for each element defined in the document type definition. Where an element can be used in more than one context a different set of formatting rules may need to be applied in each context. The techniques employed for this will depend on the type of text formatter being used, and the level to which it has been integrated with SGML.

At the simplest level, formatting will simply involve a replacement of each SGML tag with a set of markup codes relevant to the output device being used. In such cases a replacement string will need to be defined for each tagged element, and for each attribute, or combination of attributes, that can be used to qualify the element.

At a slightly higher level, the formatting program may be given the names of one or more procedures (computer program macros) to be carried out when an element or attribute is encountered. These procedures may range in complexity from ones that position the element at the required point to procedures for extracting and numbering index or content list entries.

A fully featured SGML system will be able to use specifications coded using the Document Style Semantics and Specification Language (DSSSL) defined in ISO/IEC standard 10179 to control formatting. As this standard was only published in April 1996, however, there are currently no fully-featured DSSSL formatters, though first generation tools that process a subset of the DSSSL specifications are available. Alternative strategies include the adoption of the rules defined for the US Department of Defense's CALS Formatting Output Specification Instance (FOSI) for the interchange of formatting information.

SGML allows special, application specific, processing instructions to be added to the text to control the output format in those cases where the formatter cannot itself determine the correct formatting rules. This facility should, however, be used with extreme caution as it makes it difficult to move documents between platforms that use different text formatting engines.

2.4 The difference between formatting and structure

Most word processors define the way in which text should be formatted using a mixture of explicit formatting instructions and named styles. A style name identifies a set of formatting instructions that have been predefined using one or more style sheets.

As word processors do not use a formal model to describe documents they cannot use the context in which an element occurs to control its formatting. Word processor users have, therefore, to create different style sheets for each change of presentation. If, for example, paragraphs within an annex are presented in a smaller size than those within the main body of the text, a differently named style would need to be defined for the two types of paragraph.

The lack of a formal model also makes it impossible to confirm that headings are used in the correct sequence on word processors. While word processors often define headings in terms of numbered levels there are no checks that, for example, a level 3 heading occurs before each level 4 heading, and after a level 2 heading.

To make it easier to import text from word processors many first-time users of SGML try to mimic the facilities of word processors in their initial DTDs. The most commonly cited example of this is provided by the HyperText Markup Language (HTML), which is described in detail in the latter part of this book.

The first version of HTML was not an application of SGML - it was simply a way of describing the functions of a simple word processor in the form of interchangeable, ASCII-encoded, text. It contained markup tags to identify the start and end of six types of heading (<H1> to <H6>), the end of paragraphs (<P>) and the start and end of italic, bold or fixed width, typewriter-style, text (<I>, <B> and <TT> respectively).

The concept of using SGML to formally describe HTML was introduced with version 2.0 of HTML. By this time it had been recognized that it was better practice to identify the role of elements rather than the way in which they were formatted. To make this possible elements such as emphasis (<EM>), <STRONG> and <CODE> had been introduced to replace the original, format-specific, definitions of italic, bold and typewriter.

Version 3.2 of the HTML DTD saw the introduction of other logically based elements. For example, a new division (<DIV>) element has been introduced so that, at long last, users can indicate which paragraphs and lists belong with which headings. In addition the latest version of HTML re-introduced a number of element control attributes, such as those used to define the way in which lists should be numbered and presented to users.

The development cycle of HTML is typical of the development cycle of many SGML-based systems, and indicates many of the pitfalls that normally occur through the failure to undertake full information analysis at an early stage in the development process. The typical development cycle for the formalization of documentation within a user community can be summarized as follows:

  1. A set of style sheets is developed to ensure that users follow an agreed house style.
  2. Style sheet use is enforced to stop users from adding formatting instructions to override the defaults set by the house style.
  3. Users complain that style sheet information is lost during transformation from one word processor to another (because most word processors interpret the style sheet instructions immediately, and only export the interpreted file).
  4. SGML is adopted to transfer information on style use between word processors - the original element set is based on a set of styles defined for word processors.
  5. Additional SGML elements are defined to allow logical grouping of elements so that partial documents can be interchanged.
  6. Realization that information management should be key to identifying which role elements play leads to a full analysis of information processing requirements.

2.5 Information management

Information management is the key to good SGML practice. Information management involves both the identification of the role of each information element and the management of the relationship between elements.

For example, in a letter there is a relationship between the name used as the content of the <DEAR> element and the address that heads the letter. The first line of the address is normally a formalized version of the formal or informal name used to address the recipient of the letter at its start. Information management involves the recording of the relationships between these two information elements.

Another aspect of information management is ensuring the validity of data. For example, the logical role and structural position of an element such as <PART-NO> is not sufficient to validate a part number. To validate the entry the contents of the element must be checked against a parts database. If the part number is associated with a <PARTNAME> element the validity of the entered name must also be checked through the part number reference where the name has not been generated automatically in response to the part number.

The relationships between elements form a vital part of information management. When creating a DTD it is important that information relationships be properly recorded. Typically this will be done by ensuring that related elements are placed within a container element that clearly shows their relationship, or be assigning unique identifiers to key elements and ensuring that related elements make a formal reference to the relevant identifiers.

Containerization of related elements is one of the keys to the efficient storage and retrieval of parts of SGML documents. When a set of related elements share a common parent that element can be used to quickly identify and reposition all of its subelements. Most of the advanced uses to which SGML is put are based on this ability to create information containers. Many of the advances introduced to the SGML community in recent years by SGML-based standards, such as the Hypermedia/Time-based Structuring Language (HyTime) defined in ISO/IEC standard 10744, are based on SGML's ability to identify sets of related elements and define the relationships between these information elements.

References

International Organization for Standardization/International Electrotechnical Commission (1996), Information technology - Text and office systems - Document Style Semantics and Specification Language (DSSSL) (ISO/IEC 10179:1996) Geneva: ISO.

International Organization for Standardization/International Electrotechnical Commission (1992), Information technology - Hypermedia/Time-based Structuring Language (HyTime) (ISO/IEC 10744:1992) Geneva: ISO.