Previous chapter Next chapter Table of Contents

© Martin Bryan 1997 from SGML and HTML Explained published by Addison Wesley Longman


Chapter 10
Multiple Document Structures (SUBDOC, CONCUR and LINK)

This chapter explains some of the optional SGML features that are provided by a few advanced SGML tools. It is split into the following sections:

10.1 Types of multiple document structures

More than one document structure may be required to cope with the varying roles of the data stored in an SGML document. Sometimes the different document structures are used individually; at other times concurrent multiple roles can be allocated to a single piece of text. Three different techniques for relating document structures are provided in SGML:

  1. Embedding files coded using an alternative document structure (SUBDOCuments).
  2. Marking up documents using multiple CONCURrent DTDs.
  3. Creating automatically processable LINKs between the structures declared in different DTDs.

Where a document is made up from a number of previously prepared subsections, each of which may have their own document structure, the individual subsections can be created as externally stored subdocuments of the main document. Subdocuments must be declared as external entities in the document type declaration used by the calling document. At the appropriate point in the text an entity reference is used to call the previously declared subdocument into the main document.

Concurrent document structures can be used, for example, to distinguish between the different purposes to which data may be put (e.g. for book production, CD-ROM delivery or on-line database retrieval), or to identify structures generated during the processing of a document (e.g. the layout structure of a formatted book). Each concurrent structure used within the document must be declared in the document's prolog using a separate document type declaration.

Note: The concept of concurrent structures is particulary important to the Association for Computing in the Humanities' Text Encoding Initiative (TEI), where it is used for recording the way in which particular editions of a work have been paginated or edited.

Where it is possible to automatically create one structure from another the parser can be instructed to automatically create the alternative structure using SGML's explicit link (EXPLICIT) feature. Simpler controls on processing are provided by the simple link (SIMPLE) option to control the processing of the whole document, and the implicit link (IMPLICIT) option to control the processing of individual elements.

Note: Links can only be associated with a base document type - never with concurrent document types. Explicit links cannot be used at the same time as concurrent markup structures because concurrent document types cannot be activated at the same time as link types are active.

10.2 SGML subdocuments

SGML subdocuments are self-contained, externally stored, entities which consist of a document type declaration followed by text marked up using the entities, elements and attributes defined in the local declaration.

SGML subdocuments are particularly useful where a document contains special sections of data that are, typically, produced from different sources. If, for example, complex tables are to be generated from a spreadsheet package, the DTD used to validate tables produced by the package can be transmitted with the table when it is imported into another document, avoiding the need to ensure that the table structure in the receiving document matches that of all the programs supplying it with information.

Before preparing text for use as a subdocument of another document it is important to ensure that the same SGML declaration will be used for each subdocument in the overall document. It is the SGML declaration of the main document that applies to an SGML subdocument as the subdocument may not contain its own SGML declaration. If local SGML declaration has been added to the subdocument while it is being prepared it will need to be removed before the file can be used as a subdocument within another document.

Web SGML Adaptations Extension
When the Web SGML adaptations provided by Annex K to ISO 8879 are available each subdocument can have its own SGML declaration. If the SGML declaration is omitted, the subdocument uses the SGML declaration applicable to the entity from which the subdocument entity is referenced.

If subdocuments are to be used the SUBDOC NO entry in the OTHER section of the FEATURES clause of the SGML declaration must be changed to SUBDOC YES n, where n indicates the maximum number of subdocuments that will be open at any point in the document.

Once a subdocument file has been prepared it must be declared as an external entity before it can be called. If the subdocument is stored locally it can be declared as a system-specific entity by entry of an entity declaration such as:

   <!ENTITY paper1 SYSTEM "c:\pub\captured.txt" SUBDOC>

If the system can automatically recognize the name of the file from the entity's name, the optional system identifier (e.g. "c:\pub\captured.txt") can be omitted to give an entity declaration such as:

   <!ENTITY captured SYSTEM SUBDOC>

Where the subdocument's contents are already known to all systems likely to receive the document, the entity can be publicly declared by entry of a declaration such as:

   <!ENTITY copyrite PUBLIC "-//OPOCE//DOCUMENT Copyright notice//EN" SUBDOC >

It should be noted that, even though they contain markup declarations, SGML subdocuments are defined as general entities rather than parameter entities. This is because the entity reference for the subdocument must occur at the appropriate point within the text of the document instance, rather than within the document prolog.

A previously defined subdocument should be called by entering an entity reference, such as &copyrite;, at the appropriate point in the text. Before requesting a subdocument from the system, an SGML parser will record the current state of the processor for recall after the subdocument has been processed.

A typical document instance referencing stored subdocuments might look like this:

   <!DOCTYPE SUP PUBLIC "-//OPOCE//DTD OJ Supplement//EN" [
    <!ENTITY rec94372 SYSTEM "rec94372.cat" SUBDOC>
    <!ENTITY txt94372 SYSTEM "al94-372.enc" SUBDOC>
   ]>
   <SUP><RECORD ID="FXAL94372ENC">&rec94372;&txt94372;</sup>

Within a subdocument only locally defined markup declarations apply. This means that you cannot cross-refer to an identifier declared in the main document from within a subdocument, or vice versa.

Note: ISO/IEC 10744, the Hypermedia/Time-based Structuring Language (HyTime), shows how SGML can be extended to allow references in one document to reference identifiers in another document, which could be an embedded subdocument.

10.3 Concurrent document structures

Where two or more "views" of a document's contents can exist concurrently, more than one document type declaration can be specified at the start of a document. For example, if data is to be stored in a controlled document database and also displayed on the World Wide Web, it may be necessary to indicate two different roles for a piece of text in its markup, as the following (somewhat simplified) example shows:

    <(TEI.2)TEI.2><(HTML)HTML>
    <!--TEI header elements omitted here for simplicity-->   
    <(TEI.2)BIBL><(TEI.2)MONOGR><(HTML)HEAD>
    <(TEI.2)AUTHOR><(HTML)TITLE>Shirley, James</(HTML)TITLE></(TEI.2)AUTHOR>
    </(HTML)HEAD><(HTML)BODY>
    <(TEI.2)TITLE type=main><(HTML)H2>
    The Gentlemen of Venice
    </(HTML)H2></(TEI.2)TITLE>
    <(TEI.2)TITLE type=subordinate><(HTML)H3>
    A tragi-comedie presented at the private house in Salisbury Court
    by Her Majesties servants
    </(HTML)H3)</(TEI.2)TITLE>
    <(TEI.2)IMPRINT><(HTML)ADDRESS>
    <(TEI.2)PUBLISHER>H. Moseley</(TEI.2)PUBLISHER><(HTML)BR>
    <(TEI.2)PUBPLACE>London</(TEI.2)PUBPLACE></(HTML)ADDRESS>
    <(TEI.2)DATE><(HTML)P><(HTML)STRONG>
    1655
    </(HTML)STRONG></(HTML)P)</(TEI.2)DATE></(TEI.2)IMPRINT>
    <(TEI.2)EXTENT><(HTML)P>78pp</(HTML)P></(TEI.2)EXTENT>
    </(TEI.2)MONGR></(TEI.2)BIBL></(HTML)BODY>
    </(TEI.2)TEI.2></(HTML)HTML>

Notice how the name of the elements have been qualified by a bracketed document type specification to indicate which of the DTDs declared in the document prolog they are associated with. In this case two well known industry standard DTDs have been used, that of the Text Encoding Initiative (TEI) and that of the HyperText Markup Language (HTML) used on World Wide Web.

Note: For simplicity sake I have omitted the TEI header elements that should precede the start of the bibliographic entry (<BIBL>) information. The reasons for this omission will be explained shortly.

It is important to note that there is not, in this example, a one-to-one correspondence between the occurrences of elements within the two DTDs. For example, within the address element of HTML there is no equivalent of a line element. Instead the HTML linebreak (<BR>) empty element is associated with the end-tag of the TEI publisher element.

More than one element may sometimes need to be used in one structure to obtain the correct result in another structure. For example, to print the TEI date in a bolder typeface and on a separate line within HTML it is necessary to associate the <(TEI)DATE> element with two HTML elements, <(HTML)P> and <(HTML)STRONG>.

Normally such concurrent document structures will not be entered directly by authors but will be created through automatic processes. The concurrent structures produced by these processes will be used to tell the system how it should present data to different processes without having to maintain a separate copy of the file for each process.

Concurrent document structures can also be used to record intermediate stages in a process, or the final state of a file when processing has been completed. For example, in a typical publishing application a document will pass through a number of production stages to produce galley proofs, paginated text, imposed sheets, etc. Traditionally each of these stages has resulted in an output file which is coded to suit the use to which it will be put. The problem with this is that, if changes are required to the text, more than one version of the file may have to be updated. To avoid having to create different files at each stage in the production process SGML allows the details required for each structure to be stored in the same file. SGML marked sections can then be used to identify text that is specific to a particular version of the file, other structures defining the relevant parameter entities as IGNOREd text.

When the FEATURES clause of the SGML declaration has been altered to contain a CONCUR YES n entry, where n indicates the maximum number of document structures that can be used concurrently, DTDs can be declared in the document's prolog in addition to that of the base document type (which is always the first DTD declared in the prolog). Each document structures must be declared by entry of a document type declaration.

Typically a document with two concurrent document structures might start:

   <!SGML    "ISO 8879:1986"
     BASESET "ISO 646:1983//CHARSET
              International Reference Version (IRV)//ESC 2/5 4/0"
     .
     .
     .
     FEATURES MINIMIZE DATATAG NO   OMITTAG  YES   RANK     NO SHORTTAG YES
              LINK     SIMPLE  NO   IMPLICIT  NO   EXPLICIT NO
              OTHER    CONCUR YES 2 SUBDOC   YES 2 FORMAL   NO
     APPINFO NONE
   >
   <!DOCTYPE TEI.2 SYSTEM "tei2.dtd"
    [<!ENTITY % TEIonly  "INCLUDE">
     <!ENTITY % HTMLonly "IGNORE" >]>
   <!DOCTYPE  HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"
    [<!ENTITY % TEIonly  "IGNORE" >
     <!ENTITY % HTMLonly "INCLUDE">]>

The first element entered after the above declarations is that for the TEI P3 DTD. As the locally stored file identified in this declaration is the first DTD in the prolog it is the base document type for the document, which is automatically taken by the SGML parser as the DTD name to be associated with any tag that does not have one specified.

When the document is processed to create the equivalent HTML file the second structure, which is aimed at presentation of the information on a screen, can be added to the TEI logical structure used to capture and store the data.

The above example also illustrates how parameter entity definitions can be added to the document type declaration subset to allow users and applications to identify data that is specific to a particular structure. The definitions given above could be used to extend the earlier example as follows:

    <(TEI.2)TEI.2><(HTML)HTML>
    <![ %(HTML)TEIonly; [
     <!--TEI header elements-->
     <!(TEI.2)TEIHEADER> ... </(TEI.2)TEIHEADER>
    ]]>   
    <(TEI.2)BIBL><(TEI.2)MONOGR>
    <![ %(TEI.2)HTMLonly; [
     <(HTML)HEAD><(HTML)TITLE>TEI Monograph Catalogue</(HTML)TITLE>
     <(HTML)BASE href="http://www.u-net.com/~sgml/TEI">
     <(HTML)LINK rel=translation title="Fran&ccedil;ais" href="Bibl-FR.htm">
     </(HTML)HEAD>
    ]]>
    <(HTML>BODY>
    <(TEI.2)AUTHOR><(HTML)H1>Shirley, James</(HTML)H1></(TEI.2)AUTHOR>
    <(TEI.2)TITLE type=main><(HTML)H2>
    The Gentlemen of Venice
    </(HTML)H2></(TEI.2)TITLE>
    <(TEI.2)TITLE type=subordinate><(HTML)H3>
    A tragi-comedie presented at the private house in Salisbury Court
    by Her Majesties servants
    </(HTML)H3)</(TEI.2)TITLE>
    <(TEI.2)IMPRINT><(HTML)ADDRESS>
    <(TEI.2)PUBLISHER>H. Moseley</(TEI.2)PUBLISHER><(HTML)BR>
    <(TEI.2)PUBPLACE>London</(TEI.2)PUBPLACE></(HTML)ADDRESS>
    <(TEI.2)DATE><(HTML)P><(HTML)STRONG>
    1655
    </(HTML)STRONG></(HTML)P)</(TEI.2)DATE>
    </(TEI.2)IMPRINT><(TEI.2)EXTENT><(HTML)P>78pp</(HTML)P></(TEI.2)EXTENT>
    </(TEI.2)MONOGR></(TEI.2)BIBL>
    <![ %(TEI.2)HTMLonly; [
     <(HTML)HR>
     <(HTML)P>Webmaster: 
     <(HTML)A href="mailto:webmaster@our-site.com">
     webmaster@our-site.com</(HTML)a>
     </(HTML)P>
    ]]>
    </(HTML)BODY>
    </(TEI.2)TEI.2></(HTML)HTML>

The indented material belongs to only one of the structures. The marked section delimiters will ensure that any enclosed text will not be reproduced as part of the structure it does not belong to.

Marked sections provide a useful method of overcoming one of the major problems with using concurrent structures. Where data occurs in the document instance it must be valid in all structures at the point entered. To understand the types of problems that can occur because of this rule you should note the difference between the example given above and the earlier example which did not use marked sections. In the initial example the TEI <AUTHOR> element is associated with the HTML <TITLE> element, which forms part of the header of the HTML document, rather than part of the main body text. Once marked sections are used it becomes possible to provide a separate HTML title (which is only used as a window title) and move the author details into the text body.

Note: If the latter version of the <TITLE> element had been used in the first example an error would be reported because the <MONOGR> element cannot contain parsed character data. (Its model is purely element content.)

Not all SGML features can be used within concurrent document structures. In particular, those SGML features that may only be used within the base document type, such as empty start-tags and net-enabling start-tags, cannot be used within concurrent document structures. Similarly care must also be taken to ensure that notation names associated with data entities or attribute lists are declared in each of the DTDs they are associated with.

10.4 Linking document structures

SGML uses five types of declarations to link concurrent document structures:

Link type declarations are similar in structure to document type declarations. They must be entered in the document prolog after the DTDs they relate to. Like other markup declarations, link type declarations begin with the markup declaration open delimiter (MDO) delimiter followed, without any intervening spaces, by a reserved name, LINKTYPE by default. This is followed by a link type name that uniquely identifies the link process definition. As well as being different from that of any other link type declaration in the same prolog, this name must also be different from the names used for document type definitions in the same prolog.

Link type declarations may be separated from DTDs, and other link type declarations, by comment declarations, spaces, record start and end codes, valid separator characters or processing instructions, which are collectively referred to within ISO 8879 as other prolog.

Except for link set use declarations, which are used in a way similar to short reference maps, link related declarations must either be embedded within a link type declaration subset within the link type declaration, or stored in a seperate file that is referenced as all or part of the link type declaration subset. The mechanism used is similar to that used for document type definitions (see Chapter 11), with any definitions called from a separate file being read after any local definitions encoutered between a matched pair of declaration subset open (DSO) and declaration subset close (DSC) delimiters.

Three types of link are recognized by SGML:

The types of links that can be used within a document are controlled by the LINK entries in the FEATURES clause of the SGML declaration. In the reference concrete syntax the LINK features are disabled by entry of the following line:

   LINK     SIMPLE NO     IMPLICIT NO    EXPLICIT NO

If simple links are required in a document the first entry in this line must be changed to SIMPLE YES followed by a number indicating the maximum number of simple links to be used in the document. If implicit links are to be used the second entry changes to IMPLICIT YES (without a qualifying number). Where explicit links are required the maximum number of links to be used within a single chain in the document must be stated after the entry EXPLICIT YES, giving a composite entry of the form:

   LINK     SIMPLE YES 4  IMPLICIT NO    EXPLICIT YES 2

If EXPLICIT YES is specified, multiple DTDs can be declared in the prolog. The number of explicit links allowed in the FEATURES clause must be, at least, one less than the number of document type declarations in the longest chain of linked documents declared in the prolog.

10.4.1 Simple links

A simple link specification takes the form:

   <!LINKTYPE proof #SIMPLE #IMPLIED
    [<!ATTLIST book style CDATA #FIXED "300dpi.prn">]>

This declaration tells the system to "use the style sheet that generates 300 dot per inch proofs of books when this linktype declaration is active".

The link type name (proof) that follows the LINKTYPE declaration type keyword is followed by two, compulsory, keywords (#SIMPLE and #IMPLIED) to indicate the type of link being specified and the type of result expected.

The link type declaration subset that follows the specification, between the square brackets, contains a single attribute definition list declaration that defines one or more fixed attributes that are to be assigned to the base document type element (the first one defined in the prolog).

Note: When used in link type declaration subsets, the attribute's declared value cannot be defined by use of the ID, IDREF, IDREFS or NOTATION keywords, and #CURRENT and #CONREF cannot be used for the default value. (These restrictions apply because link attributes cannot be used within an element's start-tag. In each of the above cases users would need to specify the applicable attribute values or supply the contents the attributes are to apply to.)

More than one simple link can be specified in a prolog. For example, the following link types could be associated with the TEI DTD:

   <!LINKTYPE print #SIMPLE #IMPLIED
    [<!ATTLIST tei.2 style CDATA #FIXED "postscript">]>
   <!LINKTYPE load-dbs #SIMPLE #IMPLIED
    [<!ATTLIST tei.2 dbs-name CDATA #FIXED "catalogue">]>

to specify the way the TEI.2 document should be processed before being sent for printing or loading into a bibliographic catalogue. Note, however, that a simple link could not be associated with the HTML DTD shown in the concurrent document examples above as only the base document type can be linked to the implied structure.

As with document type declarations, the declaration subset can be stored in an external file which can be referenced using system or public identifiers. For example, the two entries shown above could be shortened to:

   <!LINKTYPE print #SIMPLE #IMPLIED SYSTEM "print.lpd">
   <!LINKTYPE load-dbs #SIMPLE #IMPLIED 
     PUBLIC "-//our-firm//LPD Database loading link process definition//EN">

Note particularly the use of the LPD public class name in the formal public identifier to indicate that the file to be referenced contains a link type declaration subset.

The way a link is activated depends on the application. For conformance testing purposes an SGML parser should be able to use a processing instruction of the following form to activate the both of the link type definitions given above:

   <?rast-active-lpd: print load-dbs>

10.4.2 Implicit links

Implicit link specifications can be used to associate processing attributes with any element in any DTD. When the LINK entries in the FEATURES clause of the SGML declaration contain the entry IMPLICIT YES, the link type name can be followed by the name of a document type whose document type declaration precedes it in the prolog, and the word #IMPLIED. This tells the system that this link type declaration, when activated as detailed above, will add link attributes to elements in the named DTD.

The following example shows how some printing properties could be associated with elements making up a TEI bibliographic entry for a monograph:

   <!LINKTYPE print tei.2 #IMPLIED
   [<!ENTITY % bibl  "(author|title|publisher|pubPlace|date|extent)" >
    <!ATTLIST %bibl; align    (start|end|centred|justified)  start
                     family   CDATA      "Times Roman"
                     weight   NAME       medium
                     posture  NAME       upright
                     size     NUTOKEN    12pt
                     measure  NMTOKEN    36pi
                     l-indent NUTOKEN    0
                     r-indent NUTOKEN    0  
                     attcond  CDATA      #IMPLIED                      >
    <!LINK #INITIAL  
           author       [centred weight=bold size=18pt]
           title        [attcond="type=main" centred size=24pt]
           title        [attcond="type=subordinate" centred size=16pt]
           publisher    [weight=bold]
           date         [family="Arial" posture=italic]
           extent       [end family="Arial"]                            >
   ]>

The parameter entity defined at the start of the link set declaration subset identifies all the elements in a monograph's bibliographic entry that have text associated with them. The attribute definition list declaration then associates 9 attributes with each of these elements, and assigns default values to 8 of the 9 attributes.

Each implicit link type declaration must have at least one link set declaration whose associated name is a special reserved name, #INITIAL. This identifies the start point for the link process. Like other markup declarations the link set declaration begins with a markup declaration open (MDO) delimiter followed, without intervening spaces, by the reserved name identifying the type of declaration, LINK. The name assigned to the link set must follow this reserved name, separated from it and the subsequent link rules by one or more spaces or other separator characters. The link set declaration ends when the next markup declaration close (MDC) delimiter is encountered.

For implicit links the link rules take the form of one or more source element specifications each of which consists of the name of an associated element type, which must be that of an element defined in the DTD identified by the link type declaration, and a link attribute specification. As in other SGML declarations, the associated element type specification can be either a single element type name or a bracketed name group. The link attribute specification consists of an attribute specification list, as found within an element's start-tag, bracketed by the current declaration subset open (DSO) and declaration subset close (DSC) delimiters.

Notice that there are two declarations for the title element. Multiple entries are permitted where the selection of an appropriate option can be determined using some application-specific rule. In this example an attribute condition (attcond) attribute has been defined to check the current value of the type attribute. If the value of the type attribute is main the title will be set in 24pt Times Roman centred on the 36 pica (6 inch) default measure. If the value is subordinate the title will be set, centred on the measure, in 16pt Times Roman.

Notice also that there is no entry for the publisher element. This is because this element should be set using the default settings for the attributes, which specify that the text should be set in 12pt Times Roman, using a medium weight and upright posture, so that the start of the publisher's name is aligned with the start of the 36 pica measure, to which no indents are to be applied.

Where style sheets are used to record the details of the parameters to be associated with each element, implicit links provide a natural route for linking elements to style sheets.The following example shows, the name assigned to the style sheet specification (e.g. h1) is the only attribute needed to control the link process:

   <!LINKTYPE format tei.2 #IMPLIED
   [<!ENTITY  % bibl  "(author|title|publisher|pubPlace|date|extent)" >
    <!ATTLIST %bibl; style     NAME      "normal"
                     attcond   CDATA     #IMPLIED     >
    <!LINK #INITIAL
           author       [style=h1]
           title        [attcond="type=main" style=h2]
           title        [attcond="type=subordinate" style=h3]
           publisher    [style=p]
           date         [style=date]
           extent       [style=extent]                            >
]>

In this case the style sheet name, which is passed to the text formatting routines, activates a predefined set of formatting instructions which control the appearance of the element's text.

As explained above, the link type declaration subset can be stored in a separate, easily reusable, file that can be associated with many document instances. A typical implementation might add the following prolog to the TEI example document shown above:

   <!DOCTYPE TEI.2 SYSTEM "tei2.dtd">
   <!LINKTYPE format tei.2 #IMPLIED
     PUBLIC "-//our-firm//LPD Styles for TEI Bibliography//EN">

10.4.3 ID-specific links

Sometimes you need to associate a specific set of processing rules with a particular occurrence of an element. If an element has been assigned a unique identifier, a ID link set declaration can be used to assign link attributes to elements with relevant IDs.

When the reference concrete syntax is being used ID link set declarations are defined, within the link type declaration subset, in a declaration that begins <!IDLINK and ends with >. Between these markup delimiters there must be one or more entries consisting of:

The following example shows how an ID link set declaration could be used to provide overrides for specific instances of a publisher name:

   <!IDLINK
    isea     publisher [family="Avant Garde"]
    OPOCE    publisher [family="Helvetica"]
    sgml-cen publisher [weight=bold posture=italic] >

If the publisher element in the TEI bibliographic entry used above had been:

<(TEI.2)PUBLISHER id=isea>isea sa</(TEI.2)PUBLISHER>

the IDLINK definition would ensure that the company name would be printed using the house style for that company, which requires that the name be set in 12pt Avant Garde.

Note: Only one ID link set declaration may be specified in each link type declaration.

10.4.4 Explicit links

Explicit links are used to link elements in a source document structure to elements in a result document structure. The explicit link specification that follows the LINKTYPE keyword and the link type name consists of the names of two DTDs defined earlier in the prolog. The subsequent link type declaration subset, between the square brackets, can contain:

The following prolog could be used to link a TEI bibliographic entry for a monograph to an HTML document structure:

   <!DOCTYPE TEI.2 SYSTEM "tei2.dtd">
   <!DOCTYPE  HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
   <!LINKTYPE CreateCD TEI.2 HTML 
    [<!ATTLIST title attcond CDATA #IMPLIED>
     <!LINK #INITIAL  author                                  title
                      title     [attcond="type=main"]         h2 
                      title     [attcond="type=subordinate"]  h3
                      imprint                                 address
                      pubPlace                                br
                      date                                    strong
                      extent                                  p         >
    ]>

For explicit links the link set name (e.g. #INITIAL) is followed by matched pairs of element type names, optionally qualified by attribute specifications, that form an explicit link rule. The first element type name and attribute specification pair is known as the source element specification. This starts with the name(s) of one of the elements defined in the DTD whose base document element is the first DTD named in the link type declaration. The element type name can be qualified by one or more attributes forming a link attribute specification, as can be seen in the entries for the two variants of the TEI <TITLE> element.

The source element specification is followed by a result element specification, which also starts with an element type name. This second name must be one the element type names defined in the DTD whose base document element is the second DTD named in the link type declaration.. This element type name can, optionally, be followed by a result attribute specification showing the attributes to be associated with the selected element in the result document. These attributes must be ones whose attribute definition list declaration has been declared as part of the second of the DTDs named in the link type declaration.

The following document instance could be processed using the prolog shown above:

    <TEI.2>
    <!--TEI header elements omitted here for simplicity-->   
    <BIBL><MONOGR>
    <AUTHOR>Shirley, James</AUTHOR>
    <TITLE type=main>
    The Gentlemen of Venice
    </TITLE>
    <TITLE type=subordinate>
    A tragi-comedie presented at the private house in Salisbury Court
    by Her Majesties servants
    </TITLE>
    <IMPRINT>
    <PUBLISHER>H. Moseley</PUBLISHER>
    <PUBPLACE>London</PUBPLACE>
    <DATE>1655</DATE>
    </IMPRINT><EXTENT>78pp</EXTENT>
    </MONGR></BIBL></TEI.2>  

The HTML document structure produced as a result of processing the link process definition will be:

<TITLE>
Shirley, James
</TITLE><H2>
The Gentlemen of Venice
</H2><H3>
A tragi-comedie presented at the private house in Salisbury Courtby Her Majesties servants
</H3><ADDRESS>
H. Moseley
<BR>
London
<STRONG>
1655
</STRONG></ADDRESS><P>
78pp
</P>

This is not a complete HTML document, but it will be accepted by most HTML-based programs as the only elements missing from it are ones that are declared omissible in the HTML DTD. When the output of the link process is parsed against this DTD it will generate a file of the form:

<HTML VERSION= "-//W3C//DTD HTML 2.0//EN">
<HEAD><TITLE>
Shirley, James
</TITLE></HEAD><BODY><H2>
The Gentlemen of Venice
</H2><H3>
A tragi-comedie presented at the private house in Salisbury Court
by Her Majesties servants
</H3><ADDRESS>
H. Moseley
<BR>
London
<STRONG>
1655
</STRONG></ADDRESS><P>
78pp
</P></BODY></HTML>

Note that the parser has determined from the HTML DTD that the <TITLE> element should be placed in the <HEAD> section of the HTML document instance, and that all the other elements should be placed in the <BODY> section. It has also added the default values to those attributes that were assigned one in the DTD.

If you compare this document structure with that given as an example of concurrent markup earlier in the chapter you will find that they are not identical. Using SGML links it is not possible to position the date outside of the <ADDRESS> result element generated by the <IMPRINT> source element, or to generate more than one start-tag in the result structure when a source element is encountered. (To get the structures to be the same both a <P> and a <STRONG> start-tag would have needed to be generated in response to the <DATE> start-tag.)

If you compare the document structure with the example showing the use of both concurrent markup and marked sections you will find that the TEI <AUTHOR> element has been linked to the HTML <TITLE> element rather than to the H1 element as, without this, the compulsory title element in the header would not be present and an error would be reported when the HTML file was parsed.

10.4.5 Using alternative link sets

Sometimes problems such as those illustrated by the last example can be overcome by using one of the two mechanims provided by SGML for switching link sets:

USELINK

The #USELINK option is particularly useful for differentiating between the ways in which an element can be processed in different contexts. For example, one of the DTDs used by the Office for Official Publications for the European Communities (OPOCE) contains the following specification for a paragraph:

   <!ELEMENT (p|elem)     - -  (#PCDATA|list)+ >
   <!ELEMENT list         - O  (elem)+ >

Notice that this definition is recursive: lists are made up of elements (elem) that can themselves contain lists.

The following, made-up, example shows how a complex paragraph could be marked up using these elements. The marked-up text, which contains four levels of nested lists, has been indented to make it clear which level each component is nested to.

   <P>The budget for 1996 will be distributed as follows:
      <LIST>
         <ELEM>Payments to Directorates
         <LIST>
            <ELEM>DGI
            <LIST>
               <ELEM>Division: A
               <LIST>
                  <ELEM>Brussels:    12 Mecus</ELEM>
                  <ELEM>Luxembourg:   8 Mecus</ELEM></LIST>
               <ELEM>Division: B
               <LIST>
                  <ELEM>Brussels:    10 Mecus</ELEM>
                  <ELEM>Luxembourg:   6 Mecus</ELEM></LIST>
               <ELEM>Division: C
               <LIST>
                  <ELEM>Brussels:     9 Mecus</ELEM>
                  <ELEM>Luxembourg:  12 Mecus</ELEM></LIST></LIST>
            <ELEM>DGII
            <LIST>
               <ELEM>Division: D
               <LIST>
                  <ELEM>Brussels:    21 Mecus</ELEM>
                  <ELEM>Luxembourg:  18 Mecus</ELEM></LIST>
               <ELEM>Division: E
               <LIST>
                  <ELEM>Brussels:     5 Mecus</ELEM>
                  <ELEM>Luxembourg:   2 Mecus</ELEM></LIST>
               <ELEM>Division: F
               <LIST>
                  <ELEM>Brussels:    19 Mecus</ELEM>
                  <ELEM>Luxembourg:   2 Mecus</ELEM></LIST></LIST></LIST>
         <ELEM>Payments to Member States
         <LIST>
            <ELEM>Greece:     25 Mecus</ELEM>
            <ELEM>Austria:    12 Mecus</ELEM>
            <ELEM>Finland:     8 Mecus</ELEM></LIST>
         <ELEM>Payments to Other Bodies
         <LIST>
            <ELEM>OPOCE:      10 Mecus</ELEM></LIST>
   </P>

It could be decided that, when distributed over the World Wide Web, the first two levels of list should be numbered, but subsequent levels should be bulleted. The following (simplified) example of an explicit link shows how the #USELINK option can be used to enforce the correct nesting and numbering of lists:

   <!DOCTYPE blk0 SYSTEM "cat2.dtd">
   <!DOCTYPE  HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
   <!LINKTYPE to-HTML blk0 HTML [
    <!LINK #INITIAL p                      p
                    list #USELINK level-1  ol                 >
    <!LINK level-1  elem                   li
                    list #USELINK level-2  ol [type=a]   >
    <!LINK level-2  elem                   li
                    list                   ul [type=disc]     >
   ]>

HTML distinguishes between two types of lists, ordered lists (<OL>), which are preceded by an automatically generated number or letter (depending on the level of nesting), and unordered lists (<UL>), which are preceded by a bullet or dash (depending on both level and the style specifications of the browser you are using).

The rules for processing paragraphs are defined in the initial link set. OPOCE paragraphs are to be mapped to HTML paragraphs. In this case the same markup tag (<P>) is used in both DTDs. (It should be noted, however, that the two elements have different content models.)

When a top level list (<LIST>) element is identified within the OPOCE paragraph it will be associated with an HTML ordered list (<OL>) element with the default rules relating to numbering. Within the top level list the parser will use the link set called level-1 to link elements. This will link any OPOCE <ELEM> elements to HTML list items (<LI>). If a nested list is encountered this will be mapped to a nested ordered list whose setting for the value of the type attribute is a to ensure arabic numbering of the nested list. Within the nested list the rules defined in the link set called level-2 will be used.

Within a second level list any <ELEM> elements will be linked to HTML list items and any nested lists will be linked to HTML unordered lists (<UL>). To ensure that the list will be bulleted a type=disc attribute value has been specified. Note that there is no #USELINK statement at this level. If there are further levels of nesting they will continue to use the current link set, level-2, which will map all levels of nested list in the same way.

When the link rules are applied to the sample paragraph they will generate an HTML file of the form:

   <P>The budget for 1996 will be distributed as follows:
      <OL>
         <LI>Payments to Directorates
         <OL type="a">
            <LI>DGI
            <UL type="disc">
               <LI>Division: A
               <UL type="disc">
                  <LI>Brussels:    12 Mecus</LI>
                  <LI>Luxembourg:   8 Mecus</LI></UL>
               <LI>Division: B
               <UL type="disc">
                  <LI>Brussels:    10 Mecus</LI>
                  <LI>Luxembourg:   6 Mecus</LI></UL>
               <LI>Division: C
               <UL type="disc">
                  <LI>Brussels:     9 Mecus</LI>
                  <LI>Luxembourg:  12 Mecus</LI></UL></UL>
            <LI>DGII
            <UL type="disc">
               <LI>Division: D
               <UL type="disc">
                  <LI>Brussels:    21 Mecus</LI>
                  <LI>Luxembourg:  18 Mecus</LI></UL>
               <LI>Division: E
               <UL typ="disc">
                  <LI>Brussels:     5 Mecus</LI>
                  <LI>Luxembourg:   2 Mecus</LI></UL>
               <LI>Division: F
               <UL type="disc">
                  <LI>Brussels:    19 Mecus</LI>
                  <LI>Luxembourg:   2 Mecus</LI></UL></UL></OL>
         <LI>Payments to Member States
         <OL type="a">
            <LI>Greece:     25 Mecus</LI>
            <LI>Austria:    12 Mecus</LI>
            <LI>Finland:     8 Mecus</LI></OL>
         <LI>Payments to Other Bodies
         <OL type="a">
            <LI>OPOCE:      10 Mecus</LI></OL></OL>

This will generate a listing of the following form:

The budget for 1996 will be distributed as follows:

  1. Payments to Directorates
    1. DGI
      • Division: A
        • Brussels: 12 Mecus
        • Luxembourg: 8 Mecus
      • Division: B
        • Brussels: 10 Mecus
        • Luxembourg: 6 Mecus
      • Division: C
        • Brussels: 9 Mecus
        • Luxembourg: 12 Mecus
    2. DGII
      • Division: D
        • Brussels: 21 Mecus
        • Luxembourg: 18 Mecus
      • Division: E
        • Brussels: 5 Mecus
        • Luxembourg: 2 Mecus
      • Division: F
        • Brussels: 19 Mecus
        • Luxembourg: 2 Mecus
  2. Payments to Member States
    1. Greece: 25 Mecus
    2. Austria: 12 Mecus
    3. Finland: 8 Mecus
  3. Payments to Other Bodies
    1. OPOCE: 10 Mecus

Where special instructions are to be associated with specific instances of an element, the source document type definition must include an attribute definition list declaration for that element which contains an attribute declared using the ID keyword as its declared value. In the source document instance each element that is to be treated differently must be given a unique identifier that the system can use to determine where one of the link rules defined in the ID link set declaration defined in the current link type declaration subset is to be applied.

When preparing ID link sets it is important to remember that each link set must cater for any subelements that the model allows to be embedded within the element being linked to the result document. Failure to do so may result in embedded elements not being formatted properly, and at best will result in their being given default formatting parameters. To overcome this problem the #USELINK option can be combined with the IDLINK option to give a declaration of the form:

   <!IDLINK special p #USELINK #INITIAL block [style=special-p] >

POSTLINK

The #POSTLINK option can be used to specify that an alternative link set is to be activated when the end-tag for the specified element is encountered, or implied by the program.

The classic example of the need to change the link rules immediately after a specific element is provided by books in which the first paragraph of text after a chapter heading is set in a different format from other paragraphs. This could be handled using a link set of the following type:

  <!LINKTYPE fromHTML HMTL page [
  [<!LINK #INITIAL
     h1 #POSTLINK firstpar block [style=chapter-head]
     p                     block [style=para]              >
   <!LINK firstpar
     p #POSTLINK #INITIAL  block [style=initial-para]      >
   ] >

In the initial link set the first level of heading, e.g. the chapter heading, has been declared in such a way that, when the end-tag for the heading (</H1>) is encountered, the parser will switch to a special link set (firstpar) before processing the following text. The first paragraph differs from other paragraphs: it uses the style sheet known as initial-para to format the block it forms on the page.

When the end-tag for the first paragraph (</P>) is detected by the parser the #POSTLINK #INITIAL entry associated with the paragraph element type name (P) in the firstpar link set will cause the parser to revert to using the link definition for paragraphs given in the initial link. This will ensure that subsequent paragraphs use the normal style sheet for paragraph blocks (para)

10.4.6 Short cuts

As will be obvious from the complexity of the above, simplified, examples, preparing link type declarations for a document structure of the complexity of that defined for a book can be a time-consuming task. Fortunately a number of short cuts are available.

As with the other markup declarations, parameter entities can be used to reduce the amount of repetitive keying required. As the following example shows, this can greatly reduce the length of link attribute specifications:

   <!LINKTYPE print act page [
   <!ENTITY % catalog  SYSTEM "catalog.lpd"  >
   <!ENTITY % preamble SYSTEM "preamble.lpd" >
   <!ENTITY % terms    SYSTEM "terms.lpd"    >
   <!LINK #INITIAL    %catalog;
                      title [attcond="type='main'"]         block [style="big-title"]
                      title [attcond="type='subordinate'"]  block [style="small-title"]
                      %preamble;
                      %terms;                                      >

Here the majority of the links to be used are defined in three link process definition files stored on the local systems. These standard link set definitions are called in using parameter entities, with any elements not covered by standard definitions being added to the stored definitions.

Entity declarations defined within the document type declaration subset of the document identified in the linktype declaration as the source document type for the current link type declaration can also be used within the link type declaration subset if they are appropriate. If entity declarations of the same name occur in both the link type declaration and the document type declaration, however, the one in the link type declaration will have priority while the link is active.

Another short cut is to use a name group for the associated element type part of a source element specification in the link set declaration (or in the link attribute set declarations). For example, if the HTML tags for entering computer coding examples are to have the same format when printed they could be declared as:

   <!LINK #INITIAL  ...
     (xmp|listing|code|plaintext)  block [face=courier]
                    ...                                   >

Care must, however, be taken in using this technique. In particular, it is important to check that the linked elements require the same set of attributes for both the source and result elements. If different sets of attributes apply on either side of the link this technique should not be used.

Where a number of elements share the same output format the values should be used as the default values in the specification of the attributes in the result DTD. This will allow the attribute specification list to be omitted where the result element only requires these default values.

Another possible short cut is to let the program imply the elements that a link applies to, and the relevant attribute values. This can be done for individual elements within an explicit link set by replacing the details of the result element specification with the keyword #IMPLIED to give a link set declaration of the form:

   <!LINK linkname element [attributes] #IMPLIED>

Here the result of the link process will automatically be implied by the formatting program whenever the specified element is encountered, the optional link attribute specification associated with the source element specification defining parameters to be passed to the program.

The parser can also be instructed to link all otherwise unlinked source elements to a default result element by using the #IMPLIED keyword in place of the source element specification, e.g.:

   <!LINK linkset1 #IMPLIED block [style=normal]>

10.4.7 Overriding link declarations

Link type declarations can be controlled from within the text by use of link set use declarations. (These have a very similar form to the short reference map use declarations.) The general form of such declarations is:

   <!USELINK setname linkname>

where USELINK is the default version of the keyword defined in the reference concrete syntax, setname is the name given to one of the link set declarations (<!LINK ...>) in the document's prolog and linkname is the link type name used to identify the link type declaration (<!LINKTYPE ...>) that contains the relevant link set declarations. A typical entry would be:

   <!USELINK #INITIAL to-HTML >

As with the USEMAP declaration, the special #EMPTY keyword can be used to switch off a link. To disable the to-HTML link type once it has been enabled, for example, you could enter the following declaration at any point in the text:

   <!USELINK #EMPTY to-HTML >

If the link set map is switched off within an element by entry of a link set use declaration with the keyword #EMPTY, the original link set can be restored by entering a declaration of the form:

   <!USELINK #RESTORE to-HTML >

On seeing this markup declaration, the program will restore the link set that was associated with the current element prior to the preceding link set use declaration (e.g. the one that was current when the element began).

10.4.8 Using publicly declared link type declaration subsets

Where a publicly declared link type declaration subset is already known to the receiving system it can be invoked, like other publicly declared declaration sets, by use of a formal public identifier. In this case the public identifier qualifies a link type declaration, and so the public text class keyword used in the formal public identifier is LPD. A typical declaration might be:

   <!LINKTYPE create-CD OPOCE HTML PUBLIC
        "-//OPOCE//LPD CD creation link set//EN">

If the publicly declared link set is to be extended by local definitions, which may override some of the definitions in the publicly declared set, the formal public identifier can be followed by a link type declaration subset. It should be noted that, as with DTDs, externally stored link declarations are added to the end of the local definitions, the first definition always taking precedence. To ensure the proper handling of entity references, all entities declared within the link are treated as preceding entities declared in the source DTD. This means that any entity declarations within the link with the same name as entities declared in the DTD will take precedence. Similarly, if the link declarations contain attributes which reference general entities not declared in the link type declaration or the link's source DTD, they will, if no default entity has been defined in the link's source DTD, be taken from the declarations within the document type definition for the same element.

From the above examples it can be seen that SGML provides both document creators and document designers with a number of techniques for controlling how entered text is to be processed. It should not, however, be thought that link statements and concurrent document types provide all the tools needed to produce paginated text. Fully paginated text requires a powerful text formatter, which will normally need to be set up for specific applications. The degree of interaction possible between the SGML document designer and the text formatter will depend on the skill of the system's designers in linking the formatter to the information stored as an integral part of the SGML document.

A new standard, ISO/IEC 10179, published in 1996, provides the power needed to manage the more complex forms of transformations required for the formatting documents. The Document Style Semantics and Specification Language (DSSSL) defined in ISO/IEC10179 uses a variant of the LISP-based Scheme programming language to control the way in which SGML document trees are converted for formatting. The language also defines an SGML Document Query Language (SDQL) that can be used to identify specific components of an SGML document tree.

References

Guidelines for Electronic Text Encoding and Interchange (TEI P3) Edited by C, M. Sperberg-McQueen and Lou Burnard for The Association for Computers and the Humanities (ACH), The Association of Computational Linguistics (ACL) and The Association for Literary and Linguistic Computing (ALLC), Chicago/Oxford, 1994, 1289pp

International Organization for Standardization (1996), Information technology - Text and office systems - Document Style Semantics and Specification Language (DSSSL) (ISO/IEC 10179:1996) Geneva: ISO.

International Organization for Standardization (1992), Information technology - Hypermedia/Time-based Structuring Language (HyTime) (ISO/IEC 10744:1992) Geneva: ISO.