Previous chapter Next chapter Table of Contents

© Martin Bryan 1997 from SGML and HTML Explained published by Addison Wesley Longman


Chapter 7
Short References

This chapter explains the use of SGML short references. It is split into the following sections:

7.1 Introduction to short references

Short references are characters, or strings of characters, that provide a shorthand reference to an SGML entity. In the reference concrete syntax most non-alphanumeric characters not used as markup delimiters are defined as valid short reference delimiters (see Figure 4.6). The reference concrete syntax also defines five strings of function character and special character sequences that are commonly used, when preparing text on a word processor, as possible short references:

   &#RS;&#RE;  10,13    Empty record
   &#RS;B&#RE; 10,66,13 Blank records (one or more blanks)
   B&#RE;      66,13    Trailing blank(s) followed by record end
   &#RS;B      10,66    Record start followed by leading blanks
   BB          66,66    Two or more blanks (spaces or tabs)
   

These strings allow blank lines, indents and other occurrences of multiple spaces to be identified and processed as appropriate.

Note: Within short reference strings SGML uses the letter B to indicate a sequence of one of more blanks (separator characters).

Other characters declared in the document's character set can be added to this set by entering them as part of the DELIM section of the SYNTAX clause of the SGML declaration. Each additional short reference string required is declared, between literal delimiters (e.g. "), at the end of the SHORTREF entry in the SGML declaration. Typically the extended entry will take the form:

   DELIM GENERAL  SGMLREF
         SHORTREF SGMLREF "<{" "<{/" "}>" "&{" "};"

In the above example five new short reference sequences have been added to the default set shown in Figure 4.6.

If one or more of the standard short reference delimiters is not to be identified as a short reference delimiter string the standard set of delimiters can be disabled by replacing SGMLREF with NONE, e.g

   DELIM GENERAL  SGMLREF
         SHORTREF NONE    "<{" "<{/" "}>" "&{" "};"

If no strings are entered after this keyword short reference mapping will be prohibited.

Short references are mapped to entities in short reference mapping declarations. More than one mapping declaration can be defined in a document, but only one map will be in force at any one time.

The current short reference map is determined by use of short reference use declarations. Where these are declared in the document type definition they are activated automatically whenever a particular element, or group of elements, is encountered in the document instance. Alternatively, a short reference map declaration can be entered at the point in the document instance at which the map is to be invoked.

Where multiple DTDs are associated with a document (see Chapter 10) only the base document type can have short references associated with it.

7.2 Short reference mapping declarations

Short reference mapping declarations are identified by an initial keyword of SHORTREF (or its declared replacement) immediately after the markup declaration open sequence, e.g. <!SHORTREF. This is followed by a unique map name that is used to identify the mapping declaration in short reference use declarations.

The map name is followed by a set of definitions associating short reference delimiter strings which have been defined as valid in the SGML declaration with an entity that has been defined earlier in the same document type definition. Within the short reference map short references are entered as parameter literals, between matched pairs of quotation marks or apostrophes. This string is followed by a space and the name of the entity containing the replacement text for the specified short reference character(s).

The following short reference map has been suggested as a mechanism for providing short cuts within mathematical data in a proposed extension to the HTML DTD:

   <!SHORTREF MAP1     "^" REF1
                       "_" REF3
                       "{" REF5 >

The three entities referred to are declared, before the short reference map, as:

   <!ENTITY REF1 STARTTAG "SUP">
   <!ENTITY REF3 STARTTAG "SUB">
   <!ENTITY REF5 STARTTAG "BOX">

A short reference delimiter character (or string) can be used only once in any particular mapping declaration. If the entered short reference delimiter is not one of those defined in the SGML declaration currently in force the mapping will not take place and the short reference will be treated as part of the text. (At least one valid short reference string must be specified in each short reference mapping declaration.)

7. 3 Short reference use declarations

There are two ways in which a short reference map can be activated: automatically or manually. Automatic mapping of short references is achieved by entering short reference use declarations within a document type definition. Manual mapping occurs when a short reference use declaration is entered as part of document instance.

An automatically actioned short reference use declaration has the general form:

   <!USEMAP map-name element-list>

where map-name identifies a map defined in an associated short reference mapping declaration, and element-list contains the names of the elements that are to activate the map.

The map declared above can be invoked by the declaration:

   <!USEMAP MAP1 MATH >

This declaration tells the program that the map declared with the name of MAP1 is to be used whenever an element called MATH is being processed. This map will stay in force until the end-tag for the element (</MATH>) is encountered unless it is overridden by another map that has been set up to apply to a valid subelement of the table element.

If more than one element type is to activate a particular map, the element type names must be entered as a name group. When the reference concrete syntax is being used a typical declaration will take the form:

     <!USEMAP quotes (p|footnote) >

If either of the listed elements is started, by entry of an appropriate start-tag, the quotes map will automatically become the current map, overriding any existing map until the associated end-tag is identified.

Maps may be activated from within the text of a document by entry of a declaration of the form:

   <!USEMAP map-name >

In this case no element type name is stated as the map will be associated with the currently open element, remaining in force until its end-tag is identified. As with automatically actioned maps, the requested map will be used for any embedded subelements for which separate maps have not been specified; but if an embedded subelement is associated with another map, that map will override the previous map until such time as the end-tag for the subelement is identified.

A special variant of the short reference use declaration can be used to declare an empty map that temporarily disables any map currently being used. The empty map is specified by replacing the map name with the reserved word #EMPTY to give a declaration of the form:

   <!USEMAP #EMPTY >

for a short reference use declaration within the document instance, or:

   <!USEMAP #EMPTY element-list >

for a map declared within the document type definition.

Note: Please do not confuse the use of the SGML USEMAP keyword with the use of the HTML usemap attribute.

7.4 Using short references

The following example shows how nested short references could be used within mathematical formulae:

   <!ENTITY REF1 STARTTAG "SUP">
   <!ENTITY REF2 ENDTAG   "SUP">
   <!ENTITY REF3 STARTTAG "SUB">
   <!ENTITY REF4 ENDTAG   "SUB">
   <!ENTITY REF5 STARTTAG "BOX">
   <!ENTITY REF6 ENDTAG   "BOX">

   <!SHORTREF MAP1     "^" REF1
                       "_" REF3
                       "{" REF5 >

   <!SHORTREF MAP2     "^" REF2
                       "_" REF3
                       "{" REF5 >

   <!SHORTREF MAP3     "_" REF4
                       "^" REF1
                       "{" REF5 >

   <!SHORTREF MAP4     "}" REF6
                       "^" REF1
                       "_" REF3 >

   <!USEMAP MAP1 MATH>
   <!USEMAP MAP2 SUP >
   <!USEMAP MAP3 SUB >
   <!USEMAP MAP4 BOX >

To understand how these maps interrelate we will consider the way in which an HTML system should process the following boxed formula:

   <MATH>{&int;_0_^n^f(x) dx}</MATH>

The processing sequence for this string will be:

  1. The start-tag for the <MATH> element will activate MAP1.
  2. The { will be replaced by the entity called REF5, which will generate a start-tag for a <BOX> element.
  3. The <BOX> start-tag will activate MAP4.
  4. The &int; entity reference will generate an integral sign.
  5. The first underline ( _ ) will be replaced by the entity called REF3, which will generate a start-tag for a <SUB> (subscript) element.
  6. The <SUB> start-tag will activate MAP3.
  7. The zero (0) will be treated as the data for the <SUB> element.
  8. The second underline will be replaced by the entity called REF4, which will generate an end-tag for the subscript (</SUB>).
  9. Once the end of the subscript element has been identified MAP3 will be deactivated and MAP4 will be reactivated.
  10. The first circumflex (^) will be replaced by the entity called REF1, which will generate a start-tag for a <SUP> (superscript) element.
  11. The <SUP> start-tag will activate MAP2.
  12. The letter n will be treated as the data for the <SUP> element.
  13. The second circumflex will be replaced by the entity called REF2, which will generate an end-tag for the superscript (</SUP>).
  14. Once the end of the superscript element has been identified MAP2 will be deactivated and MAP4 will be reactivated.
  15. f(x) dx will be treated as data for the <BOX> element.
  16. The } code will be replaced by the entity called REF6, which will generate an end-tag for the box (</BOX>).
  17. Once the end-tag for the box element has been identified MAP4 will be deactivated and MAP1 will be reactivated.
  18. When the end-tag for the math container (</MATH>) is identified MAP1 will be deactivated.

Although short references provide a very useful method of reducing the number of tags that need to be entered by authors, a word of caution is required. When choosing suitable short reference delimiters you must make sure they are unique. For instance, if you choose the apostrophe as the delimiter used to start and end quoted text, how will quote delimiters be distinguished from other apostrophes within the text? It might be possible to declare short reference delimiters consisting of a space and an apostrophe that could be used to identify the start of a quote. A matching short reference delimiter consisting of an apostrophe followed by a space could be used to generate the end-tag for the quote, but this sequence would not recognize the end of quoted text where the quotation mark is followed by other punctuation symbols, while an apostrophe on its own might be identified erroneously if the text contained any embedded plural possessive words such as those found in the phrase "the bosses' toilet". While further short reference delimiters can be defined to cover most situations, this can lead to fairly complicated maps which could be avoided by careful choice of short reference delimiters

A similar problem can occur if two short reference strings start with the same sequence of characters. For example, if the SGML declaration contained the entry:

   SHORTREF SGMLREF "##"

both a single hash code (#) and a pair of hash codes would be valid short reference delimiters. If the map currently being used only contained a mapping for the single variant, e.g.:

   <!SHORTREF newmap "#" hash >

a string of three hashes (###) would be interpreted as ##&hash;. In this case the first two hashes have been recognized as a valid SGML short reference string which is currently unmapped, so the characters have been passed through as they are, while the final hash has been recognized as the single hash short reference which is to be mapped to the entity whose name is hash.

Before leaving the subject of short references it should be noted that short reference strings are never recognized within markup.