Martin Bryan 1997 from SGML and HTML Explained published by Addison Wesley Longman


Web SGML and HTML 4.0 Explained

Martin Bryan

Dedication

This book is dedicated to Yuri Rubinsky, who did so much to make SGML and HTML widely accepted before his untimely death.

This file contains the following parts of Web SGML and HTML 4.0 Explained:

Updates

An annex defining adaptations of SGML for use over the Internet was approved by ISO's SGML technical committee on 5th December 1997. Whilst the formal date of publication for this annex is not yet known, SGML and HTML Explained have been updated and this new version of the book has been produced in anticipation of its formal publication. In addition the book has been updated to reference the strict variant of Version 4.0 of the HTML DTD.

Version 1.0 of Jade, which includes support for TeX, was released in September 1997. To obtain a copy of the latest version of Jade contact http://www.jclark.com/jade An experimental version of Jade with support for XML and XSL is also available from this site. James Clarke is currently (December 1997) upgrading the SP parser used by JADE to cover the Web SGML Adaptations.

Version 4.0 of the HTML DTD was made public on 8th July 1997 and became a formally proposed W3C recommendation on 4th November. It is expected to be formally approved on Christmas Eve 1997. When Version 4.0 became a Proposed Recommendation, a strict version of the DTD, which was conformant with Version 2 of the W3C Cascading Style Sheets specification, was made public. In December 1997 ISO announced plans for a standardized subset of HTML 4.0 which has even stricter interpretation than the HTML proposals. These new features are covered in this updated version of my explanation of SGML and HTML.

Foreword

International Standards come in two flavors. One variety is a useful codification of current practice, providing, for example, an unambiguous definition of a character set for information interchange, as ISO 8859-1 does for the Latin-1 character set. The other category has been aptly described as 'standards imposed by an international body of do-gooders' which, largely as a result of the gestation time of such standards, are found to be largely irrelevant by the time they appear. The Office Document Architecture (ODA) standard is a case in point. The Standard Generalised Markup Language (SGML, ISO 8879) is an honourable exception to this rule: in the ten years since it was first published it has revolutionized the handling of information in industry, commerce and government circles. Large quantities of the world's information are now organised with the help of SGML, and the World Wide Web community is at last beginning to realize that treating HTML in its true guise as an SGML application opens up a future for Web publishing divorced from the power games of Microsoft and Netscape.

For a long time the practice of SGML was a black art, known only to a select few. Foremost amongst these gurus is Martin Bryan, whose book SGML: an Author's Guide did much to help the spread of SGML in its early days. This present volume stands as a worthy successor, marking SGML's coming of age and its recognition as a major force both in technical documentation and in the wider world of Web publishing. Attempting to extract the message from the text of an International Standard is masochism of the highest order, and readers who need to understand the precise meaning and significance of ISO 8879 or the HTML 3.2 definition will have cause to be eternally grateful to Martin Bryan for his guidance.

Professor David Barron
University of Southampton
December 1996

Preface

Information is the key to success in our fiercely competitive world: without it most businesses and governments would collapse. As the pace of life increases, the ability to communicate information speedily is becoming an increasingly important ingredient in the success, or failure, of most organizations. The development of a World Wide Web (WWW) of connected computer systems has had a major impact on the speed with which organizations can communicateduring the last decade of the 20th century. This book explains the theoretical background and practical implications of one of the key components of the success of the WWW, the HyperText Markup Language (HTML).

HTML has been formalized according to the rules defined in International Standard 8879, which defines a Standard Generalized Markup Language (SGML). This book explains the role of SGML and the concepts for controlling computerized document markup that it has introduced; it then explains how these concepts have been utilized in formalizing HTML.

Rather than concentrating on the 'how to' aspects of producing HTML documents, this book concentrates on 'why it was done this way'. By providing a different perspective on the development of HTML, it is hoped that this book will increase awareness of the benefits that SGML brings to computerized information management.

While this book is specifically designed to appeal to those who want to gain a more in-depth understanding of what makes HTML tick, it also serves a more general purpose in teaching the theory behind SGML. As such, it will be a useful reference book for those being introduced to this important information processing language.

One of the best ways to learn any computing language is by example. One of my principal aims, therefore, has been to try to provide a practical example of the use of every concept in SGML, and every markup tag provided in HTML. To avoid making the book too unwieldy, I have deliberately simplified many of the examples.

To provide a more in-depth understanding of the use of HTML the book is supplied on disk in the form of a set of HTML-encoded files, which contain navigable hypertext links within and between chapters. Most WWW document browsers provide facilities for viewing the coding of the source document. By using these facilities to view the way in which the files that make up this book have been encoded, you can get some idea of the way in which HTML documents are typically marked up and learn the advantages provided through electronic corss-referencing, which is much more extensive in the electronic version of the book.

Until recently, document creation using word processors was seen as a separate activity from other computing activities of a company. Today, companies are beginning to realize that document production of all types needs to be tightly coupled with other forms of information capture and management. SGML provides a generalized mechanism for the identification of manageable units of information and the documents that can be produced from them. Many of today's largest document management projects already use SGML to control information flow. This book explains how document components can be identified and stored as a reusable resource by marking up their information elements in SGML.

The WWW has led to an increasing awareness of the benefits of building compound documents by pointing to many different information sources rather than copying data directly into new files. Trying to maintain multiple copies of a piece of data is impossible over any length of time. Only by keeping one master copy of each file, and making a reference to this copy at appropriate points from other documents, can document sets be realistically managed over the lifespan of projects such as that required to build an airliner. Typically, such large capital investment projects involve life cycles measured in terms of decades. To maintain data over such periods using technology that has a lifespan that can be measured in months, rather than years, naturally leads to difficulties. One of the big advantages of SGML is that it encodes documents in a way that can be understood by virtually all computers. This is why HTML-encoded documents can be moved between computer systems without difficulty. This ability to create a document on one platform and then view or edit it on a completely different platform is one of the key aspects to the success of the WWW.

The 21st century could be the first for many centuries for which written records are not available. Unless we can learn to archive our electronic mail, our Internet newsgroup discussions, our videoconferences and our telephone messages in a way that will preserve the data over time, future historians will find it difficult to find out about our activities. It is now, at the start of this age of electronic information dissemination, that we need to take a serious look at how we can preserve our electronic data in an easily understood format. As bodies such as the Association for Computing in the Humanities' Text Encoding Initiative (TEI) has already recognized, SGML and HTML will help to provide an electronic record of the 21st century.

Martin Bryan
January 1997
E-mail: mtbryan@sgml.u-net.com

Acknowledgements

This book would not have been possible without the help and encouragement of my family and friends, or the excellent team of reviewers and copy editors for SGML and HTML Explained that my editor, Nicky McGirr, found for me. I would particularly like to stress the contributions made by David Penfold, who had the hard work of copy editing my bad grammar, and Peter Flynn, David Barron, Gary F. Hasman, Jacques Deseyne and Niel Bradley in reviewing the draft text and making substantial improvements to it. David Barron has added to his long list of kindnesses to me by supplying the foreword.

Table of Contents

One of the advantages of electronic files is that you can get to the relevant part of the text simply by double clicking on the relevant title. This means that there is no point in listing the page numbers, as there is no need to break the text into pages on a screen.

  1. Foreword
  2. Preface
  1. Background to SGML and HTML
  2. Document Analysis and Information Modelling
  3. The Components of an SGML System
  4. The SGML Declaration
  5. Elements and Attributes
  6. Entity Declaration and Use
  7. Short References
  8. Marked Sections and Processing Instructions
  9. Tag Minimization
  10. Multiple Document Structures (SUBDOC, CONCUR and LINK)
  11. Building a Document Type Definition
  12. Interpreting the HTML DTD
  13. The HyperText Markup Language
  14. Creating HTML forms
  15. The Future for SGML and HTML
  1. Index