Previous chapter Next chapter Table of Contents

© Martin Bryan 1997 from SGML and HTML Explained published by Addison Wesley Longman


Chapter 14
Creating HTML Forms

One area where HTML is particularly useful is in the area of on-line data capture. Many companies have found the HTML <FORM> element so useful that they have converted their main data capture processes to HTML. This chapter explains how this can be achieved using a combination of SGML elements and special processing programs. It is split into the following sections:

14.1 Basic principles

An HTML form consists of set of input fields, selection menus, labels and explanatory text. When users have selected relevant options from menus, and completed all the input fields as instructed within the text associated with the form, the contents of the input fields and details of selected options are packaged up into a compressed format that conforms to the Internet Common Gateway Interface (CGI) specification. The compressed data package is submitted to a program on a WWW server using the HyperText Transfer Protocol (HTTP). This program uses the CGI specification to control the decoding of the input into the format it requires for processing the data.

HTTP provides two mechanisms for delivering data, GET and POST. While most modern HTTP servers can handle most forms processing using the now deprecated GET option some older servers restrict the data strings associated with this option to 255 characters. Where large amounts of data require to be transmitted when a form's contents are submitted to a server (e.g. where files for stroage at the server need to be attached to forms), or where the requested information will require an update to the forms generation database, the preferred POST method should be used instead.

14.2 The <FORM> element

The <FORM> element is defined as follows in the strict variant of Version 4.0 of the HTML DTD released in November 1997:

<!ELEMENT FORM - - (%block;|SCRIPT)+ -(FORM) -- interactive form -->
<!ATTLIST FORM
  %attrs;                          -- %coreattrs, %i18n, %events --
  action      %URL;      #REQUIRED -- server-side form handler --
  method      (GET|POST) GET       -- HTTP method used to submit the form --
  enctype  %ContentType; "application/x-www-form-urlencoded"
  onsubmit    %Script;   #IMPLIED  -- the form was submitted --
  onreset     %Script;   #IMPLIED  -- the form was reset --
  target   %FrameTarget;    #IMPLIED -- render in this frame --
  accept-charset %Charsets; #IMPLIED -- list of supported charsets --
  >

Note particularly the use of the -(FORM) exclusion declaration to preclude the nesting of forms within forms, and the use of parameter entities to identify the type of data that is expected to be found in attributes that have been declared using the CDATA attribute declared value keyword.

While a <FORM> can contain any element defined by the %block parameter entity it will normally consist principally of elements defined as part of the %formctrl parameter entity. This is defined in Version 4.0 as:

<!ENTITY % formctrl "INPUT | SELECT | TEXTAREA | LABEL | BUTTON " >

These elements allow form creators to define:

The attributes associated with the <FORM> element are:

If files are attached to the form using the type=file option of the <INPUT> element the default entries will need to be changed to give a start-tag for the form such as:

<FORM action="http://www.myco.com/cgi/process1" method=POST
      enctype="multipart/form-data" >

14.3 The <INPUT> element

The definition assigned to the <INPUT> element in the strict variant of Version 4.0 of the HTML DTD issued in November 1997 was:

<!ELEMENT INPUT - O EMPTY -- form control -->
<!ATTLIST INPUT
  %attrs;                          -- %coreattrs, %i18n, %events --
  type       %InputType; TEXT      -- what kind of widget is needed --
  name        CDATA      #IMPLIED  -- submit as part of form --
  value       CDATA      #IMPLIED  -- required for radio and checkboxes --
  checked   (checked)    #IMPLIED  -- for radio buttons and check boxes --
  disabled  (disabled)   #IMPLIED  -- control is unavailable in this context --
  readonly  (readonly)   #IMPLIED  -- for text and passwd --
  size        CDATA      #IMPLIED  -- specific to each type of field --
  maxlength   NUMBER     #IMPLIED  -- max chars for text fields --
  src         %URL;      #IMPLIED  -- for fields with images --
  alt         CDATA      #IMPLIED  -- short description --
  usemap      %URL;      #IMPLIED  -- use client-side image map --
  tabindex    NUMBER     #IMPLIED  -- position in tabbing order --
  accesskey  %Character; #IMPLIED  -- accessibility key character --
  onfocus     %Script;   #IMPLIED  -- the element got the focus --
  onblur      %Script;   #IMPLIED  -- the element lost the focus --
  onselect    %Script;   #IMPLIED  -- some text was selected --
  onchange    %Script;   #IMPLIED  -- the element value was changed --
  accept  %ContentTypes; #IMPLIED  -- list of MIME types for file upload --
  %reserved;                       -- reserved for possible future use --
  >

The most important entry in this definition is the %InputType parameter entity used to define the permitted values for the type attribute. This has been defined as:

<!ENTITY % InputType
        "(TEXT | PASSWORD | CHECKBOX |
          RADIO | SUBMIT | RESET |
          FILE | HIDDEN | IMAGE | BUTTON)"  >

The types of input field that can be defined using the <INPUT> element are:

Apart from the generic attributes identified by the %attrs parameter entity, use of the attributes associated with the <INPUT> element is dependent on the value of the type attribute.

14.3.1 Text input

Where the contents of a field can be entered in a single line an <INPUT> element whose type attribute has been defined as text can be used. In this case the following additional attributes can be specified:

If maxlength is greater than size the contents of the text field will be scrolled whenever the cursor is positioned outside of the displayable area.

The following attributes can be used, in addition to the attributes for defining intrinsic events defined in the %events parameter entity, to identify functions defined within <SCRIPT> elements that are to be run when specific actions are performed on text input fields:

The following attributes can be used to control access to the field:

A typical example of a text input field specification is:

<P>Name: <INPUT type=text name=claimant size=20 maxlength=40 accesskey=N></P>

This field would be displayed as:

Name:

14.3.2 Password input

Password input is similar to text input except that each character input is replaced by an asterisk if the value of the type attribute is password. Typically a fixed size, without a maximum length, will be specified for a password field. A typical example would be:

<P>Password: <INPUT type=password name=password size=6></P>

This field would be displayed as:

Password:

14.3.3 Checkboxes

Checkboxes are buttons that are either on or off. The following attributes can be associated with an <INPUT> element when its type attribute has been set to checkbox:

A typical use of checkboxes would be:

<P>Send further information? <INPUT type=checkbox name=SendInfo value=Y checked>

This would be displayed as:

Send further information?

14.3.4 Radio buttons

Radio buttons form sets of linked buttons, only one of which can be selected for each submission of the form. The following attributes can be associated with an <INPUT> element when its type attribute has been set to radio:

A typical use of radio buttons might be:

<P>Orange Juice <INPUT type=radio name=juice value=orange checked>
Grapefruit Juice <INPUT type=radio name=juice value=grapefruit>
Tomato Juice <INPUT type=radio name=juice value=tomato></P>

This would be displayed as:

Orange Juice Grapefruit Juice Tomato Juice

Selecting any one of the three options automatically deselects the other two.

14.3.5 User-defined buttons

Version 4.0 of the HTML DTD introduced a new type of <INPUT> for definition of user-defined buttons that can perform functions based on the contents of <SCRIPT> elements. To use this option you need to set type=button and then add the following attributes:

A typical use of user-defined buttons might be:

<P><INPUT type=button value="Check field contents" onclick=ValidateFields></P>

Browsers that support this new option might display these entries as:

14.3.6 Submit button

The contents of a form are sent to the program identified by its action attribute when a button declared as an <INPUT> element whose value for the type attribute is submit is pressed. The value attribute can be used to label the button. If this value is to be submitted as part of the form contents a name attribute can be added to identify which field the value is to be stored in. An optional onclick attribute can be used to identify a function, declared within a <SCRIPT> element, that is to be performed whenever the button is selected.

A typical use of this option is:

<P>When you have completed the form:
<INPUT type=submit value="Press here to submit form"></P>

Browsers that support this new option might display this as:

When you have completed the form:

14.3.7 Reset button

When a form contains an <INPUT> element with an attribute of type=reset a reset button will be displayed. When this button is pressed the browser will remove any existing entries without submitting them, returning the form to the state it was first displayed in (including any default values for fields). As with the submit button, the value attribute can be used to name the button. An optional onclick attribute can be used to identify a function, declared within a <SCRIPT> element, that is to be performed whenever the button is selected.

A typical use of this option is:

<P><INPUT type=reset value="Press here to clear current entries"></P>

Browsers that support this option might display this as:

14.3.8 Hidden input

Many forms require information about themselves to be submitted as part of the information set sent to the form processor. The form may also need to contain information about the state of the client, or server, from which the form was transmittted. Where users do not need to be made aware of transmitted information, hidden fields can be created in an HTML form by creating an <INPUT> element with a type=hidden attribute. Only two attributes, name and value are used with this element. They identify the name of the field being sent to the server, and the value being assigned to the field.

A typical entry would be:

<INPUT type=hidden name=form-id value=Form-12>

As such fields are not displayed no example can be given here, but a hidden field is included in the sample form shown in Figure 14.1.

14.3.9 File selection

Where users need to identify a file that should be attached to a form a special type of <INPUT> field, with an attribute value of type=file, can be used to associate a Browse button with an input field. The attributes that can be associated with this type of field are:

A typical entry would be:

<INPUT type=file name=file-id size=20 maxlength=40>

This would typically be displayed as:

Pressing the Browse button will display the browser's file selection menu. Selecting a file from this menu will result in the full pathname of the file being placed into the field.When the form is transmitted the file name will be submitted as the value of the field identified by the name attribute.

Note: File selection is only possible when the form is using POST as the value for the method attribute.

14.3.10 Image spot selection

The type=image option for the <INPUT> element can by used to allow users to select a spot on an image. When this option is selected the following attributes apply:

A typical use would be:

<INPUT type=image src=europe.gif name=find-country>

This would typically be displayed as:

Clicking on a point in this map will automatically send the X and Y coordinates of the selected pixel to the program identified by the form's action attribute as the value of the find-country field.

Warning: Clicking on an image in a form has the effect of submitting the current contents of the form.

14.4 The <BUTTON> element

Version 4.0 of the HTML DTD has introduced a new <BUTTON> element that will eventually replace the type=button, type=reset and type=submit variants of the <INPUT> field. The definition of this new element is:

<!ELEMENT BUTTON - -
     (%flow;)* -(A|%formctrl;|FORM|FIELDSET)
     -- push button -->
<!ATTLIST BUTTON
  %attrs;                          -- %coreattrs, %i18n, %events --
  name        CDATA      #IMPLIED  -- for scripting/forms as submit button --
  value       CDATA      #IMPLIED  -- gets passed to server when submitted --
  type (button|submit|reset) submit -- for use as form submit/reset button --
  disabled    (disabled) #IMPLIED  -- control is unavailable in this context --
  tabindex    NUMBER     #IMPLIED  -- position in tabbing order --
  accesskey  %Character; #IMPLIED  -- accessibility key character --
  onfocus     %Script;   #IMPLIED  -- the element got the focus --
  onblur      %Script;   #IMPLIED  -- the element lost the focus --
  %reserved;                       -- reserved for possible future use --
  >

Whereas only unmarked-up text entries expressed as CDATA entities can be used to label buttons when using the <INPUT> element, buttons defined using the new <BUTTON> element can have almost any type of content, including multiple paragraphs of text, emphasized and other types of formatted in-line text, and images. The associated exclusion exceptions, however, forbid the embedding of form-related fields, and links to other documents or parts of the current document.

Typically a button defined using the <BUTTON> element will be displayed as a 3-D object whose form of display will change when it is pressed, whereas a button defined using the <INPUT> element will be displayed as a flat, labelled, screened area whose appearance will not change if it is selected.

The following attributes can be associated with a <BUTTON> element:

14.5 The <SELECT> element

Where users need to be asked to pick one or more options from a menu the <SELECT> element can be used in place of a set of <INPUT> buttons. This element is defined in Version 4.0 of the HTML DTD as:

<!ELEMENT SELECT - - (OPTGROUP|OPTION)+ -- option selector -->
<!ATTLIST SELECT
  %attrs;                          -- %coreattrs, %i18n, %events --
  name        CDATA      #IMPLIED  -- field name --
  size        NUMBER     #IMPLIED  -- rows visible --
  multiple    (multiple) #IMPLIED  -- default is single selection --
  disabled    (disabled) #IMPLIED  -- control is unavailable in this context --
  tabindex    NUMBER     #IMPLIED  -- position in tabbing order --
  onfocus     %Script;   #IMPLIED  -- the element got the focus --
  onblur      %Script;   #IMPLIED  -- the element lost the focus --
  onchange    %Script;   #IMPLIED  -- the element value was changed --
  %reserved;                       -- reserved for possible future use --
  >

Each <SELECT> element forms a container for a list of options and option groups that will be selected to form the contents of a particular field in the submitted message. The attributes associated with the <SELECT> element are:

The model for the <OPTGROUP> element introduced into Version 4.0 of HTML to allow nested sets of options is:

<!ELEMENT OPTGROUP - - (OPTGROUP|OPTION)+ -- option group -->
<!ATTLIST OPTGROUP
  %attrs;                          -- %coreattrs, %i18n, %events --
  disabled    (disabled) #IMPLIED  -- control is unavailable in this context --
  label       %Text;     #REQUIRED -- for use in hierarchical menus --
  >

When an option group is specified it must be given a label that can be displayed in the selection list at the point a submenu of the nested options is to be offered to users. Optionally the disabled attribute can be use to prohibit access to the embedded options. The intrinsic events and other generic attributes defined in the %attrs parameter entity may also be used to qualify option group specifications.

The model for the <OPTION> element used within the <SELECT> element is:

<!ELEMENT OPTION - O (#PCDATA) -- selectable choice -->
<!ATTLIST OPTION
  %attrs;                          -- %coreattrs, %i18n, %events --
  selected    (selected) #IMPLIED
  disabled    (disabled) #IMPLIED  -- control is unavailable in this context --
  label       %Text;     #IMPLIED  -- for use in hierarchical menus --
  value       CDATA      #IMPLIED  -- defaults to element content --
  >

Each option element should contain a text string that forms the entry to be displayed in a single line of the menu. Each <OPTION> element can have the following optional attributes associated with it:

The following coding can be used to set up a menu from which users can select the appropriate option:

<SELECT name=juice size=3 title=Juices onChange="reorder()">
<OPTION value=Orange selected>Fresh Orange Juice</OPTION>
<OPTION value=Grapefruit>Grapefruit Juice</OPTION>
<OPTION value=Pineapple>Pineapple Juice</OPTION>
<OPTION value=Tomato disabled>Tomato Juice</OPTION></SELECT>

This menu would be displayed as:

14.6 The <TEXTAREA> element

Where more than one line needs to be allowed for input the <TEXTAREA> element must be used. The attributes associated with this element are:

A typical text area definition might be:

<P>Address:<BR>
<TEXTAREA name=address rows=5 cols=25></P>

This would appear on the screen as:

Address:

14.7 The <FIELDSET> element

Version 4.0 of the HTML DTD introduced the concept of a captioned set of fields (<FIELDSET>), which can be used within a form, or independently of a form, to group related text, controls and labels. Grouping controls makes it easier for users to understand their purpose and facilitates navigation for visual user agents using the Tab key as well as speech navigation for speech-oriented userts. This element is defined as:

<!ELEMENT FIELDSET - - (#PCDATA, LEGEND?, (%flow;)*) -- form control group -->
<!ATTLIST FIELDSET
  %attrs;                          -- %coreattrs, %i18n, %events   >

The #PCDATA that precedes the <LEGEND> element in the model is only there to avoid possible problems due to white spaces preceding the caption. (The %flow; parameter entity inlcudes #PCDATA in its model, making this model a form of mixed content.) Note that this model does not exclude the use of nested <FIELDSET> or <FORM> elements within the block content. The chief difference between a <FIELDSET> used outside of a form and one entered within a <FORM> element is that the <FIELDSET> element has no associated action, method or enctype attributes. This is because such field sets are designed for client-side processing, using functions associated with the individual form fields using the onclick and other intrinsic event generic attributes.

The <LEGEND> element is defined as:

<!ELEMENT LEGEND - - (%inline;)* -- fieldset legend -->
<!ATTLIST LEGEND
  %attrs;                          -- %coreattrs, %i18n, %events --
  accesskey   CDATA      #IMPLIED  -- accessibility key character -->

The accesskey attribute can be used to indicate which key can be used as a shortcut to the fieldset. For example:

<FIELDSET>
<LEGEND accesskey=A><B>A</B>irports used regularly</LEGEND>
<INPUT type=checkbox name=airport value=Heathrow> Heathrow
<INPUT type=checkbox name=airport value=Frankfurt> Frankfurt
<INPUT type=checkbox name=airport value=Schipol> Schipol
</FIELDSET>

14.8 The <LABEL> element

Another new element introduced by Version 4.0 of the HTML DTD was the <LABEL> element. The element is defined as:

<!-- Each label must not contain more than ONE field -->
<!ELEMENT LABEL - - (%inline;)* -(LABEL) -- form field label text -->
<!ATTLIST LABEL
  %attrs;                          -- %coreattrs, %i18n, %events --
  for         IDREF      #IMPLIED  -- matches field ID value --
  accesskey  %Character; #IMPLIED  -- accessibility key character --
  onfocus     %Script;   #IMPLIED  -- the element got the focus --
  onblur      %Script;   #IMPLIED  -- the element lost the focus --
  >

Labels are used to indicate which text containing elements within the form are associated with which data entry elements. The for attribute identifies a unique identifier (id) assigned to one of the data entry fields in the current form/fieldset. The accesskey attribute can be used define which key can be pressed as a shortcut to get to the appropriately labelled field. The onfocus and onblur attributes can be used to identify script functions that are to be run whenever users tab to or from the field.

Input fields can either be placed inside the label element or in a separate area of the form, as Figure 14.1 shows.

<FORM method=post action="http://special.org/cgi-bin/example>
<CAPTION>Sample Form</CAPTION>
<INPUT type=hidden name="form-id" value="form-123">
<TABLE>
<TR><TD><LABEL for=surname accesskey=S><B>S</B>urname:</LABEL>
<TD><INPUT id=surname type=text name=Surname maxlength=30>
<TD><LABEL for=forename accesskey=F><B>F</B>orenames:</LABEL>
<TD><INPUT id=forename type=text name=Forenames maxlength=40>
<TR><TD><LABEL for=user accesskey=U><B>U</B>ser Id:</LABEL>
<TD><INPUT id=user type=password name=UserID>
</TABLE>
<P>Type of Problem:<BR>
<LABEL for=crash accesskey=C>Software <B>C</B>rash 
<INPUT id=crash type=radio name=Type value=Crash></LABEL>
<LABEL for=malfunc accesskey=M>Software <B>M</B>alfunction
<INPUT id=malfunc type=radio name=Type value=Malfunction></LABEL>
<LABEL for=bad-doc accesskey=D>Inaccurate <B>D</B>ocumentation
<INPUT id=bad-doc type=radio name=Type value=Documentation></LABEL></P>
<P>Statement of <B>P</B>roblem:<BR>
<TEXTAREA name=problem rows=5 columns=60>
</TEXTAREA></P>
<TABLE>
<TR><TD><LABEL for=cause accesskey=A>Possible C<B>a</B>use:</LABEL><BR>
<SELECT id=cause name=Cause size=3>
<OPTION value=OpError>Operator Error
<OPTION value=Hardware>Hardware Failure
<OPTION value=OpMemory>Operator Memory
<OPTION value=Memory>Computer Memory
</SELECT>
<TD><LABEL for=file accesskey=F><B>F</B>ile displaying problem:</LABEL>
<INPUT id=file type=file name=File>
</TABLE>
<INPUT type=submit value="Submit Report>
<BUTTON type=reset>Reset Form</BUTTON>

Figure 14.1 Example of HTML Form

14.9 Form submission

Figure 14.2 shows a typical form ready for submission to the server.

Example of Completed Form

Figure 14.2 Completed Form

When this form is submitted to the server it will generate the following data string:

form-id=Form-123&Surname=Bryan&Forenames=Martin&UserID=sgml66&Type=Malfunction
&Problem=The+program+failed+to+copy+the+selected+area+when+the+%0D%0ACopy+butt
on+was+pressed+and+then+Paste+was+used.%0D%0A%28I%27ve+been+successful+in+copy
ing+smaller+pieces+of+text%2C%0D%0Ait+was+trying+to+copy+the+whole+of+section+
5.3+that+failed.%29&Cause=Memory&File=docs%2Fmtb%2Ffile1.doc

Note that this string has no spaces, and forms one continuous sequence of characters with no line breaks. The boundary of each field of information is identified by the presence of an & character. Each space in the form input has been replaced by a + sign. Each non-alphanumeric character, other than hyphens and periods, has been replaced by a % sign followed by a two digit hexadecimal value of the character. Each carriage return/line feed sequence in the text area has, therefore, been replaced by %0D%0A.

Each field definition consists of two parts, separated by the = sign. To the left of the = is the name of the field: to the right is the value to be assigned to the field.

Note that the first field definition, form-id=Form-123 does not appear on the displayed form. This entry represents the contents of the name and value attributes of a hidden input field. Note also that the contents of the User ID field are transmitted in clear text form, though the contents of this password field are not displayed on the screen.

For the Type and Cause fields the field values have been take from the contents of the value attributes of the selected radio button or option, rather than from the displayed text.

14.10 Form processing

The submitted contents of a form are normally sent to a program conforming to the Common Gateway Interface specification. WWW servers often contain software, such as UNCGI, that is specifically designed to accept the continuous strings produced as a result of form processing and to break it into a set of discrete fields, reconverting the + signs to spaces and the hexadecimal entries to the equivalent characters.

Normally a CGI program will be associated with facilities for validating specific entries and for passing the validated entries to appropriate programs. If an invalid field is found the program may be able to generate an HTML form that requests that the user provides a corrected entry. Such a form could contain supplementary information on completing the faulty field that was not displayed on the original form.

When a form has been successfully processed the CGI program will typically respond with a new page of information for the user. Where the POST method has been used, however, the only notification that the form has been successfully submitted may be the presence of a Document Done message in the window's message area.

References

Internet RFC 1945, Hypertext Transfer Protocol - HTTP/1.0 (http://ds.internic.net/rfc/rfc1945.txt)

Erwin, M., Gaither, M., Hassinger, S. and Tittel, E. (1995) Foundations of World Wide Web Programming with HTML & CGI Foster City:IDG Books Worldwide

The CGI Specification (http://hoohoo.ncsa.uiuc.edu/cgi/interface.html)