Return to
htmlchek documentation
This is yet another HTML quick reference, containing advice on how to
write correct HTML, as well as practical tips. It is mainly based on version
1.22 of the HTML 2.0 standard (including some HTML3 extensions which are
already implemented by a number of browsers; the extensions are all clearly
labeled as such). Some material is from another quick reference by Tom Fine.
This is not necessarily a good guide for absolute beginners.
The HTML language represents hypertext data, for use as part of the World-wide Web. HTML is one specific language defined using the general SGML meta-language. HTTP is a transport protocol, used to deliver HTML documents (as well as other types of files) over networks.
<tagname attribute=value attribute="Value"> contained stuff </closingtag>
A "tag" is everything between the `<' and `>' characters. The tag name should come directly after the `<' character, with no intervening whitespace. Tag names and tag attributes are case-insensitive, as are the values of certain tag attributes as well. If an attribute value contains whitespace, or any characters other than a-z, A-Z, `.' or `-' it should be quoted. For this reason, most URL's should be quoted (the fact that some implementations may tamper with the alphabetic case of unquoted attribute values means that it is good style to quote all URL's). Some attributes (such as COMPACT) do not need a value.
An "element" is made up of the opening tag, its matching closing tag, and everything that contained between the two (which can include other tags, and also text which is not part of any tag): <X>Stuff In Element</X>. The closing tags for some elements are optional (as noted below), and some tags can not have a corresponding closing tag (namely, <BR>, <HR>, <IMG>, <INPUT>, the non-<TITLE> tags in <HEAD>...</HEAD>, and the SGML pseudo-tags <!DOCTYPE> and <!-- -->).
Details of text formatting in the HTML source (such as the position of linebreaks) are not preserved when the document is displayed, and extra whitespace is ignored.
All other tags besides these, and all text which is not part of a tag, should be contained within a <HEAD>...</HEAD> or <BODY>...</BODY> element, which should be in turn contained within the overall <HTML>...</HTML>.
These high-level elements all imply both a preceding and a following paragraph break (except after the optional </P> tag).
The list item closing tags </LI>, </DT>, and </DD> are optional.
Lists can be nested (i.e. included in an <LI>...</LI> item in a <UL>...</UL> or a <OL>...</OL> list, or inside a <DD>...</DD> item in a <DL>...<DL> list). List items are not supposed to directly contain <H1>-<H6> headings, <HR>, or <ADDRESS> (though <LI> and <DD> elements can contain a <BLOCKQUOTE> or <FORM> which itself includes them).
It is better if text contained within a link element is not something relatively meaningless like <A...>Click Here</A>, but rather something which describes what the link is pointing to: <A...>Chelsea's cat Socks</A>. (Remember that not everybody is using a mouse anyway, so the word "Select" is preferable to "Click".)
Anchors/links CANNOT BE NESTED, directly or indirectly, so that even code like <A...>...<X>...<A...>... </A>...</X>...</A> is forbidden. (In the upcoming HTML3 language, the attribute ID="...", which will be able to be used with most tags, will replace <A NAME="...">, so that almost any element will be able to be the target of a link.)
Be sure to specify meaningful text in the ALT attribute value (for use in non-graphic environments), especially if the image is in a link. If the image is purely decorative, use ALT="" to avoid annoying "[IMAGE]" clutter in Lynx.
Using too many and too large inline bitmaps can be very inconsiderate, especially on your home page and other pages that are linked to from outside, (unless they are publicized as picture galleries). Many people are using 14.4k modems, and it is particularly frustrating when with no advance warning you have to wait for a lot of big .GIF's to load -- before you're even able to decide whether or not there is actually anything of interest on the page. In any case, inline images will often be shown with few colors (only 50 in some versions of Mosaic), whereas external images will be shown with the maximum available number of colors -- so it is best to use a small sample (thumbnail) as a link to the full size image.
It is preferable to use logical styles rather than hard-wired fonts (bold, italic, etc. may not be available in non-graphical environments, anyway). Styles and fonts are NOT guaranteed to be rendered cumulatively (i.e. <B><I>Text</I></B> may look the same as plain <I>Text</I>, and the italic text in <H1>RomanText <I>ItalicText</I></H1> may not be the appropriate size for a H1 heading).
The logical style, font, and link/anchor elements generally can contain only each other (and <IMG> and <BR>), and not lists and high-level tags. The headings <H1>-<H6>, <DT>...</DT> in a <DL>...</DL> list, and <LI>...</LI> in MENU or DIR can also contain only these tags. It is best not to have whitespace after an opening tag of a style, font, or anchor element, or before a closing tag (i.e. <B>Text</B> is preferable to <B> Text </B>); such whitespace produces displeasing visual results on some browsers.
These three characters should be escaped with the above ampersand entities everywhere in a document where they are not intended to be used with their HTML meanings. Other entities (such as "é" etc.) are available to encode the alphabetic characters in positions 192-255 of the ISO 8859-1 Latin 1 character set for European languages. Numeric entities can be used for characters in the range 160-191 with some hope of success (such as © for the copyright symbol, since not all browsers understand ©). Not all browsers understand or even treat   as a space -- a safe alternative is   (but this will not act as non-breaking on most browsers). The range 127-159 is undefined in ISO 8859-1, and should not be used. A double-quote character must be escaped as " or " inside an attribute value. Characters in URL's are best escaped with %-hex-digits (e.g. %26 for "&").
Where protocol is one of http, gopher, ftp, file, telnet, wais, news, mailto, etc. The "#anchor" is optional, and ":port" defaults to 80 if left out.
A fully absolute URL contains a protocol prefix, and a full hostname for external DNS resolution.
A URL can be relative in several ways:
(Uses the protocol, host, and port of the current document.)
URL's which are document-relative, but specify something outside the current directory (i.e. http URL's which contain a `/' character, but do not start with a `/' character, after the optional protocol prefix) can sometimes confuse browsers (especially relative URL's that start with "../" -- in general, ".." will be interpreted in terms of the logical Web file system, rather than the physical file system).
To implement forms (or <ISINDEX> or <IMG ISMAP>) you need special HTTP-server stuff outside your HTML file.
<FORM>...</FORM> is a high-level element, and so should not be contained inside a heading, <ADDRESS>...</ADDRESS>, <PRE>...</PRE>, <P>, <DT>...</DT>, style, font, or anchor element (or <LI>...</LI> in MENU or DIR). <INPUT>, <SELECT>, and <TEXTAREA> should only be contained within a form. Forms cannot be nested.
</quickreference>