Re: xhtml and character entities, test table

by J_A_B(at)t-online.de (Jens Brueckmann)

 Date:  Thu, 16 Oct 2003 11:26:21 +0200
 To:  Anita Roy Dobbs <a(at)studioae.com>
 Cc:  "hwg-techniques(at)hwg.org" <hwg-techniques(at)hwg.org>
 References:  studioae
  todo: View Thread, Original
Hi Anita,

using Opera 7.2 or Mozilla 1.4 on Windows 2000 no character is missing 
 from your table, IE6 SP1 can not display characters 19, 24, 25, 26, 82, 
83.
However, you should keep in mind that apart from possible differences with 
various browsers and operating systems there is another very important 
parameter not to be ignored: the font.
Many fonts are only able to display a very limited part of the Unicode 
characters.
Thus, as you did not specify any font or font family in your markup the 
viewer's browser uses the default font.
As I am using "Arial Unicode MS" which is capable of displaying most of 
the Unicode characters I do not encounter any problem with your test page. 
On the other hand, a person using some weird font might have some gaps or 
strange characters showing up.

Concerning the use of number codes rather than name codes, it should not 
make any difference.
A look at the http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd 
shows you right at the beginning the definition of character entities. 
These are further defined in the corresponding ent-files, e.g. 
xhtml-special.ent for special characters which you can view at 
http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_Special_characters or download 
directly from W3C at
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent .

Characters number 6 and 7 of your table, the right-to-left and 
left-to-right mark are control characters for bidirectional text. You can 
use these with texts written from right to left like arabic and hebrew. 
Another way of defining bidirectionality would be to use <bdo 
dir="rtl">text</bdo> or <bdo dir="ltr">text</bdo> respectively.
More information on the issue of bidirectionality can be found at

- http://www.w3.org/TR/html4/struct/dirlang.html#edef-BDO
- http://www.everything2.com/index.pl?node=Bidirectional%20Category
- http://www.ietf.org/rfc/rfc2070.txt

There are several methods defining the character set of a document. Using 
a meta tag is absolutely ok. Another method would be including a 
processing instruction right at the beginning of your document, like <?xml 
version="1.0" encoding="iso-8859-1"?>. The problem with processing 
instructions like this one is that browsers tend to switch into quirks 
mode encountering these lines, which might not be what you really want. A 
third possibility would be defining the character set in the http-header 
which is sent from the web server to the client, which might be done if 
you have access to the configuration files of your web server.

To sum it up:

- using name codes instead of number codes is ok
- using a meta tag for defining the character set is ok (consider using 
utf8 instead of iso-8859-1)
- specify fonts and font families capable of displaying a wide range of 
characters for your document (hoping that your visitor has these on his 
machine)
- validate your document (there are a few errors in your test page,
  see 
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.studioae.com%2Fcsstest%2Fcharacters.html 
for details)
- if nothing helps use images (problems usually arise with IE6, which most 
people use)

A note on fonts:
If a browser can not display a certain character in a specified font it 
_should_ use some font-matching algorithm as described in 
http://www.w3.org/TR/CSS2/fonts.html#algorithm

I hope you could make sense of my garbled mutterings :)

jens

HWG hwg-techniques mailing list archives, maintained by Webmasters @ IWA