[Templates] UTF8 support and issues

Barry Caplan bcaplan@i18n.com
Tue, 19 Nov 2002 16:00:54 -0800


>I don't think 'L=E9on' is an example of UTF8 encoding. but rather of
>ISO-8859-1 encoding.
>
>In UTF8 an é would be represented with a double-byte sequence: =C3=
=A9
>or in case of email mangling \xc3 \xa9.

UTF-8 and iso-8859-1 are the same for the first 256 code points. Having it=
 otherwise would slow or halt adoption of UTF-8 because of the amount of=
 legacy data in 8859-1 and programs that are built to use it.

When using Unicode, European characters with accents can be represented in=
 more than one way: pre-composed and decomposed. 88-59 contains the=
 precomposed versions. See www.unicode.org for more details.

In my  unicode 3.0 book, \xc3 is \LATIN CAPITAL LETTER A WITH TILDE and \xa9=
 is COPYRIGHT SIGN.  LATIN SMALL LETTER E WITH ACUTE is \xe9.

Barry Caplan
www.i18n.com