[Templates] UTF8 support and issues
Barry Caplan
bcaplan@i18n.com
Tue, 19 Nov 2002 16:00:54 -0800
>I don't think 'L=E9on' is an example of UTF8 encoding. but rather of
>ISO-8859-1 encoding.
>
>In UTF8 an é would be represented with a double-byte sequence: =C3=
=A9
>or in case of email mangling \xc3 \xa9.
UTF-8 and iso-8859-1 are the same for the first 256 code points. Having it=
otherwise would slow or halt adoption of UTF-8 because of the amount of=
legacy data in 8859-1 and programs that are built to use it.
When using Unicode, European characters with accents can be represented in=
more than one way: pre-composed and decomposed. 88-59 contains the=
precomposed versions. See www.unicode.org for more details.
In my unicode 3.0 book, \xc3 is \LATIN CAPITAL LETTER A WITH TILDE and \xa9=
is COPYRIGHT SIGN. LATIN SMALL LETTER E WITH ACUTE is \xe9.
Barry Caplan
www.i18n.com