[Templates] UTF8 support and issues

Mark Proctor m.proctor@bigfoot.com
Wed, 20 Nov 2002 14:54:31 -0000


This is a multi-part message in MIME format.

------=_NextPart_000_0000_01C290A4.BB962210
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I have forwarded my example to the perl-unicode list - I will post any
useful answers back to this list, thats if I get any responses other
than RTFM.
=20
Mark

-----Original Message-----
From: templates-admin@template-toolkit.org
[mailto:templates-admin@template-toolkit.org] On Behalf Of Mark Proctor
Sent: 20 November 2002 12:31
To: templates@template-toolkit.org
Subject: Re: [Templates] UTF8 support and issues


I have managed to knock up a self contained example which I have
attached, an example string is Descripci=F3n - although you will need to
have XML::Simple installed.=20
=20
The example takes an input string and then prints it twice - one with
concatenation another just displaying the inputted string. The mangling
occurs when you concatenate an XML string with a CGI string.
=20
I'm not sure why this happens but here is a first attempt at a possible
theory. All XML parsing is done in UTF8, but perl has no idea of
encodings for incomding CGI streams and assumes them to be iso-88591
(latin1) - I read this somewhere don't know if its correct. String
operations upgrade none UTF8 strings to UTF8, so perl tries to convert
the CGI string from iso-88591 to UTF8 thus mangling it as its already
UTF8.
=20
Ivan - thank your for your example example, I think it shows the same
issue as mine. I'm not sure how your fix would help with mine as the
concatonation happens at compiled template stage - and we would have to
change template toolkit to work with sprintf which I expect is not
desirable. Is there some way to tag an incoming value as UTF8 so that it
doesn't get mangled when it is upgraded during the concatenation?
=20
Barry - I'm still trying to digest what you said, I'm about to start
reading through the unicode site you linked too. How does this issue
relate to the two example from Ivan and myself?
=20
Thanks
=20
Mark


------=_NextPart_000_0000_01C290A4.BB962210
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Message</TITLE>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2719.2200" name=3DGENERATOR></HEAD>
<BODY>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN =
class=3D055505214-20112002>I have=20
forwarded my example to the perl-unicode list - I will post any useful =
answers=20
back to this list, thats if I get any responses other than=20
RTFM.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D055505214-20112002></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2><SPAN=20
class=3D055505214-20112002>Mark</SPAN></FONT></DIV>
<BLOCKQUOTE dir=3Dltr style=3D"MARGIN-RIGHT: 0px">
  <DIV></DIV>
  <DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr =
align=3Dleft><FONT=20
  face=3DTahoma size=3D2>-----Original Message-----<BR><B>From:</B>=20
  templates-admin@template-toolkit.org=20
  [mailto:templates-admin@template-toolkit.org] <B>On Behalf Of </B>Mark =

  Proctor<BR><B>Sent:</B> 20 November 2002 12:31<BR><B>To:</B>=20
  templates@template-toolkit.org<BR><B>Subject:</B> Re: [Templates] UTF8 =
support=20
  and issues<BR><BR></FONT></DIV>
  <DIV><SPAN class=3D264152611-20112002><FONT face=3DArial size=3D2>I =
have managed to=20
  knock up a self contained example&nbsp;</FONT><SPAN=20
  class=3D234324611-20112002><FONT face=3DArial size=3D2>which I have =
attached,=20
  </FONT><SPAN class=3D264152611-20112002><FONT face=3DArial><FONT =
size=3D2><SPAN=20
  class=3D234324611-20112002>a</SPAN>n example string&nbsp;is =
Descripci=F3n -=20
  </FONT></FONT></SPAN></SPAN><FONT face=3DArial size=3D2>although you =
will need to=20
  have XML::Simple installed. </FONT></SPAN></DIV>
  <DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN=20
  class=3D264152611-20112002></SPAN></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN class=3D264152611-20112002>The =
example takes=20
  an input string and then prints it twice - one with concatenation =
another just=20
  displaying the inputted string. The mangling occurs when you =
concatenate an=20
  XML string with a CGI string.</SPAN></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN=20
  class=3D264152611-20112002></SPAN></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN class=3D264152611-20112002>I'm =
not sure why=20
  this happens but here is a first attempt at a possible theory. All XML =
parsing=20
  is done in UTF8, but perl has no idea of encodings for incomding CGI =
streams=20
  and assumes them to be iso-88591 (latin1) -&nbsp;I read this somewhere =
don't=20
  know if its correct. String operations upgrade none UTF8 strings to =
UTF8, so=20
  perl tries to convert the CGI string from iso-88591 to UTF8 thus =
mangling it=20
  as its already UTF8.</SPAN></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN=20
  class=3D264152611-20112002></SPAN></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN =
class=3D264152611-20112002><SPAN=20
  class=3D234324611-20112002>Ivan - thank your for your example example, =
I think=20
  it shows the same issue as mine. I'm not sure how your fix would help =
with=20
  mine as the concatonation happens at compiled template stage - and we =
would=20
  have to change template toolkit to work with sprintf which I expect is =
not=20
  desirable. Is there some way to tag an incoming value as UTF8 so that =
it=20
  doesn't get mangled when it is upgraded during the=20
  concatenation?</SPAN></SPAN></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN =
class=3D264152611-20112002><SPAN=20
  class=3D234324611-20112002></SPAN></SPAN></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN =
class=3D264152611-20112002><SPAN=20
  class=3D234324611-20112002>Barry - I'm still trying to digest what you =
said, I'm=20
  about to start reading through the unicode site you linked too. How =
does this=20
  issue relate to the two example from Ivan and=20
  myself?</SPAN></SPAN></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN =
class=3D264152611-20112002><SPAN=20
  class=3D234324611-20112002></SPAN></SPAN></FONT><FONT face=3DArial =
size=3D2><SPAN=20
  class=3D264152611-20112002><SPAN=20
  class=3D234324611-20112002></SPAN></SPAN></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN =
class=3D264152611-20112002><SPAN=20
  class=3D234324611-20112002>Thanks</SPAN></SPAN></FONT></DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN =
class=3D264152611-20112002><SPAN=20
  class=3D234324611-20112002></SPAN></SPAN></FONT>&nbsp;</DIV>
  <DIV><FONT face=3DArial size=3D2><SPAN=20
  =
class=3D264152611-20112002>Mark</SPAN></FONT></DIV></DIV></BLOCKQUOTE></B=
ODY></HTML>

------=_NextPart_000_0000_01C290A4.BB962210--