[Templates] UTF8 support and issues

Mark Proctor (mproctor) mproctor@cisco.com
Wed, 20 Nov 2002 19:05:17 -0000


This is a multi-part message in MIME format.

------_=_NextPart_001_01C290C7.C3C6EE3B
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Success - I found this

=20
<http://groups.google.com/groups?hl=3Den&lr=3D&ie=3DUTF-8&oe=3DUTF-8&thre=
adm=3D200
20429145407.00874.00005678%40mb-me.aol.com&rnum=3D1&prev=3D/groups%3Fq%3D=
per
l%2Bpack%2Bcgi%2Butf%2BOR%2Butf8%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DU
TF-8%26as_qdr%3Dall%26selm%3D20020429145407.00874.00005678%2540mb-me.aol
.com%26rnum%3D1>
http://groups.google.com/groups?hl=3Den&lr=3D&ie=3DUTF-8&oe=3DUTF-8&threa=
dm=3D2002
0429145407.00874.00005678%40mb-me.aol.com&rnum=3D1&prev=3D/groups%3Fq%3Dp=
erl
%2Bpack%2Bcgi%2Butf%2BOR%2Butf8%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUT
F-8%26as_qdr%3Dall%26selm%3D20020429145407.00874.00005678%2540mb-me.aol.
com%26rnum%3D1

This line can take a UTF8 input and tag it as UTF8

$text =3D pack('U*', unpack('U*', $q->param('text')));

Which actually is what Peter Guzis said - sorry for not understanding
this the first time peter.

Is this the only way to tag a string that has come in from CGI as UTF8?
I will also pose this question on perl-unicode.=20

Not sure how this fits in with template toolkit - other than insisting
that if people are working with utf8 they need to do this with their
variables. I don't think you want to do this as default as I expect
there is a small penalty.

Thanks=20

Mark

-----Original Message-----

From: Mark Proctor [ <mailto:m.proctor@bigfoot.com>
mailto:m.proctor@bigfoot.com]=20

Sent: 20 November 2002 18:47

To: 'Barry Caplan'; 'Andreas J. Koenig'

Cc: perl-unicode@perl.org

Subject: RE: CGI and UTF

=20

Unfortunetly I have asked the cisco admins if we can have perl5.8 and

they said no way.

I have tried doing stuff like this:

$text =3D $q->param('text');

if ($q->param('text')) {

print $text . $xml->{message};

} else {

print "\x{00F3}" . $xml->{message};

}

And it works and displays fine. I display this in the textarea, so that

I can resubmit it, it comes back mangled still :(

Mark

-----Original Message-----

From: Barry Caplan [ <mailto:bcaplan@i18n.com> mailto:bcaplan@i18n.com]=20

Sent: 20 November 2002 18:42

To: Mark Proctor; 'Andreas J. Koenig'

Cc: perl-unicode@perl.org

Subject: RE: CGI and UTF

=20

Mark,

I think 5.8 has a encode module with a normalize function. CPAN probably

has something similar. The perl docs for those modules is probably a

good place to start to understand unicode normalization. unicode.org is

the definitive source but could be pretty pedantic if this is your first

exposure.

Barry Caplan

 <outbind://60/www.i18n.com> www.i18n.com

At 05:38 PM 11/20/2002 +0000, Mark Proctor wrote:

>I have checked with the sysadmins at cisco and they said "no way" :(

>So I have to get this working. Someone has said that I need to

>"normalise" the params from cgi - but I have no idea what that means.

>

>Mark

=20

=20


------_=_NextPart_001_01C290C7.C3C6EE3B
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<TITLE>Message</TITLE>

<META content=3D"MSHTML 6.00.2719.2200" name=3DGENERATOR></HEAD>
<BODY>
<DIV><FONT face=3DArial size=3D2><FONT size=3D2>
<P>Success - I found this</P>
<P></FONT><A=20
href=3D"http://groups.google.com/groups?hl=3Den&amp;lr=3D&amp;ie=3DUTF-8&=
amp;oe=3DUTF-8&amp;threadm=3D20020429145407.00874.00005678%40mb-me.aol.co=
m&amp;rnum=3D1&amp;prev=3D/groups%3Fq%3Dperl%2Bpack%2Bcgi%2Butf%2BOR%2But=
f8%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26as_qdr%3Dall%26selm%3D20=
020429145407.00874.00005678%2540mb-me.aol.com%26rnum%3D1"><U><FONT=20
color=3D#0000ff=20
size=3D2>http://groups.google.com/groups?hl=3Den&amp;lr=3D&amp;ie=3DUTF-8=
&amp;oe=3DUTF-8&amp;threadm=3D20020429145407.00874.00005678%40mb-me.aol.c=
om&amp;rnum=3D1&amp;prev=3D/groups%3Fq%3Dperl%2Bpack%2Bcgi%2Butf%2BOR%2Bu=
tf8%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26as_qdr%3Dall%26selm%3D2=
0020429145407.00874.00005678%2540mb-me.aol.com%26rnum%3D1</U></FONT></A><=
/P><FONT=20
size=3D2>
<P>This line can take a UTF8 input and tag it as UTF8</P>
<P>$text =3D pack('U*', unpack('U*', $q-&gt;param('text')));</P>
<P>Which actually is what Peter Guzis said - sorry for not understanding =
this=20
the first time peter.</P>
<P>Is this the only way to tag a string that has come in from CGI as =
UTF8? I=20
will also pose this question on perl-unicode. </P>
<P>Not sure how this fits in with template toolkit - other than =
insisting that=20
if people are working with utf8 they need to do this with their =
variables. I=20
don't think you want to do this as default as I expect there is a small=20
penalty.</P>
<P>Thanks </P>
<P>Mark</P>
<P>-----Original Message-----</P>
<P>From: Mark Proctor [</FONT><A =
href=3D"mailto:m.proctor@bigfoot.com"><U><FONT=20
color=3D#0000ff =
size=3D2>mailto:m.proctor@bigfoot.com</U></FONT></A><FONT size=3D2>]=20
</P>
<P>Sent: 20 November 2002 18:47</P>
<P>To: 'Barry Caplan'; 'Andreas J. Koenig'</P>
<P>Cc: perl-unicode@perl.org</P>
<P>Subject: RE: CGI and UTF</P>
<P>&nbsp;</P>
<P>Unfortunetly I have asked the cisco admins if we can have perl5.8 =
and</P>
<P>they said no way.</P>
<P>I have tried doing stuff like this:</P>
<P>$text =3D $q-&gt;param('text');</P>
<P>if ($q-&gt;param('text')) {</P>
<P>print $text . $xml-&gt;{message};</P>
<P>} else {</P>
<P>print "\x{00F3}" . $xml-&gt;{message};</P>
<P>}</P>
<P>And it works and displays fine. I display this in the textarea, so =
that</P>
<P>I can resubmit it, it comes back mangled still :(</P>
<P>Mark</P>
<P>-----Original Message-----</P>
<P>From: Barry Caplan [</FONT><A =
href=3D"mailto:bcaplan@i18n.com"><U><FONT=20
color=3D#0000ff size=3D2>mailto:bcaplan@i18n.com</U></FONT></A><FONT =
size=3D2>] </P>
<P>Sent: 20 November 2002 18:42</P>
<P>To: Mark Proctor; 'Andreas J. Koenig'</P>
<P>Cc: perl-unicode@perl.org</P>
<P>Subject: RE: CGI and UTF</P>
<P>&nbsp;</P>
<P>Mark,</P>
<P>I think 5.8 has a encode module with a normalize function. CPAN =
probably</P>
<P>has something similar. The perl docs for those modules is probably =
a</P>
<P>good place to start to understand unicode normalization. unicode.org =
is</P>
<P>the definitive source but could be pretty pedantic if this is your =
first</P>
<P>exposure.</P>
<P>Barry Caplan</P>
<P></FONT><A href=3D"outbind://60/www.i18n.com"><U><FONT color=3D#0000ff =

size=3D2>www.i18n.com</U></FONT></A></P><FONT size=3D2>
<P>At 05:38 PM 11/20/2002 +0000, Mark Proctor wrote:</P>
<P>&gt;I have checked with the sysadmins at cisco and they said "no way" =
:(</P>
<P>&gt;So I have to get this working. Someone has said that I need =
to</P>
<P>&gt;"normalise" the params from cgi - but I have no idea what that =
means.</P>
<P>&gt;</P>
<P>&gt;Mark</P>
<P>&nbsp;</P>
<P>&nbsp;</P></FONT></FONT></DIV></BODY></HTML>

------_=_NextPart_001_01C290C7.C3C6EE3B--