[Templates] UTF8 support and issues

Mark Proctor m.proctor@bigfoot.com
Tue, 19 Nov 2002 14:16:35 -0000


I've narrowed it down to what causes it, although I don't know why:

The following gives the correct output:
    eval { BLOCK: {
    print "itemText3:", $stash->get(['validFields', 0, 'itemText', 0]),
"<br>\n";
$output .=3D  "<HTML>\n<HEAD>\n<TITLE>ISTORE</TITLE>\n<META
HTTP-EQUIV=3D\"Content-Type\" CONTENT=3D\"text/html; =
CHARSET=3DUTF-8\">\n";
$outout .=3D  "<HEAD><BODY>\n";
$output .=3D  "itemText2:";
$output .=3D  $stash->get(['validFields', 0, 'itemText', 0]);
$output .=3D  "<br>\n</BODY>\n</HTML>";
    } };

While the following mangles itemText:
    eval { BLOCK: {
    print "itemText3:", $stash->get(['validFields', 0, 'itemText', 0]),
"<br>\n";
$output .=3D  "<HTML>\n<HEAD>\n<TITLE>ISTORE</TITLE>\n<META
HTTP-EQUIV=3D\"Content-Type\" CONTENT=3D\"text/html; =
CHARSET=3DUTF-8\">\n";
$outout .=3D  "<HEAD><BODY>\n";
$output .=3D  $stash->get(['session', 0, 'base_url', 0]);
$output .=3D  "itemText2:";
$output .=3D  $stash->get(['validFields', 0, 'itemText', 0]);
$output .=3D  "<br>\n</BODY>\n</HTML>";
    } };

The only difference is the session.base_url variable, this is retrieved
from a different location. itemText is suplied from he cgi form, while
base_url is read in from an xml configuration file using XML::Simple.

Does any of this help, I'm on ICQ 8106598 if anyone wants to discuss
quickly with me.

Regards

Mark

-----Original Message-----
From: Andy Wardley [mailto:abw@ourshack.com] On Behalf Of Andy Wardley
Sent: 19 November 2002 13:05
To: Mark Proctor
Cc: templates@template-toolkit.org; Leslie Fuller (lefuller)
Subject: Re: [Templates] UTF8 support and issues


Mark Proctor wrote:
> Is there something I need to do to tell template toolkit to use utf8?
> Will upgrading to the latest version fix this? We have:

The simple answer is yes, upgrading to Perl 5.8 does seem to solve the=20
problem.  You should probably upgrade TT to 2.08 as well, but I don't
think that's part of the problem or solution.

There's no code in TT that I'm aware of that does something "wrong" to=20
cause UTF8 support to break.  To the best of my knowledge it's something

in Perl pre 5.8 which doesn't properly handle UTF8 that causes the
problem.
Unfortunately, I don't know what particular Perl feature it is that
we're
using that breaks UTF8 support. =20

I haven't been able to reliably reproduce the problem.  For example,=20
this test works fine for me under 5.6.1 with TT 2.08c.

  use strict;
  use Template;

  my $leon =3D 'L=E9on Brocard';
  print "$leon is my friend\n";

  my $tt2 =3D Template->new();
  $tt2->process(\*DATA, { leon =3D> $leon }) || die $tt2->error();

  __DATA__
  [% leon %] is my friend
  [% INCLUDE orange person=3Dleon -%]
  [% BLOCK orange -%]
  [% person %]'s favourite colour is orange
  [% END %]

Output:

  L=E9on Brocard is my friend
  L=E9on Brocard is my friend
  L=E9on Brocard's favourite colour is orange

(assuming those accented characters manage to survive the transition
through=20
email... but trust me, it looks fine when I run it here)

> I've been going through the source code trying to understand the flow
of
> information, placing print statements everwhere. I have identified in
> the ttc file where you can see the same command, one with correct
> output, one mangled(as part of he output string)=20
[...]
>     eval { BLOCK: {
>     print "text3", $stash->get(['validFields', 0, 'itemText', 0]),

Can you post the source template, or as short a fragment of it that
replicates the problem.  Also, the complete ttc file would be useful.

Cheers
A