[mythtvnz] xmltv-proc-nz problem with MHEG epg data

David Moore dmoo1790 at ihug.co.nz
Fri Jun 29 01:17:50 BST 2012


On 29/06/12 04:46, Stephen Worthington wrote:
> I run xmltv-proc-nz 0.5.8 on my ChoiceTV data that I get from MHEG. At
> the moment, it has been having a problem with some characters in that
> data:
>
> Running xmltv-proc-nz on the Freeview data
> Movies: TMDB module not found.
> Traceback (most recent call last):
>    File "/usr/local/bin/xmltv-proc-nz", line 563, in<module>
>      tree = ElementTree.XML(data)
>    File "<string>", line 106, in XML
> cElementTree.ParseError: not well-formed (invalid token): line 2174,
> column 50
>
> The data causing the problem seems to be an e-acute character, 0xE9,
> in the word soufflés:
>
>    <programme start="20120625140000 +1200" stop="20120625150000 +1200"
> channel="tv1.freeviewnz.tv">
>      <title lang="eng">Celebrity Masterchef</title>
>      <sub-title>New Season</sub-title>
>      <desc>Temperatures rise higher than the soufflés as the determined
> celebrities strive to demonstrate their ability to cook great food to
> the judges.</desc>
>      <category>series</category>
>      <category>Education/Science/Factual</category>
>      <rating system="SKY-NZ">
>        <value>M</value>
>      </rating>
>    </programme>
>
> In order to fix this, I have had to get my epg script to add the
> following line to the top of the output file from mhegepgsnoop.py:
>
> <?xml version="1.0" encoding="cp1252"?>
>
> That specifies the character encoding to be cp1252 which permits
> characters such as 0xE9.  But I think it might be better to have
> mhegepgsnoop.py convert what it gets from MHEG and produce UTF-8
> encoded data, as that is the default encoding for XML files.
>

Odd. The mhegepgsnoop.py code that writes the xml file specifies UTF-8 
encoding:

ET.ElementTree(root_element).write(outfile, encoding="utf-8").

Also I get "xmltvaaa.xml: text/plain; charset=utf-8" when I do "file -i 
xmltvaaa.xml". So maybe xmltv-proc-nz doesn't like UTF-8 extended 
characters? Or maybe I need to set the encoding attribute in the xml 
header? I had various issues with character encoding when writing 
mhegepgsnoop. I think one was myth didn't like the xml file before I 
specified UTF-8 encoding.



More information about the mythtvnz mailing list