[mythtvnz] xmltv-proc-nz problem with MHEG epg data
David Moore
dmoo1790 at ihug.co.nz
Fri Jun 29 01:17:50 BST 2012
On 29/06/12 04:46, Stephen Worthington wrote:
> I run xmltv-proc-nz 0.5.8 on my ChoiceTV data that I get from MHEG. At
> the moment, it has been having a problem with some characters in that
> data:
>
> Running xmltv-proc-nz on the Freeview data
> Movies: TMDB module not found.
> Traceback (most recent call last):
> File "/usr/local/bin/xmltv-proc-nz", line 563, in<module>
> tree = ElementTree.XML(data)
> File "<string>", line 106, in XML
> cElementTree.ParseError: not well-formed (invalid token): line 2174,
> column 50
>
> The data causing the problem seems to be an e-acute character, 0xE9,
> in the word soufflés:
>
> <programme start="20120625140000 +1200" stop="20120625150000 +1200"
> channel="tv1.freeviewnz.tv">
> <title lang="eng">Celebrity Masterchef</title>
> <sub-title>New Season</sub-title>
> <desc>Temperatures rise higher than the soufflés as the determined
> celebrities strive to demonstrate their ability to cook great food to
> the judges.</desc>
> <category>series</category>
> <category>Education/Science/Factual</category>
> <rating system="SKY-NZ">
> <value>M</value>
> </rating>
> </programme>
>
> In order to fix this, I have had to get my epg script to add the
> following line to the top of the output file from mhegepgsnoop.py:
>
> <?xml version="1.0" encoding="cp1252"?>
>
> That specifies the character encoding to be cp1252 which permits
> characters such as 0xE9. But I think it might be better to have
> mhegepgsnoop.py convert what it gets from MHEG and produce UTF-8
> encoded data, as that is the default encoding for XML files.
>
Odd. The mhegepgsnoop.py code that writes the xml file specifies UTF-8
encoding:
ET.ElementTree(root_element).write(outfile, encoding="utf-8").
Also I get "xmltvaaa.xml: text/plain; charset=utf-8" when I do "file -i
xmltvaaa.xml". So maybe xmltv-proc-nz doesn't like UTF-8 extended
characters? Or maybe I need to set the encoding attribute in the xml
header? I had various issues with character encoding when writing
mhegepgsnoop. I think one was myth didn't like the xml file before I
specified UTF-8 encoding.
More information about the mythtvnz
mailing list