[mythtvnz] xmltv-proc-nz problem with MHEG epg data

Stephen Worthington stephen_agent at jsw.gen.nz
Thu Jun 28 17:46:45 BST 2012


I run xmltv-proc-nz 0.5.8 on my ChoiceTV data that I get from MHEG. At
the moment, it has been having a problem with some characters in that
data:

Running xmltv-proc-nz on the Freeview data
Movies: TMDB module not found.
Traceback (most recent call last):
  File "/usr/local/bin/xmltv-proc-nz", line 563, in <module>
    tree = ElementTree.XML(data)
  File "<string>", line 106, in XML
cElementTree.ParseError: not well-formed (invalid token): line 2174,
column 50

The data causing the problem seems to be an e-acute character, 0xE9,
in the word soufflés:

  <programme start="20120625140000 +1200" stop="20120625150000 +1200"
channel="tv1.freeviewnz.tv">
    <title lang="eng">Celebrity Masterchef</title>
    <sub-title>New Season</sub-title>
    <desc>Temperatures rise higher than the soufflés as the determined
celebrities strive to demonstrate their ability to cook great food to
the judges.</desc>
    <category>series</category>
    <category>Education/Science/Factual</category>
    <rating system="SKY-NZ">
      <value>M</value>
    </rating>
  </programme>

In order to fix this, I have had to get my epg script to add the
following line to the top of the output file from mhegepgsnoop.py:

<?xml version="1.0" encoding="cp1252"?>

That specifies the character encoding to be cp1252 which permits
characters such as 0xE9.  But I think it might be better to have
mhegepgsnoop.py convert what it gets from MHEG and produce UTF-8
encoded data, as that is the default encoding for XML files.



More information about the mythtvnz mailing list