[mythtvnz] XMLTV headers changed
Robin Gilks
g8ecj at gilks.org
Sat Jul 7 10:16:42 BST 2012
> On 07/07/12 19:25, Robin Gilks wrote:
>> Greetings
>>
>> It seems that the data contained in http://nzepg.org/freeview.xml.gz had
>> a
>> change of header last weekend.
>>
>> I followed the conversation about mhegsnoop having a header change but
>> didn't realise it would propagate to the online data.
>>
>> This is a real problem for me as I merge data from epgsnoop with the
>> online stuff (which has more detail for FreeView) using 'tv_cat' from
>> the
>> xmltv package and it barfs with:
>> "/tmp/listings-freeview-31681.xml: this file's encoding utf-8 differs
>> from
>> others' ISO-8859-1 - aborting"
>>
>> So the online data now has utf-8 in the header but the epgsnoop data is
>> ISO-8859-1. I tried changing outputter.py (in epgsnoop) to utf-8 but the
>> data from satellite EPG has some interesting 8 bit characters (which I
>> assume really are ISO-8859-1 codes).
>>
>> So who is right - utf-8 or ISO-8859-1 and how can I merge the two
>> different encodings if they are both right!!
>>
>> Cheers
>>
>
> Try iconv to convert the encoding of one file to match the other. For
> example:
>
> iconv -c -f UTF-8 -t ISO_8859-1 this_file -o that_file
>
> The -c will skip invalid chars.
>
> Also change the header or delete it. And check the encoding with "file
> -i that_file".
>
> UTF-8 vs ISO_8859-1? Well UTF-8 is a more universal char set. ISO_8859-1
> is mostly for Western European or Latin languages. I believe UTF-8 is or
> is becoming the preferred encoding.
>
> Interestingly you may have revealed a bug. UTF-8 encoding created by
> mhegepgsnoop is displayed properly (e.g., by "less file" and myth) but
> iconv choked on one character. Seems the byte order might be backwards
> for this char but most apps handle it because bytes in multi-byte UTF-8
> chars are unambiguous so order doesn't really matter. iconv may be less
> tolerant and simply abort if it gets bytes in the wrong order.
So the online data is now created by mhegepgsnoop? I'm surprised it
doesn't follow the existing (as of the last 4 years at least) encoding of
ISO_8859-1 from epgsnoop or at least checked for compatibility with an
existing schema.
--
Robin Gilks
More information about the mythtvnz
mailing list