[mythtvnz] CTV (CanterburyTV) EPG scraper anyone?

Robin Gilks mythtvnz@lists.linuxnut.co.nz
Mon, 20 Nov 2006 14:55:44 +1300 (NZDT)


> Robin Gilks wrote:
>>
>>> Ok we now have permission from CTV to redistribute their EPG data. I'll
>>> tidy this up in the next couple of days, and put up an icon file, but
>>> for the moment can someone please test the following.
>>>
>>> http://www.mythtv.co.nz/epg/ctv.xml.gz
>>>
>>> This means we now have permission for AltTV, Triangle Auckland and CTV.
>>>
>>> Steve
>>>
>>
>> I've just compared it to the CTV web site (and info I had heard "down
>> the
>> grapevine") and the data is totally different. For example I was going
>> to
>> schedule a recording of "Irish Last Night of the Proms" at 19:00 to
>> 20:30
>> but according to the XML file, that is some footy match program running
>> from 19:00 to 21:00.
>>
>> I hope they are not trying to pull a fast one...
>
> The info is actually scraped off their website. I'll re-run the scraper
> and repost it later today.
>
> At the moment they provide their data in a Word document, so their website
> is actually easier to parse.
>
> The nice bit is I have offical permission to do this and re-distribute the
> information.
>
> If it all appears to be Ok i'll arrange for the information to be updated
> every night around 12:30

Looks like the list is only updated on a Monday morning - that means for
example that right now, Sat and Sun are out of date (i.e. not days 6 & 7
from now). Does this mean some scraping magic is required to ensure we
don't go back in time thinking we have next weeks data? I certainly found
that on Saturday I was trying to check from the program duration whether
the Last Night of the Proms was a repeat on Monday or another part - but
Monday was still a week behind, not a week ahead.

Perhaps a prod at them if they only update once a week!!

If the Word document is more up-to-date then perhaps catdoc (or antiword)
will help get the data out in a format that can be xml-ified.

Cheers

-- 
Robin Gilks