[mythtvnz] Pointers on how to track down machine lockup

Wade Maxfield mythtvnz at hotblack.co.nz
Mon Oct 25 03:42:25 BST 2010


On 25/10/2010, at 2:45 PM, Solor Vox wrote:

> Did you notice the last script /usr/local/bin/mythlink.sh?  Looks to me
> that is the last thing logged before the reboot.
> 
> <snip>
>> Oct 25 11:35:01 mythtv CRON[19514]: (root) CMD
>> (/usr/local/bin/mythlink.sh)
> 
> <crash/reboot>
> 
>> Oct 25 11:39:41 mythtv kernel: imklog 4.2.0, log source = /proc/kmsg
>> started.
> 

Thats the last thing kern.log recorded, but mythbackend.log has:

2010-10-25 11:38:16.622 MainServer, Warning: Unknown socket closing MythSocket(0x7f5c8481a520)
2010-10-25 11:38:16.640 MainServer, Warning: Unknown socket closing MythSocket(0x7f5c843e9040)
2010-10-25 11:39:48.512 mythbackend version: branches/release-0-23-fixes [24158] www.mythtv.org
2010-10-25 11:39:48.669 Using runtime prefix = /usr
2010-10-25 11:39:48.717 Using configuration directory = /home/mythtv/.mythtv



> <snip>
> 
> Looks like that script is for symlinks to human readable names.  Have you
> run fsck on your filesystem(s) with all those crashes?
> 

Not this time, but I did the previous one.  We were watching a recording on remote frontend, it paused, then the screen went white and we were dumped back to the recordings menu with a message about not being able to connect to the master backend.  That's how we knew it had died again. Less than a minute later the backend had finished rebooting, and everything was back up and working again.  At the time of the reboot there was a single recording on the PVR-350, and a single recording being watched via remote frontend.  

The script runs twice an hour at 5 & 35 minutes, and normally completes in less than 30 seconds.


> Also, how good is your power from the mains?  Any power protection
> surge/UPS in fault condition?  

The box is connected to an APC UPS, running at about 20-25% load. The Myth box and the Sky decoder are the only devices running off it. apcupsd doesn't having anything logged out of the ordinary (just a few startups & self tests), and no warnings or flashing LEDs on the UPS. The Self Test says there's nothing wrong with the UPS.


> If you're comfortable with bash, I'd
> suggest writing a wee script to log CPU load, free memory,
> temperatures/etc. and current date/time to a file every few 20-30 seconds.
> 

Temps are currently logged every 30min, using sensord. Last readings were CPU 27-31° at 11:27, and 28-33° just after boot at 11:39.  I'm also using monit to keep an eye on cpu load and memory usage. Nothing out of the ordinary, but I'll look at logging everything a bit more explicitly.


> A kernel panic could cause a reboot as well.  Have you changed or looked
> at /etc/sysctl.conf for kernel.panic times?  (or look at sysctl -a)
> 

I've never looked here before.  What should I keep an eye out for?

kernel.panic = 0
kernel.panic_on_oops = 0
kernel.unknown_nmi_panic = 0
kernel.nmi_watchdog = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.panic_on_io_nmi = 0
kernel.softlockup_panic = 0
kernel.softlockup_thresh = 60
kernel.hung_task_panic = 0
kernel.hung_task_check_count = 4194304
kernel.hung_task_timeout_secs = 120
kernel.hung_task_warnings = 10
vm.overcommit_memory = 0
vm.panic_on_oom = 0
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1

 - Wade




More information about the mythtvnz mailing list