[mythtvnz] Pointers on how to track down machine lockup
Wade Maxfield
mythtvnz at hotblack.co.nz
Mon Oct 25 03:42:25 BST 2010
On 25/10/2010, at 2:45 PM, Solor Vox wrote:
> Did you notice the last script /usr/local/bin/mythlink.sh? Looks to me
> that is the last thing logged before the reboot.
>
> <snip>
>> Oct 25 11:35:01 mythtv CRON[19514]: (root) CMD
>> (/usr/local/bin/mythlink.sh)
>
> <crash/reboot>
>
>> Oct 25 11:39:41 mythtv kernel: imklog 4.2.0, log source = /proc/kmsg
>> started.
>
Thats the last thing kern.log recorded, but mythbackend.log has:
2010-10-25 11:38:16.622 MainServer, Warning: Unknown socket closing MythSocket(0x7f5c8481a520)
2010-10-25 11:38:16.640 MainServer, Warning: Unknown socket closing MythSocket(0x7f5c843e9040)
2010-10-25 11:39:48.512 mythbackend version: branches/release-0-23-fixes [24158] www.mythtv.org
2010-10-25 11:39:48.669 Using runtime prefix = /usr
2010-10-25 11:39:48.717 Using configuration directory = /home/mythtv/.mythtv
> <snip>
>
> Looks like that script is for symlinks to human readable names. Have you
> run fsck on your filesystem(s) with all those crashes?
>
Not this time, but I did the previous one. We were watching a recording on remote frontend, it paused, then the screen went white and we were dumped back to the recordings menu with a message about not being able to connect to the master backend. That's how we knew it had died again. Less than a minute later the backend had finished rebooting, and everything was back up and working again. At the time of the reboot there was a single recording on the PVR-350, and a single recording being watched via remote frontend.
The script runs twice an hour at 5 & 35 minutes, and normally completes in less than 30 seconds.
> Also, how good is your power from the mains? Any power protection
> surge/UPS in fault condition?
The box is connected to an APC UPS, running at about 20-25% load. The Myth box and the Sky decoder are the only devices running off it. apcupsd doesn't having anything logged out of the ordinary (just a few startups & self tests), and no warnings or flashing LEDs on the UPS. The Self Test says there's nothing wrong with the UPS.
> If you're comfortable with bash, I'd
> suggest writing a wee script to log CPU load, free memory,
> temperatures/etc. and current date/time to a file every few 20-30 seconds.
>
Temps are currently logged every 30min, using sensord. Last readings were CPU 27-31° at 11:27, and 28-33° just after boot at 11:39. I'm also using monit to keep an eye on cpu load and memory usage. Nothing out of the ordinary, but I'll look at logging everything a bit more explicitly.
> A kernel panic could cause a reboot as well. Have you changed or looked
> at /etc/sysctl.conf for kernel.panic times? (or look at sysctl -a)
>
I've never looked here before. What should I keep an eye out for?
kernel.panic = 0
kernel.panic_on_oops = 0
kernel.unknown_nmi_panic = 0
kernel.nmi_watchdog = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.panic_on_io_nmi = 0
kernel.softlockup_panic = 0
kernel.softlockup_thresh = 60
kernel.hung_task_panic = 0
kernel.hung_task_check_count = 4194304
kernel.hung_task_timeout_secs = 120
kernel.hung_task_warnings = 10
vm.overcommit_memory = 0
vm.panic_on_oom = 0
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
- Wade
More information about the mythtvnz
mailing list