[mythtvnz] Pointers on how to track down machine lockup

Wade Maxfield mythtvnz at hotblack.co.nz
Tue Sep 14 12:34:46 BST 2010


On 14/09/2010, at 10:25 PM, Solor Vox wrote:

> Hello Wade,
> 
> Standard things to check when having hardware instability:
> (in order)
> 
> * Heat, try and setup lmsensors and monitor your CPU/chipset temps, make
> sure fans are all running and the heatsinks are free of dust
> 

lm-sensors was already installed. I had added a CPU temp applet to the xfce panel, and the CPU is running 40-45° if I do a transcode. So that doesn't seem too hot to me. But I'll disassemble in the morning to check everything's spinning.

I've just installed sensord so will see how that goes.  I have CPU scaling turned on, most of the time the machine is sitting at 1.2GHz.  Is it worth disabling the scaling?


> * Bad RAM, run memtest86 for a few hours, it's listed on ubuntu boot menu
> 

OK, will try that in the morning.


> * Hard disk, run fsck from a livecd to check for filesystem errors, run
> SMART tools to check for hardware problems
> 

All the drives have just passed the simple SMART status, have been used for around 2400 hours total now, and all went through a vigorous 16-19 hr certification test before I started using them.  I'll try fsck before memtest tomorrow.


> * Power supply, bad power supplies can cause problems, see if your local
> IT shop can run a PSU tester on it.  Make sure you're not pulling too many
> watts.
> 

OK. Total system load at the wall is around 140W and I'm using a new 650W Antec Signature. That shouldn't be overloading but I'll call around tomorrow and see if I can find someone who can test it.


> * If it's a new system, ensure the clock speeds, FSB or BLK for I7 are
> correct.  Also verify your memory settings.
> 

I haven't changed any of the Motherboard Tweaking settings in the BIOS as I had no idea about them, plus the machine was plenty fast compared to the one it's replacing.  What am I looking for, and how do I spot something that may be incorrect? Would resetting the BIOS to it's Failsafe Defaults be a good start?


> * Lastly, check dmesg/logs for any signs.  Lots of segfaults might
> indicate faulty memory or CPU.
> 


There's a couple of segfault in kern.log and syslog, but not just before a lockup, but I'll keep an eye on this.

kern.log
Sep  6 15:29:14 mythtv-mkII kernel: [ 7308.534457] Xorg[1315]: segfault at b554a000 ip b5645a92 sp bfbffe00 error 7 in nvidia_drv.so[b558f000+53a000]
Sep  6 21:39:22 mythtv-mkII kernel: [ 1274.887416] xfce4-sensors-p[1879]: segfault at c029088c ip b77c98bb sp bf926180 error 5 in libc-2.11.1.so[b775e000+153000]
Sep 14 23:06:32 mythtv-mkII kernel: [   39.640717] mythfrontend.re[2019]: segfault at 24 ip b365d61b sp ad6fec70 error 4 in libQtCore.so.4.6.2[b34f5000+276000]

syslog
Sep 14 23:06:32 mythtv-mkII kernel: [   39.640717] mythfrontend.re[2019]: segfault at 24 ip b365d61b sp ad6fec70 error 4 in libQtCore.so.4.6.2[b34f5000+276000]


Many thanks for the starters.


 - Wade


More information about the mythtvnz mailing list