[mythtvnz] Pointers on how to track down machine lockup
Wade Maxfield
mythtvnz at hotblack.co.nz
Tue Sep 14 12:34:46 BST 2010
On 14/09/2010, at 10:25 PM, Solor Vox wrote:
> Hello Wade,
>
> Standard things to check when having hardware instability:
> (in order)
>
> * Heat, try and setup lmsensors and monitor your CPU/chipset temps, make
> sure fans are all running and the heatsinks are free of dust
>
lm-sensors was already installed. I had added a CPU temp applet to the xfce panel, and the CPU is running 40-45° if I do a transcode. So that doesn't seem too hot to me. But I'll disassemble in the morning to check everything's spinning.
I've just installed sensord so will see how that goes. I have CPU scaling turned on, most of the time the machine is sitting at 1.2GHz. Is it worth disabling the scaling?
> * Bad RAM, run memtest86 for a few hours, it's listed on ubuntu boot menu
>
OK, will try that in the morning.
> * Hard disk, run fsck from a livecd to check for filesystem errors, run
> SMART tools to check for hardware problems
>
All the drives have just passed the simple SMART status, have been used for around 2400 hours total now, and all went through a vigorous 16-19 hr certification test before I started using them. I'll try fsck before memtest tomorrow.
> * Power supply, bad power supplies can cause problems, see if your local
> IT shop can run a PSU tester on it. Make sure you're not pulling too many
> watts.
>
OK. Total system load at the wall is around 140W and I'm using a new 650W Antec Signature. That shouldn't be overloading but I'll call around tomorrow and see if I can find someone who can test it.
> * If it's a new system, ensure the clock speeds, FSB or BLK for I7 are
> correct. Also verify your memory settings.
>
I haven't changed any of the Motherboard Tweaking settings in the BIOS as I had no idea about them, plus the machine was plenty fast compared to the one it's replacing. What am I looking for, and how do I spot something that may be incorrect? Would resetting the BIOS to it's Failsafe Defaults be a good start?
> * Lastly, check dmesg/logs for any signs. Lots of segfaults might
> indicate faulty memory or CPU.
>
There's a couple of segfault in kern.log and syslog, but not just before a lockup, but I'll keep an eye on this.
kern.log
Sep 6 15:29:14 mythtv-mkII kernel: [ 7308.534457] Xorg[1315]: segfault at b554a000 ip b5645a92 sp bfbffe00 error 7 in nvidia_drv.so[b558f000+53a000]
Sep 6 21:39:22 mythtv-mkII kernel: [ 1274.887416] xfce4-sensors-p[1879]: segfault at c029088c ip b77c98bb sp bf926180 error 5 in libc-2.11.1.so[b775e000+153000]
Sep 14 23:06:32 mythtv-mkII kernel: [ 39.640717] mythfrontend.re[2019]: segfault at 24 ip b365d61b sp ad6fec70 error 4 in libQtCore.so.4.6.2[b34f5000+276000]
syslog
Sep 14 23:06:32 mythtv-mkII kernel: [ 39.640717] mythfrontend.re[2019]: segfault at 24 ip b365d61b sp ad6fec70 error 4 in libQtCore.so.4.6.2[b34f5000+276000]
Many thanks for the starters.
- Wade
More information about the mythtvnz
mailing list