home :: technology :: linux :: RandomSegfaults.txt

May 11, 2004

Troubleshooting Linux: Don't forget the obvious

Well I've been pulling my hair out for the past week or so trying to figure out what the heck was wrong with my little MythTV DVR box. I've been getting more and more crashes lately, which I assumed was due to some package incompatibility caused by my incessant "apt-get dist-upgrade" commands that I've been throwing at it in an attempt to keep current. I'd also had the misfortune of allowing my primary HD to fill up while installing packages, which may have corrupted my RPM database.

Last weekend, I did a full fsck (file system check) to make sure the HD was good. Everything checked out OK. Preparing for the worst, I dumped my mysql database out to the HD as an SQL file - or at least, I tried to.

Segmentation fault


What the? Okay - so this was something bigger than just MythTV if the database alone was crashing.

Over the course of the next week, I managed to get the machine to crash (albeit randomly) while: watching MytvTV (but mostly when a new program started recording or you fast forwarded a lot), running a few consecutive dumps of the mysql database, and watching a plain old movie file with mplayer (no database involved).

Instead of my troubleshooting narrowing the problem down to one piece, the trouble seemed to be manifesting itself in new ways the more I tried different tactics to pinpoint it.

Deciding that some underlying library must be corrupt, I decided to just install the original packages from Fedora Core 1.

When not only my first but second set of CD's failed to boot, I went to the old standby - my trusty knoppix 3.3 disk. This thing boots on anything that's not terribly broken.

No good. It kernel-panics before even booting up.

So now I'm just despondent. My computer is once again dead. This isn't that bad unless you know the context. I spent two months of my life in valiant battle against this piece of crap before Christmas, and I wasn't giving up now.

45 minutes and a short floor-nap that left the leather shape of my bracelet imprinted on my forehead later my computer had been off for a while and I wanted to try an experiment.

I booted the system cold, and it went straight up. Right into knoppix, lickety split.

I was confounded and happy at the same time. The good news was, whatever was broken wasn't that bad. The bad new was, this was now a hardware error.

Popping the system open, I found the source of my random segfaults, non-booting install cd's and other headaches: one of the two cpu fans was dead... and not just dead, dead and melting. This thing was, and still is, 15 minutes after being turned off, HOT.

I don't know if any permanent damage has been done but I do know one thing - just because it's running Linux doesn't mean it's invulnerable to stupid hardware problems and erratic behavior. Don't forget to check the obvious.