Click to See Complete Forum and Search --> : PC troubles over the weekend (trouble shooting marathon story)


JamminJoeyB
10-04-2005, 03:08 PM
Well guys I had one heck of a weekend.

First a breakdown of my system.
2.0ghz P4
640megs RAM
Geforce FX5700 Ultra
80 gig Maxtor ata133 ide
Sound Blaster Audigy 2zs
HP CDRW drive
BTW this is a dual monitor rig.


Ok so this is a pretty average to high end rig. My distro is Slackware 10.1

I'd been having an occasional problem with my system freezing up and knew something was brewing. A hardware failure was happeneing or going to happen soon. On 1 Oct 2005 things got crazy. I was just editing some pics and got a seg fault. OK, that's not a good thing I think to myself. Good thing this was just a personal project and not something I had a deadline for.

Well I do a reboot with morphix to fsck the drive as I had to hard cycle the power as the system was frozen.

Ok after that is done I reboot back to slack, everything good for about an hour. Then the system just starts acting flakey again and freezes. Ok time to power down and every trouble shooting mode.

So I think what has changed about my system to cause the stability problems I had been having lately. The 5700 has only been in the system about a month. So lets pull that and put in my trusty 4mb matrox card. This pci wonder has been in my little kit for a long time. Just about any pc of any age will id this puppy and run it with no problems.

Well the vid card change did nothing. Before you guys think I forgot the obvious I did make sure all fans and pc temps were where they are normally at.

So I think ok it's either the HD or memory. No problem I can eliminate the hd with this 10 gig here and see how that goes.

So I start to install Slack to the 10 gig and I don't even get thru the install before errors start happening. Now I did have my other hd slaved in and slack picked it up to use the swap in conjunction with the swap I made on the 10 gig. So I am still thinking hard drive.

Ok so I think just for the hell of it lets pull out one of the memory modules and see if the install goes. I also pulled my 80 gig out of the system figuring I would just add it in later and mount it and copy my most important files over for backing up to cd-rw. I only have a 600megs of mission critical stuff so this works well for me. I can always re-rip my music.

So the install goes well and I am back up and running. Now it's about 0230 (230 a.m. to you civilian type folks) I'm tired and I have a working system. Ok time for bed as I will need to get a new hd as this 10gig just wont cut the mustard.

I get up around 1000 in the morning and fire up mozilla to check for prices on hard drives. Well my local vendors are closed on sundays and they usually have competitive prices or I would support them. I find a pretty good deal on a HD at Circuit City. 160 gig WD ata100, $119 with $80 in rebates, that is a lot of storage for under $50 after taxes. This looks like a great deal, just got to mail off for the rebates, no big deal there. So off to Circuit City I go.

Fast forward to Monday. Installed the new hard drive, got slack and everything patched and running smoothly. Decieded to go with the 2.6.10 kernel as another option. So I take care of that and get ready to reboot.

Anyone recall that memory module I removed? Well I spotted that thing next to my slackware cd box in the static pack I put it in. So instead of just rebooting I decide to reinstall that. So a quick halt command and take care of that.

Well I didn't even get half way through initial start up of my system and I get a kernel panic. I do my best Homer Simpson "DOH!" at this point. It wasn't the hard drive it was the memory module. So I learned the hardway as I have read on here before from other people who have been trouble shooting late at night or really tired that you need to be well rested or you are gonna screw something up.

Now at least mine didn't cost me a whole new mobo processor etc. Here is the ironic thing. That new HD after rebates cost me about the same amount as another 512mb stick of memory would have. It would have been a lot less time invested to install that instead of a new hd. Thank god it was only a 128meg stick.

JayMan8081
10-04-2005, 05:10 PM
Ahh yes. That is the joy of computer hardware. Most people are familiar with the steps to debug software (programmers at least), but hardware failure happens so infrequently that the average home user doesn't really know a good procedure to go through. I had a problem with not enough power in my main rig, but I checked that last after swapping out video cards and running memtest86 for a good 8 hours (overnight). I enjoy playing with hardware but finding and fixing problems isn't always the easiest thing to do. I think now though I will always check the memory first, then hard drives, then video card, then take entire system down to the least power-hungry components. Glad you got everything back up and running. :)

mrBen
10-05-2005, 10:00 AM
I did the same recently.

Started getting problems when switching to 3d modes in games 'n' stuff. So I replaced the video card. Same problem. Did a memtest, and it came up fine.

Turns out it was a dodgy CPUfan, and the video card was pushing the cpu temp over the limit. Glad I didn't have to replace the processor, and having a new gfx card is sweet :)

JamminJoeyB
10-05-2005, 11:24 AM
Yeah I think it's the little things that people over look that can cuse problems. The good side of my situation is at least I know I have a solid 80 gig drive in addition to the 160 gig I just installed. I am thinking that will become my distro test drive or I will get an enclosure and make it a portable usb drive. Not sure which yet.

retsaw
10-05-2005, 03:42 PM
I've just had a similar problem with my "new" Athlon 850Mhz 512MB system that I got given. Everything seemed to be working fine with it until I had to recompile the kernel to add something that wasn't included in stock. Being lazy I just used the stock .config file from my Arch install, so it was set to compile all the modules aswell. Well, this does give the system a thorough workout, and it kept stopping with random errors. So, the first thing I did was the mprime torture test for a few hours before I went to bed, no problems. The next morning, I got it going on memtest while I went out, it went through 9 passes, so the memory must be fine. The next thing to try was the processor, I tried underclocking it to 765 MHz (as low as it would go), I still got kernel compilation errors. Next, I changed the heatsink and fan (I was going to do that anyway as had one rated for a Athlon-XP 3200+ lying around,) still errors. Next, I upped the vcore (it's got plenty of cooling now, so heat shouldn't be an issue), that didn't help. I exported the kernel source via NFS and compiled it from my main box, no problems there, so the hard drive must be fine.

After all this I'm thinking, WTF is wrong with it, is it just a bit knackered? I was going to leave it since it only had a problem under heavy load, then I read this thread and thought what the hell, I'll try removing the memory sticks and try them one by one, even though memtest came back with no errors. First stick, kernel compiles without errors, great. Second stick, no problems. Third stick, errors. So the moral of this is to not rely on memtest to tell you if your RAM is dodgy.

I'm down 128 MB on what I thought I had, but it's still got a nice amount for something I got for free.

lnx_nu_b
10-06-2005, 03:31 PM
Similar problem myself....

Two motherboards
Two Sound Cards
Three Harddrives
2 Video Cards
4 Memory Modules (2 -- Mushkin, 2 -- Corsair)
2 PSU's

and MANY attempts to reinstall an OS.

Finally, it was the RAM. Everything was either sold or returned, and I'm up and running!! Oh, and track this down over 3 months. I couldn't sleep very well during that time.