Click to See Complete Forum and Search --> : Problem with network connection (and cron?)


Grobbendonk
10-17-2003, 09:00 AM
Hi,

I've got two problems here (sort of - the second is my failure to solve the first!)

I've got the following hardware setup:
internet - cable-modem - [ eth1 - smoothwall on old PC - eth0 ] - hub - other machines.

Every now and then, eth0 stops working - /var/log/dmesg shows something about the network card entering "promiscous mode", with no apparent reason, and then a few minutes later, it stops being able to send/receive packets on the interface. I've found no pattern to it. eth1 is fine, as is eth0 on the red-hat server sat next to it (identical network cards, which I've swapped a couple of times!)

It would be nice to fix this properly, but I can't find anything to tell me WHY it's happening or how to stop it.

Anyway, second problem, is an attempt at a dodgy fix:
The connection can be re-enabled by simply "ifconfig eth0 down" then "ifconfig eth0 up"

So, as a near-clueless newbie, I created this script:
#!/bin/sh
ping -c 3 192.168.0.5
if [ $? != 0 ] ; then
ifconfig eth0 down
ifconfig eth0 up
fi

(my first ever script which does anything more complex than set a few environment variables! So probably the worst way to do it?)

This works when run in foreground. But not when it's added to cron. (10 * * * * /usr/local/bin/nics_stayalive.sh >/var/log/SA.log) When run by cron, it executes (SA.log contains the three ping responses and expected timestamp), but doesn't seem to do the down/up bit. There's no output from the ifconfig in foreground or background, so I've no idea where to look next!

Any help will be very gratefully received, thanks in advance!

kam
10-19-2003, 02:16 AM
I don't really know, but I have a guess:
Is your eth0 IP address assigned via DHCP? Maybe your lease is expiring. :confused:

As for the script, try adding full paths, such as /usr/bin/ping (check this one; it might be /bin/ping) and /sbin/ifconfig.

cowanrl
10-19-2003, 01:52 PM
I have seen the type of problem you are having being caused by interrupt sharing by devices. Even though modern computers are supposed to support it, sometimes if you have two busy devices sharing the same interrupt, it can cause problems. Especially if you are using a poorly written driver somewhere.

You can see what interrupts are being used by your computer by executing this command:

cat /proc/interrupts

You should be able to identify the interrupts your NICs are using by the names in the list. You should be able to determine if your NICs are sharing interrupts and with what other devices.

Sometimes upgrading to a newer driver can solve the problem. I know there is a web site you can go to that has available almost every driver available for Linux. I can't remember the URL though. Maybe someone else can help you with that.

Sometimes you have to try to change what interrupt your NIC is using. This is usually done via the /etc/modules.conf file. However, the exact setting you would need to use can vary according to what device and module you are using. For newer PCI NICs, this may not be possible at all.

If you look under /lib/modules/<kernel version number>/build/Documentation/ you can usually find documentation on the module you are using. Sometimes it will list the options available for use when the module is loaded. You can also see the man page for modules.conf and insmod for help with loading the modules.

Grobbendonk
10-27-2003, 01:53 PM
Well, on the first problem, I'd never thought about the interrupts, but I checked, and there's no sharing. Eth1 is the same make of PCI card, but is active on a different interrupt.

I checked dhcp and I don't think it's that. I'm using a fixed ip address, cos the machine is a firewall. However, it's also running as a DHCP server for my network. Could that be related. I think I might have found some pattern - the network dropouts have only ever happened when one particular machine is active (Win98, uses DHCP on this server to get its IP number). Could that be related?

On the second problem, I have had some success - I didn't need to add a path to ping (cos the logs were showing that was working fine). However, a bit of a rummage showed that ifconfig wasn't in the path for root (only added in the default interactive shell) and adding the path to the ifconfig lines SEEMS to have fixed that! I'm going to add some logging into it (i.e. appending "network dropped date/time" to a file) see if I can get a better pattern.

Thanks for your help, much appreciated (especially as it means I don't have to keep trogging down to the cellar to re-start the NIC every couple of hours)