Click to See Complete Forum and Search --> : Just curious about something with crontab


TGrimace
04-25-2003, 02:28 PM
I wrote a nice simple little bash program that goes through several hundred thousand lines of logs files and sorts it into something readable for SQL. I'm just curious as to why when I run it from the command line it takes about 2.5 hours to get from start to finish, but when I schedule it as a cron job it takes less then 4 minutes. At first I was thrilled that this long annoying program that I thought would have to run for 2.5 hours everyday could be shortened down to 4 min, but now I'm wondering why I'm not getting that kind of speed from my terminal-emulation.
Any thoughts would be welcome.

chrism01
04-27-2003, 09:02 AM
Personally I'd be very careful to check both versions are working properly and compare the output. Frankly, that perf increase is ubelievable (if at more or less same time of day).
It indicates either cron version is not actually doing the work, or its running at a time when the system is very quiet and the terminal ver is run when you are also doing LOT of other stuff.

TGrimace
04-28-2003, 12:44 PM
That's what I thought when it happned the first time. My first thought was "Oh, great. Now what's wrong with it?" but I checked the output when crontab ran against the output of when I ran it manually, and they were exactly the same. I ran it several times over a week to make sure and they always came out the same. Time of day doesn't seem to be a factor. I can schedule it for anytime and it'll alway run much much faster then if I ran it manually.

chrism01
04-28-2003, 01:17 PM
Well, unless I'm going mad :) I don't believe it OR you've just discovered a new principle of computing.
Seriously though, the 2 methods should be roughly the same, in fact I'd expect interactive processes to have a higher default priority.
Maybe you could check that ie interactive priority vs cron priority....

Anybody else got any ideas here...

TGrimace
04-28-2003, 01:40 PM
hmm. I was thinking this was a normal effect, but apparently not.
I believe the main line of the program that seem to be accelerated in crontab are this:

<code>
echo " Sorting FileNames"
sort -f FILENAME.txt > SortedFiles.txt

echo " Removing duplicate filenames"
awk -F" " '{
n = $0
if ( n != z ) { print n }
z = $0
}' SortedFiles.txt > NoDuplicates.txt
#cp NoDuplicates.txt testparttwo.txt
rm SortedFiles.txt

echo " Counting filenames"
awk -f AWKfiles/newCountFilename.awk NoDuplicates.txt > FILENAMEcount.txt
</code>

The counting of the filenames takes a very long time when I run it from the Konsole. The NoDuplicates.txt file is a list of roughly 4000 files (give or take) and it's checking it against a log file with close to 400000 lines of log entries. Basicly the idea is to pull individual file names out of a logfile, and then count up how many times that filename is in there. Oh I guess the newCountFilename.awk code would be useful too :

<code>
{ system( "grep -icwe "$0" FILENAME.txt") }
</code>

From the command line, these lines can take 45min-1hour to complete, and there's another part of the code that does almost the exact same thing which also takes about that long. But crontab runs it all in less then 5 min.
I don't believe cron has a higher priority, but I'll check on that. (as soon as I figure out how :D )

chrism01
04-29-2003, 05:42 AM
So you've got a list of filenames, that want the unique subset.
How about using the -u (unique) switch on the sort cmd?
Then effectively you're saying


for pattern in `cat uniquelist.txt`
do
echo $pattern
grep -ic $pattern largefile.txt
done


Is that about it?
I've been working in Unix for about 7 yrs, and I've never heard anybody claim cron is any faster than eg interactive. Its very strange....

TGrimace
04-29-2003, 11:14 AM
That does seem more concise then the way I have it. I started writing this script to learn scripting, so sometimes things are done a bit oddly, but it does finally work.
However, now I'm feeling a bit of concern. The output of both scripts is exactly the same, but the crontab takes much much less time. Minutes instead of hours. If this isn't a normal thing, what could be causing it? Maybe cron is running at the speed it's supposed to, but my shell Konsole is running much slower then it's supposed to? I've tried running the program as different users, and setting the 'nice' to -20, but still cron is much faster. Any ideas would be helpful.

chrism01
04-29-2003, 11:26 AM
The phrase 'both scripts' worries me. You should be running the same ie actual one and only script.
There may be a difference if you've got 2 versions....

TGrimace
04-29-2003, 11:28 AM
Whoops. My bad. It's one script, I meant the output of the script is the same whether I run it from Konsole or from cron.

chrism01
04-29-2003, 11:42 AM
Thats incredibly bizarre... I believe you are telling the truth (as you see it) but I just don't believe the implications.
Does the script talk to the terminal or stdout at all?
Also, can you add the script as an attachment, so i can look at it?
Where are you based geographically? If not near me, have you got a local Unix prog you can run it by?
You're not submitting it to cron on another box??
Can you tell I'm getting desperate here ;)

TGrimace
04-29-2003, 12:14 PM
umm. Ok, I can add the script. It's probably not the easiest program in the world to read tho. Like I said, it was one of my first. StartRealStats is the first part of the script, and it runs the other parts. What I don't have is a log files I can send you. The log files this script is used on is over 100megs which would make it extremely difficult to get it to ya.
Ok, basic rundown of what the script does (and keep in mind that it _does_ work now, I'm just curious as to why it runs so fast in cron). We stream classes, events and a couple of radio stations here at work using RealProducer Plus. If you've ever used RealProducer, you know what a complete mess the logfiles it generates are. Just horrible, useless junk. So finally I wrote this to tear into the logfiles and spit out something that could be useful and easily imported into SQL server, where it could be used in asp pages.
(what? I can't attach a tar file? *sigh* hold on, let me go zip it up)

chrism01
04-29-2003, 05:10 PM
You got me going with that zip; i thought you meant gzip... shows i wasn't concentrating enough ;)
Anyway, why don't we take this to the pm (private msg ) system for a bit.