Linux_cat
01-05-2005, 11:07 AM
Right I hope someone can help, or point me in the right direction, I know that perl would solve the problem but i dont know it and dont have time to learn it as this has to be done within a week or some.
I have two test two text files that merge into one, however each file holds around 15,000,000 lines of data and compresse's into a 5,000,000 line file. I have recreated the rules that it matches on and have written the following script:
#!/bin/bash
YMOUT031=${1:?"requires an argument" }
YMOUT015=${2:?"requires an argument" }
zcat $YMOUT031 > YMOUT31unzipped.$$
zcat $YMOUT015 |cut -c7-11,18,18-24,46,47-52,75- > YMOUT15unzipped.$$
while read line_text
do
field=`echo $line_text|awk -F'|' '{printf $5"|"$4"|"$3"\n"}'`
matchedrows=`grep "$field" YMOUT15unzipped.$$|tr '\n' ' '`
if [ "$matchedrows" == "" ]
then
echo "<$line_text>" >> nomatches
else
echo "The row: $line_text Matches:$matchedrows " >> matches #this is where the error lies
fi
done < "YMOUT31unzipped.$$"
Although this works it is way to slow, it takes around 4-5 seconds to grep and pipe it to either output, which will take me well over half a year to complete!!!!!!
does anyone know of a way in which i can speed up the script dramatically so that it will only take a couple of days??, week maximum???
any help will be much apprieciated.
I have two test two text files that merge into one, however each file holds around 15,000,000 lines of data and compresse's into a 5,000,000 line file. I have recreated the rules that it matches on and have written the following script:
#!/bin/bash
YMOUT031=${1:?"requires an argument" }
YMOUT015=${2:?"requires an argument" }
zcat $YMOUT031 > YMOUT31unzipped.$$
zcat $YMOUT015 |cut -c7-11,18,18-24,46,47-52,75- > YMOUT15unzipped.$$
while read line_text
do
field=`echo $line_text|awk -F'|' '{printf $5"|"$4"|"$3"\n"}'`
matchedrows=`grep "$field" YMOUT15unzipped.$$|tr '\n' ' '`
if [ "$matchedrows" == "" ]
then
echo "<$line_text>" >> nomatches
else
echo "The row: $line_text Matches:$matchedrows " >> matches #this is where the error lies
fi
done < "YMOUT31unzipped.$$"
Although this works it is way to slow, it takes around 4-5 seconds to grep and pipe it to either output, which will take me well over half a year to complete!!!!!!
does anyone know of a way in which i can speed up the script dramatically so that it will only take a couple of days??, week maximum???
any help will be much apprieciated.