Click to See Complete Forum and Search --> : awk/nawk question
Linux_cat
12-22-2004, 06:46 AM
Hello beautiful and wonderful people of this forum I was wondering if anyone could answer a quick question.
When working with awk/nawk, how would one print specific characters in a field e.g
say the following was a field in a text file
afreessaW2005jjkkajkjd|
how would i just print or isolate the W2005????, filtering out the other characters around it??????
thank you, I wait eagerly for a response :)
Linux_cat
12-22-2004, 07:52 AM
please help?.
DrChuck
12-22-2004, 11:20 AM
extracting a substring in bash, 9 characters long, beginning with #8, counting from 0:
[drchuck@dphn531 ~]$ txt="afreessaW2005jjkkajkjd|"
[drchuck@dphn531 ~]$ echo ${txt:8:9}
W2005jjkk
An awk 1-liner is best for splitting text by delimiters, like whitespace or csv, and you don't have any of those in your example.
Cheers,
Linux_cat
12-22-2004, 12:22 PM
Thank you so much, your help is very much appreciated, your the man.
Linux_cat
12-22-2004, 12:31 PM
also an awk 1 liner???, i am working with some very big text files with '|' as the delimeter, could you give me an example of what your talking about.
Thank you.
bsh152s
12-22-2004, 03:11 PM
To specify the field separating character, just use the -F option like so:
awk -F| '{print $1,$2,$3;}' yourfile.txt
Linux_cat
12-22-2004, 03:40 PM
Thanks bsh152s, however i was aware of that option already.
My basic problem is that i have two very large text files ouputed from a database that merge into one, when merging the files, i need to match certain columns so that they can be put into the right place in the merged file. I have been given the wonderful task of testing it and recreating the rules that the system will use.
My main problem is that one of the files has clearly defined colums with values eg
W2001|GHNJJS|123456
however the text file i am matching it against uses those fields in a key amongst other data in the field e,g
DFKKE?.W2001LLSKII|WORLD______GHNJJS____1|QWERSDFT __123456__|
I need to isolate the exact strings in the columns in the second file to match them, which means stripping away the rest od the characters around them....
does this make it clearer for anyone ?
chrism01
12-23-2004, 07:04 PM
You could use something like:
$file1_rec='W2001|GHNJJS|123456'
$tmp=`echo $file1_rec |cut -d'|' -f1`
echo $tmp
grep $tmp $file2_rec
if [[ $? eq 0 ]]
then
echo file rec matched
fi
You'd need to create a loop to extract the recs from
the files.
Personally, I'd say it's a classic Perl problem, but that assumes you know Perl.
Linux_cat
12-27-2004, 10:08 PM
Thanks for your help.
I dont know perl, in fact I am in the process of learning shell scripting and Awk/Grep, I've wrote the following so far to solve my problem:
#!/bin/bash
YMOUT031=${1:?"requires an argument" }
YMOUT015=${2:?"requires an argument" }
zcat $YMOUT031 > YMOUT31unzipped.$$
while read line_text
do
field=`echo $line_text|nawk -F'|' '{print $5,$4,$3}'`
matched=`zcat $YMOUT015|nawk -F'|' '{print $1,$2,$3}'|grep "$field"`
echo "The row: <$line_text> \tMatches:$matched" > matched
done < "YMOUT31unzipped.$$"
This is the basic skeleton, im gonna need to modify it with extra commaned if I am to filter out the characters i dont need in the target fields.
Can this problem not be solved with Awk, would I need to learn perl as well???.
Thanks again.
Linux_cat
12-28-2004, 02:24 PM
right ive figured out exactly what i need to do to isolate the characters i need so thats sorted, thank you chrism01 for that, however there seems to be an error in my script, i cant figure out what it is I get the following output from the debug option
+ YMOUT031=ymout031_23012004_1307200409.z
+ YMOUT015=ymout015031_07092004_2047080.z
+ zcat ymout031_23012004_1307200409.z
+ read line_text
++ echo '771037|2001-11-27-00:00|B01018|CBLCBH|W2001|' '0|' 0
++ nawk '-F|' '{print $5,$4,$3}'
./script: line 1: nawk: command not found
+ field=
++ zcat ymout015031_07092004_2047080.z
++ nawk '-F|' '{print $1,$2,$3}'
./script: line 1: nawk: command not found
++ grep ''
+ matched=
+ echo 'The row: <771037|2001-11-27-00:00|B01018|CBLCBH|W2001| 0| 0> \tMatches:'
can anyone please help, im new to scripting and am trying my best to get my head round this stuff...thank you.
bwkaz
12-28-2004, 03:51 PM
Originally posted by Linux_cat
./script: line 1: nawk: command not found
<...>
./script: line 1: nawk: command not found You don't have nawk installed. You probably have gawk instead; nawk is ancient and not maintained anymore, while gawk is. Alternately, you could just use awk instead, for portability -- both gawk and nawk should have a symlink named awk pointing at them in a normal installation.
Linux_cat
12-28-2004, 04:47 PM
cheers mate, I was going mad as to why it was not working, you just saved me some hair :).
Linux_cat
01-04-2005, 10:59 AM
Right next question,
dudes, in my following script
#!/bin/bash
YMOUT031=${1:?"requires an argument" }
YMOUT015=${2:?"requires an argument" }
zcat $YMOUT031 > YMOUT31unzipped.$$
zcat $YMOUT015 > YMOUT15unzipped.$$
while read line_text
do
field=`echo $line_text|awk -F'|' '{printf $5"|"$4"|"$3"\n"}'`
matchedrows=`cat YMOUT15unzipped.$$|cut -c7-11,18,18-24,46,47-52,75-|grep "$field"`
echo "The row: <$line_text> \t Matches:$matchedrows \n"
done < "YMOUT31unzipped.$$"
the escape characters in echo "The row: <$line_text> \t Matches:$matchedrows \n" are not working!!!, does anyone know why?...I need to do a line count on each group of matchings, so thought that putting an escape characters after the matched would be fine obviously not!!!...any genius know whats wrong?.
Also does anyone know how I can modify my script so that it doesnt print anylines if there is no match at the moment it prints the following even if there is not a match - "The row: <771037|2001-12-04-00:00|Y01064|ICFYXA|S2002| 0| 0> \t Matches:"...which is going to make my matched line count inaccurate!.
Linux_cat
01-04-2005, 12:46 PM
K, Ive figured out how to get the script to print out non matches to a seperate file, my only question is now why aren't my escape characters working??.
bwkaz
01-04-2005, 07:42 PM
/bin/echo requires a -e option to enable substitution of escape characters. bash's builtin echo requires the same thing.
Check the manpage for echo and also the one for bash (search through it for "echo" to find the relevant builtin). ;)
Linux_cat
01-05-2005, 05:42 AM
Thank you bwkaz, yet again your wealth of knowledge has pointed me in the right direction, I now have the escape characters working.
I am however not getting the result I need, I get mutliple matches to one line and want to count the match and the matches as 1 line, however each match is still printing to a seperate line, how would I overcome this problem??. e.g
the following is counted as 11 or so lines:
The row: <771037|2001-12-04-00:00|Y01064|ICFYXA|S2002| 0| 0> Matches:S2002|ICFYXA|Y01064|FDF|2002-04-29-00:00|1| |10080|0
S2002|ICFYXA|Y01064|FDF|2002-05-06-00:00|1| |10080|0
S2002|ICFYXA|Y01064|FDF|2002-05-13-00:00|1| |10080|0
S2002|ICFYXA|Y01064|FDF|2002-05-20-00:00|1| |10080|0
S2002|ICFYXA|Y01064|FDF|2002-05-27-00:00|1| |10080|0
S2002|ICFYXA|Y01064|FDF|2002-06-03-00:00|1| |10080|0
S2002|ICFYXA|Y01064|FDF|2002-06-10-00:00|1| |10080|0
S2002|ICFYXA|Y01064|FDF|2002-06-17-00:00|1| |10080|0
S2002|ICFYXA|Y01064|FDF|2002-06-24-00:00|1| |10080|0
S2002|ICFYXA|Y01064|FDF|2002-07-01-00:00|1| |10080|0
I need this to be counted as 1.
Thanks again for all your help.
Linux_cat
01-05-2005, 06:50 AM
Right, cancel my last question, i used the tr command to sort it out thanks for all the help, below is my current script:
#!/bin/bash
YMOUT031=${1:?"requires an argument" }
YMOUT015=${2:?"requires an argument" }
zcat $YMOUT031 > YMOUT31unzipped.$$
zcat $YMOUT015 > YMOUT15unzipped.$$
while read line_text
do
field=`echo $line_text|awk -F'|' '{printf $5"|"$4"|"$3"\n"}'`
matchedrows=`cat YMOUT15unzipped.$$|cut -c7-11,18,18-24,46,47-52,75-|grep "$field"|tr '\n' ' '`
if ["$matchedrows" = ""]
then
echo "<$line_text>" >> nomatches
else
echo -e "The row: <$line_text> Matches:$matchedrows ">> matches
fi
done < "YMOUT31unzipped.$$"
Whe i run it however, I get the following output to the screen which is baffling me as i cant see what is wrong
./script: [S2002|ICFYXA|Y01064|FDF|2002-04-29-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-05-06-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-05-13-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-05-20-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-05-27-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-06-03-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-06-10-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-06-17-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-06-24-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-07-01-00:00|1| |10080|0 : command not found
./script: [S2002|ICFYXA|Y01064|FDF|2002-04-29-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-05-06-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-05-13-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-05-20-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-05-27-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-06-03-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-06-10-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-06-17-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-06-24-00:00|1| |10080|0 S2002|ICFYXA|Y01064|FDF|2002-07-01-00:00|1| |10080|0 : command not found
what is the command not found about??????
thanks.
Linux_cat
01-05-2005, 09:21 AM
Dont worry i figured it yet again, I didn put an extra '=' in the test brackets.
Thanks anways, everyone has been very helpful.