Click to See Complete Forum and Search --> : sed - search and delete in an XML Array - Shell Script
pwharff
10-16-2003, 02:48 AM
Hello,
What I want to do is simple, I just want my shell script to delete the array from a text file (including the strings obviously). I included below what text out of the XML file that I want removed by my shell script. The only unique string is "Our Fun Vacation", so I'm thinking that will be what sed searches on since the other values appear in the XML file many times. I think this can be done with sed, but I am not sure on the syntax. I really appreciate the help since I am a newbie.
<dict>
<key>AlbumName</key>
<string>Our Fun Vacation</string>
<array>
<string>5459</string>
<string>5403</string>
<string>5402</string>
<string>5401</string>
<string>5400</string>
<string>5399</string>
</array>
</dict>
bsh152s
10-16-2003, 02:00 PM
I see a fairly easy way to do this with awk/gawk. I've never really liked sed and so I'm not too familiar with it.
This will delete (or not print) the next array after the string "Our Fun Vacation". Note that this is all one continuous command.
gawk '{if($0~/Our Fun Vacation/) {foundstring=1;} if(foundstring==1 && $0~/<array>/) {del=1;} if(del!=1) {print $0;} if(foundstring==1 && del==1 && $0~/<\/array>/) {del=0;} }' xmlfile_name > new_xmlfile_name
pwharff
10-16-2003, 07:09 PM
bsh152s
Thanks a bunch for your help. Maybe you could help me figure your code out.
First off, I want to delete all of the code from above everything from <dict> to </dict>.
Second, can you explain your awk command, I'm fairly new to awk and I've used it in several of my shell scripts before, but nothing as complicated as your command.
Thanks again for your help.
pwharff
10-17-2003, 12:46 AM
I guess I'm just gonna have to pick up a book on awk to fully understand and create my own custom awk commands.
micio
10-17-2003, 06:50 AM
I rearranged a little what bsh152s wrote and wrote it into a shell script file rather than using single command line. As you already understood, you'd better read something about awk, it's great and worth studying, moreover, it is not too difficult. The best you can do is to download gawk guide from GNU site.
Anyway, this is a short explanation: awk parses the doc line by line and extract tokens. Tokens are named $1, $2 .... while $0 is the whole line.
The syntax: $0 ~ /Our Fun Vacation/ is true if the line content ($0) matches the regular expression "Our Fun Vacation".
So if match does not occur del != 1 and awk prints the line, when a match is found, the next line matches <array> thus del=1 and no print occurs. When awk finds </array> del=0 and awk restart printing.
#!/bin/awk -f
{
if( $0 ~ /Our Fun Vacation/ ) {
foundstring=1;
}
if( foundstring==1 && $0 ~ /<array>/) {
del=1;
}
if (del!=1) {
print $0;
}
if (foundstring==1 && del==1 && $0~ /<\/array>/ ) {
del=0;
}
}
bsh152s
10-17-2003, 08:42 AM
Originally posted by pwharff
bsh152s
First off, I want to delete all of the code from above everything from <dict> to </dict>.
I guess I misinterpreted your question.
Anyway, if you want to delete everything between <dict> and </dict>, here's what you'll need to do (in English):
First, you'll need to find an instance of <dict>. If one hasn't been found, you'll just print the line. Once one is found, you'll then need to start looking for the "Our Fun Vacation" string or </dict>. While doing this, you'll need to keep track of all of the lines after the <dict>. If </dict> is found before "Our Fun Vacation", you'll need to print out all of these lines. If "Our Fun Vacation is found first, then you'll skip all lines up to </dict>.
By the way, here are a few links for awk tutorials: http://robert.wsi.edu.pl/awk/start.html, http://www.cs.hmc.edu/tech_docs/qref/awk.html, http://www.cs.uu.nl/docs/vakken/st/nawk/nawk_toc.html. Seriously, take a look at it because it is a very powerful tool.
pwharff
10-17-2003, 08:42 PM
Thanks a bunch guys for the help. Sometimes it's frustrating being a newbi. :) But I am learning!