Click to See Complete Forum and Search --> : Word replacement in a large group of files


Fryguy8
01-07-2005, 01:29 AM
Say i've got a LARGE amount of files in a directory structure, and I want to replace every instance of the word "hello" with "goodbye". What's the easiest way to do this?

DrChuck
01-07-2005, 08:25 AM
For each text file, I would suggest a "sed" filter to do the replacement. To decend through a directory tree, you could wrap a bash script around this, using pushd and popd to move thru the subdirectories. Here is an example from my collection:
BEGIN CODE
---------------------------
#!/bin/bash
down_1_level ()
{
# this function is called recursively to decend through a directory tree
list=( `ls` ) # note - `ls` does not handle spaces in file names
for name in ${list[@]}; do
if [ -d $name ]; then #test for directories
pushd $name &> /dev/null # push $PWD to the stack, and enter directory $name
echo "push "$PWD
down_1_level # repeat on the current directory
popd &> /dev/null # pop the previous directory off the stack
echo "pop "$PWD
elif [ -f $name ]; then # test for regular files
echo "file "$name
# sed filter goes here
else
echo "D_OH!"
fi
done
}
down_1_level
exit
---------------------------
END CODE

Hope this helps,

Fryguy8
01-07-2005, 12:47 PM
would something like


grep -ril [keyword] * | xargs <sed command>


work?

ph34r
01-07-2005, 01:14 PM
for i in `ls -1`
do
cat $i | sed s/hello/goodbye/g > $i-fixed
done

bwkaz
01-07-2005, 08:19 PM
Originally posted by ph34r
for i in `ls -1` As long as no filenames have spaces in them... Come to think of it, I believe xargs will also break if filenames have spaces (but if not, then it will break if they have newlines, which are also valid characters in filenames).

Try this instead:

find . -type f -exec sed -i.bak s/hello/goodbye/g {} \;

Handles spaces in filenames with ease. ;) You have to have GNU sed 4.0 or higher for the -i.bak part to work, though. (It does in-place substitution, after copying the original file to <filename>.bak as a backup.) If you have GNU sed 3.x, or non-GNU sed, then you'll have to do some magic escaping for the sed command to redirect its output, but it is possible.

find is also recursive by default, so no messy huge bash scripts. :p

pickarooney
01-08-2005, 07:37 AM
Originally posted by ph34r
for i in `ls -1`
do
cat $i | sed s/hello/goodbye/g > $i-fixed
done

ls -1|while read i
do
cat "$i" | sed s/hello/goodbye/g > "$i"-fixed
done

should avoid problems with spaces

bwkaz
01-08-2005, 10:53 AM
Originally posted by pickarooney
should avoid problems with spaces But not newlines, which are legal in filenames also.

EVERYTHING except forward slash (/) and the byte with value 0 is legal in a filename.

pickarooney
01-10-2005, 03:41 PM
Just out of curiosity, how would a filename with a newline display when you list its parent directory or look at in in konqueror?

bwkaz
01-10-2005, 08:13 PM
Like this:

$ touch 'test
> file'
$ ls
test?file which won't work in your loop. But it also depends on the options you give to ls, and your version of that utility. It has a -q option, which the manpage says is "allowed to be the default option when outputting to a terminal" in POSIX mode (but is always the default in GNU ls). Of course, when piping the output into a loop, it's not outputting to a terminal, so that may not be the default (I think that it is, however). There is also a -b option (with a few GNU long option equivalents), which tells it to output non-graphic characters as escapes. That does this:

$ ls -b
test\nfile (which still won't work in the loop). There is also a -Q option, which does this:

$ ls -Q
"test\nfile" which also won't work (the \n can't be in there, it has to be an actual newline). There is also a --quoting-style long option, which when set to "shell" or "shell-always" sounds like it might work, except:

$ ls --quoting-style=shell
'test?file'
$ ls --quoting-style=shell-always
'test?file' so nope.

In short: None of these options work. You need to either use find, or use a loop constructed like:

for file in * ; do
# stuff...
done if you don't want it to be recursive.