Click to See Complete Forum and Search --> : Text file issues: Removing Tabs and Unwanted Control Characters


evac-q8r
08-30-2004, 11:33 AM
When I copy statistics over to a file from a sports page there are alot of tabs. Is there a way to remove all tabs using something like sed or awk (or maybe perl). Is there a way to view all of the characters which are invisible or otherwise do not show up ordinarily in the files using an editor. I don't use emacs (by the way). Are there any special control characters I need to be aware of when cleaning up files like these.

Thanks a Million,

EVAC

jim mcnamara
08-30-2004, 12:06 PM
try tr - maybe something like this

cat junkfilled | tr -d '\t' | cleanfile

will get rid of tab chars for example.

tr supports classes of chars - [:alphanum:] and you use these with
-c -d to get rid of all characters except the ones in the character class.

bwkaz
08-30-2004, 09:04 PM
Useless Use Of Cat Award time! :p

Try something like tr -d '\t' <junkfilled >cleanfile instead, to remove the Useless Use Of Cat.

Otherwise, if you want to remove control characters and whatnot, try sed -e 's/[[:cntrl:]\t]//g' junkfilled >cleanfile -- the first [ introduces a set of characters, the [:cntrl:] matches any control character, the \t matches the tab character, and the last ] closes off the character set. The s/ says "match the regex after this", and the // says "replace it with what's between us", in other words nothing. The g says "replace all occurrences, not just the first".

evac-q8r
08-31-2004, 09:30 PM
Did I post something illegal? I could have sworn that I posted some statistics and now I think they have disappeared. Maybe I just forget to submit it after I preview my text. Anyways, thanks for the help. I was able to tackle my situation.

EVAC

bwkaz
09-01-2004, 07:16 PM
I didn't delete anything in this thread... hmm...

Anyway, if it's working, good!