Click to See Complete Forum and Search --> : How to remove blank lines by using hex?


MkIII_Supra
05-17-2007, 06:58 PM
Okay, here is the deal. I have tried the 3 following methods to remove blank lines from .csv files that I have (205 of these files).

Grep first:
grep -v '^$' Meter\ History\ 61.csv > Meter\ History\ 61A.csv

Sed second:
sed '/^$/d' Meter\ History\ 61.csv > Meter\ History\ 61A.csv

Cat third:
cat -s Meter\ History\ 61.csv | tr -d "\f" | lpr

None of these worked. So I opened one of the files up in KHexEdit to see what gives. The "blank lines" show up as 4 dots with a value of 0d 0a 0d 0a.

The KHexEdit settings are stream length Fixed 8 bit. The four dots show in the editor show up as a blank line in OpenOffice AND Kate. I already have a bash script to cat these all together, but I have to figure out how the heck to get rid of the doggone blank lines, without opening each file up individually.

Attached is a sample file, change the extension to .csv, since I can't upload a .csv here I had to change it to a .txt. I am stumped (again).

Thank you,

MkIII_Supra
05-17-2007, 07:29 PM
Okay... I found an answer that seems to work!

grep -E '[0-9]+' Meter\ History\ 1.csv > 01.csv

This seems to be doing the trick, so party on!

ghostdog74
05-17-2007, 11:49 PM
assuming printable characters you want:

awk '/[[:print:]]/{print}' file > newfile

MkIII_Supra
05-18-2007, 06:27 PM
ghostdog74 - Thanks! That worked great! But I am curious, why did awk work and the other didn't?

bwkaz
05-18-2007, 07:12 PM
Those lines aren't blank. :p Yes, they look blank, but since they're using Windows newline sequences (CR+LF), they don't get treated as blank by any of the standard Linux text utilities (which treat single LF characters as newlines). To all the standard Linux text utilities, those lines would match "^<CR>$" (assuming you replace <CR> with an actual CR character), not ^$.

But here's the confusing part: Most Linux text editors (...well, at least vim and gvim; I'm not sure about emacs / kate / whatever else) will autodetect the line ending being used by the file, and suppress the extra CR characters if all the lines have them. So the line still looks blank, until you use a hex editor.

That awk script works because instead of matching empty lines, it's matching lines with any number of printable characters on them. (The CR character is non-printable.)

An alternative would be to run the file through dos2unix, then do any of your sed/grep -v/whatever commands on it, then run it back through unix2dos. (I believe both of those commands can be used in a pipeline, too.) This will lose information if any line is supposed to only have an LF character in it, but those shouldn't happen in CSV files.