Click to See Complete Forum and Search --> : Combining Columns in a text file


lagdawg
06-07-2007, 10:12 AM
I recently had the need to take a file and copy the first columns and add paste it in between the second and third columns in a text file.

IE the file with columns A-D:

A B C D

would become:

A B A C D

I was able to accomplish this by using the cut command to create three txt files that looked like:

A B
A
C D

and then used the paste command to combine the three files into one with the proper columns. While this worked okay, the original file resides on Windows system so to use the cut and paste command I had to ftp the file to the UNIX server and then FTP it back to my computer. Otherwise I would have had to use excel to make the change.

I am interested in other more efficient ways of accomplishing this same feat. The example above is much simplified as the files I am dealing with may have up to one hundred columns and up to 100,000 lines of text to sift through. So using the cut command creates alot of extra files on the unix server that I must keep track of. Using Excel is just a pain. So any command line tricks that obtain the same result I would be interested in.

As a side note, our windows system includes the UXDOS suite of tools that try to emulate the UNIX command line, however these commands do not include cut and paste. So if there is a way to do this with commands included in UXDOS it would save me the trouble of FTPing the files back and forth.

TIA

ph34r
06-07-2007, 10:31 AM
You can get win32 versions of most of the gnu utils ... cut is included I know that... google for gnu32

JuiceWVU202
06-07-2007, 05:57 PM
seems to me like awk would work great for this
awk 'print{$1 $2 $1 $3 $4}'

hotcold
06-07-2007, 09:51 PM
Hi, lagdawg.

I have often been unsatisfied with cut in Linux. I put together a little perl script that does what I thought cut should do -- write the columns in the order the user specifies. I wanted it to handle ranges as does cut, like 53-101, for convenience, as well as being able to change the field separator.

The code is not yet complete, but here's a test with your data in a number of forms, as called from a shell script:
#!/bin/sh

# @(#) s3 Beta test perl script arrange.

A=./arrange

FILE=data2
echo
echo " $FILE:"
cat $FILE
echo

$A -c "1 2 1 3 4" $FILE

FILE=data3
echo
echo " $FILE:"
cat $FILE
echo

$A -s, -c "1 2 1 3 4" $FILE

FILE=data4
echo
echo " $FILE:"
cat $FILE
echo

$A -c "1-2 1 3-4" -s "|" $FILE

FILE=data5
echo
echo " $FILE:"
cat $FILE
echo

$A -c "1-2 1 3-4" -s "|" $FILE

exit 0

which produces:
% ./s3

data2:
A B C D

A B A C D

data3:
A,B,C,D

A,B,A,C,D

data4:
A|B|C|D

A|B|A|C|D

data5:
A A|B B| C C |D D

A A|B B|A A| C C |D D
If you had perl on the Windows box, you might be able to run it there, although perhaps not from a shell script ... cheers, hotcold

ghostdog74
06-08-2007, 02:28 AM
here's a vbscript for use in windows

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("c:\temp\yourfile", 1)
Do Until objFile.AtEndOfStream
strNextLine = objFile.ReadLine
MyArray = Split(strNextLine, " ", -1, 1)
first = MyArray(0)
newA = InsertAfter(MyArray, 1, " ", first)
WScript.Echo Join(newA," ")
Loop
objFile.Close
Function InsertAfter(TheArray, TheLocation, TheDelimiter, strNew)
TheArray(TheLocation) = TheArray(TheLocation) & TheDelimiter & strNew
strTemp = join(TheArray, TheDelimiter)
InsertAfter = split(strTemp, TheDelimiter)
End Function

lagdawg
06-08-2007, 09:02 AM
ghostdog, I am trying to stay away from VBscript as I want something that is more platform independent.

Hotcold, I have access to perl as I use it quite often. I would be interested in your perl program.

JuiceWVU, while I can use AWK your command isn't returning anything for me.

Here is a shell script I whipped up recently:

#Extract columns needed (A B), (A), (C D)
cut -d' ' -f1,2 $1 >$1.1
cut -d' ' -f1 $1 >$1.2
cut -d' ' -f3- $1 >$1.3

#Combine columns into one file
paste $1.1 $1.2 $1.3 >$1.out

#Remove temp files created during the process.
rm $1.1 $1.2 $1.3


But still this code isn't very flexible and I would need to change it everytime the columns changed which is why your perl script is interesting to me hotcold.

Thanks

hotcold
06-08-2007, 10:08 AM
Hi.

This should work for your purposes. I don't know how Windows will pass along parameters to perl scripts -- that's certainly a key issue in the convenience of arrange.
#!/usr/bin/perl

# @(#) arrange Arrange columns in user-specified order.
# $Id: arrange,v 1.3 2007/06/08 13:59:08 hotcold Exp hotcold $

# Beta version; no warranties, no maintenance.

use warnings;
use strict;

my ($debug);
$debug = 1;
$debug = 0;

my ( $c, $s );
my ( $i, $j );
my ( @a, @columns, @count );
my ( $first, $last, $leftmost, $howmany );
our ( $opt_a, $opt_b, $opt_c, $opt_s );
$opt_a = $opt_b = $opt_c = $opt_s = undef;

use Getopt::Std;
print " args before getopts :@ARGV:\n" if $debug;
getopts("c:s:");
print " args after getopts :@ARGV:\n" if $debug;

# $a = defined($opt_a) ? "a-opposite" : "a-default (switch)";
# $b = defined($opt_b) ? "b-opposite" : "b-default (switch)";
$c = defined($opt_c) ? $opt_c : "c-default";
$s = defined($opt_s) ? $opt_s : " ";

# my(@arg_names) = qw/ a b c d /;
my (@arg_names) = qw/ c s /;

$j = 0;

# foreach $i ( ($a,$b,$c,$d) ) {
foreach $i ( ( $c, $s ) ) {
if ( defined($i) ) {
print "item $j ($arg_names[$j]) is :$i:\n" if $debug;
}
else {
print "item $j ($arg_names[$j]) is not defined.\n" if $debug;
}
$j++;
}

@columns = split( /[ ,]/, $c );
print " columns :@columns:\n" if $debug;

# foreach $i ( @columns ) {
for ( $i = 0; $i <= $#columns; $i++ ) {
( $first, $last ) = split( /[-]/, $columns[$i] );
if ( not defined $last ) {
push @count, 1;
}
else {
push @count, $last - $first + 1;
}
$columns[$i] = $first - 1;
}
print " columns :@columns:\n" if $debug;
print " count :@count:\n" if $debug;

# Insert processing code here.

while (<>) {
chomp;
@a = split /[$s]/;
print " line $. split :@a:\n" if $debug;
$leftmost = 1;

for ( $j = 0; $j <= $#columns; $j++ ) {
$first = $columns[$j];
$howmany = $count[$j];
for ( $i = $first; $i < $first + $howmany; $i++ ) {
if ( not $leftmost ) {
printf( "%s%s", $s, $a[$i] );
}
else {
printf( "%s", $a[$i] );
$leftmost = 0;
}
}
}

printf("\n");
}

exit(0);
Best wishes ... cheers, hotcold

ghostdog74
06-08-2007, 12:31 PM
ghostdog, I am trying to stay away from VBscript as I want something that is more platform independent.

sure, since you can use awk, here's an awk implemenation

awk '{ printf $1" "$2" "$1" "
for (i=3;i<=NF;i++){
printf $i" "
}
print ""
}' "file"




JuiceWVU, while I can use AWK your command isn't returning anything for me.

awk need a file to process. most probably you forgot to put the file name.