Click to See Complete Forum and Search --> : comparing


reaky
05-03-2007, 09:36 PM
Hello all
I have two files named.1 named.2 that had a list of my domains like this
For example
name1

omain1.com
domain2.com
domain3.cc
domain4.net
domain5.tv
etc..

named2

domain5.tv
domain3.cc
domain2.com
domain4.net
domain1.com
etc..

I just want it to get the domains that are in one of them and not in the other, To add it to the other one.
Note that the domains are note arranged in the both files

deathadder
05-04-2007, 05:06 AM
Is python ok?
#!/usr/bin/env python

import os, sys

def Usage():
print '\nUsage: ' + str(sys.argv[0]) + ' file1 file2\n'

def printArray(arrayName):
for item in arrayName:
print item

def checkArray(array1, array2, diffArray):
for line in array1:
find = 'no'
for item in array2:
if item == line:
find = 'yes'
break
if find == 'no':
diffArray.append(line)

firstList = []
seconList = []
diffArray = []
diffArray1 = []

if len(sys.argv) != 3:
Usage(); sys.exit(1)

if not os.path.isfile(sys.argv[1]):
Usage(); sys.exit(1)
elif not os.path.isfile(sys.argv[2]):
Usage(); sys.exit(1)

testFile = open(sys.argv[1], 'r')

for line in testFile.readlines():
firstList.append(line)

testFile.close()

secondFile = open(sys.argv[2], 'r')

for line in secondFile.readlines():
seconList.append(line)

secondFile.close()

checkArray(firstList, seconList, diffArray)
checkArray(seconList, firstList, diffArray1)

printArray(diffArray)
printArray(diffArray1)

test1.txt contains:
one
three
two
twerw
ten
seven
test2.txt contains:
three
two
twerw
seven
one
wooooooooo
The output is:

ten

wooooooooo

deathadder
05-04-2007, 08:36 AM
Out of my bordom I didn't really think...that sounds like homework. Oops :o

Well if it is homework, sorry I didn't think, at least it's not commented code, so reaky will have to figure out what's happening to explain it.

Bordom can really be a burden at times :)

lagdawg
05-04-2007, 09:30 AM
How about:

diff text1.txt text2.txt | grep \> | cut -d' ' -f2 >>text1.txt

ghostdog74
05-04-2007, 01:17 PM
Is python ok?

....
testFile = open(sys.argv[1], 'r')

for line in testFile.readlines():
firstList.append(line)

testFile.close()
....


just a small note:
there is no need to do readlines() once a file handler is "initialized". This is sufficient

testFile = open(sys.argv[1], 'r')
for line in testFile:
#do something
....

Also, another syntax for iterating over a file

for line in open("file"):
.....

also, for small lists , the set module can be used

>>> file1 = set([i.strip() for i in open("file1").readlines()])
>>> file2 = set([i.strip() for i in open("file2").readlines()])
>>> print file1
set(['seven', 'ten', 'twerw', 'three', 'two', 'one'])
>>> print file2
set(['seven', 'twerw', 'three', 'one', 'two', 'wooooooooo'])
>>> print file1.symmetric_difference(file2)
set(['wooooooooo', 'ten'])
>>>

deathadder
05-04-2007, 01:27 PM
I had no idea about that what so ever, thanks for the info! :)

reaky
05-04-2007, 02:59 PM
Thanx so much friends
This's was so helpfull

Yours
Reaky :)

neuron
05-10-2007, 05:14 AM
eh, a little late, but i just think the diff is good enough, but get the files sorted first and with uniqe line

cat file |sort|uniq