# Finding non-common elements in two arrays

Buster wouter.abraham at scarlet.be
Sun Nov 6 19:05:58 EST 2005

```"Jim Ault"  <JimAultWins at yahoo.com>  wrote:

> One catch I can see is the "set whole matches to true"
> also considering the false hits generated by your definition of a
> unique
> line (lower case, sub string, number format)
>  "Mary had a little lamb" = line 6 of field 2
> "Mary had a little lamb,  whose fleece was white" = line 8 of field 1
> line 6 of fld 2 is in line 8 of fld 1 => lineoffset would be > 0
>
> "234" & "2345" == offset match, lineOffset not
> "234" & "2,345" == offset match not,  lineOffset not
> "234" & "2345.00 == offset match, lineOffset not
> "234" & "2345, 554, 234, 196" == lineoffset match twice
> "snow"  & "snow shovel" & "snowbound" & "snow-bound"
>
> Jim Ault
> Las Vegas

Good catch :-)

"Alex Tweedly"  <alex at tweedly.net >  wrote:

-snip-

> >     put fld "Field" & cr & "ZZZZZZZZZZ" into t1
> >     put fld "Field" & cr & "test line" & cr  & "ZZZZZZZZZZ" into t2
> >
> >     put the millisecs into tStart
> >     put 1 into i2
> >     put the number of lines in t2 into limit2
> >
> >     sort t1
>
> >     sort t2
> >     split t2 by CR
> >     put t2[1] into L2
> >
> >     repeat for each line L1 in t1
> >         repeat while L2 < L1
> >             add 1 to i2
> >             put t2[i2] into L2
> >         end repeat
> >         if L2 = L1 then
> >             -- put L1 & cr after tBoth
> >             add 1 to i2
> >             put t2[i2] into L2
> >         else
> >             -- put L1 & cr after t1only
> >         end if
> >     end repeat
> >     if i2 < limit2 then
> >         repeat with i = i2 to limit2-1
> >             put t2[i] & cr after t2only
> >         end repeat
> >     end if
> >     put "loop" && the millisecs - tStart & cr after msg
>
>
> P.S. I tried hard to break every one of Jerry's recommendation about
> variable naming as described in his excellent tutorial from the
> stack, you should. It *might* just stop you from writing such ugly
> code
> as I did above - but my old Fortran habits just keep coming back :-)
>
>
> --
> Alex Tweedly       http://www.tweedly.net

The handler above is not giving correct results, neither on numeric
lists nor on word or mixed lists.

Follows a function which is a combination and adaptation of
techniques mentioned previously in this thread

### adapt the names of handler and the filtermodes to own taste

function intersectSpecial pList1,pList2,pMode
repeat for each line i in pList1
end repeat
repeat for each line i in pList2
end repeat
combine a with cr and tab
### elements only in pList1 --> 1
### elements only in pList2 --> 2
### elements in both lists     --> 3
if pMode = "bothCommon" then put "*"&tab&"3" into tFilter
else  if pMode = "uniqueA" then put "*"&tab&"1" into tFilter
else if pMode = "uniqueB" then put "*"&tab&"2" into tFilter
else if pMode = "bothUnique" then put "*"&tab&"1,*" &tab&"2" into
tFilter
repeat for each item tFilterString in tFilter
put a into b
filter b with tFilterString
replace char 2 to -1 of tFilterString with "" in b
put b & cr after tList
end repeat
return tList
end intersectSpecial

on mouseUp
put the millisecs into zap
put intersectSpecial(fld 1,fld 2,"bothUnique") into fld 3
put the millisecs - zap
end mouseUp

May be not a real speed monster but not bad either
(takes < 500 millisecs for 2 fields with > 25000 lines on an iMac G5
1.8 gHz)

Greetings,
Wouter

```