Population puzzle

Geoff Canyon gcanyon at gmail.com
Fri Sep 26 21:22:47 EDT 2014


This is a fun problem.

My first, nearly brute force solution simply maintained an array with the
keys being the sum of the value lists stored in the array -- so X[5] might
contain 2,3. The only optimization inherent in this is that it doesn't
worry about duplicate sums along the way. So if there are many duplicate
sums, it will work okay. It won't solve this 26-value problem in a
reasonable time (or maybe at all -- likely it will crash):

function subsetSumBrute L,T
   repeat for each item N in L
      repeat for each line K in the keys of R
         if R[K+N] is empty then put R[K],N into R[K+N]
      end repeat
      put N into R[N]
      if R[T] is not empty then return R[T]
   end repeat
   return "no solution"
end subsetSumBrute

The obvious optimization is not to worry about sums greater than the target
value. That helps, but it's still slow: checking only the first 22
populations (which doesn't return a solution) takes about 14 seconds on my
laptop, and the resulting working array hits over 2 million entries. Trying
to run it on the full 26 value list crashes after about 30 seconds, and it
seems the working array might be trying to reach >30 million entries. Here
it is:

function subsetSumBetter L,T
   repeat for each item N in L
      repeat for each line K in the keys of R
         if K+N <= T and R[K+N] is empty then put R[K],N into R[K+N]
      end repeat
      put N into R[N]
      if R[T] is not empty then return R[T]
   end repeat
   return "no solution"
end subsetSumBetter

I sorted the numbers in descending order, which improves the "larger than
the goal" optimization, but the optimization that did the trick was to
maintain the overall sum of the remaining numbers, and delete any entries
in the working array that can no longer reach the total. For the 26 values
here it reduced the maximum size reached by the working array to just
217,523 elements, 1/10th the number subsetSumBetter used just for 22
values. Further, where subsetSumBetter takes about 14 seconds to check 22
values, this function solves the 26 value problem in under 1.5 seconds.

function subsetSum L,T
   sort items of L descending numeric
   put sum(L) into S
   repeat for each item N in L
      repeat for each line K in the keys of R
         if K + S < T then
            delete variable R[K]
            next repeat
         end if
         if K+N <= T and R[K+N] is empty then put R[K],N into R[K+N]
      end repeat
      put N into R[N]
      if R[T] is not empty then return R[T]
      subtract N from S
   end repeat
   return "no solution"
end subsetSum

Those are all mine, and I like that last one, but it's not as good as this
solution, which I created after reading the wikipedia article at
https://en.wikipedia.org/wiki/Subset_sum_problem

This splits the original list in two by size, largest values in one,
smallest in the other. It calculates all the possible sums for each list.
It parses through the lists, largest to smallest for the large value list,
and smallest to largest for the small value list. It checks each pair to
see if it's a solution, and bails on the low list when the sum exceeds the
target because the values from the small value are getting larger and no
more will work. It also checks each list to see if the target value is in
one of them alone.

For this problem it builds two arrays of about 8,000 elements each and then
pairs them off. It finds the solution in about 0.16 seconds on my computer.

function subsetSumBi L,T
   sort items of L descending numeric
   put (the number of items of L) div 2 into B
   put allSubsetSums(item 1 to B of L) into highList
   put the keys of highList into HLK
   sort lines of HLK descending numeric
   put allSubsetSums(item B + 1 to -1 of L) into lowList
   put the keys of lowList into LLK
   sort lines of LLK ascending numeric

   repeat for each line LL in LLK
      if LL = T then return lowList[LL]
   end repeat
   repeat for each line HL in HLK
      if HL = T then return highList[HL]
      repeat for each line LL in LLK
         if HL + LL = T then return highList[HL],lowList[LL]
         if HL + LL > T then exit repeat
      end repeat
   end repeat
   return "no solution"
end subsetSumBi

function allSubsetSums L
   repeat for each item N in L
      repeat for each line K in the keys of R
         if R[K+N] is empty then put R[K],N into R[K+N]
      end repeat
      put N into R[N]
   end repeat
   return R
end allSubsetSums



More information about the use-livecode mailing list