This simple problem of finding a pair of numbers in a list of random integers, which adds up to a specific number, is easy to get working, but hard to optimize. So we will go through the various implementation strategies and see it metamorph to the final accepted version.
Implementation iterations
1. Maintain a cache of the left number seen. If that number has not generated a pass in the past ignore it. Also maintain an array of left , right indexes which have generated the result. This got the basic tests running, but the final submission timed out.
2. Maintain no cache of previously seen numbers, also maintain a tuple of (left index, right index, gap between two indices). Only if the gap is less for a new no# then overwrite the old entry. It iterates through the whole list hence not a great idea.
3. No need of cache, search only till the new idx + gap indexes, as any further we will get a pair which is farther away. Again as all the indices are being searched not an runtime efficient option.
def sum_pairs(ints, s):
# startidx, endidx, gap
nearest_pair = ()
ints_len = len(ints)
for i in range(ints_len):
diff_s = s - ints[i]
try:
end_idx = ints_len
if nearest_pair and nearest_pair[2] < ints_len:
end_idx = i + nearest_pair[2]
idx = ints.index(diff_s, i+1, end_idx)
gap = idx - i
if not nearest_pair or (nearest_pair[2] > gap):
nearest_pair = (i,idx, gap)
except :
pass
if nearest_pair and nearest_pair[2] == 1:
break
if not nearest_pair:
return None
else:
# sort the cache by start index & gap between indices
return [ints[nearest_pair[0]], ints[nearest_pair[1]]]
4. We need to find the first two entries only, and then just return the one which is earlier. In worst case its O(n^2), but in best case it will be O(2n) only. But the codewars bot says its still too long running.
def sum_pairs(ints, s):
# startidx, endidx, gap
tuples_list = []
ints_len = len(ints)
for i in range(ints_len):
diff_s = s - ints[i]
end_idx = ints_len
if tuples_list:
end_idx = tuples_list[0][1]
for z in range(i + 1, end_idx):
if ints[z] == diff_s:
tuples_list.append((i, z))
break
if len(tuples_list) == 2:
if tuples_list[0][1] > tuples_list[1][1]:
return [ints[tuples_list[1][0]],ints[tuples_list[1][1]]]
return [ints[tuples_list[0][0]],ints[tuples_list[0][1]]]
5. My accepted solution#1: This involves using a cache to note numbers that have not generated any result. Then the same can be used to skip iterations. The worst case we will have O(n^2) with additional n entries needed for storage. Not too elegant, lots of verbose coding style.
def sum_pairs(ints, s):
# startidx, endidx, gap
tuples_list = []
nums_cache = []
ints_len = len(ints)
for i in range(ints_len):
if ints[i] in nums_cache:
continue
diff_s = s - ints[i]
end_idx = ints_len
if tuples_list:
end_idx = tuples_list[0][1]
num_seen = False
for z in range(i + 1, end_idx):
if ints[z] == diff_s:
tuples_list.append((i, z))
num_seen = True
break
if not num_seen:
nums_cache.append(ints[i])
continue
if len(tuples_list) == 2:
fe = tuples_list[0]
se = tuples_list[1]
if fe[1] > se[1]:
return [ints[se[0]],ints[se[1]]]
return [ints[fe[0]],ints[fe[1]]]
if not tuples_list:
return None
if len(tuples_list) == 1:
return [ints[tuples_list[0][0]],ints[tuples_list[0][1]]]
6. My accepted solution#2: I got a good use of for else and was able to shave off a few more milliseconds. This version adopts a more pythonic style.
def sum_pairs(ints, s):
# startidx, endidx, gap
tuples_list = []
nums_cache = []
ints_len = len(ints)
for i in range(ints_len):
if ints[i] in nums_cache:
continue
diff_s = s - ints[i]
end_idx = ints_len if not tuples_list else tuples_list[0][1]
for z in range(i + 1, end_idx):
if ints[z] == diff_s:
tuples_list.append((i, z))
break
else:
nums_cache.append(ints[i])
if len(tuples_list) == 2:
fe = tuples_list[0]
se = tuples_list[1]
if fe[1] > se[1]:
return [ints[se[0]],ints[se[1]]]
return [ints[fe[0]],ints[fe[1]]]
if not tuples_list:
return None
if len(tuples_list) == 1:
return [ints[tuples_list[0][0]],ints[tuples_list[0][1]]]
7. Optimal solution(Not mine) : This simple solution shines as the no of lines is 1/5th of my implementation and still maintains a cache and gets the entries as well. Elegant and really optimal.
def sum_pairs(list, sum):
s = set()
for v in list:
if (sum - v) in s:
return [sum - v, v]
s.add(v)
Lessons learnt
The optimal solution here is implemented in 1/5th of the lines and takes 1/3rd of the running time. So some insights while trying to figure the solution are.
- The optimal solution just focuses on getting the numbers which add up to the required target sum. It does not maintain indexes to get the one with the smallest gap. This eliminates all the need to cache, compare and return the correct indexes.
- Set is slower than list in append operations. So where ever possible use a list as its append() makes it super fast than set’s add().