Python: iteration speed test between list, tuple and set

Let’s start with a practical example:

We have a list of IDs that need to be processed. This list can hold from just a dozen of IDs to up to a few 100,000 IDs. Each ID is processed only once in each iteration and the order of IDs is not important.

This whole thing is presenting a bottleneck in our application, so we decided to take a look at other iterable data structures that Python offers.

To demonstrate this, we will create a simple example:

Function that generates a shuffled list of positive and negative numbers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from random import shuffle

def get_numbers(power=2):
    low_end = -10 ** power
    high_end = 10 ** power
    # generate list
    arr = [i for i in range(low_end, high_end)]
    # shuffle list
    shuffle(arr)
    # remove 25% of numbers, less numbers will have it's pair
    remove_n_elements = int(len(arr) * .25)
    arr = arr[remove_n_elements:]
    return arr

This will return:

1
2
get_numbers(2)  # list of 150 unique numbers, between -100 and 100
get_numbers(3)  # list of 1500 unique numbers, between -1000 and 1000

Then we will define a simple function that will iterate through list, tuple or set and find pairs, where the sum of both numbers equals sum:

1
2
3
4
5
6
def find_pairs(arr, sum):
    pairs = {} 
    for i in arr:
        if (sum - i) in arr:
            pairs.add((i, sum - i))
    return pairs

And let’s also create a function that will measure the execution time of find_pairs() function:

1
2
3
4
5
6
7
from time import time

def measure_time(nums, sum):
    start = time()
    pairs = find_pairs(nums, sum)
    end = time()
    return end - start  # seconds

Okay, now we can measure time, based on number of elements.

150 numbers:

1
2
3
4
numbers = get_numbers(2)         # list of 150 numbers
measure_time(numbers, 17)        # ~ 0.000271 seconds
measure_time(tuple(numbers), 17) # ~ 0.000224 seconds
measure_time(set(numbers), 17)   # ~ 0.000046 seconds

With 150 numbers, using set is ~5.89 times faster than using list.

1,500 numbers:

1
2
3
4
numbers = get_numbers(3)         # list of 1,500 numbers
measure_time(numbers, 17)        # ~ 0.025494 seconds
measure_time(tuple(numbers), 17) # ~ 0.026308 seconds
measure_time(set(numbers), 17)   # ~ 0.000418 seconds

With 1,500 numbers, using set is ~60.99 times faster than using list.

15,000 numbers:

1
2
3
4
numbers = get_numbers(4)         # list of 15,000 numbers
measure_time(numbers, 17)        # ~ 3.825252 seconds
measure_time(tuple(numbers), 17) # ~ 3.786150 seconds
measure_time(set(numbers), 17)   # ~ 0.006523 seconds

With 15,000 numbers, using set is ~586.42 times faster than using list.

150,000 numbers:

1
2
3
4
numbers = get_numbers(5)         # list of 150,000 numbers
measure_time(numbers, 17)        # ~ 834.580523 s = 13 min 54.58 sec
measure_time(tuple(numbers), 17) # ~ 854.708630 s = 14 min 14.70 sec
measure_time(set(numbers), 17)   # ~ 0.07081294 seconds

With 150,000 numbers, using set is ~11,785.70 times faster than using list.

Conclusion:

list and tuple work pretty much with same speed, but set made a big difference, here.

By using the set instead of list we significantly sped up the code.