February 08, 2024

Iterate through zip(*product(…)) in batches

To get all possible unique combinations between multiple lists, we can achieve that by using the zip(*product(list_1, list_2, ...)), where:

the product() function produces a cartesian product of input iterables; and then
the zip() function iterates over these iterables in parallel, producing tuples with an item from each one.

In practice it looks like this:

import numpy as np
from itertools import product

range_1 = np.arange(1, 5, step=1, dtype=int)
range_2 = np.arange(1, 5, step=1, dtype=int)

combs_1, combs_2 = zip(*product(range_1, range_2))

print("combinations 1:", combs_1)
print("combinations 2:", combs_2)

Which prints out:

combinations 1: (1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4)
combinations 2: (1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4)

Where each column presents a unique combination.

In this case range_1 has 4 elements and range_2 has another 4 elements, which results in 2 lists where each contains 16 elements.

This is all good when we deal with short lists. What about when we have something more demanding?

import numpy as np
from itertools import product

range_1 = np.arange(1, 50, step=1, dtype=int)
range_2 = np.arange(1, 50, step=1, dtype=int)
range_3 = np.arange(2, 30, step=1, dtype=int)
range_4 = np.arange(0.1, 30, step=0.1, dtype=float)
range_5 = np.arange(0.1, 30, step=0.1, dtype=float)

combs_1, combs_2, combs_3, combs_4, combs_5 = zip(*product(range_1, range_2, range_3, range_4, range_5))

Here, we would create 5 lists, each containing 56538748 elements, which would be very demanding for our memory.

Optimization

Luckily for us, both of these functions are iterators, which means they produce a sequence of results one at a time as we iterate over them.

We can create a generator to control the amount of data provided to the zip() function. Let’s go ahead and write the code for this:

from itertools import product

def zip_product_in_batches(*iterables, size=1000):
    batch = []
    for combination in product(*iterables):
        batch.append(combination)
        if len(batch) >= size:
            yield zip(*batch)
            batch = []
    
    yield zip(*batch)

Now we can process only a manageable numbers of elements at a time:

import numpy as np

range_1 = np.arange(1, 5, step=1, dtype=int)
range_2 = np.arange(1, 5, step=1, dtype=int)

# Our generator will yield the lists multiple times,
# therefore we should use it in a for-loop
for combs_1, combs_2 in zip_product_in_batches(range_1, range_2, size=5):
    print("combinations 1:", combs_1)
    print("combinations 2:", combs_2)
    print("-" * 35)

In this case it returns only 5 elements per list in each iteration:

list_1: (1, 1, 1, 1, 2)
list_2: (1, 2, 3, 4, 1)
-------------------------
list_1: (2, 2, 2, 3, 3)
list_2: (2, 3, 4, 1, 2)
-------------------------
list_1: (3, 3, 4, 4, 4)
list_2: (3, 4, 1, 2, 3)
-------------------------
list_1: (4,)
list_2: (4,)
-------------------------

Conclusion

We can now iterate through a manageable number of combinations at a time, process them, save the results, and proceed to the next batch — ensuring that we never run out of memory.

5 1 vote

Article Rating