To get all possible unique combinations between multiple lists, we can achieve that by using the zip(*product(list_1, list_2, ...))
, where:
- the
product()
function produces a cartesian product of input iterables; and then - the
zip()
function iterates over these iterables in parallel, producing tuples with an item from each one.
In practice it looks like this:
import numpy as np from itertools import product range_1 = np.arange(1, 5, step=1, dtype=int) range_2 = np.arange(1, 5, step=1, dtype=int) combs_1, combs_2 = zip(*product(range_1, range_2)) print("combinations 1:", combs_1) print("combinations 2:", combs_2)
Which prints out:
combinations 1: (1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4) combinations 2: (1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4)
Where each column presents a unique combination.
In this case range_1
has 4 elements and range_2
has another 4 elements, which results in 2 lists where each contains 16 elements.
This is all good when we deal with short lists. What about when we have something more demanding?
import numpy as np from itertools import product range_1 = np.arange(1, 50, step=1, dtype=int) range_2 = np.arange(1, 50, step=1, dtype=int) range_3 = np.arange(2, 30, step=1, dtype=int) range_4 = np.arange(0.1, 30, step=0.1, dtype=float) range_5 = np.arange(0.1, 30, step=0.1, dtype=float) combs_1, combs_2, combs_3, combs_4, combs_5 = zip(*product(range_1, range_2, range_3, range_4, range_5))
Here, we would create 5 lists, each containing 56538748 elements, which would be very demanding for our memory.
Optimization
Luckily for us, both of these functions are iterators, which means they produce a sequence of results one at a time as we iterate over them.
We can create a generator to control the amount of data provided to the zip()
function. Let’s go ahead and write the code for this:
from itertools import product def zip_product_in_batches(*iterables, size=1000): batch = [] for combination in product(*iterables): batch.append(combination) if len(batch) >= size: yield zip(*batch) batch = [] yield zip(*batch)
Now we can process only a manageable numbers of elements at a time:
import numpy as np range_1 = np.arange(1, 5, step=1, dtype=int) range_2 = np.arange(1, 5, step=1, dtype=int) # Our generator will yield the lists multiple times, # therefore we should use it in a for-loop for combs_1, combs_2 in zip_product_in_batches(range_1, range_2, size=5): print("combinations 1:", combs_1) print("combinations 2:", combs_2) print("-" * 35)
In this case it returns only 5 elements per list in each iteration:
list_1: (1, 1, 1, 1, 2) list_2: (1, 2, 3, 4, 1) ------------------------- list_1: (2, 2, 2, 3, 3) list_2: (2, 3, 4, 1, 2) ------------------------- list_1: (3, 3, 4, 4, 4) list_2: (3, 4, 1, 2, 3) ------------------------- list_1: (4,) list_2: (4,) -------------------------
Conclusion
We can now iterate through a manageable number of combinations at a time, process them, save the results, and proceed to the next batch — ensuring that we never run out of memory.
Good content. Thanks for sharing.