[Python!] - List Comprehensions & Generator Expressions

Hailey Park·2021년 11월 13일

List Comprehensions generator expressions python

Python!

목록 보기

9/11

What is List comprehensions?

List comprehension allows you to create lists for a loop with less code. In short, it’s a truly Pythonic way of coding. Less code – more effectiveness.

& >>> my_list = []
>>> for x in range(10):
... my_list.append(x * 2)
...
>>> print(my_list)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

The example above is to create a list using a for loop and a range() function.

>>> comp_list = [x * 2 for x in range(10)]
>>> print(comp_list)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

And this is how the implementation of the previous example is performed using a list comprehension.

You can use a more complex modifier in the first part of comprehension or add a condition that will filter the list. Something like this:

>>> comp_list = [x ** 2 for x in range(7) if x % 2 == 0]
>>> print(comp_list)
[4, 16, 36]

You can create dicts and sets comprehensions as well.

>>> dict_comp = {x:chr(65+x) for x in range(1, 11)}
>>> type(dict_comp)
<class 'dict'>
>>> print(dict_comp)
{1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F', 6: 'G', 7: 'H', 8: 'I', 9: 'J', 10: 'K'}

>>> set_comp = {x ** 3 for x in range(10) if x % 2 == 0}
>>> type(set_comp)
<class 'set'>
>>> print(set_comp)
{0, 8, 64, 512, 216}

Benefits of Using List Comprehensions

List comprehensions optimize the lists’ generation and help to avoid side effects as gibberish variables.

As a result, you get more concise and readable code.

When Not to Use a List Comprehension in Python

List comprehensions are useful and can help you write elegant code that’s easy to read and debug, but they’re not the right choice for all circumstances. They might make your code run more slowly or use more memory. If your code is less performant or harder to understand, then it’s probably better to choose an alternative.

>>> cities = ['Austin', 'Tacoma', 'Topeka', 'Sacramento', 'Charlotte']
>>> temps = {city: [0 for _ in range(7)] for city in cities}
>>> temps
{
    'Austin': [0, 0, 0, 0, 0, 0, 0],
    'Tacoma': [0, 0, 0, 0, 0, 0, 0],
    'Topeka': [0, 0, 0, 0, 0, 0, 0],
    'Sacramento': [0, 0, 0, 0, 0, 0, 0],
    'Charlotte': [0, 0, 0, 0, 0, 0, 0]
}

Watch Out for Nested Comprehensions

Comprehensions can be nested to create combinations of lists, dictionaries, and sets within a collection. For example, say a climate laboratory is tracking the high temperature in five different cities for the first week of June. The perfect data structure for storing this data could be a Python list comprehension nested within a dictionary comprehension:

You create the outer collection temps with a dictionary comprehension. The expression is a key-value pair, which contains yet another comprehension. This code will quickly generate a list of data for each city in cities.

Nested lists are a common way to create matrices, which are often used for mathematical purposes. Take a look at the code block below:

>>> matrix = [[i for i in range(5)] for _ in range(6)]
>>> matrix
[
    [0, 1, 2, 3, 4],
    [0, 1, 2, 3, 4],
    [0, 1, 2, 3, 4],
    [0, 1, 2, 3, 4],
    [0, 1, 2, 3, 4],
    [0, 1, 2, 3, 4]
]

The outer list comprehension [... for _ in range(6)] creates six rows, while the inner list comprehension [i for i in range(5)] fills each of these rows with values.

So far, the purpose of each nested comprehension is pretty intuitive. However, there are other situations, such as flattening nested lists, where the logic arguably makes your code more confusing. Take this example, which uses a nested list comprehension to flatten a matrix:

matrix = [
...     [0, 0, 0],
...     [1, 1, 1],
...     [2, 2, 2],
... ]
>>> flat = [num for row in matrix for num in row]
>>> flat
[0, 0, 0, 1, 1, 1, 2, 2, 2]

The code to flatten the matrix is concise, but it may not be so intuitive to understand how it works. On the other hand, if you were to use for loops to flatten the same matrix, then your code will be much more straightforward:

>>> matrix = [
...     [0, 0, 0],
...     [1, 1, 1],
...     [2, 2, 2],
... ]
>>> flat = []
>>> for row in matrix:
...     for num in row:
...         flat.append(num)
...
>>> flat
[0, 0, 0, 1, 1, 1, 2, 2, 2]
Now you can see that the code traverses one row of the matrix at a time, pulling out all the elements in that row before moving on to the next one.

Choose Generators for Large Datasets

A list comprehension in Python works by loading the entire output list into memory. For small or even medium-sized lists, this is generally fine. If you want to sum the squares of the first one-thousand integers, then a list comprehension will solve this problem admirably:

>>> sum([i * i for i in range(1000)])
332833500

But what if you wanted to sum the squares of the first billion integers? If you tried then on your machine, then you may notice that your computer becomes non-responsive. That’s because Python is trying to create a list with one billion integers, which consumes more memory than your computer would like. Your computer may not have the resources it needs to generate an enormous list and store it in memory. If you try to do it anyway, then your machine could slow down or even crash.

When the size of a list becomes problematic, it’s often helpful to use a generator instead of a list comprehension in Python. A generator doesn’t create a single, large data structure in memory, but instead returns an iterable. Your code can ask for the next value from the iterable as many times as necessary or until you’ve reached the end of your sequence, while only storing a single value at a time.

If you were to sum the first billion squares with a generator, then your program will likely run for a while, but it shouldn’t cause your computer to freeze. The example below uses a generator:

>>> sum(i * i for i in range(1000000000))
333333332833333333500000000

You can tell this is a generator because the expression isn’t surrounded by brackets or curly braces. Optionally, generators can be surrounded by parentheses.

The example above still requires a lot of work, but it performs the operations lazily. Because of lazy evaluation, values are only calculated when they’re explicitly requested. After the generator yields a value (for example, 567 567), it can add that value to the running sum, then discard that value and generate the next value (568 568). When the sum function requests the next value, the cycle starts over. This process keeps the memory footprint small.

map() also operates lazily, meaning memory won’t be an issue if you choose to use it in this case:

>>> sum(map(lambda i: i*i, range(1000000000)))
333333332833333333500000000

It’s up to you whether you prefer the generator expression or map().

Difference Between Iterable and Iterator

It will be easier to understand the concept of generators if you get the idea of iterables and iterators.

Iterable is a “sequence” of data, you can iterate over using a loop. The easiest visible example of iterable can be a list of integers – [1, 2, 3, 4, 5, 6, 7]. However, it’s possible to iterate over other types of data like strings, dicts, tuples, sets, etc.

Basically, any object that has iter() method can be used as an iterable. You can check it using hasattr()function in the interpreter.

>>> hasattr(str, '__iter__')
True
>>> hasattr(bool, '__iter__')
False

Iterator protocol is implemented whenever you iterate over a sequence of data. For example, when you use a for loop the following is happening on a background:

first iter() method is called on the object to converts it to an iterator object.
next() method is called on the iterator object to get the next element of the sequence.
StopIteration exception is raised when there are no elements left to call.

>>> simple_list = [1, 2, 3]
>>> my_iterator = iter(simple_list)
>>> print(my_iterator)
<list_iterator object at 0x7f66b6288630>
>>> next(my_iterator)
1
>>> next(my_iterator)
2
>>> next(my_iterator)
3
>>> next(my_iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration

Generator Expressions

In Python, generators provide a convenient way to implement the iterator protocol. Generator is an iterable created using a function with a yield statement.

The main feature of generator is evaluating the elements on demand. When you call a normal function with a return statement the function is terminated whenever it encounters a return statement. In a function with a yield statement the state of the function is “saved” from the last call and can be picked up the next time you call a generator function.

>>> def my_gen():
... for x in range(5):
... yield x

Generator expression allows creating a generator without a yield keyword. However, it doesn’t share the whole power of generator created with a yield function. The syntax and concept is similar to list comprehensions:

>>> gen_exp = (x ** 2 for x in range(10) if x % 2 == 0)
>>> for x in gen_exp:
... print(x)
0
4
16
36
64

In terms of syntax, the only difference is that you use parentheses instead of square brackets. However, the type of data returned by list comprehensions and generator expressions differs.

>>> list_comp = [x ** 2 for x in range(10) if x % 2 == 0]
>>> gen_exp = (x ** 2 for x in range(10) if x % 2 == 0)
>>> print(list_comp)
[0, 4, 16, 36, 64]
>>> print(gen_exp)
<generator object <genexpr> at 0x7f600131c410>

The main advantage of generator over a list is that it takes much less memory. We can check how much memory is taken by both types using sys.getsizeof() method.

Note: in Python 2 using range() function can’t actually reflect the advantage in term of size, as it still keeps the whole list of elements in memory. In Python 3, however, this example is viable as the range() returns a range object.

>>> from sys import getsizeof
>>> my_comp = [x * 5 for x in range(1000)]
>>> my_gen = (x * 5 for x in range(1000))
>>> getsizeof(my_comp)
9024
>>> getsizeof(my_gen)
88

We can see this difference because while list creating Python reserves memory for the whole list and calculates it on the spot. In case of generator, we receive only ”algorithm”/ “instructions” how to calculate that Python stores. And each time we call for generator, it will only “generate” the next element of the sequence on demand according to “instructions”.

On the other hand, generator will be slower, as every time the element of sequence is calculated and yielded, function context/state has to be saved to be picked up next time for generating next value. That “saving and loading function context/state” takes time.

Resources
https://djangostars.com/blog/list-comprehensions-and-generator-expressions/

https://realpython.com/list-comprehension-python/