In *A Student’s Guide to Python for Physical Modeling*, we emphasized
NumPy arrays and paid less attention to Python lists. The reason is simple: In
most scientific computing applications, NumPy arrays store data more
efficiently and speed up mathematical calculations, sometimes a thousandfold.

However, there are some applications where a Python list is the better choice. There are also times when the choice between a list and an array has little or no effect on performance. In such cases a list can make your code easier to read and understand, and that is always a good thing.

In this post, I will describe Python lists and explain a special Python
construct for creating lists called a **list comprehension**. I will also
describe a similar construct called a **generator expression**.

## Lists

A list is an ordered collection of items. You may have made a “To Do” list this
morning or a grocery list for a recent trip to the store. In computer science,
a list is a data structure that supports a few basic methods like `insert`

,
`remove`

, `append`

, and `find`

. You probably used several of these operations
with your own list. Perhaps you penciled in a new task later in the day
(`append`

), then crossed tasks off the list as you completed them (`remove`

).

An array is a rigid data structure that stores a fixed number of identical
elements (integers, floats, eight-character strings, etc.). If operations like
`insert`

, `remove`

, or `append`

are important parts of a computational task, a
more flexible data structure like a list may be appropriate. The type of data
may also suggest that a Python list is a better choice than a NumPy array. For
instance, how would you initialize an array to store a grocery list?
Furthermore, if the number of items to be stored is not known at the outset, it
may be easier to store the data in a list and convert it to an array later.
(Perhaps you are taking data at regular intervals but do not know how long an
experiment will run.) Finally, if you are not worried about performance and
scaling, a Python list might be a simpler option than a NumPy array. If you
just need the first 20 perfect cubes, do you really want to import all of NumPy?

Let’s use the example of a grocery list to create and transform a simple list. A more useful example in scientific computing might be managing a collection of data files to process into stunning figures for your latest report, but the principles are the same.

To create a Python list, just enclose a comma-separated list of elements inside square brackets.

```
groceries = ['milk', 'eggs', 'orange juice']
```

Some functions also return a list. I can create the same list from a string
using the `split`

method:

```
groceries = "milk, eggs, apples, orange juice".split(',')
```

To find an item in a list, use the `index`

method. It will return the index
of the first occurrence of the item you request if it is in the list and a
`ValueError`

if it is not.

```
groceries.index('eggs')
groceries.index('bread')
```

It looks like I forgot the bread! I can add it to the list using the `append`

method:

```
groceries.append('bread')
```

Later, I see some orange juice in the back of the refrigerator (not yet expired
…), so I will delete that item from the list using the `remove`

method:

```
groceries.remove('orange juice')
```

Since I am headed to ABC Grocery, where everything is organized alphabetically, I will sort the list and review it:

```
groceries.sort()
print(groceries)
```

One more useful operation is joining lists, or *concatenation*. In Python, the
addition operator for lists is defined to join two lists. The `extend`

method
of a list will also join two lists. Calling `append`

with a list as its argument
will *not* add each element to the original list. It will make the argument
list the final element of the calling list. I.e., you will have a list within a
list, not a concatenation of the two lists.

If I had two separate grocery lists, I could join them into a single list in a variety of ways:

```
old_list = ['bananas', 'coffee']
new_list = groceries + old_list # Addition creates a new list.
groceries += old_list # In-place addition extends original list.
old_list.extend(groceries) # extend also extends original list.
bad_list = ['bananas', 'coffee']
bad_list.append(groceries) # append does NOT concatenate lists.
```

After this set of commands, `groceries`

and `new_list`

contain the same
elements. `old_list`

contains `'bananas'`

and `'coffee'`

twice, since the
commands append `old_list`

to `groceries`

first, and then append the new
`groceries`

to the original `old_list`

. As you can see, `bad_list`

did not
merge the two lists properly.

In case you are skeptical of the usefulness of Python lists in scientific
endeavors, here is a function that creates a list of the first `N`

Fibonacci
numbers.

```
def fibonacci(N):
if N == 0: return [1] # Handle unusual request for 0th number.
fib = [1, 1] # Create a list for all other values of N.
for k in range(1,N):
# Next Fibonacci number is the sum of the previous two.
fib.append(fib[-1] + fib[-2])
return fib
```

If you are still skeptical of the possible utility of a Python list, try writing
the same function using NumPy arrays and using it to compute `fibonacci(100)`

.
I’ve included a solution at the end of the post.

The command inside the loop could also have been written using list addition. Either of the following commands will work:

```
fib += [fib[-1] + fib[-2]]
fib = fib + [fib[-1] + fib[-2]]
```

The second approach is less efficient because it makes a copy of the list every time a new element is added, but it is syntactically correct. Be aware that you can only use list addition to join two lists — not a list and some other object you would like to put inside it. The following alternatives would result in errors:

```
fib += fib[-1] + fib[-2]
fib += (fib[-1] + fib[-2])
fib = fib + (fib[-1] + fib[-2])
```

## List Comprehensions

Sometimes a list is not a collection of random elements; it has a logical structure instead. For instance, suppose you want to find the sum of the first 20 perfect cubes. You could create an array or a list of cubes, then add them up. The familiar procedure using a NumPy array is

```
import numpy as np
cubes = np.arange(1,21)**3
cubes.sum()
```

This does not require too much typing, and the purpose of the code is fairly clear. However, compare it with the following code:

```
cubes = [n**3 for n in range(1,21)]
sum(cubes)
```

This is an example of a **list comprehension**: a Python expression inside of the
square brackets that denote a list. The statement is similar to the
notation used in mathematics to define a set. It also clearly describes what
the list contains. Note the similarity of the list comprehension to the
following loop, which creates the same list:

```
cubes = [] # Initialize empty list.
for n in range(1,21):
cubes.append(n**3) # Append perfect cubes.
```

The list comprehension effectively compresses all of this into a single Python statement.

A list comprehension defines a new list from another collection of objects via
an expression like “`for n in ...`

” Rather than using `range`

, you can build one
list from another:

```
poly = [(x+1)*(x-1) for x in cubes]
```

You can also apply conditional statements in a list comprehension:

```
even_squares = [n**2 for n in range(1,51) if n%2 == 0]
odd_squares = [n**2 for n in range(1,51) if n%2 == 1]
```

You can cram quite a lot of code into a list comprehension, but it is not always advisable:

```
pythagoras = [(a,b,c) for a in range(1,31) for b in range(a,31) \
for c in range(1,31) if a**2 + b**2 == c**2]
```

Despite the length and complexity of this single expression, its meaning is still fairly clear.

Returning to our original task, we can do even better than adding up the first 20 perfect cubes. Using a nested list comprehension, we can make a list of sums of cubes!

```
sums_of_cubes = [sum([n**3 for n in range(1,N+1)]) for N in range(1,21)]
```

## Generators

A list comprehension creates a Python list that stores all of the elements in a
single data structure. Sometimes this is exactly what you need. Other times, you
simply want to iterate over all of the items in a list. If you never need all
of the items in the list at once, you can use a **generator expression**
instead. A generator expression looks like a list comprehension, except
that you enclose the expression in round parentheses instead of square brackets
— `(...)`

instead of `[...]`

. Despite the round parentheses, a generator
expression *does not* create a tuple, and there is no such thing as a “tuple
comprehension”.

```
cube_list = [n**3 for n in range(1,101)]
cube_generator = (n**3 for n in range(1,101))
```

A generator is simpler than a list. You cannot insert, remove, or append items, nor can you search or sort a generator. A generator knows how to produce the next item in a sequence, and little else. Once it has reached the end of its sequence, it does even less.

```
for x in cube_list: print(x) # Prints numbers stored in list.
for x in cube_list: print(x) # Prints numbers stored in list again.
for x in cube_generator: print(x) # Prints numbers provided by generator.
for x in cube_generator: print(x) # Prints nothing. Generator is finished.
```

The advantages of a generator over a list are size and speed. Compare the
output of the `__sizeof__()`

method for the following lists and generators.
This method returns the size of the object in bytes.

```
cube_list = [n**3 for n in range(1,10)]
cube_generator = (n**3 for n in range(1,10))
print(cube_list.__sizeof__())
print(cube_generator.__sizeof__())
cube_list = [n**3 for n in range(1,10**3)]
cube_generator = (n**3 for n in range(1,10**3))
print(cube_list.__sizeof__())
print(cube_generator.__sizeof__())
cube_list = [n**3 for n in range(1,10**6)]
cube_generator = (n**3 for n in range(1,10**6))
print(cube_list.__sizeof__())
print(cube_generator.__sizeof__())
```

The list grows from 168 bytes to 9 kB to 8.7 MB, while the generator remains a constant 48 bytes. Also, you may have noticed a delay while Python created the large list during the last set of commands.

I generally prefer a generator when I iterate over a large sequence of items once — especially if the program might exit the loop before reaching the end of the sequence.

## Summary

NumPy arrays are often the most efficient data structure for numerical work in
Python. However, there are some tasks for which a Python list is a better
choice — often when *organizing* data rather than *processing* data. Python
offers a compact syntax for creating lists called a **list comprehension**. A
**generator expression** is similar, but creates an object that can produce a
sequence *without* storing all of its elements. A generator is often a better
choice than a list or an array when iterating over a large sequence of items.

### NumPy version of `fibonacci(N)`

Here is a version of the `fibonacci(N)`

function above that uses NumPy arrays.

```
import numpy as np
def Fibonacci(N):
if N == 0: return np.array(1) # Handle unusual request for 0th number.
fib = np.zeros(N+1, dtype=int) # Initialize list for all other values of N.
fib[0], fib[1] = 1, 1
for k in range(2,N+1):
# Next Fibonacci number is the sum of the previous two.
fib[k] = fib[k-1] + fib[k-2]
return fib
```

Perhaps you came up with a more elegant solution. I find this version more
difficult to code and more confusing to read. Plus, using a NumPy array forces
a compromise: Either use floating point numbers and lose significant digits for
`N > 78`

, or use integers and encounter an overflow error for `N > 91`

. In
either case, you cannot generate the 100th Fibonacci number!

This comment has been removed by a blog administrator.

ReplyDelete