3 different ways of escaping the usual loop in Python

Hojentropia
6 min readNov 11, 2021

--

Summary

In this article we look into 3 different ways of escaping the usual for loop in python, using:

  • List comprehension
  • map() function
  • NumPy arrays

We used a simple approach, using mathematical expressions in a list of 16.619 items to talk a little bit about the differences we found in the minimal completion time in each one of the methods above. If you are a data scientist, or are entering in this data-related field, take a look at this article and see helpful tools to take you faster where we need to go.

Introduction

As data-scientists the majority part of our time should be spent collecting and gaining insights with data. Like cleaning, creating a machine learning set or elegant visualizations. The code we write should be a helpful tool to take us faster where we need to go, not something that leaves us waiting around.

Usually, we need to deal with loops, and when we are doing this in a list with python, we start like:

a = [ ]for i in range(1000):    a.append(i)

However this is not the most efficient way to do most of the loops. In this post we will show some techniques to get faster loops without so much work, turning your code more Pythonic, as well as some practical results of each technique time comparison.

What does efficient Python code mean?

In general, efficient refers to a code that satisfies two keys concepts:

  • Minimal completion time (fast runtime) — small latency between execution and result return.
  • Allocates resources with ability and is not subject to unnecessary overhead.
less runtime + less memory consumption= efficiency target

We could also add Code readability as an qualitative key concept, since the creators of python pride themselves on the rewriteability of the language. A readable code means that it’s following the best practices recommended in the guiding principles of Python.

Path to efficiency: 3 ways to break out of the for loop

Running out of the loop
  1. List Comprehension
  2. Built in functions (Map())
  3. Numpy

To explain this in an interactive way, we are going to use a video game sales dataset. We created a new list called “sales” which is the amount of sales in Japan and in USA in million of sales, for each game in the list “games”.

sales=[(3.77, 28.96), (6.81, 3.58),(3.79, 12.76),(3.28, 10.93),(10.22, 8.89),(4.22, 2.26),(6.5, 9.14),(2.93, 9.18),(4.7, 6.94),(0.28, 0.63),(1.93, 10.95),…]games=[‘Wii Sports’,’Super Mario Bros.’,’Mario Kart Wii’,’Wii Sports Resort’,’Pokemon Red/Pokemon Blue’,’Tetris’,’New Super Mario Bros.’,’Wii Play’,…]

Our objective is to collect the Total Sales (US+Japan) per game using the previous 3 different methods. To test the time consumed by each operation, we will use the %timeit module.

0. Usual for loop

Just to have a benchmarking, let’s start performing this operation with for loop and time it.

%%timeit -r7 -n100total_sales= []for row in sales:    total_sales.append(sum(row))

And we end up with:

2.77 ms ± 138 µs per loop(mean ± std. dev. of 7 runs, 100 loops each)
  1. List Comprehension

First common escape to the usual for loop is to substitute the multiple lines to a single line called “List comprehension”. It’s an elegant and smart way to define and create lists based on existing lists. It offers a shorter and cleaner syntax.

This matches with one of the principles of Zen of Python: “Flat is better than nested”.

So using this to our problem, we wrote:

%%timeit -r7 -n100total_sales= [sum(row) for row in sales]

%timeit result:

2.34 ms ±80.4 µs per loop(mean ± std. dev. of 7 runs, 100 loops each)

Almost no improvement over normal for loops. But cleaner and pythonicer.

2. Built in functions (Map())

A second way of innovating is thinking about built in functions, that is inherent functions packs that comes with python installation (Standard Library)

One of these functions is Map(), which applies a function to each element of an object. It receives two arguments: first, the function that you would like to apply, and second the object that you would like to apply that function on.

We can also use map with an unnamed function (lambda function):

sqr_numb=map(lambda x: x², nums)

The map function provide a clean way to apply a function to an object without writing a loop.

For our same target that is achieve Total sales in Japan and USA, timing it we have:

%%timeit -r7 -n100total_sales= [*(map(sum,sales))]#Notice that we have to turn into a list which adds time to our loop. This happens because the map function returns a map object, and this is not our expected output.

Results:

1.51 ms ±80.4 µs per loop(mean ± std. dev. of 7 runs, 100 loops each)

3. Numpy Arrays

The third and fancy way is using Numpy. Numpy or Numerical Python is an invaluable Python package to data scientists. It is the fundamental pack for Scientific computing and python. It provides a number of benefits to write efficient code.

One of the most important advantages of NumPy is the NumPy array. NumPy array provides a fast and efficient alternative in terms of memory of Python lists.

Numpy arrays are homogeneous, which means that they must contain elements of the same type. If, for example, we try to create an array of two elements (int, float), NumPy will convert int into float to maintain its homogeneous nature.

To our data set we are analyzing a homogeneous data, and we need to execute the sum over the entire collections of values. A numpy array is far more efficient at doing this compared to for loops and list comprehension, this happens because numpy arrays have a gadget that a python list does not: their broadcasting functionality. NumPy arrays vectorize operations, for them to be performed to all elements of the objects at once.

values_np= np.array(sales)%%timeit -r7 -n100total_sales = np.sum(values_np, axis=1)

Results:

177 µs ± 10.5 µs per loop(mean ± std. dev. of 7 runs, 100 loops each)

To summarize the results, we end up with the following chart:

Conclusion

In this article we did a time comparison between some different ways of doing for loops in Python. We saw that list comprehension has a little improvement over normal loops, but may look cleaner and is more pythonic. Using a lambda function with map ended up being even a little bit better, and on this one we should note that transforming the map object into a list in the end consumes a lot of time. And by far, the winner was Numpy with its broadcasting feature. So, when possible, you should definitely not use a loop but a vectored operation with Numpy.

And we must stress again that none of these is written in stone. It won’t always be possible to use a specific type of loop, or it might not be the most organized way to do something. So when possible do the most efficient operation, but go with what works best with the rest of the code.

Written by

Diego Akel

Tassia Forasteiro

--

--

Hojentropia
Hojentropia

Written by Hojentropia

0 Followers

Written by Diego Akel and Tassia Forasteiro. We’re both Data Analysts, sharing our job and research experiences.

No responses yet