A little bit of progress

Looking at a console and seeing output along the lines of “HERE 1”, “HERE 2″, ….”HERE [n]”, “HERE 1” used to be an all to common occurrence for me when I was running code. Be it for debugging or tracking loop progress the simplest way of quickly checking in on the runtime state of a piece of code is print functions (or your languages‘ equivalent) dumping some indexed output to the console. This is a valuable tool (and one that I do still use often); however, simply writing to stdout has both quite a lot of overhead and can turn messy if you are doing it a lot. In a future post I will talk about ways to reduce these print statements when debugging (though to be clear I believe that the place where they are most useful is debugging). Today however, I will talk about how to stop using print statements as progress indicators.

Of the two cases I provided, debugging and progress indication, the latter is the most important to remove print statements from. This is because, as said, printing to stdout tends to have a high overhead cost. Therefore, print functions in loop tend to slow code down. How much this matters does depend on how fast the rest of the loop executes — the stellar modeling code I work with writes to standard output, and other io buffers, quite a lot; however, the timescales of those io calls are much shorter than the time to solve the linearized equations of stellar structure so the order of the time of execution is not really affected.

Luckily for us, in Python at least, this is a solved problem. There exist many packages to provided progress indication, they vary in feature set and preformance. I’m gonna focus on tqdm as it is the most performant. tqdm provides a set of drop in functions which will print a nicely formatted progress bar to stdout so you can see loops making progress. Now its a pretty bold statement of me to suggest that this will improve performance over just printing using print, after all both are making io calls to stdout.

Here is a graph showing the scaling performance on my Linux workstation vs the number it iterations for both standard output and tqdm. The red line is execution time for standard output and the green line is execution time for tqdm. While this is only a linear difference in speed it is clearly still significant.

Even ignoring performance, because in many cases the hit from stdio calls is not huge, using a progress bar still helps clean up output dramatically.

Using TQDM

Using tqdm is super easy. Lets say you have the following code you want to track the progress of

# Let targets be some list of targets
# Let foo be some function which takes a target and operates on it
for target in targets:
    print(f"Working on target: {target}")
    foo(target)

As this code is currently set up for each target a line will print out saying something like

Out[1]: Working on target: 2MASS J04130560+1514520

You could get rid of the print function and still see progress with tqdm by changing the above code to

from tqdm import tqdm
# Assume targets has 10 elements
for target in tqdm(targets):
    foo(target)
Out[2]: 100%|██████████████| 10/10 [00:00<00:00, 114912.44it/s]

(N.B. the iterations per second number here is random because I generated this output without giving foo any real workload)

You can give tqdm some additional parameters if you’d like to get more info, tho note that this might start to slow down its performance. Lets say in addition to showing progress you still want to see what target is being operated on at any give loop iteration. To do this we need to save the progress bar as an object (so we can access its properties) first.

from tqdm import tqdm
pbar = tqdm(targets, desc="placeholder")
for target in pbar:
    pbar.set_desciption(target)
    foo(target)

There are certainly other things to note about tqdm (for example if you are iterating over a generator tqdm will tell you the number of iterations and the iterations per second but not provide a progress bar as generators to not have a defined length) but for quickly cleaning up output and reducing the performance impact of print statements this post should cover what you need to know. tqdm can be installed with pip, and I highly recommmend you incorporate it into your workflow, I have and my life is all the better for it.

One thought on “A little bit of progress

Leave a comment