Packaged code is usable code is happy code

After a pleasant break and a cancelled conference I’m back at astroBugs. This week I’m going to try and convince you that, for certain kinds of projects and code bases, you should wrap the source into a python package. “But Thomas” I hear you say “I already know this, if I want to distribute my code to others I should to package it”. You would be correct about that; however, here I hope to show you that there are times you should package your code even if you never indent for it to leave the computer you are writing it on.

Packaging Python Code

In line with the conceit of this blog, that is assuming that readers have the basic programming skills an observational astronomy graduate student might have, we will first provide a basic overview of what packaging code looks like and how it is done.

Importing from a local file

We know that in python we can import code from other packages. This might look like

from numpy import random as nprand

From within the source we have this line we can now access all the function in numpy.random using the dot operator. This ability to bring code in from other files is not limited however to large or advanced developers. We too can very easily bring code from one file into another.

Say you have two files in the same directory, foo.py and bar.py, within foo.py is the following code

def hello():
    return "Hello, World!"

In bar.py you want to use the function hello. You could copy and paste it from foo.py into bar.py; however, that is inelegant, especially for longer and more realistic examples. Instead, within bar.py you could write the following

from foo import hello

string = hello()

This same principal works with objects, variables, and anything else in the global namespace of foo.py. Already, you may be see how helpful this is. You might imagine having a file called utils.py which contains commonly used routines for plotting, opening FITS files, parsing through data tables. You could copy the file utils.py to the different directories you need those routines; intern, allowing all scripts in that directory to use those routines without ever having to rewrite them. Moreover, it will keep your code clearer and easier to read. A single file which contains routines to use, such as foo.py in this toy example, is known as a module.

Making a Package

Importing from a local file is super handy; however, if you have a lot of reused code and/or you are developing in many locations on your drive it can get tedious copying the same file everywhere. Further, that one file may start getting very long with all sorts of useful routines. When you start noticing such annoyances it may be time to think about building a package.

Packages are what you import into python, such as numpy and pandas. A package is a directory which contains at least one constructor file. They then almost always also contain at least one module and sometimes subpackages. The general structure of a package might then look like

foobar/
├─ __init__.py (constructor)
├─ foo.py (module)

From one directory without the package you could then type:
from foobar.foo import hello

string = hello()

One immediate advantage to this is you can have multiple modules within the package. So let’s say that you have some functions which parse data files, some functions with plot results, and some functions that fit curves to data; in that case, your package might take on the following structure:

AstroCode/
├─ __init__.py (constructor)
├─ parsing.py
├─ plotting.py
├─ fitting.py

Here we have three modules, one for each of the major genres of task we might want to do. Any scripts in some directory which contains AstroCode will then be able to include lines such as

from AstroCode.parsing import parse_output_file
from AstroCode.plotting import plot_CMD
from AstroCode.fitting import fit_iso_chi2

Assuming of course those specific functions were implemented in the relevant files.

You might be wondering what this constructor file I skimmed over contains; however, in a twist so shocking it would make M. Night blush, __init__.py does not need to contain anything! __init__.py signals to the python interpreter than the directory is a package, but that is done by its mere existence. Any directory containing a file called __init__.py is by definition a python module. You can put stuff into the file if you want, and any code in __init__.py will autorun on import of the package, but in general, don’t worry about it further than simply creating it.

Using a Package Anywhere

Using a package as opposed to a module gave us the advantage of organization; however, we still have to copy the package directory around to every place we want to use it. What if we don’t want to do that? There are two ways to do this, a short way and a long way (and more formally correct way). We are going to totally ignore the long way (see here if you want more details on that). The short way is to decide on some location on your computer where all the packages you write will live. I use a folder I create called ~/.local/python_devel_pkg. Once you pick the folder you want open your shell profile script and export that path to the environmental variable PYTHONPATH (also export JUPYTERPATH if you want to use your packaged from jupyter). So you should have some line in your shell profile that looks like

export PYTHONPATH="~/.local/python_devel_pkg:$PYTHONPATH"

PYTHONPATH defines where python looks for packages when you invoke python from the command line (JUPYTERPATH does the same when running in jupyter). Copy over your python packages to this same directory so that the structure looks something like

PYTHONPATH_dir/
├─ AstroCode/
│  ├─ __init__.py (constructor)
│  ├─ parsing.py
│  ├─ plotting.py
│  ├─ fitting.py

Now when you run the python interpreter, even if the package is not in the folder you are running from it will find your package and let you import and run code from it. Note, this means that when you want to modify or make additions to your package you need only modify the version in the path you exported to PYTHONPATH.

Closing Thoughts

What I have presented here is not the most formal introduction to packaging in python, and if you want to develop code others will use certainly do not rely on this guide. However, I find that keeping my personal code in a well organized package (I call the package I use for research AstroCode_p, p to denote its personal code) saves me huge amounts and time. Those time saving are primary in trying to remember where I put that one function I wrote 2 years ago that I don’t want to spend time rewriting now but also not having to copy files around and not having to worry about organization nearly as much.

Leave a comment