Quick Parallelization to get Quicker Code

I spend a lot of time letting models of stars evolve, hours and hours per week. Each model might take anywhere from 5 minutes to 30 minutes to finish (depending on the initial mass of the star, the resolution, and a few other factors that would distract from the main point of this post). The way I evolve models is by feeding initial conditions of some premain sequence star into the Dartmouth Stellar Evolution Program (DSEP), hitting enter, and then checking back when its done. Actually, that’s how I used to evolve models.

The problem with what I used to do is that I had to wait for one model to be done before I could start more. DSEP does not make use of any parallelization; as it turns out, parallelizing solutions to the stellar structure equations + nuclear reaction rates + equations of state is a really hard problem, the folks working on MESA have been working on this for years but have yet to see major speed increases. Because of this lack of parallelization when one model is evolving a single thread on my computer is being used but that leaves, on my desktop at least, 23 unused threads.

Modern operating systems are really good at multitasking, I could for example open up 24 terminal windows, in each one prepare slightly different inputs and then run DSEP in each of those 24 windows. That would work and make much better use of my computers resources. However, that’s inelegant¹

For all the inelegance of the previous solution it does present the very real benefit of an ~24 times speedup when running some number of models, m, where m >> 24. We might therefore look for some way to achieve the same speedup without all of those separate sessions open.

Notes for those in your life who may not follow when you say your code is inelegant: to those who may not have spent as much time programming, it can be hard to describe when a solution is inelegant. The concept of 24 open windows on one screen all doing different things that you have to keep track of yourself provides a visceral and easy to digest view of inelegance.

There are many ways to run multiple operations in parallel, some languages are built from the ground up (or near enough as doesn’t matter) to allow for parallelization (Julia) some simply allow it by virtue of not disallowing it, and others, such as python, actively discourage and in some ways disallow it. Python’s Global Interpreter Lock (GIL) is a fraught topic which I am frankly not qualified to speak on; simply know that it prevents much of standard fair parallelization you might want to do in other languages.

I am told that there are ways to get around this, or at least work within its constraints (incidentally one reason the GIL is in place is it allows single threads to execute much faster), here I am going to present a parallelization paradigm which has worked very well for me when I am using python as the glue to call, control, and monitor some external program (in my case DSEP). Note, I am not saying this is the best way to parallelize code, but I am saying its a comparatively low upfront effort way which has provided me great results in the past.

I’ve referenced in past posts a python interface layer for DSEP I have written called pysep. As a brief reminder, pysep provides an object oriented interface to setup initial conditions, call DSEP, and parse its output files. When using pysep it looks to the user like everything is happening within python. But this is a trick! pysep merely takes the user given inputs writes these to a file on disk (DSEP does not have command line arguments, it reads configuration files) and then calls DSEP as a sub-process. This architecture (and anyone where you are calling some program as a sub-process) can easily be adapted to run multiple instance of that program at once. I’ve made use of this in pysep, and now can, with just a few lines of code run 24 (or n where n = the number of available threads) instances of DSEP at once.

We are going to make use of pythons multiprocessing package, below is slightly modified version of the code I use to let pysep run multiple instances of DSEP at once. Note the line self._pool.map(iEvolve, self.modelList). That (and the functions it calls in tern) is what is actually distributing jobs across threads. pool.map takes some function (in my case iEvolve) and sends out some number of instances of that function. Each instance will have one element of self.modelList (one stellar model).

 def iEvolve(model):                                                
      temp_dir = tempfile.TemporaryDirectory()                                    
      initDir = os.getcwd()                                                       
      os.chdir(temp_dir.name)                                                     
      model.evolve()  # this is a method which starts the DSEP subprocess.                                                                                                             
      return model, temp_dir 

class pStellarModel():
    def __init__(self, modelList):
        .
        .
        .
      def pEvolve(self):            
          """                                                                     
          Evolve models in parallel with the number of workers being equal to the 
          number of available threads. Models will be evaluated from their own     
          temporary directories.                                                                                                                          
          """"
          if not self._initialized:                                               
              self._init_pool()                                                   
                                                   
          results = self._pool.map(iEvolve, self.modelList)    # This is the key line              
                                                                                  
          self._desc_pool()                                                       
          if results:                                                             
              self._evolved = True                                                
              self._temp_dirs = [x[1] for x in results]                           
              self.evolved_models = [x[0] for x in results]

Because of some of the operational principals of DSEP I have to move each thread to its own working directory (and in the actual pysep source, there is more code to clean up stuff in /tmp) but note how little code is actually needed to parallelize assuming I already have some other code which can run on its own.

That last point is key, each instance, each thing you want to put on a thread needs to be able to run independently of every other thread for this to be trivial. If one thread needs output from another thread to proceed your life becomes dramatically more challenging and you may want to consider other points in your work flow where you can parallelize. To hammer that last point home, as I said the folks at MESA have been working on running single stellar models in parallel for years, but that’s a challenging ask since most of the tasks that take time depend on the results of other tasks that take time. Sure you can do it, but the speed increase have been minimal thus far as most of the threads spend most of their time twiddling their proverbial thumbs. What we have done here with DSEP is move our point of parallelization up the chain, instead of making one model run faster we let multiple models run concurrently.

To be clear there are downsides to this approach (compared to some theoretical world where single stellar models are able to be efficiently parallelized), most notable is the dramatically higher memory usage from keeping the input data for 24 separate models in memory than just one. Moreover, the speed benefits are not as dramatic for lower number of models. If for example, we only need to run 12 models on a 24 thread machine then we can then achieve only a maximum 12 x increase instead of the 24x we could achieve with many more models. And obviously, if we are only running one model we get no net speed increase. However, for the work I do at least I tend to be interested in ensembles of 100s of models and have access to machines with 10s of GB of ram (also DSEP has a tiny memory footprint) so those downsides are not of huge concern.

To give a quick example of how clean this solution ends up being (remember the initial suggestion for this speed up was to open 24 terminals and run DSEP manually in each one) take a look at the following code which evolves stellar models in parallel. Assume some number of models have already been initialized (but not evolved) and are stored in the directory “initModels” and assume each model is called model.dsep.

from pysep.api.parallel import pStellarModel
from pysep.dsep import load_model
from pathlib import Path

tdirs = list()
with pStellarModel() as psm:
    for file in Path("initModels").rglob("models.dsep"):
        model, tdir = load_model(file)
        # here I add all the models into the pStellarModel first so that when the jobs
        #   get distributed python knows how many there are to do in total.
        psm.append(model)
        tdirs.append(tdir) # load model loads to a temp directory, this needs to be saved
                                     #  so that python does not delete the directory where the input
                                     #  files are being stored
   psm.pEvolve(stash=True)

And with that the models will be evolved in parallel. This small sub package within pysep has saved me literally 100s of hours and only took about 30 minutes to put together.

This is only one way to do parallelization and perhaps not the best; however, it is quick to implement and works, and there is a lot of value in that. If you want to learn more look into tutorials and the documentation for the multiprocessing library.

Quick Parallelization to get Quicker Code

One thought on “Quick Parallelization to get Quicker Code”

Leave a comment Cancel reply

Share this:

Related

One thought on “Quick Parallelization to get Quicker Code”

Leave a comment Cancel reply