Distributed ring network (MPI)¶

In this example, we will build on an earlier tutorial, to demonstrate usage of distributed contexts. The network will be unaltered, but its execution is parallelized using MPI. As these concerns are largely orthogonal, we will discuss only the differences between the two simulations.

Note

Concepts covered in this example:

Building a basic MPI aware arbor.context to run a network. This requires that you have built Arbor with MPI support enabled.
Running the simulation and extracting the results.

The recipe¶

Step (11) is changed to generate a network with five hundred cells.

# (11) Instantiate recipe
ncells = 500

The hardware context¶

An execution context describes the hardware resources on which the simulation will run. It contains the thread pool used to parallelise work on the local CPU, and optionally describes GPU resources and the MPI communicator for distributed simulations. In some Arbor tutorials, the arbor.single_cell_model or arbor.simulation objects created the execution context arbor.context behind the scenes. The details of the execution context can be customized by the user. We may specify the number of threads in the thread pool; determine the id of the GPU to be used; or create our own MPI communicator.

The configuration of the context will need to be added to reflect the change in hardware. We will use MPI to distribute the work over nodes, cores, and threads.

Step (12) uses the Arbor-built-in MPI communicator, which is identical to the MPI_COMM_WORLD communicator you’ll know if you are familiar with MPI. The arbor.context takes a communicator for its mpi parameter. Note that you can also pass in communicators created with mpi4py. We print both the communicator and context to observe how Arbor configures their defaults.

Note

If you use GPUs in combination with MPI, consider using find_private_gpu.

Step (13) creates the simulation using the recipe and the context created in the previous step.

# (12) Create an MPI communicator, and use it to create a hardware context
A.mpi_init()
comm = A.mpi_comm()
print(comm)
context = A.context(mpi=comm)
print(context)

# (13) Create a default domain decomposition and simulation

The execution¶

Step (16) runs the simulation. Since we have more cells this time, which are connected in series, it will take some time for the action potential to propagate. In the ring network, we could see it takes about 5 ms for the signal to propagate through one cell, so let’s set the runtime to 5*ncells.

# (16) Run simulation
sim.run(ncells * 5 * U.ms)

An important change in the execution is how the script is run. Whereas normally you run the Python script by passing it as an argument to the python command, you need to use srun or mpirun (depending on your MPI distribution) to execute a number of jobs in parallel. You can still execute the script using python, but then MPI will not execute on more than one node.

From the command line, we can run the script using mpirun (srun on clusters operated with SLURM) and specify the number of ranks (NRANKS) or nodes. Arbor will spread the cells evenly over the ranks, so with NRANKS set to 5, we’d be spreading the 500 cells over 5 nodes, simulating 100 cells each.

mpirun -n NRANKS python mpi.py

The results¶

Before we execute the simulation, we have to understand how Arbor distributes the computational load over the ranks. After executing mpirun, all nodes will run the same script. In the domain decomposition step, the nodes will use the provided MPI communicator to divide the work. Once arbor.simulation.run() starts, each node will work on their allocated cell gid s.

This is relevant for the collection of results: these are not gathered for you. Remember that in step (15), we store the handles to the probes; these refer to particular gid s. The gid s are now distributed, so on one node, the script will not find the cell referred to by the handle and, therefore, return an empty list (no results were found).

In step (17) we check, for each gid, if the list returned by arbor.simulation.samples() has a nonzero length. The effect is that we collect the results generated on this particular node. Since we now have NRANKS instances of our script, and we can’t access the results between nodes, we have to write the results to disk and analyse them later. We query arbor.context.rank to generate a unique filename for the result.

# (17) Store the recorded voltages
print("Storing results ...")
df_list = []
for gid in range(ncells):
    if len(sim.samples(handles[gid])):
        samples, meta = sim.samples(handles[gid])[0]
        df_list.append(
            pd.DataFrame(
                {"t/ms": samples[:, 0], "U/mV": samples[:, 1], "Cell": f"cell {gid}"}
            )
        )

if len(df_list):
    df = pd.concat(df_list, ignore_index=True)
    df.to_csv(f"result_mpi_{context.rank}.csv", float_format="%g")

In a second script, network_ring_mpi_plot.py, we load the results stored to disk into a pandas table, and plot the concatenated table as before:

#!/usr/bin/env python3
# This script is included in documentation. Adapt line numbers if touched.

import glob
import pandas
import seaborn

results = glob.glob("result_mpi_*.csv")

df_list = []
for result in results:
    df_list.append(pandas.read_csv(result))

df = pandas.concat(df_list, ignore_index=True)
seaborn.relplot(
    data=df, kind="line", x="t/ms", y="U/mV", hue="Cell", errorbar=None
).savefig("mpi_result.svg")

To avoid an overcrowded plot, this plot was generated with just 50 cells:

The full code¶

You can find the full code of the example at python/examples/network_ring_mpi.py and python/examples/network_ring_mpi_plot.py.