Distributed ring network (MPI)

In this example, the ring network created in an earlier tutorial will be used to run the model in a distributed context using MPI. Only the differences with that tutorial will be described.

Note

Concepts covered in this example:

  1. Building a basic MPI aware arbor.context to run a network. This requires that you have built Arbor with MPI support enabled.

  2. Running the simulation and extracting the results.

The recipe

Step (11) is changed to generate a network with five hundred cells.

recipe = ring_recipe(ncells)

# (12) Create an MPI communicator, and use it to create a hardware context

The hardware context

An execution context describes the hardware resources on which the simulation will run. It contains the thread pool used to parallelise work on the local CPU, and optionally describes GPU resources and the MPI communicator for distributed simulations. In some other examples, the arbor.single_cell_model object created the execution context arbor.context behind the scenes. The details of the execution context can be customized by the user. We may specify the number of threads in the thread pool; determine the id of the GPU to be used; or create our own MPI communicator.

The configuration of the context will need to be changed to reflect the change in hardware. First of all, we scrap setting threads=”avail_threads” and instead use MPI to distribute the work over nodes, cores and threads.

Step (12) uses the Arbor-built-in MPI communicator, which is identical to the MPI_COMM_WORLD communicator you’ll know if you are familiar with MPI. The arbor.context takes a communicator for its mpi parameter. Note that you can also pass in communicators created with mpi4py. We print both the communicator and context to observe how Arbor configures their defaults.

Step (13) creates the simulation using the recipe and the context created in the previous step.

comm = arbor.mpi_comm()
print(comm)
context = arbor.context(mpi=comm)
print(context)

# (13) Create a default domain decomposition and simulation
sim = arbor.simulation(recipe, context)

# (14) Set spike generators to record

The execution

Step (16) runs the simulation. Since we have more cells this time, which are connected in series, it will take some time for the action potential to propagate. In the ring network we could see it takes about 5 ms for the signal to propagate through one cell, so let’s set the runtime to 5*ncells.

print("Simulation finished")

# (17) Store the recorded voltages

An important change in the execution is how the script is run. Whereas normally you run the Python script by passing it as an argument to the python command, you need to use srun or mpirun (depending on your MPI distribution) to execute a number of jobs in parallel. You can still execute the script using python, but then MPI will not execute on more than one node.

From the commandline, we can run the script using mpirun (srun on clusters operated with SLURM) and specify the number of ranks (NRANKS) or nodes. Arbor will spread the cells evenly over the ranks, so with NRANKS set to 5, we’d be spreading the 500 cells over 5 nodes, simulating 100 cells each.

mpirun -n NRANKS python mpi.py

The results

Before we execute the simulation, we have to understand how Arbor distributes the computational load over the ranks. After executing mpirun, all nodes will run the same script. In the domain decomposition step, the nodes will use the provided MPI communicator to divide the work. Once arbor.simulation.run() starts, each node wil work on their allocated cell gid s.

This is relevant for the collection of results: these are not gathered for you. Remember that in step (15) we store the handles to the probes; these referred to particular gid s. The gid s are now distributed, so on one node, the script will not find the cell referred to by the handle and therefore return an empty list (no results were found).

In step (17) we check, for each gid, if the list returned by arbor.simulation.samples() has a nonzero length. The effect is that we collect the results generated on this particular node. Since we now have NRANKS instances of our script, and we can’t access the results between nodes, we have to write the results to disk and analyse them later. We query arbor.context.rank to generate a unique filename for the result.

df_list = []
for gid in range(ncells):
    if len(sim.samples(handles[gid])):
        samples, meta = sim.samples(handles[gid])[0]
        df_list.append(
            pandas.DataFrame(
                {"t/ms": samples[:, 0], "U/mV": samples[:, 1], "Cell": f"cell {gid}"}
            )
        )

if len(df_list):
    df = pandas.concat(df_list, ignore_index=True)
    df.to_csv(f"result_mpi_{context.rank}.csv", float_format="%g")

In a second script, network_ring_mpi_plot.py, we load the results stored to disk into a pandas table, and plot the concatenated table as before:

#!/usr/bin/env python3
# This script is included in documentation. Adapt line numbers if touched.

import glob
import pandas
import seaborn

results = glob.glob("result_mpi_*.csv")

df_list = []
for result in results:
    df_list.append(pandas.read_csv(result))

df = pandas.concat(df_list, ignore_index=True)
seaborn.relplot(data=df, kind="line", x="t/ms", y="U/mV", hue="Cell", ci=None).savefig(
    "mpi_result.svg"
)

To avoid an overcrowded plot, this plot was generated with just 50 cells:

../_images/network_ring_mpi_result.svg

The full code

You can find the full code of the example at python/examples/network_ring_mpi.py and python/examples/network_ring_mpi_plot.py.