Hardware management¶
Arbor provides two library APIs for working with hardware resources:
The core libarbor is used to describe the hardware resources and their contexts for use in Arbor simulations.
The libarborenv provides an API for querying available hardware resources (e.g. the number of available GPUs), and initializing MPI.
libarborenv¶
The libarborenv API for querying and managing hardware resources is in the
arbenv
namespace.
This functionality is kept in a separate library to enforce
separation of concerns, so that users have full control over how hardware resources
are selected, either using the functions and types in libarborenv, or writing their
own code for managing MPI, GPUs, and thread counts.
-
arb::util::optional<int>
get_env_num_threads
()¶ Tests whether the number of threads to use has been set in an environment variable. First checks
ARB_NUM_THREADS
, and if that is not set checksOMP_NUM_THREADS
.Return value:
no value: the
optional
return value contains no value if the no thread count was specified by an environment variable.has value: the number of threads set by the environment variable.
Throws:
throws
std::runtime_error
if environment variable set with invalid number of threads.
#include <arborenv/concurrency.hpp> if (auto nt = arbenv::get_env_num_threads()) { std::cout << "requested " << nt.value() << "threads \n"; } else { std::cout << "no environment variable set\n"; }
-
int
thread_concurrency
()¶ Attempts to detect the number of available CPU cores. Returns 1 if unable to detect the number of cores.
#include <arborenv/concurrency.hpp> // Set num_threads to value from environment variable if set, // otherwise set it to the available number of cores. int num_threads = 0; if (auto nt = arbenv::get_env_num_threads()) { num_threads = nt.value(); } else { num_threads = arbenv::thread_concurrency(); }
-
int
default_gpu
()¶ Returns the integer identifier of the first available GPU, if a GPU is available
Return value:
non-negative value: if a GPU is available, the index of the selected GPU is returned. The index will be in the range
[0, num_gpus)
wherenum_gpus
is the number of GPUs detected using thecudaGetDeviceCount
CUDA API call.-1: if no GPU available, or if Arbor was built without GPU support.
#include <arborenv/gpu_env.hpp> if (arbenv::default_gpu()>-1) {} std::cout << "a GPU is available\n"; }
-
int
find_private_gpu
(MPI_Comm comm)¶ A helper function that assigns a unique GPU to every MPI rank.
Note
Arbor allows at most one GPU per MPI rank, and furthermore requires that an MPI rank has exclusive access to a GPU, i.e. two MPI ranks can not share a GPU. This function assigns a unique GPU to each rank when more than one rank has access to the same GPU(s). An example use case is on systems with “fat” nodes with multiple GPUs per node, in which case Arbor should be run with multiple MPI ranks per node. Uniquely assigning GPUs is quite difficult, and this function provides what we feel is a robust implementation.
All MPI ranks in the MPI communicator
comm
should call to avoid a deadlock.Return value:
non-negative integer: the identifier of the GPU assigned to this rank.
-1: no GPU was available for this MPI rank.
Throws:
std::runtime_error
: if there was an error in the CUDA runtime on the local or remote MPI ranks, i.e. if one rank throws, all ranks will throw.
-
class
with_mpi
¶ The
with_mpi
type is a simple RAII scoped guard for MPI initialization and finalization. On creationwith_mpi
will callMPI_Init_thread
to initialize MPI with the minimum level thread support required by Arbor, that isMPI_THREAD_SERIALIZED
. When it goes out of scope it will automatically callMPI_Finalize
.-
with_mpi
(int &argcp, char **&argvp, bool fatal_errors = true)¶ The constructor takes the
argc
andargv
arguments passed to main of the calling application, and an additional flagfatal_errors
that toggles whether errors in MPI API calls should return error codes or terminate.
Warning
Handling exceptions is difficult in MPI applications, and it is the users responsibility to do so.
The
with_mpi
scope guard attempts to facilitate error reporting of uncaught exceptions, particularly in the case where one rank throws an exception, while the other ranks continue executing. In this case there would be a deadlock if the rank with the exception attempts to callMPI_Finalize
and other ranks are waiting in other MPI calls. If this happens inside a try-catch block, the deadlock stops the exception from being handled. For this reason the destructor ofwith_mpi
only callsMPI_Finalize
if there are no uncaught exceptions. This isn’t perfect because the other MPI ranks still deadlock, however it gives the exception handling code to report the error for debugging.An example workflow that uses the MPI scope guard. Note that this code will print the exception error message in the case where only one MPI rank threw an exception, though it would either then deadlock or exit with an error code that one or more MPI ranks exited without calling
MPI_Finalize
.#include <exception> #include <iostream> #include <arborenv/with_mpi.hpp> int main(int argc, char** argv) { try { // Constructing guard will initialize MPI with a // call to MPI_Init_thread() arbenv::with_mpi guard(argc, argv, false); // Do some work with MPI here // When leaving this scope, the destructor of guard will // call MPI_Finalize() } catch (std::exception& e) { std::cerr << "error: " << e.what() << "\n"; return 1; } return 0; }
-
libarbor¶
The core Arbor library libarbor provides an API for:
prescribing which hardware resources are to be used by a simulation using
arb::proc_allocation
.opaque handles to hardware resources used by simulations called
arb::context
.
-
class
proc_allocation
¶ Enumerates the computational resources on a node to be used for simulation, specifically the number of threads and identifier of a GPU if available.
Note
Each MPI rank in a distributed simulation uses a
proc_allocation
to describe the subset of resources on its node that it will use.#include <arbor/context.hpp> // default: 1 thread and no GPU selected arb::proc_allocation resources; // 8 threads and no GPU arb::proc_allocation resources(8, -1); // 4 threads and the first available GPU arb::proc_allocation resources(8, 0); // Construct with auto num_threads = arbenv::thread_concurrency(); auto gpu_id = arbenv::default_gpu(); arb::proc_allocation resources(num_threads, gpu_id);
-
proc_allocation
() = default¶ By default selects one thread and no GPU.
-
proc_allocation
(unsigned threads, int gpu_id)¶ Constructor that sets the number of
threads
and the idgpu_id
of the available GPU.
-
unsigned
num_threads
¶ The number of CPU threads available.
-
-
class
context
¶ An opaque handle for the hardware resources used in a simulation. A
context
contains a thread pool, and optionally the GPU state and MPI communicator. Users of the library do not directly use the functionality provided bycontext
, instead they create contexts, which are passed to Arbor interfaces for domain decomposition and simulation.
Arbor contexts are created by calling make_context()
, which returns an initialized
context. There are two versions of make_context()
, for creating contexts
with and without distributed computation with MPI respectively.
-
context
make_context
(proc_allocation alloc = proc_allocation())¶ Create a local
context
, with no distributed/MPI, that uses local resources described byalloc
. By default it will create a context with one thread and no GPU.
-
context
make_context
(proc_allocation alloc, MPI_Comm comm)¶ Create a distributed
context
. A context that uses the local resources described byalloc
, and uses the MPI communicatorcomm
for distributed calculation.
Contexts can be queried for information about which features a context has enabled, whether it has a GPU, how many threads are in its thread pool, using helper functions.
-
unsigned
num_ranks
(const context&)¶ Query the number of distributed ranks. If the context has an MPI communicator, return is equivalent to
MPI_Comm_size
. If the communicator has no MPI, returns 1.
-
unsigned
rank
(const context&)¶ Query the rank of the calling rank. If the context has an MPI communicator, return is equivalent to
MPI_Comm_rank
. If the communicator has no MPI, returns 0.
Here are some simple examples of how to create a arb::context
using
make_context()
.
#include <arbor/context.hpp>
// Construct a context that uses 1 thread and no GPU or MPI.
auto context = arb::make_context();
// Construct a context that:
// * uses 8 threads in its thread pool;
// * does not use a GPU, regardless of whether one is available;
// * does not use MPI.
arb::proc_allocation resources(8, -1);
auto context = arb::make_context(resources);
// Construct one that uses:
// * 4 threads and the first GPU;
// * MPI_COMM_WORLD for distributed computation.
arb::proc_allocation resources(4, 0);
auto mpi_context = arb::make_context(resources, MPI_COMM_WORLD)
Here is a more complicated example of creating a context
on a
system where support for GPU and MPI support are conditional.
#include <arbor/context.hpp>
#include <arbor/version.hpp> // for ARB_MPI_ENABLED
#include <arborenv/concurrency.hpp>
#include <arborenv/gpu_env.hpp>
int main(int argc, char** argv) {
try {
arb::proc_allocation resources;
// try to detect how many threads can be run on this system
resources.num_threads = arbenv::thread_concurrency();
// override thread count if the user set ARB_NUM_THREADS
if (auto nt = arbenv::get_env_num_threads()) {
resources.num_threads = nt;
}
#ifdef ARB_WITH_MPI
// initialize MPI
arbenv::with_mpi guard(argc, argv, false);
// assign a unique gpu to this rank if available
resources.gpu_id = arbenv::find_private_gpu(MPI_COMM_WORLD);
// create a distributed context
auto context = arb::make_context(resources, MPI_COMM_WORLD);
root = arb::rank(context) == 0;
#else
resources.gpu_id = arbenv::default_gpu();
// create a local context
auto context = arb::make_context(resources);
#endif
// Print a banner with information about hardware configuration
std::cout << "gpu: " << (has_gpu(context)? "yes": "no") << "\n";
std::cout << "threads: " << num_threads(context) << "\n";
std::cout << "mpi: " << (has_mpi(context)? "yes": "no") << "\n";
std::cout << "ranks: " << num_ranks(context) << "\n" << std::endl;
// run some simulations!
}
catch (std::exception& e) {
std::cerr << "exception caught in ring miniapp: " << e.what() << "\n";
return 1;
}
return 0;
}