Skip to content
Thomas Herault edited this page Feb 6, 2017 · 21 revisions

Table of Contents

How to Write a Performance Instrumentation module

This page will explain how to write a new module for the Performance INStrumentation (PINS) system to instrument the operation of PaRSEC. It uses the existing module (source code available in the directory) as an example of a bare-bones PINS module that does not require the use of the PaRSEC profiling system for the collection of its data, and does not use PAPI for hardware measurement purposes. For help with using PAPI and the profiling system in PaRSEC to perform low-impact hardware-provided event measurements, refer to the built-in module once you understand the architecture of the PINS system.

PINS is implemented as an extension to the Modular Component Architecture, originally of OpenMPI. This provides very simple integration with the PaRSEC build process and PaRSEC runtime, at the cost of some extra 'markup' code in a file for each module. You are encouraged to use any existing PINS module as a reference for how to write the code necessary for MCA to understand your PINS module. //Please note// that the naming scheme of <directory></directory>, e.g. , containing a bare minimum of and (e.g., ) is **absolutely mandatory** for the PaRSEC MCA compilation and initialization process to function at all.

A PINS module is generally comprised of two main pieces: the initialization and finalization code, and your custom instrumentation callback functions themselves.


Initialization Routines

There are 3 pairs of initialization and finalization routines - each invoked by a specific part of the runtime - and your module may implement as many or as few of them as are needed. They are:

It is customary to name the init/fini routines in order to maintain a level of consistency that will be helpful to other module developers. This naming convention is fully demonstrated in the and modules.

Any of these 6 routines may be provided by your module, and they will be invoked automatically by the PaRSEC runtime if they exist. You **must** specify which of the init/fini routines are provided, using a struct similar to the one in 's file:

        • is invoked a single time, during , before the PaRSEC profiling system is active, and before the entirety of the structure (particularly including all virtual process and thread structures) is instantiated, but after the initialization of the PaRSEC profiling system (if built). This is the right place to perform any custom initialization, such as global memory allocation, that does not depend on anything thread or handle-related.
 is also the usual place to make calls to , the macro which will register your custom callback code with whichever callbacks your module requires. It is customary, in order to provide callback-overloading, that you store the return value of  (a  pointer) so that it can be called before your callback returns -- this will be demonstrated in the section on Callback functions.
        • is the companion routine for . It is invoked a single time per PaRSEC runtime, after all processing has been completed, after the virtual process and thread structures have been destroyed, but before the PaRSEC profiling subsystem is flushed and removed. It is generally used to undo everything done by the corresponding initialization routine, which should also generally include the re-registration of previously stored callback pointers returned by , in order to cleanly allow for future use cases in which callbacks may continue to occur throughout PaRSEC system shutdown and cleanup. is guaranteed to return a valid callback function pointer unless a malicious or otherwise broken module has already submitted an invalid function pointer for registration.
The routine may also be used to perform final operations with items stored in self-allocated or otherwise safe global memory.
        • is invoked a single time per instantiated . It is called after most of the handle information has been created, so initialization that is dependent on the handle's aspects should generally function as desired.
Currently, this method is not commonly used, as the initialization process for the itself does not lend itself to the reliable use of the corresponding routine, and the PINS system itself does not allow for separate callbacks per handle. //In the future, it is expected that instrumentation relating to the performance of most tasks would be best initialized and finalized per handle, as opposed to per overall PaRSEC process// (), //so as to provide the full flexibility of the PaRSEC runtime as it potentially runs many separate DAG graphs in parallel.//
        • is not currently provided by the PINS system, though it will soon be implemented and will provide a functionality mirroring that of , and resembling that of , but per handle instead of per PaRSEC runtime.
        • is invoked once per thread created by the PaRSEC runtime, very close to the beginning of that thread's lifetime, but, crucially, **after** the initialization of the PaRSEC profiling system and the registration of the thread with PAPI. This allows for the full and safe use of both the profiling system and PAPI calls, e.g. . In PaRSEC, each thread is associated with one and only one for its entire lifetime, and these objects will have been fully initialized before the call to the module thread_init routine. However, the order of these per-thread invocations is //**undefined**//.
In , the thread_init routine simply allocates per-execution unit data counters.
        • is invoked once per thread very near the end of the thread's lifetime, but before the destruction of the profiling system. The order of **these** invocations //**is**// defined: the master thread (thread 0) will be fini'd first, and the rest of the threads will be fini'd in counting order of which core they were bound to, e.g.: 0, 1, 2, 3, ..., 46, 47, 48. This is particularly useful for the module, as it allows an elegant, in-order, row-by-row print of the task-stealing matrix that would otherwise have to be accomplished by nested loops in the call.

Callback functions

The callback functions are the core of most PINS modules. Various callback types are built in to PaRSEC, and new callback types can be added easily by modifying a single enum in the file and by inserting corresponding uses of the macro where the instrumentation is required.

PINS callbacks are functions with a very specific function prototype - the callbacks receive a pointer to a (the thread-specific data structure), a pointer to a (the data structure encapsulating a single task in the DAG), and a pointer, which may or may not be a pointer to some sort of data specific to a particular callback type. Most callback types will provide valid pointers for at least the first two. However, any of these 3 pointers may be NULL depending on whether it makes sense and is possible to provide each of the items for the given callback type.

Most modules will likely use the built-in callback types. The most up-to-date list of callback types can be found in the source file , but as of this writing, the provided and working callback types are:

  • - called before scheduler begins looking for an available task
  • - called after scheduler has finished looking for an available task
  • - called before thread executes a task
  • - called after thread executes a task
  • - //special//. Provided as an option for modules to do work during thread init without using the MCA module registration system.
  • - //special//. Similar to above, for thread finalization.
  • - //special//. Similar to , for handle initialization.
  • - //special//. Similar to , for handle finalization.
 uses only one type - , which is called at the end of task selection by the scheduler. It receives an execution unit, an execution context, and, by the particular implementation of the  callback type in the schedulers that come included with PaRSEC, an integer (in the guise of a ) representing the number of the core from which the execution context was stolen. These three items are used to increment one of the steal counters for the provided execution unit. Then, as previously mentioned, the callback is chained to the next registered callback (if any) with a simple invocation of the function pointer returned by the original call to  in the  initialization routine.

Limiting Allowable Modules at Runtime

By default, PaRSEC compiles (via MCA) all modules present in the directory at build time. Since PINS is designed to allow multiple modules to run at the same time, all built modules are assumed to be compatible and will be automatically used during runtime. In the case that modules that conflict with each other for whatever reason (e.g., multiple concurrent or overlapping registrations for PAPI events), it may be desirable to exclude some or all PINS modules from activation without the additional work of moving directories out of the source tree and recompiling.

Therefore, there is also a built-in capability to limit activated modules by name during runtime. The function **** is a once-per-runtime invocation that receives an array of names corresponding to the module names (recall that the module name is also the name of the directory inside the PINS source directory that contains the module code) that you wish to allow for activation during runtime.

It is not necessary to use this function at all - by default, all modules are **//enabled//**. If you //do// wish to use the function, you must call it **before** calling , as the function will have no effect after is run.

However, the DPLASMA testing executables provided with PaRSEC automatically invoke this function with the contents of the runtime flag. As a result, all PINS modules are **//disabled//** by default in DPLASMA tests, as the default 'content' of the flag is null. To enable the use of any PINS module(s) when testing with a provided DPLASMA testing executable, you must use the command line flag with a list of comma-separated PINS module names, e.g. , or