ActorProf: A Framework for Profiling and Visualizing Fine-grained Asynchronous Bulk Synchronous Parallel Execution¶
This document gives a brief guidance on how to generate trace for actor applications on COTS (typically x86) systems using ActorProf. For more details about ActorProf, please refer to our paper.
Step-by-Step Guide¶
Here we will take Triangle Counting selector as an example on Perlmutter
.For other machines, please follow the Manual installation and run instructions
Step 1: Build HClib
and Bale
¶
Build the HClib and bale libraries to setup the environment. For setting up HClib
and Bale
libraries on Perlmutter
, please source the perlumtter_setup.sh
script provided.
Note: please re-direct to current directory and source the setup script again to set all environment variables every time after you login to a cluster/supercomputer
step 2: Allocate interactive compute node¶
Allocate to run the Actorprof scripts
Step 3: Run Actorprof Script¶
Run the ActorProf bash script (run_actorprof.sh
) which has 4 options:
source ./run_actorprof.sh [logical | papi | physical| overall | all] [triangle_selector | triangle_selector_interval] [1...N] [1...N]
[logical | papi | physical | overall | all] Selects which type of trace (or all) to generate
[triangle_selector | triangle_selector_interval] Selects which application to generate the trace (triangle_selector - cyclic distribution or triangle_selector_interval - range distribution)
[1...N] Selects Scale of the RMATE graph
[1...N] Selects the number of cores for the run
Logical Trace¶
E.g. Generate logical trace of triangle selector with 1D Cyclic distribution on scale of 10 using 2 cores
It will generate one trace file *send.csv
for each PE and a Heatmap application_logical.png
. In this example, three files PE0_send.csv
, PE1_send.csv
and triangle_selector_logical.png
were generated since we ran this application on two threads.
HWPC Trace¶
E.g. Generate HWPC trace of triangle selector with 1D Cyclic distribution on scale of 10 using 2 cores
It will generate two trace file PE*_send.csv
and PE*_papi.csv
for each PE, one Heatmap logical.png
, and a bar graph papi.png
. In this example, three files PE0_send.csv
, PE1_send.csv
, logical.png
, and papi.png
were generated since we ran this application on two threads.
Physical trace¶
E.g. Generate physical trace of triangle selector with 1D Cyclic distribution on scale of 10 using 2 cores
It will generate one trace file physical.txt
and a stacked bar graph physical.png
.
Overall Trace¶
E.g. Generate overall trace of triangle selector with 1D Cyclic distribution on scale of 10 using 2 cores
It will generate the overall.txt
trace file and a stacked bar graph overall.png
.
Note: user can use all
to generat all four trace mentioned above at once
Manual installation and run instructions¶¶
If user decide to build and run Actorprof manually without using the ActorProf bash script (run_actorprof.sh
), you can use the guide below.
Step 1: Environment Setup¶
Please refer to HClib-Actor setup page for more details on how to build the HClib and bale libraries to setup the environment.
Step 2: Build Application with trace flag enabled¶
-DENABLE_TRACE
flag for enabling logical message generation macro.-DENABLE_TRACE_PAPI
flag for enabling logical message and HWPC trace generation macro.-DENABLE_TCOMM_PROFILING
flag for enabling overall trace generation macros.-DENABLE_TRACE_PHYSICAL
flag for enabling physical message trace generation macro.
Below is an example of building the 1D-Cyclic Triangle Counting application on Perlmutter with logical, overall, and physical trace macros respectively using Makefile
.
cd $PWD/hclib/modules/bale_actor/test
make triangle_selector_logical
make triangle_selector_papi
make triangle_selector_overall
make triangle_selector_physical
Step 3: Trace Generation¶
Here we will take 1D-Cyclic Triangle Counting to run as an example on Perlmutter
interactive node.
1) To generate Logical Message Trace and HWPC Trace
It will generate two trace files (*send.csv
and *PAPI.csv
) for each PE.
In this example, 64 data files, i.e., PE0_send.csv
, PE0_PAPI.csv
, PE1_send.csv
, PE1_PAPI.csv
,..., will be generated since we ran this application on 32 threads.
Note: To generate logical trace only, please use triangle_selector_logical executable.
2) To generate Overall Trace
overall.txt
contains overall trace for every PE in one .txt
file.
3) To generate Physical Message Trace
physical.txt
contains Physical message trace for every PE in one .txt
file.
Step 4: ActorProf Visualization¶
Four type of graphs can be generated with ActorProf with different flags using actorprof.py
, please put all generated trace into the data directory before running ActorProf.
transfer.sh
can be used to create data dir in correct format and move all generated trace into the data directory.
Cautious: Please use the script or manually empty/remove the data directory every time generating a new trace file to aviod trace data overlap, which may lead to incorrect visualizing result.
Path to the data directory (path
) and total number of PEs( -n
or --num_PEs
) used to generate the trace files are required for running the ActorProf.
1) Logical Message trace Heatmap
-l
flag is needed to generate Logical Message Trace Heatmap
2) Physical Message trace Heatmap
-p
option is needed to generate Physical Message Trace Heatmap
0
for Local Send Message trace Heatmap (Default)
1
for Non-blocking Message Trace Heatmap
3) HWPC trace Heatmap
-lp
flag is needed to generate HWPC Trace bar-graph
4) Overall trace Heatmap
-s
flag is needed to generate stacked bar-graph for overall absolute and relative execution time
Note: please specify all flags when trying to profile all result.
Example to run ActorProf visualizer using actorprof.py
to generate physical trace Heatmap.
All result will be saved as an .png
figure.
Top-Level Directory Organization¶
The folder structure of this repository is as follows:
.
├── ActorProf # Contains files for the ActorProf Tool
│ ├── hclib # Contains the HClib library and the Actor-based runtime
│ │ ├── ...
│ └── ─── modules
│ │ │ ├── ...
│ └── ─── ─── bale_actor
│ │ │ │ ├── ...
│ └── ─── ─── ─── test # Contains the Triangle Counting Selector application files
│ │ │ │ ├── triangle_selector.cpp # Triangle Counting code for 1D-Cyclic version
│ │ │ │ ├── triangle_selector_interval.cpp # Triangle Counting code for 1D-Range version
│ │ │ │ ├── small.mtx # Scale of 16 Triangle Counting graph
│ └── ─── ─── ─── ...
├── logical.py # Visualization for Logical Message Trace
├── papi.py # Visualization for HWPC Trace
├── physical.py # Visualization for Physical Message Trace
├── overall.py # Visualization for Overall Trace
├── generate_rmate.py # RMAT Graph generation for applications
├── run_actorprof.sh # Top to down complete run script for using ActorProf
└── README.md
Citation¶
If you use our application in your work, please cite our paper.
ActorProf: A Framework for Profiling and Visualizing Fine-grained Asynchronous Bulk Synchronous Parallel Execution. Jiawei Yang, Shubhendra Pal Singhal, Jun Shirako, Akihiro Hayashi, Vivek Sarkar. Workshop on Programming and Performance Visualization Tools (ProTools2024, co-located with SC24)
Corresponding author: Jiawei Yang (jyang810@gatech.edu), Shubhendra Pal Singhal(ssinghal74@gatech.edu)