NERSC/ORNL/PACE
Prerequisites¶
- a cluster/supercomputer with OpenSHMEM or UPC installed.
As discussed in the background section, hclib-actor
depends on Bale, which depends on either UPC or OpenSHMEM. Here we mainly explain steps to load OpenSHMEM, build bale, and build hclib-actor
on three platforms: Perlmutter@NERSC, Cori@NERSC, Summit@ORNL, and PACE (Phoenix)@ GT.
Installation/initialization Scripts¶
Tip
In order to run a job successfully every time after you login to a cluster/supercomputer, please make sure to
- Redirect to the directory where you initizally run the respective script for the platform using the above scripts.
source
the script again to set all environment variables.
Run¶
Example Slurm script (example.slurm
)
Example Slurm script (example.slurm
)
Example LSF script (example.lsf
)
#!/bin/bash
#BSUB -P XXXXX # project to which job is charged
#BSUB -W 0:30 # job will run at most 30 min
#BSUB -nnodes 2 # resources allocated, 2 nodes
#BSUB -alloc_flags smt1 # one logical thread per physical core
#BSUB -J histo # name of job
#BSUB -o histo.%J # stdout file
#BSUB -e histo.%J # stderror file
jsrun -n 84 ./histo_selector
Example Slurm script(example.sbatch
):
#!/bin/bash
#SBATCH -Joshmem # name of job
#SBATCH --account=GT-XXXXXXX # account to which job is charged
#SBATCH -N 2 # 2 nodes
#SBATCH -n 48 # resources allocated, 48 cores
#SBATCH -t15 # job will run at most 15mins
#SBATCH -qinferno # job is submitted to inferno queue
#SBATCH -ooshmem.out # output file is named oshmem.out
echo "Started on `/bin/hostname`" # prints name of compute node job was started on
cd $SLURM_SUBMIT_DIR # changes into directory where script was submitted from
source ./oshmem-slurm.sh
cd ./hclib/modules/bale_actor/test
srun -n 48 ./histo_selector
Example output:
Running histo on 48 threads
buf_cnt (number of buffer pkgs) (-b)= 1024
Number updates / thread (-n)= 1000000
Table size / thread (-T)= 1000
models_mask (-M)= 0
0.106 seconds
Manual installation instructions¶
This part can be done with the Installation/initialization Scripts, but you can also manually install everything with the guide below.
Load OpenSHMEM¶
Use Cray OpenSHMEMX
Use Cray SHMEM
Use OpenMPI's SHMEM (OSHMEM)
Note
You need to re-run the above commands every time you login to a cluster/supercomputer. You can use the respective script for the platform using the above pre-prepared scripts (source ./{PLATFORM}_setup.sh
or source ./oshmem-{PLATFORM}.sh
).
Build Bale and HClib¶
Bale¶
git clone https://github.com/jdevinney/bale.git bale
cd bale/src/bale_classic
export BALE_INSTALL=$PWD/build_${PLATFORM}
./bootstrap.sh
python3 ./make_bale -s
cd ../../../
Note
On Perlmutter, do patch -p1 < path/to/perlmutter.patch
in bale
directory after git clone
. You can find perlmutter.patch
here.
Note
Bale will be installed in bale/src/bale_classic/build_${PLATFORM}
HClib¶
git clone https://github.com/srirajpaul/hclib
cd hclib
git fetch && git checkout bale3_actor
./install.sh
source hclib-install/bin/hclib_setup_env.sh
cd modules/bale_actor && make
cd test
unzip ../inc/boost.zip -d ../inc/
make
cd ../../../../