Config Quick guide

Taking Advantage of the Simultaneous Multi-Theading (SMT) on Bluevista and Blueice

SCD Documentation: The gory details

Background:

With the current crunch on computing resources at NCAR, it is important for users to maximize the efficiency of both the release and development versions of CCSM on bluevista and blueice. At this time, not all CCSM users may be taking advantage of the SMT capabilities of the IBM platforms, which can offer a roughly 20%-30% efficiency increase with minimal changes to the CCSM scripts and no changes
to the code. Note that with SMT, the model will run with double the number of MPI tasks per node, and this can be utilized to either increase model throughput or decrease model cost
significantly in most production runs.

Definitions

SMT threads/node = ptiles x OMP_THREADS
# nodes = MPI_task/ptiles
4 threads/task
4 tasks x 4 threads = 1 node
ptile (ntasks on node) = 4
SMT Optimal Configurations

machine

OMP_THREADS

ptiles

SMT threads/node

bluevista (See below)

4

4

16

blueice

4

8

32

CAM-MPI only mode

1

32

32
Note:
We have introduced new machine support for "bluevista16", beginning with the CCSM tag, ccsm_3_1_beta39. The user should simply use the machine name "bluevista16" instead of "bluevista" and the
generated scripts will automatically take advantage of SMT to maximizing throughput. If users are utilizing pre-ccsm3_1_beta39 tags, then they should follow the directions below for the release-based
modifications.
Option 1) pre ccsm3_1_beta39 tags.Increasing throughput:
Double the number of MPI-tasks and use the same resources as before. You can expect a 30-40% increase in performace for the same cost.
OR
Option 2) leave the MPI-tasks the same. You can expect a 20-30% decrease in throughput with a corresponding 50% decrease in cost as well.
Example 1. Stand Alone Cam
Resolution: T85
nlat = 128 (Note that 128 threads is optimal)
nlon = 256
4 x 4 = 16 (tasks) x 8 (nodes) = 128 threads
32 total tasks (#bsub -n 32)
64 PEs (processor Equivalents)with smt = 128 threads
Configure in CAM run script:
#bsub -n 32 # number of MPI tasks
#bsub-R "span[ptile=4]" # max tasks per node
setenv OMP_NUM_THREADS 4

Space shortcuts

Child pages

Taking Advantage of the Simultaneous Multi-Theading (SMT) on Bluevista and Blueice

SCD Documentation: The gory details

Background:

Definitions

SMT Optimal Configurations

Example 1. Stand Alone Cam

machine	OMP_THREADS	ptiles	SMT threads/node
bluevista (See below)	4	4	16
blueice	4	8	32
CAM-MPI only mode	1	32	32