POP 2.0.1 is the Los Alamos National Laboratory’s Parallel Ocean Program. POP is a component of the NCAR Community Climate System Model. The benchmark includes two configurations, of which we will only be running the
7-day 1-degree production benchmark.
Building POP requires GNU make, MPI, a C compiler, a Fortran90 compiler, and the NetCDF libraries.
POP can be downloaded from http://web.ncar.teragrid.org/~bmayer/benchmarks/pop.tar.gz
Follow these steps to build the POP benchmark.
For the 1 degree case type:
./setup_run_dir run_dir.x1
This will create a directory for the 1-degree run called run_dir.x1 and copy the files necessary to compile and run the model into that directory.
For the 1-degree case type:
make x1 RUNDIR=../run_dir.x1
This will convert the necessary data files to native binary format and copy all needed data and source files into the run directory for the desired case.
nprocs_clinic=NP
nprocs_tropic=NP
where NP is the desired number or processors
mpirun --np 32 ./pop
Note: Each time pop is to be run with a different number or processors the namelist input file pop_in must be edited to reflect the desired number of processors. For certain processor numbers the values of max_blocks_clinic and max_blocks_tropic may need to be adjusted in the domain_size.F90 file (located in the run directory for each test case). If the values are too small, a message will be printed to standard out at run time stating what the values should be set to. The values should be adjusted accordingly and the code will need to be recompiled prior to running. The number of blocks assigned to each processor can be adjusted by changing the block sizes in the file domain_size.F90. the number of blocks will be :
(nx_global/block_size_x)*(ny_global/block_size_y).
On some systems the best performance is found using one block per processor. The value can be tuned to find the best performance on the target system. The values should be adjusted accordingly and the code recompiled prior to each run.
Numerical validation can be checked by examining the value of “mean K.E.” printed out at the end of the run. This number should agree to at least 5 decimal digits with the baseline values obtained on NCAR’s IBM POWER4 (bluesky) system, which are
0.965524462029654 for the 1-degree run
0.261029444867348 for the 0.1-degree run
Timing information is written to standard output at the end of the run. The most important timing number is timer number 11 which represents the wallclock execution time less initialization overhead.
The POP production benchmark (1-degree, 7-day simulation) should be run using from 1 to the maximum number of processors in the system. All of the output should be saved and the timer number 11 value at the end of each run should be recorded in the benchmark results spreadsheet.