System Overview

Frost is a single-rack Blue Gene/L system with 1024 dual-processor
compute nodes, and 32 I/O nodes (1 I/O node is allocated for every 32
compute nodes). All Blue Gene/L partition sizes of a power of 2 between
32 and 2048 tasks are available. Frost uses the Cobalt scheduler from
ANL, and uses a high-performance GPFS parallel filesystem for all I/O.

Recommended Use

The Frost Blue Gene/L system is a highly scalable platform for
developing, testing and running parallel MPI applications up to 2048
processors, and providing efficient computing for smaller job sizes.

Status

Frost is available for requesting development allocations (DAC) starting
July, 2007. Requests will be accepted starting Sept. 17, 2007 for medium
and large allocations (MRAC/LRAC) beginning Jan. 1, 2008.

Access

Roaming accounts are not automatically enabled. In order to access the
Frost system, please open a ticket requesting that your roaming account
be enabled. Once your account is created on the system, the relevant DNS
entries are as follows:

tg-login.frost.ncar.teragrid.org

User Logins

gridftp.frost.ncar.teragrid.org

GridFTP

gatekeeper.frost.ncar.teragrid.org

Globus/GRAM. (Use the 'jobmanager-cobalt' job manager).

CTSS Software

The following CTSSv4 kits are available in production:

  • Core
  • Remote Login
  • Remote Compute
  • Data Movement
  • Application Development and Runtime Support
  • Parallel Application
  • Science Workflow Support

The following CTSSv4 kits will not be supported on this resource:

  • Data Management
  • Visualization Support

System Specific Information

The user documentation for the Frost system is available at
https://wiki.ucar.edu/display/BlueGene/Frost

Filesystems

Path

Soft Quota

Hard Quota

Use

Backups

Use

/home/username

4MB

16MB

binaries

~Monthly

The /home filesystem is NFS mounted from the Blue Gene/L service node, and is not to be used for job data.

/ptmp/username

600GB

800GB

Job data

Never

/ptmp is a GPFS filesystem which provides up to 600MB/s for job I/O and job data. The filesystem is 6TB, and the quotas are only to prevent abuse - please police your own use. Files are not scrubbed, but under high use the administrative staff will contact the peak users, and may potentially intervene.

/mnt/gpfs-wan

(see SDSC)

(see SDSC)

Copy files to/from this data store and local resources


The GPFS-WAN filesystem from SDSC is available as /mnt/gfps-wan, but is only provided as a convenience for moving files between the sites. Currently, running jobs out of this filesystem is not supported. Note that we do not copy the entire set of TeraGrid DN's - if your home directory on this filesystem shows up as being owned by 'nobody', the local set of DN's for your user account does not match the set used for GPFS-WAN. Please open a ticket at help@teragrid.org to have us add the additional DN's for your account.

Overview of the Cobalt Scheduler

Command

Description

cqsub [options] [executable]

Submit a job. Common options:

-p [project_number]

Use tgusage to find your project number

-q [queue]

Use 'teragrid' queue for production work, 'debug' for short debugging runs.

-n [nodes]

Number of MPI Tasks

-t [time]

Wallclock time, in minutes

cqstat -f

queue status, full display

cqstat -q

display queue limitations and state

showres

show current and upcoming reservations

partlist -l

show Blue Gene/L partition to queue assignments

nodes

list free partitions (custom NCAR command)

Archival Storage

NCAR's 3PB mass storage system is available for TeraGrid use. Please see
the following documentation for complete information:

MSS main page:
http://www.cisl.ucar.edu/hss/mssg/mss.jsp

Introduction to NCAR's Mass Storage System (MSS), Key information, file
purge policies, usage instructions:
http://www.cisl.ucar.edu/docs/mss/

MSS commands quick reference:
http://www.cisl.ucar.edu/mss/quick.html

  • No labels