Data Analysis Services Group - March 2012

News and Accomplishments

VAPOR Project

Project information is available at: http://www.vapor.ucar.edu

TG GIG PY6 Award:

We met with Pablo Mininni and Duane Rosenberg,  who are going to use VAPOR’s parallel data writing (PIOVDC) for a large visualization in their ASP project at Yellowstone (if awarded).  Pablo and Duane also expressed the need for VAPOR to enable easier data conversion for scientists.

User documentation for PIOVDC was written and is available from    https://wiki.ucar.edu/display/dasg/PIOVDC/display/dasg/PIOVDC

A PIOVDC test driver and example code were created. The example code is part of the user documentation.

Benchmarking of PIOVDC continued on Lynx and Janus. The Lustre file system on Lynx has posed a number of challenges that make it difficult to get acceptable performance without substantial tuning and tweaking. With some effort we are now seeing fairly consistent and not unreasonable results on both systems. However, IO rates with PIO when writing directly to a netCDF file are still 2 to 3 times faster than PIOVDC. The reason for the performance penalty is not yet known.

XD Vis Award:

Work continued on implementing a depth peeling algorithm to support correct rendering of semi-transparent geometry.

KISTI Award:

We have begun negotiating with KISTI for our 2nd year contract. KISTI has requested one significant development addition (support for FVCOM and ROMS ocean model data), and also asked us to host another student visitor in 2012.  We have evaluated both ocean models and decided to prototype the ability to visualize ROMS, but not to address FVCOM yet. We are still exploring the possibility of supporting a visitor.

Development:

  • Alan has been making numerous changes to enable use of the RegularGrid class that John implemented.  This will faciltate the proper handling of missing values in VAPOR datasets, such as needed for visualizing ocean data.  Extensive changes have been made to the flow integration library and it appears to be operating properly; however a few problems were identified in the flow library and still need to be fixed.  Changes were also made to the probe, two-D and image renderers and all is working well.  Many changes were made to simplify the methods that deal with voxel-based and user-coordinate based data.  We still need to convert our Python libraries to work with the RegularGrid class.
  • A number of minor changes were made to improve the performance of AMR data handling for deep hierarchies. In particular, topography information is now processed lazily and only when changed.
  • A GetMinCell() method was added to the RegularGrid class to facilitate the calculation of the stepping size during flow integration
  • The DataManager was reorganized slightly to improve performance when caching expensive
    to calculate attributes (e.g. data value range, variable existence)
  • A specialization of the RegularGrid class was implemented to support stretched grids (needed by ocean modelers, for example).
  • Work began on a custom GLSL shader for volume rendering of stretched grids.
  • Support for 2D grids was added to the RegularGrid class and all of its derivatives.
  • A number of regressions introduced by the conversion to the RegularGrid class were uncovered and fixed.
  • The DataMgr methods for mapping between integer grid point coordinates and user coordinates were not correctly handling layered (e.g. terrain following) grids and had to be fixed.

Administrative:

The VAPOR team had kick off meeting and a couple of follow ups to develop a two-year plan. Preliminary goals were established and a number of ideas were discussed for achieving plan goals. The intent of the plan is to look beyond incremental feature additions to the VAPOR suite of applications.

Education and Outreach:

  • Cindy Bruyere and Wei Wang invited us to present a vapor tutorial at WRF workshop on 6/29, and we have reserved the CTTC for this purpose.
  • Cindy Bruyere and Alan met to discuss increasing the VAPOR presence in the WRF tutorials.   The current plan is that we will include additional material in the WRF tutorial on unsteady flow, and also material on Cindy’s Colorado case study
  • Alan and John are preparing to attend and present our work at DOECGF in Albuquerque at the end of April.
  • SIPARCS candidate Ashish Dhital accepted an offer to work with Alan this summer on animation control.

Consulting:

  • Janice Coen is preparing a new wildfire simulation and visualization.  Alan has been helping her with the new capabilities in Vapor 2.1.

Systems Projects

Storage Research

  • Downloaded and installed the "cyberduck", an open source cloud file download program on Mac. Tests on Openstack cloud storage will be performed using curl, swift, and cyberduck
  • Added PasteDeploy package and upgraded the sqlite libraries on stratus nodes to finish the proxy-server setup for swift.
  • Rebuilt the entire swift package on Mac using the following sequence as an exercise:
    python -> (rsync) -> setuptools -> eventlet -> WebOb _> simplejson -> xattr -> nose -> Sphinx -> netifaces -> PasteDeploy -> swift
  • Installed "swauth" package to provide more manageable layer for multiple users for swift (from default tempauth)
  • Tested -S segment option for large files (>5GB) with success.
  • Reconfigured cyberduck with "defaults write ch.sudo.cyberduck cf.authentication.context /auth/v1.0" for swauth authentication
  • Installed keystone package for future option to provide PAM shim to openstack for existing users.
  • Reconfigured LUNs on flex380 for gpfs native raid, added RDAC for consistent mapping on stratus nodes.
  • Began to reconfigure the stratus systems for testing GPFS Native RAID, only to find that Linux isn't fully supported yet due to a few missing binaries. Tested the native raid, but found out it is not for x86 platform yet. In spite of mmcrrecoverygroup missing, we proceeded with tscrrecgruop which failed due to missing "tslsosdisk" command from log messages
  • Downloaded lustre2.1.1 from whamcloud (Feb 12 version) We plan to remove lustre2.1.0 from swtest1 and deploy full luste2.1.1 on mds+stratus[0-3] setup and reconfigure the lustre2 test LUN for openstack expansion to stratus1 node.
  • Reconfigured the openstack to use all four stratus nodes (glitch on ring-builder add was corrected via backups) which was related with BUG # 943493 rebalance crash

Security & Administration Projects

  • Finished reading the NSA "Guide to the Secure Configuration of Red Hat Enterprise Linux 5" to get some ideas about what should be done to secure the new GLADE servers located at NWSC.

System Monitoring & Problem Resoultion Project

  • Wrote numerous procedures for handling GLADE requests and problems reported through the Help Desk. These procedures cover all tickets we have received since Jan 1, 2012. This is in preparation for support of the new NWSC GLADE environment.

NWSC Planning & Installation

  • Attended two follow-up in-person interviews for the remaining open NWSC SA1 positions.
  • Started testing the xCAT package that will be used for system management on the CFDS servers.  The xCAT documentation is not very good and some of it has not been updated for a number of years.  This causes one to have to look in a number of different documents to obtain an overview of the architecture and steps to install xCAT, as well as the steps to use xCAT to provision and manage systems.  Managed to get xCAT 2.7 installed and use it to create and install several KVM based virtual servers on a second KVM provisioning server.
  • Reviewed an IBM presentation about the current plans for the hardware configuration of the NWSC components.

System Support

Data Analysis & Visualization Clusters

  • Installed IDV, rar/unrar on all systems.
  • Made a script for CSG that logs any user process that runs for over an hour on the mirage systems.  This is to help them get a sense of a reasonable wall-clock limit for the scheduler.
  • Investigated a possible Phalanx infection on mirage2 (false alarm).

GLADE Storage Cluster

  • created the group 'wgomd' for Gary Strand to control access to /glade/data01/wgomd
  • Created a script to automatically try to fix the SCSI I/O error issue on the NSD servers.
  • Began to investigate some of the more unstable GPFS clients causing brief filesystem hangs.  Managed to get bross and evans to stop remounting filesystems every day.
  • Asked admins at NICS about their ibsrp.conf setup regarding "DDN_HOST_TIMEOUT=80" value. None of their machines has the particular timeout variable set. Search for the same conf file at SDSC, NCSA, TACC, and PSC did not lead to the same variable either. We plan to increase the value to reduce the number of SCSI errors on glade during the power down on April 14.
  • Had a discussion with Si on posixio.c of netcdf libsrc source as possible cause of blocksize mismatch leading to poor performance observed by Gary Strand and others
  • Replaced GBIC for the oasisa host port to Brocadeswitch that caused scsi I/O errors in oasis1. The GBIC was the one at the switch port not at the controller side (oasisa)

Data Transfer Cluster

  • Still awaiting establishment of a production UCAS authentication based MyProxy server for use with the GLADE GridFTP service.  The schedule for the implementation of this server has not yet been provided to me.  A decision was made to use the temporary testing MyProxy server established late last year in the interim to allow data transfers via Globus Online to proceed using UCAS token authentication.
  • Took ownership of the "ncar" Globus Online user from Si Liu and reconfigured it into a DASG group managed login at Globus Online.  Since no support is provided by Globus Online for organizations to define officially sanctioned and managed endpoints, so you have to hope no one else has previously registered your institution name.  Used this login to provide publicly accessible NCAR related GLADE endpoint definitions for GridFTP transfers.
  • Answered some questions for Chi-fan about using Globus Online for GridFTP transfers.  Also answered some questions for him about X.509 certificates.

Lynx Cluster

  • Had discussion and exchanged messages with Wes Jeannete on lynx-lustre performance monitoring
  • Had discussion with John Ellis of SSG for DVS reconfiguration (which were later deferred with moab reconfiguration) The original goal was to provide more interactive nodes to lynx users, but adding more jobs on dvs nodes was preferred as an intermediate step before reconfiguring DVS nodes as login nodes.
  • No labels