Data Analysis Services Group - February 2011

News and Accomplishments

Ben Brown produced an animation of a reversal of the sun’s magnetic field using VAPOR.  This is headlined on the NICS web site at www.nics.tennessee.edu.

VAPOR Project

Project information is available at: http://www.vapor.ucar.edu

2.0.2 release: Work continued in February for a bug fix release of VAPOR following the major 2.0 release last December. The targeted drop date for this patch release was the end of February, but has now been pushed back to mid-March due to the two-week outage of the SourceForge site that hosts the VAPOR code repository. Numerous small bugs have been fixed, including:

  • The wavelet SignificanceMap encoding scheme was enhanced to contain enough information to reconstruct a map without external data
  • The RPATH field was removed from the header of ELF executables on Linux systems to reduce the possibility of loading incorrect dynamic libraries
  • The Mac installer had to be updated after upgrading from Leopard to Snow Leopard
  • The histogram calculator suffered from floating point overflow

All third-party libraries were rebuilt to support our move from CIFS to NFS on the Mac platform.

The entire VAPOR bug list was vetted and updated with current information.

Release candidates for 2.0.2 have been built and VAPOR team members have been performing regression tests on package components.

User outreach: The VAPOR the team continues holding in-person meetings with steering committee members and local users including: Kyle Augustun and Nick Nelson (CU PhD candidates) and Olivier Desjarins (CU, Mech. Eng.).

KISTI proposal: The draft KISTI proposal was reviewed favorably by our technical collaborators at KISTI. KISTI will issue a formal solicitation for the proposal in March. NCAR is expected to be the only group to respond. We met with Sponsored Agreements to work out details on how to proceed once the solicitation becomes available.

XD Vis Award: Alan is working out the framework for renderer extensibility.  In order to validate this approach, he is developing a simple renderer and corresponding gui that exercise the basic capabilities of the API.  He has been working on a renderer example that will test out the extensibility changes we are planning to provide.  The parameter class and the gui (eventrouter) example classes for this example are working, so we next shall complete this by implementing a renderer example.

TG GIG PY6 Award: Yannick has nearly completed development work on a Fortran-callable, parallel API to the VDC2 data format. The goal of this work is to permit modelers to write VDC2 data directly from their simulation, avoiding a post-simulation processing step. An additional benefit is the ability to select a lossy (or lossless) compression level on a per time step basis (individual time steps could be selectively more aggressively compressed than others). Much of February has been spent porting the code to the Cray (Lynx). A couple of stubborn memory bugs were encountered that are still presenting problems. Yannick has also begun working with Duane Rosenburg and Pablo Mininni to integrate the parallel API into the GHOST model. Finally, due to significant performance related changes in netCDF, versions 4.1.1 and 4.0.1 are being benchmarked.

A SOW was prepared at the request of NSF for a possible 3-month extension to our award. If accepted, the an additional ~$15k of funding would be made available. The focus of the SOW is to continue working with friendly users on parallel API to VDC2.

Misc:

  • Alan provided VAPOR assistance in the WRF tutorial on Feb 3-4.    There were several students who wanted to show their output in 3D.   It appears that we could get even more interest if there were a short video clip illustrating VAPOR presented to the students early in the course.  We will approach Cindy Bruyere and see if this is possible.
  • John and Alan met with Pablo and Duane to discuss the benefits of VAPOR's new data model (VDC2). Pablo and Duane have begun using the new data format for a 3096^3 simulation.
  • John participated in the DOE  Exascale Workshop on Data Analysis, Management and Visualization in Houston. The workshop was one in a series that the DOE is hosting to prepare for Exascale computing platforms. A report from the Houston workshop will be forthcoming.
  • Alan met with Sherrie Fredrick (at MMM) to discuss WRF requirements for VAPOR.  She provided lots of valuable input, including a list of useful colormaps and a list of derived variables that would make VAPOR more useful for WRF users.  Sherrie also provided good feedback about getting radar data into VAPOR on a WRF grid.
  • Sherrie is making VAPOR images of NCRM tropical storm predictions as in the following:

Consulting: We continue to respond to VAPOR queries on the mailing list, and to field reported bugs.

Software Research Projects

Feature Tracking: John and Alan continue to work with visiting scientist Pablo Mininni to develop a  CFD feature tracking method that exploits knowledge of the underlying fluid dynamics equations. In brief, the displacement of many structures of interested can be predicted by advecting field lines in areas of low dissipation. Though conceptually simple, implementing a robust  algorithm has proven challenging. Numerous different approaches have been tried, but so far only met with limited success. We'll keep trying in the hope of having something publishable in time for IEEE Vis (March 31).

Climate data compression : Clyne and Dennis continued their experiments with compression of high resolution atmospheric GCMs.  We are looking at the sensitivity of these atmospheric data to lossy compression using the student T test. Preliminary results have been encouraging. However, a few of the model validation routines are presenting problems because they computing some derived quantities by division of fields that may contain zeros. The compressed version of these data do not always preserve true zeros. Options for handling this case are currently being explored. The hope is to present results at an invited talk to be given at the Statistical Graphics in Climate Research session at the 2011 Joint Statistical Meeting in Orlando.

Data Analysis & Visualization Lab Projects

File System Space Management Project

  • Read up on parallel software issues.
  • Continued design work on the Scrubber/Quota Enforcer/Accounting Data program to incorporate the broadening of the scope of the uses for the program.

Visualization Test Bed Project

  • xxx

Accounting & Statistics Project

  • Continued work on statistics gathering tools for the GLADE project and scratch spaces. We are working to devleop a set of reporting tools to help users more efficiently manage their storage allocations.
  • Space usage reports are now being automatically generated and posted to the wiki site. These reports show the usage of the individual project space along with information on the space allocation. For scratch spaces, the report shows the general usage, a list of users with the largest amount of data and a graph showing the historical usage. These reports are available from the GLADE project wiki.  GLADE Usage Report
  • Began working with the new test LDAP server. This server should provide us all information necessary for managing user accounts, groups and projects. The long term goal is to be able to utilize LDAP as a live lookup service for authentication and authorization.

Security & Administration Projects

  • Started work on DASG system's support for Kerberos Roles (KROLE) to support user HPSS access. Installed the Heimdal Kerberos implementation, and used it to verify KROLE admin principal access. Tested George Williams' KROLE software package and provided feedback to George.
  • Asked for 2,395 role principals for DASG user and GridFTP HPSS access.

System Monitoring Project

  • Continued work on configuration of the Nagios monitoring system. DASG will use this system to monitor systems including sending some of the monitoring to CPG’s Nagios system. This will help centralize our monitoring capability. Looking at adding more functionality to what we monitor with Nagios.

CISL Projects

GLADE Project

  • xxx

Lustre Project

  • Exchanged updates with CPG regarding an ESM card failure in one of the Flex-380 enclosure. On Feb 24 Jim McVey replaced the failed card.

Data Transfer Services Project

  • Checked the gridftp file transfer bandwidths between ORNL and NCAR. There is still some asymmetry in transfers but stress test showed that the bandwidth is enough to achieve 600MB/s from ORNL to NCAR and 1.1 GB/s from NCAR to ORNL.

GridFTP/HPSS Interface

  • xxx

TeraGrid Project

  • xxx

Lynx Project

  • xxx

Batch Systems & Scheduler Project

  • xxx

NWSC Planning

  • Responded to a vendor question on IOR benchmark for NWSC. The vendor was questioning the possible conflict in configuration file, namely a POIX option in a MPIIO mode run. By running a fresh copy of the benchmark on lynx system, clarified that the option will not be a factor to the proposed benchmarks.

Production Visualization Services & Consulting

  • xxx

Publications, Papers & Presentations

  • xxx

System Support

Data Analysis & Visualization Clusters

  • Diagnosed why a RHEL 5.5 client could not perform a Kerberized NFS 3 mount from a RHEL 5.5 server, but OpenSUSE and OS X clients could. Looks like a bug in the older Linux kernel used in RHEL 5.5. A work around is available.
  • Rebooted storm0 after a memory swap crash.
  • mirage4 crashed with a Kernel Panic from a GPFS thread. Motivated by the incident, checked the kdump configuration and sample sessions: installed and configured the setup on MDS node to demonstrate a Kernel panic leading to a usable dump image to be analyzed with "crash" tool. We can consider deploying the feature to all DASG machines if we decide to enable it.

GLADE Storage Cluster

  • Upgraded the DataDomain DD670 Restorer firmware.
  • Upgraded software on the /glade/home backup servers. Configured iceberg to not blank its text console to try and gather some information about why it becomes unresponsive periodically and requires a reset. Also reseated icberg's 10G Ethernet HBA, memory DIMMs and RAID controller.
  • Had to research and find a fix for the Networker Management Console (NMC) software after it stopped working after the upgrades of RHEL on polynya the previous week. It turns out that the NMC software apparently records some kind of fingerprint of the system on which it is installed and won't start if too many things have changed. This information was not apparent on the EMC support web site.
  • DDN 9550 SATA bridge chip lockup on enclosure C. Took an emergency maintenance on Thursday, Feb. 3. Rebooted enclosure C and started rebuilds on 13 tiers. All rebuilds on DDN9550 completed successfully.
  • Responded to an emergency page from the machine room regarding the partial loss of /glade/data02 file system. Diagnosed the symptom and recovered the file system availability by rebooting one of the GPFS server (oasis5). The root cause was a brief backend disk error that prevented the section of the file system from being accessed by NSD mode clients.
  • Channel failure on DDN9900 controller led to 15 disks not being accessed by a controller. Later DDN sent two replacement DEM (Driver Expansion Module)s.  We scheduled a downtime for the task.
  • Replaced two DEM modules and rebooted the controller couplet to recognize the entire disk sets. Manually launched and monitored the rebuild processes on 90 tiers.

TeraGrid Cluster

  • xxx

Legacy MSS Systems

  • Opened support cases with Oracle about their SDP not noticing tape drive problem dumps were available. It turned out that when tape drives were being replaced for service, the SNMP configuration was not being set to send SNMP traps to the SDP. Worked with Oracle SE to ensure MSS tape drives were reporting problems to the SDP via SNMP.
  • Met with ISS about possible alternatives for the relocation of some MSS equipment if the moves can not be delayed until the HPSS cutover.
  • Upgraded the DataDomain DD530 Restorer's firmware.
  • Showed Bill Anderson and Marc Genty some diagnostic procedures when a library robot failed.
  • Provided answers to auditor questions about license conformance for the obsolete and no longer used IBM DCE library software.
  • Did some testing on the next beta version of the MDVOP software, sent comments back to Oracle. Tested and installed the latest MDVOP software release, provided additional feedback to Oracle.
  • Sent Bill Anderson and Marc Genty memory dumps on tape drive, library and monitoring information.
  • Changed the Oracle Support contact for the AMSTAR hardware for which Craig was on record to CPG.

Data Transfer Cluster

  • Made some comments about diagnosing possible causes of prematurely terminated file transfers from bluefire for a RAL user.  Tested some scp transfers from bfft to my Mac to make sure an OS X or scp size limitation was not the cause.
    The user was instructed to use the bluefire file transfer (bfft) service hosted on the datagate servers instead of using bluefire directly.
  • Worked to diagnose a problem Chi-fan was having with GridFTP not working from OSC.  Spent a lot of time looking at the GridFTP code and enabling debug printing on both the client and server.  The GridFTP client at OSC was complaining about a credential used to authenticate the data connection, but it was impossible to get it to say which credential it was.  This was not helped by the inconsistency how debug printing is done by different parts of the GridFTP code.  Provided Chi-fan with a copy of uberftp (another GridFTP client) which pointed towards the correct direction.  The problem was finally solved by reinstalling the Certificate Authority certificate information for the TACC and DOEgrid certificate authorities (CAs) in Chi-fan's home directory at OSC.
  • Suggested some parameters to be varied to test GridFTP network throughput from OSC to NCAR.
  • Provided Chi-fan information on where he can obtain the latest certificate revokation list (CRL) files for using GridFTp at OSC.

Other

  • xxx
  • No labels