Data Analysis Services Group - November 2011

News and Accomplishments

xxx

VAPOR Project

Project information is available at: http://www.vapor.ucar.edu

TG GIG PY6 Award:

Yannick is wrapping up development of the PIOVDC library and getting the code ready for distribution. At the request of John Dennis, who will distribute the code as part of PIO, a parallel VDC readback capability was added: VDC data sets can now be both written and read in parallel on a distributed memory platform. In the process of testing bug was uncovered when handling grids that were not block-aligned. The bug was fixed. Finally, the current version of the code is again experiencing performance issues that are still being investigated.

John resolved a couple of AIX portability bugs in the VDF library.

XD Vis Award:

Yannick continued work  VAPOR's ShaderMgr to provide support for GLSL compilation pre-processing (e.g. #ifdefs).

KISTI Award:

The period of performance for year one of the three-year KISTI award ended on 11/30. Most of the efforts of the group were focused on meeting required deliverables, all of which were met.

Sang Myeong Oh, from Jeju University in Korea, worked with us for most of this month, as part of our collaboration with Kisti.  He created sample data sets, converted them to VAPOR and visualized them, and in the process identified a number of issues that needed to be resolved. On his last day he showed us a number of visualizations he created and discussed some of the valuable capabilities he would like to have.  He did find VAPOR visualization to provide valuable understanding of his data.  In general his visit was quite useful to us and we hope to continue to collaborate now that he has returned to Korea.

Alan developed the momvdfcreate and mom2vdf applications, which we had agreed to implement as the first year deliverables of out Kisti agreement.  These applications convert MOM4 datasets to VAPOR datasets.  They operate much faster (25 times as fast) as the corresponding Python routines that we prototyped last summer.  The Python routines depend on several existing applications, such as CDO and NCL, which were operating too slowly on the data.  For the new code, we accelerated the conversion by reimplementing the code that calculates bilinear weights and remaps the data, going back to the original algorithms provided by Phil Jones of LANL.

Niklas Roeblas of DKRZ provided us with several ocean data sets, output from the MPI-OM ocean model that is used at DKRZ.  We found that such data may be visualized using the same code that we are developing for MOM4 data, with minor modification.

Progress was also made towards year-two deliverables. John refactored vaporgui's internal data model to use the an abstract class representation of a regular grid as the token for data exchange within the application. The RegularGrid class, and it's derivatives (SphericalGrid and LayeredGrid), support missing data values as a first class citizen, in particular, offering reasonable behavior when interpolants are undefined during data reconstruction. Two visualizers, the direct volume renderer (dvr) and the isosurface rendered, were re-written to treat missing values appropriately. Most of the effort involved rewriting the GPU hardware shaders that perform the actual rendering. Numerous pathological cases had to be supported. Additionally, the shaders were modified to interpolate layered (e.g. terrain following) grids on the fly, avoiding the re-gridding step required previously. Finally, a "fast" rendering option was added to the shaders to improve performance during interactivity when low quality rendering is acceptable.

John also re-implemented the memory manager supporting vaporgui's data cache. The new memory manager lazily allocates dynamic memory as needed (as opposed to allocating all memory up front), reducing the virtual memory footprint in most instances.

Finally, the year-end report, and an accompanying power point presentation, were written and submitted to KISTI.

Development:

John worked with CU's Sam Geen to troubleshoot and correct a number of bugs in the Adaptive Mesh Refinement (AMR) support modules. Additionally, a critical AMR access method that was performing poorly when applied to deep hierarchies was re-implemented to optimize performance. Run time on Sam's data sets dropped from hours to seconds.

Administrative:

The hiring process for a new student assistant was started. We hope to start interviewing in January.

Data Analysis & Visualization Lab Projects

File System Space Management Project

  • Continued to work sporadically on the FMU design and documentation.

Accounting & Statistics Project

  • Began sending sample GLADE accounting records to the AMS development team.

System Monitoring Project

  • Began collecting GPFS's "vfsstats" from all mirage and storm systems to graph with Ganglia.
  • We are continuing the development of the Nagios monitoring infrastructure. Stubs have been completed that we should be able to plug information into from the new storage system components.
  • Began documentation stubs for the new Nagios monitoring infrastructure. These documents will provide troubleshooting steps and procedures for placing calls to vendors and DASG based on events that occur with the storage systems.

CISL Projects

GLADE Project

  • Tested the fileset based quota feature of GPFS3.4 on a VM setup.
  • Setup GPFS3.4 on stratus servers and enabled the fileset quota for further tests: created a new GPFS cluster called "nwsc34.scd.ucar.edu" and created a single file system "test34" Within the file system, created 6 filesets of two different sizes for further tests on fileset based quota.
  • Read documentation of the Bluearc file server appliance proposed to take over hosting of /glade/home and a few other file systems for HPC system support uses.

Data Transfer Services Project

  • Reviewd HPSS documentation focusing on VFS interface.
  • Performed random sampling of HPSS current usage: one user has 400k files taking up 33TB Others have average size of as little as 85kB.
  • Still awaiting establishment of a production UCAS authentication based MyProxy server for use with the GLADE GridFTP service.  I have not heard of any plans to deploy this yet.
  • Attended a meeting about ESG support for GridFTP and Globus Online.

NWSC Planning

  • Planning for installation is continuing as we get more details on procured systems. Timelines are being refined as delivery schedules are negotiated.

System Support

Data Analysis & Visualization Clusters

  • Built PGI version of NetCDF and HDF5 (EV ticket 69425).
  • Installed git upon CSG's request  (EV 69880).
  • Installed the "Free NX" remote desktop software on Tesla for Jacob Fugal.  This fixed some issues Turbovnc was having with International keyboards.
  • Upgraded CUDA and the Nvidia drivers on Tesla to the latest versions.
  • Installed MATLAB 2011b on all systems.
  • Ran Redhat updates on both twister systems.
  • Spent time finding the root cause of smeared text rendering while running the Vapor GUI under VirtualGL in a VNC session.  This was made harder by the fact that the different storm system configurations behaved differently and different users would or wouldn't see the issue.  Consulted with the VirtualGL project lead and we narrowed it down to if the X RENDER extension was or was not supported by the VNC X server.  The issue is in the underlying Qt or font rendering libraries, and is not a Vapor problem. The work arounds have their own issues (ugly fonts or other instabilities).

GLADE Storage Cluster

  • Rebooted a DDN9550 disk enclosure (D-channel) upon IBM recommendation to reset a SATA Bridge Chip lockup and monitored the rebuild processes until completion.
  • Removed the remaining LUNs from the proj0 filesystem and added them to proj2. Monitored the LUN migration process (proj0 -> proj2) which incidentally slowed down daily /glade/home backups.
  • Responded to FLEX380 error alarm: Reseated the ESM canister (tray45) following the online instruction and made the alarm go away.
  • Updated the Networker software and reworked the backup scheduling for the /glade/home file system.  Full backups for some subsets can now run in excess of 12 hours with the current hardware configuration of the /glade/home filesystem.  I will probably need to further repartition the backup groups from four to six or seven subsets.
  • Finally managed to get the correct HP service contract information to get the failed disks in polynya replaced (the Networker master server).

External Collaborations

  • Had a discussion on Lustre issues on the Janus system, CU MRI machine, with Jaczek Braden. OOM-killer interruption of jobs and failover related error messages were the frequent ones and over-committed OSS related messages were also common. Explained the general approach to Lustre tuning based on usage patterns and error messages.

Data Transfer Cluster

  • Built htar for a machine called "pileus" at RAL and coordinated the installation with Tres Hofmeister (EV ticket 69175)
  • Exchanged 10 messages with Greg Thompson of RAL for htar issues on pileus: initial confusion was due to his inexperience but the intended funcationality was proved to be there with satisfactory performance.
  • Built Mac version of htar+hsi and packaged it for individual Mac users (upon Erich's request)
  • No labels