Data Analysis Services Group - April 2012

News and Accomplishments

VAPOR Project

Project information is available at: http://www.vapor.ucar.edu

TG GIG PY6 Award:

Wes compiled all of the data he collected as well as the methods he used to run the benchmarks and gathered the data for the wiki. The analysis uncovered some minor performance discrepancies in PIOVDC, PIO and PNetCDF run times.  The first time the benchmarks were run  PNetCDF was much faster than PIO even though the PIO run is just writing uncompressed data out to PNetCDF. The second set of times (a few days later)  PIO was much faster. Looking at individual runs we do see there are cases where they are basically the same (what we expect to see) but there are times when they are very different (this is what makes the averaged results look so bad). One of the things Wes looked at with hopes of getting all the times closer was the compression ratio for the PIOVDC compressed version. Changing compression didn't impact in run time.
While looking through everything for the wiki it was discovered that the IOR results I had were not very good. Wes had been using an example script and left in the -z flag which is random access, when removed the IOR runs saw an increase in speed. Another thing Wes changed in IOR was the -t flag up to 4096k.

Wes and Yannick were also looking into the memory issues we were seeing when running on Lynx and Janus. Si pointed Wes to a script he had that basically checks the stat file in /proc/pid and continually updates to vmpeak as well as a few other memory stats. This didn't seem to produce results that could be trusted. Wes spent some more time looking at Tau but ran into some issues with Lynx and Janus. On Janus the system was so loaded Wes couldn't access the scratch directory and people were running jobs on login nodes because the time it took to test something in the regular debug queue was taking about an hour.

Wes also wrote some updated MPI code in fortran to better gather timing data (getting the average runtimes across all processes as well as the max and min in each category). For one of the bench marks Wes had to create a new communicator because the only way it works on the systems is to run it with double the process allocation (memory issues).

XD Vis Award:

Depth peeling support was enabled in the probe.

KISTI Award:

Negotiations on year two of the  KISTI award were completed and the final contract is expected to be signed in May.

Development:

Alan made numerous changes to support the new RegularGrid class, so that we can now properly handle missing values throughout the VAPOR application, except in Python (which will be dealt with soon). John had to restructure portions of the DataMgr to support the Python calculation engine working with the RegularGrid class.

The RegularGrid changes broke the VAPOR support for moving extents.  We came up with a new plan for supporting moving extents.  These changes require a lot of work because they permeated the entire application; Alan has now implemented most of these changes.

Alan set up the Windows 64-bit environment.  We are waiting for Unidata to release a Windows 64-bit version of NetCDF, before we can release 64-bit VAPOR on Windows.

John cleaned up the volume rendering code to reduce the number of unnecessary data loads and improve overall performance.

We redid the VAPOR visualizations for Hank Child’s new book based on reviewers’ comments.

Administrative:

The vapor section of the FY13 POP was written.

The VAPOR team had continued working on their two-year plan, which is expected to be completed in early May. The intent of the plan is to look beyond incremental feature additions to the VAPOR suite of applications.

Education and Outreach:

John and Alan completed revisions for their  chapter on VAPOR for the upcoming book, High Performance Visualization. The completed manuscript has been accepted for publication.

Alan was Interviewed by a reporter from Casper Star Tribune regarding the proposed work for our Siparcs intern Ashish Dhital.

Alan and John attended DOECGF in Albuquerque, NM.  Alan presented an update on the status of the VAPOR project.

Alan met with Rick Brownrigg,  helped him prepare a demo of VAPOR for the EGU Visualization workshop.

Consulting:

Alan has been helping Janice Coen on a new wildfire visualization.

Alan met with Mel Shapiro, helped him construct a new visualization using backwards  unsteady advection.

The VAPOR team, along with VETS, will be supporting the ASD campaign again this year. The primary goal is to produce visualizations that would be showcased at the NWSC grand opening in October. The event will be attended by various dignitaries, including the director of NSF and the governor of Wyoming. The VAPOR team has reviewed the ASD proposals and is making arrangements to work with three of the awardees (VETS will support the remaining two).

Software Research Projects

The NREL LDRD pre-proposal submitted in January was accepted and a full proposal was invited. The full proposal has been submitted and expected to be reviewed next month. If awarded NREL would extend VAPOR's VDC library to support varying compression rates across the spatial domain. This is a $0 proposal for the VAPOR team, who would serve only as consultants.

Several brain storming sessions were held in consideration of responding to the NSF 12-499 big data solicitation. We are now exploring partners for a collaboration that would build upon our wavelet-enabled lossy compression and progressive data access work.

Systems Projects

Storage Research

  • Kesytone Identity Management middleware. As the final stage of openstack evaluation, populated keystone-client databases (tenants, roles, users, services, and endpoints) Using curl as the guiding test tool, checked the database interfaces and identified how user-roll-add is authorizing the access to the swift storage front.
  • Made keystone work with swift by providing correct endpoint incantation. This was one of the most frequently asked user questions, and poorly documented part of the swift+keystone integration. Two modes of accesses for admin and user mode document were mixed for "keystone catalog" and "keystone token-get" part which added to the initial confusion.
  • Resolved the issue of "403 Access Denied" errors on PAM backend with keystone+swift by examining the "swift_auth.py" part of the keystone middleware package.
  • Installed Lustre2.2 on stratus nodes and did functionality tests. It was actually possible to upgrade and see the previous contents of 1.8.6 version. Documentation recommands fresh builds for 1.8->2.X migration. We plan on baseline tests and eventual rebuild for native 2.2 contents.

NWSC Planning & Installation

  • Continued testing the xCAT software that will be used for system management on the CFDS servers. Because the xCAT documentation is not very good, I configured up a testbed using virtual machines so that I could attempt to understand how xCAT works. Started writing some wiki documentation based on my experiences.
  • Continued brushing up on RHEL/CentOS system administration.
  • Attended a meeting on NWSC travel procedures.

System Support

GLADE Storage Cluster

  • During power down and up on April 14, troubleshot the disappearing 10GiE NIC problem by reseating the card.
  • Updated the DDN_HOST_TIMEOUT variable in /etc/ibsrp.conf in the hope of reducing SCSI errors. Improved on the procedure of IB port speed setup, with a fixed sequence below.
    ibsrp stop -> opensmd stop -> openibd stop -> openibd restart -> opensmd start -> ibsrp start
    Previously, we used to mix and match leading to unpredictable and unnecessary repeats to get 20-20 port speeds.
  • Updated the /sys/block/sd*/queue/max_sectors_kb on NSD nodes to bump from 512k to 2M
    This can be automated in the future, but has been deferred since it can be done on live systems.
  • Re-checked the impact of DDN_HOST_TIMEOUT by looking at the module options and init scripts on ibsrp.
    The intended longer timeout will be achieved with "srp_dev_loss_tmo" variable for the ib_srp module
    rather than the DDN_HOST_TIMEOUT which is just a dummy variable orphaned at the /etc/init.d/ibsrp script. (reference for the sep_dev_loss_tmo is available in the Release notes of OFED1.5.2 from Dec 2010)
  • Following trick may provide more information on SCSI errors in the future on oasis4-7.
    echo 9411 > /proc/sys/dev/scsi/logging_leveear

Data Transfer Cluster

  • Had to obtain and install updated X.509 certificates for the XSEDE MyProxy servers located at NCSA so that some users could continue with their GridFTP data transfers. Wrote a script to detect when X.509 certificates will expire in the near future and to notify DASG so that preemptive action can be taken.
  • Still awaiting establishment of a production UCAS authentication based MyProxy server for use with the GLADE GridFTP service. The schedule for the implementation of this server has not yet been provided to me. A decision was made to use the temporary testing MyProxy server established late last year in the interim to allow data transfers via Globus Online to proceed using UCAS token authentication.

Other

  • A member of DASG attended LUG12 meeting held at Austin, TX. Major topics of interest were: Integrity check improvements, Distributed Name space for version 2.2, Network RPC Scheduler of QoS, ID mapping and big site issues.
  • Worked on a script to help Alan automatically manage his VAPOR file system NFS mounts on his laptop.
  • No labels