Data Analysis Services Group - January 2011

News and Accomplishments

VAPOR Project

  • Project information is available at: http://www.vapor.ucar.edu
  • 2.0.2 release: Work began in January for a bug fix release of VAPOR following the major 2.0 release last December. The targeted drop date for this patch release is the end of February. Numerous small bugs have been fixed, including:
    • The API for defining AMR hierarchies was only working correctly for breadth-first, left-to-right traversal of the tree. Arbitrary traversal order is now possible.
    • A regression was introduced in 2.0 that prevented display of some existing AMR data sets
    • The binary installer was not installing shared libraries with base paths preventing external developers from linking to VAPOR’s libraries.
      All VAPOR library dependencies were rebuilt to reflect the new path on Glade to the VAPOR data repository (/glade/proj3/VAPOR). Additionally, Mac NFS was evaluated as an alternative for mounting the VAPOR data repository.
  • User outreach: To prepare for the next major release of VAPOR the team began holding in-person meetings with steering committee members and local users including: Ben Brown (U. of Wisc), Leigh Orf (U. of Central Michigan), Hsiao-Ming Hsu (NCAR), and Pablo Mininni (U. of Buenos Aires).
  • KISTI proposal: A first draft of the KISTI proposal to extend VAPOR to support global ocean circulation models was completed. We are waiting for KISTI to formally invite the proposal pending execution of KISTI’s contract with the Korean government for 2011. Continued work on the KISTI funding proposal to extend VAPOR to better support global ocean model data. KISTI informed us that they still do not have a contract in place for their own funding, and will not formally issue a funding solicitation until they do, hopefully in the next few weeks.
  • Documentation modernization: A draft plan for modernizing VAPOR’s user documentation was completed. The goals in restructuring the documentation are to: facilitate its use by the user community (e.g. supporting browser based searches), simplify maintenance of documentation by the VAPOR team, and eliminate the current redundant material.
  • Evaluated the Mac NFS client as a potential replacement for Samba. The main driver for this effort is Samba’s poor handling of symbolic links. Preliminary results with Mac NFS are good – no problems have been encountered.
  • VAPOR’s new registration web site uncovered a defect in WEG’s UCAS authentication module that prevents login names containing “.”s. WEG is working to resolve the problem and a workaround was put in place in the interim that will have no effect on VAPOR users.
  • A brief meeting was held with IMAGE’s Duane Rosenberg and Pablo Mininni to discuss their help as friendly users for the parallel VDC2 API.
  • We held e-mail discussions with several Korean ocean scientists, with Frank Bryan (NCAR) and with Stephen Griffies (one of the designers of MOM) in order to fully identify the tasks required for VAPOR visualization of ocean data.  Most of the basic needs are now fairly well-understood.  There is a great deal of diversity in the variables, dimensions, and grid structures that can be output in MOM4, so our proposed work will need to focus on the most common scenarios.
  • Alan had some discussions with Rory Kelly on the use of PyCuda. It appears that we can use this to perform computationally-intensive Python calculations in VAPOR by using the GPU. Alan installed PyCuda in the VAPOR Python environment on the Mac; however there are some limitations with using the CUDA capabilities on the Mac that still need to be worked out. These should not be a problem on Linux.
  • Alan worked with Andy Heymsfeld (MMM) to finalize an illustration of the effect of aircraft “punching” holes in clouds. He is planning to include the following VAPOR image in his article, demonstrating the simulation of this phenomenon. Below the hole in the original cloud there is an icy cloud produced in the wake of the aircraft:

  • We started our discussions for the next major VAPOR release. We will hold discussions with users to help select upcoming features. Our most pressing feature is extensibility, which is required as part of our funding.
  • Alan fixed many of the bugs that have been found in the latest release code. At this point the bugs remaining in the UI are fairly minor, and we plan to provide a bug-fix release, version 2.0.2 soon. We are tentatively expecting to release in February. We continue to collect users’ bug reports during the month of January and expect to freeze the code early in February.
  • We (John, Yannick, Alan) held the first a series of discussions to establish the rendering extensibility API. The first effort (in Alan’s court) will be to complete a design document that describes the full API that we shall provide users.
  • We (John, Yannick, Alan) met with Ben Brown, a physics graduate from CU who is now at the U. of Wisconsin. Ben has been making very good use of VAPOR and he showed us a number of new things he would like to do with VAPOR. His ideas are helping us define animation features and other capabilities to enable in an upcoming release.
  • We also held meetings with several WRF/Weather scientists, including Hsiao-Ming Hsu, James Done and Leigh Orf. The feedback has been excellent.
  • John and Alan met with Pablo Mininni to plan our next steps with regard to feature tracking. We identified a few issues that we need to resolve to meet the reviewers concerns from the paper we submitted last year.
  • Alan is planning to work with the students at the WRF tutorial the first week of February. The documentation has been updated (for vapor 2.0) and the new vapor version has been installed in the lab machines.
  • Hsiao-Ming Hsu is investigating Typhoon Morakot, which struck Taiwan two years ago with unusually intense rainfall, resulting in extensive flooding and consequent loss of life. We are working to visualize this in vapor, which we hope will provide additional insight in understanding why the typhoon produced so much rain. The following image illustrates Morakot’s rainfall and wind as it held stationary over Taiwan, releasing over a meter of rainfall in a 24-hour period.

  • Have been rebuilding the third party libraries on Linux. Mostly Qt library.
  • Re-enabled the ucarAuth module from the Drupal email registration username validation issue.
  • Continued reviewing open SourceForge bugs.
  • Started rebuilding the third party libraries in new location.
  • Worked on the Drupal email registration username validation issue with Markus.
  • Finished reviewing and updating automated test cases.
  • Started reviewing open SourceForge bugs.
  • Researched depth peeling for use in VAPOR.
  • Researched VAPOR rendering API.
  • Built, tested and benchmarked the PIOVDC user library.
  • Porting PIOVDC user library to lynx.
  • We continue to respond to VAPOR queries on the mailing list, and to field reported bugs.

XD Vis Award

  • We resumed work on refactoring the vaporgui API to support extensibility. The design specification for the rendering component is nearing completion.
  • A number of local TACC Ranger users were contacted to discuss the possibility of using Longhorn for remote visualization. The hope is to generate “science nuggets” to better position the team for continuation funding after the three-year award runs out.
  • The quarterly report for Q4CY10 was submitted.

TeraGrid GIG Award

  • We continue to refine the public API for extensions made to John Dennis’ Parallel IO (PIO) library so support direct output of progressive access data from an MPI code. Two friendly users have been identified to test the code once ready: Pablo Mininni and a group of wind turbine modelers at NREL.
  • The quarterly report for Q4CY10 was submitted.

Software Research Projects

  • The netcdf compression code, nccompress, was resurrected and is in the process of being updated to make use of the new VAPOR compression library. Once this work is completed we will generate a number test data sets from CISL’s PetaApps climate outputs that will be then be analyzed by John Dennis.
  • Climate data compression research: Once again we are experimenting with compressing scientific data sets. This time the compression engine is the recently developed codecs for the VAPOR VDC2 data model, and the test data sets are high-resolution CAM5 outputs generated as part of the NCAR/COLA PetaApps award. The VDC2 compression engine was integrated into the netCDF compression utility developed previously, nccompress. We are looking at the sensitivity of these atmospheric data to lossy compression using the student T test. Preliminary results have been encouraging. However, a few unexpected anomalies have turned up that need to be resolved. The hope is to present results at an invited talk to be given at the Statistical Graphics in Climate Research session at the 2011 Joint Statistical Meeting in Orlando.

Data Analysis & Visualization Lab Projects

File System Space Management Project

  • Continued reading on the Belnap-Dunn 4-valued logic and Kleene the strong 3-valued logic and their use in policy decision languages and the application
    of roles to policy evaluation.  Started aligning the Scrubber's design with those logics to ensure logical correctness of file deletion decisions.
  • Continued work on the Scrubber design, focusing on thinking on how partitioning for parallelism will affect the design and operation, along with needs for accounting report generation.
  • Started updating the design documentation with my current thinking and incorporating notes I have been making. Worked out how implementing soft quota enforcement will affect the structure of the Scrubber. Clarified the roles of the actors (system administrators, data administrators, users) and how their scrub criteria will interact via policies. Started prioritizing implementation of features.
  • Continued work on the Scrubber design continuing to define how soft quotas can be defined and enforced, as well as how scrub decision inputs from system administrators, data administrators and users can be arbitrated.

System Monitoring Project

  • Using the “SEC” program set up email alerts for failed drives on both DDN storage systems and any SCSI I/O errors on the GPFS servers that could result in a filesystem going down.
  • Added both DDN storage systems to our Nagios configuration. It can monitor failed fans, power supplies, temperatures, and drives.
  • Added services to Nagios to monitor the free space left on each GPFS filesystem.
  • Created graphs with Ganglia to view the individual I/O performance of each GPFS filesystem.

CISL Projects

GLADE Project

  • Supplied configuration and log information to EMC to see if they can determine why too many files are being backed up in certain situations.

Data Transfer Services Project

  • Talked about user issues for GridFTP to HPSS support.
  • Agreed to help RESET and CGD perform some GridFTP testing. It is clear that NCAR/UCAR really needs to have a Certificate Authority of its own that is recognized by other organizations as authoritative.
  • Wrote a draft policy on how to handle requests to export/import large amounts of data from/to GLADE via user supplied storage devices.
  • Talked with Dick Valent about user certificates for GridFTP access work and how we don't currently have a way to provide them to NCAR users who are not TeraGrid affiliated.

GridFTP/HPSS Interface

  • The goal is to provide a transparent interface for users to transfer data to HPSS directly from remote sites. Instead of providing an independent gridftp service from HPSS, we decided to investigate the option of using current gridftp service by providing a pipe to an "hsi" client so that in flight data stream can go directly to HPSS without hitting a local file system. The initial step was to test the XIO pipe driver function in gridftp server. It did not work with production versions available on bluefire or remote sites like NICS and TACC. Built test version of most recent Globus Tool Kit and reproduced the pipe-driver functionality under a simplest server setup.
  • Using one of the current gridftp server (datagate0), set up a separate gridftp service that listens to a different port (49183). The service was able to spawn "hsi" correctly and worked as expected. The service, however, had to be configured to use a control-channel only mode. Under a multiple stripe call where control channel server had to create several data-channel server threads, the pipe opertation breaks down. Further tests were performed using frost, NICS, and TACC machines to transfer files to and from the local gridftp service, directly streaming data from HPSS both in and out direction.
  • Further tests were performed to demonstrate the capability and limitation of the gridftp+HPSS service. Though the local server that communicate to HPSS can be only one server per operation, the remote end can be striped to take advantage of the aggregate bandwidths. Users trying it with aggressive stripe options will not encounter error and the operation will continue to be transparent. Rought estimate on proposed 120TB transfer from NICS was made: it would take about 2 weeks to bring that amount using all four gridftp servers.
  • Set up the identical service on other gridftp nodes (datagate(1-3)) and changed the service port from 49183 to easy-to-remember 500. Tests were performed to check the round-robin response of the servers. To make it available to users, we would need to deploy role-based-kerberos setup so that proposed gridftp+HPSS can work without interactive intervention for periodic credential-cache updates.

System Support

Data Analysis & Visualization Clusters

  • Fixed an issue with our Matlab installation that was preventing the imfilter library from working in version 2010b.
  • Installed CUDA on the Storm systems.
  • Compiled and installed the latest versions of CDO and R on all systems.
  • Assisted David Mitchell with network testing from one of our hosts.
  • Increased /ptmp quota for a user.
  • Updated members of the 'cmip5' group.
  • Created 9 /glade/scratch directories for existing users.
  • Created DASG accounts for 46 new users.

GLADE Storage Cluster

  • Monitored the rebuild processes of the LUNs after DEM module replacement on DDN enclosure. All of them successfully finished on Jan 9.
  • Replace 2 failed drives in the DDN 9900 storage system.
  • A single user put 90+TB on glade/scratch. Had to monitor the usage growth and the source of the data stream. User Support contacted the user and voluntary action brought the usage down to 50% level.

TeraGrid Cluster

  • Rebooted twister2 after a crash and restarted the ssh tunnels for FlexLM.

Legacy MSS Systems

  • Worked with Oracle to replace a failed drive in a FLX210 RAID.
  • Worked with Oracle to replace controller cache batteries in the FLX 210 RAIDs. Talked with Oracle SE about robot reliability issues.
  • Consulted with Bill Anderson about Fibre Channel switch interconnection issues.

Data Transfer Cluster

  • Enabled additional users for GridFTP access.
  • At Jim Edwards request, analyzed performance issues for file transfers of a CGD user from ORNL to bluefire:/ptmp via GridFTP. Was able to tune the default stripe block size used by the GridFTP servers to match that used by GPFS for an approximate 2x performance improvement. Further performance improvements do not appear to be possible due to the wide variability of network throughput between the datagate cluster and bluefire. This is probably due to the hardware configuration of the bluefire NSD nodes also being used as routers to access the GLADE GPFS servers causing contention for the network interfaces of the bluefire NSD nodes. Had to recommend that the user write the data to /glade/scratch instead, which seems to have more consistent, and higher, performance.
  • Assisted looking for the source of puzzling behavior of the GridFTP server not starting a pipe command during his GridFTP/HPSS interface research. Looks like a possible code bug or compiler optimization bug is a possibility.
  • Re-enabled the anonymous “visftp” ftp server and created a /glade/scratch/incoming directory for remote users to write into.

Other

  • Craig Ruff had his picture taken with other CISL award recipients.
  • Attended the GTP annual retreat (1/18, Boulder)
  • No labels