Focus areas

Priority in 2013 will be given to two focus areas: improving VAPOR usability (in particular, but not limited to vaporgui), and refactoring code (again in particular, but not limited to vaporgui) in order to improve maintainability and allow extensibility.  However, meeting the requirements of the Kisti contract has a higher priority than either of these two focus areas.

Usability improvements

Each new release of VAPOR has added new capabilities to the package. Moreover, with little exception the priority of each release has been the addition of new features above all else. A  side effect of increased capability has been increased complexity. As a result VAPOR has become more and more difficult to use, particularly for the uninitiated. A primary activity area for 2013 will be usability improvement. Emphasis will be placed on areas such as:

  1. Lowering the barriers to getting started with VAPOR for new users
  2. Reducing complexity of commonly performed tasks (e.g. preparing data for vaporgui)
  3. Making it easier for users learn how to perform a task
  4. Making it easier for users to learn what VAPOR's capabilities exist
  5. Providing users with up-to-date documentation

Note: much of this activity involves updating and maintaining our Website.

Code refactoring

Growth in capabilities has also resulted in growth in the code base. VAPOR now contains on the order of 250k lines of code. Moreover, new functionality has been, and will continue to be, added that was never foreseen by the original designs. Relatedly, desired future capability, such as more general structured grids, is problematic to support due to limitations of the original design. Lastly, the code's often ad hoc expansion is making maintenance more and more difficult. A second focus area will be refactoring the code, particularly in vaporgui. The broad goals of refactoring are to:

  1. Improve maintainability
  2. Improve robustness
  3. Facilitate future addition of new capabilities.

Considerations in improving the code base

If we are to achieve an improved, more maintainable architecture we must first address two issues:

What practices must we adopt in order to avoid the existing problems in code design? 

We have made several attempts in the past to improve the code design, but we continue to have serious problems.  Several practices in our existing software development process are problematic and we must devise ways of avoiding them.  All of the following have limited the quality of our designs:

  • Evolving interfaces.  The most important API's in VAPOR have changed repeatedly over the years.  We have usually addressed these changes by modifying the code that accesses the APIs.  The emphasis has always been to get the application functioning correctly and quickly so as to meet release schedules.  As a result there are numerous inconsistent interfaces to the libraries.  Code that was written to deal with one version of an API often does not cleanly implement the next version.  Whenever we change an API we must consider its impact on the entire design, prepare a new design based on use of the new API, and take the time to ensure that the new design is completely implemented prior to releasing the code.  We should not improve an API that is not optimal unless we are also committed to make design changes that clean up the consequent use of that API.
  • Inadequate commitment to design and documentation completion.  Several of our recent architectural changes have been based on solid design principles; however insufficient time has been allocated to complete them.  The clearest example of this is the extensibility architecture effort that was not continued after its first release, even though the planned changes had only been applied in a few classes.  If we expect to have a clean design, it will need to be consistently applied throughout the application, with a large effort devoted to rewriting code.
  • Last-minute rush to release.  Repeatedly we have applied quick fixes to the code base in order to finish within a planned schedule or to avoid increased testing near release time.  We could avoid this problem by including adequate time in our schedule to fully address architectural changes and not freeze the code until these changes are completely propagated throughout the code base.
  • Lack of centralized systematic documentation.  Developers need to have a way to understand the code design other than by reading the code.  This is currently not possible.  DOxygen is excellent for documenting API's (as long as it is kept current) but more documentation is needed if programmers are to understand and maintain the code behind the API. 
  • Inadequate supervision of inexperienced programmers.  Programmers who are not familiar with the existing code base should not be asked to add features until they are sufficiently familiar with the vapor architecture, or should only be asked to work on features that are separable from the core architecture.  We must foster collaboration and discussion with all team members so that the design is shared and commonly understood.  Code reviews should be routine.

What principles must be followed in order to achieve an improved design?

  • Common understanding and agreement of functional units and how they should interface.  VAPOR has several major components:  The VDF, Params, Render, and Flow libraries and the vaporgui app interface with each other in ways that have changed over time.  We should understand and agree on how these units should best interface with each other, and be prepared to support such functional divisions.  This understanding should be considered in connection with new features that we are preparing.  For example, some concepts overlap these boundaries, leading to confusion.  These include for example the mappings between regions and data grids and scene coordinates, the use of geo-referencing and lat/lon coordinates, the use of data caches, the handling of error and warning messages, setting of dirty bits, transfer functions, etc. 
  • Design clean API's that are well documented and adhered to.   We need to specify the API's that we will be using as part of this redesign effort, including a specification of how the various classes in the libraries will interface.  This specification may take multiple iterations because the consequences of an API change may not be immediately apparent.  Asserts should be liberally used in new APIs to ensure the inputs are as expected.
  • We must address the timing of proposed API changes.  It would be best to schedule changes one interface at a time.  Big API changes will require lots of recoding and testing immediately if the old API is removed.  Gradual changes are much easier (allowing incremental development); however there must be a commitment to remove the old API and complete the changes needed ASAP, and certainly before the next code release; otherwise cruft will persist in the code. 
  • It is highly desirable to maintain a buildable and runnable source tree throughout this process.
  • APIs and designs must be documented in readable English, and available in a well-known location.

Execution considerations

A number of factors will limit and/or influence our activities:

  1. The anticipated KISTI award will impose constraints on resources and delivery schedules
  2. It is desirable to continue to offer regular releases to our user community
  3. Desired refactoring and usability improvements may have code overlap and should be avoided when possible.
  4. We will have a new SEII starting later this spring (hopefully), and we'll need to figure out how to map their abilities to our plans

New capabilities

With the exception of those new capabilities either required by KISTI or other external funding sources, or needed in support of, or as a direct of, our code refactoring and usability efforts will not be given priority for the duration of this plan.

Planning process

  1. Identify specific goals (wish list) for usability improvements and refactoring
  2. Identify use cases
  3. High-level task breakdown for major components
  4. Evaluate required effort,  impacted components, barriers, and expected value of effort
  5. Prioritize wish list and select areas for work
  6. Develop execution plan
  7. Develop APIs, architectural designs, etc. as appropriate

Goals (Wish List)

Refactoring

More general data

Structured grids: The current VDC makes highly limiting assumptions about the data it is able to store: all data are assumed to be sampled on a single 3D Rectlinear grid  (a tessellation of Cartesian space by rectangles or parallelepipeds that are not, in general, all congruent to each other). In the most general case the three spatial coordinates of a grid point with indeces (i,j,k) are given by the montonic functions: X(i), Y(j), Z(k). The result is that data that do not conform to this model (such as typically found in geo-science modeling) must be resampled to a single grid, leading to resampling artifacts that manifest themselves during rendering, increased data size, and significant complexity when adding support for a new simulation model output format. Note, the one exception to this is what is essentially a "hack" added to support terrain following grids: the z component coordinate alone may be a function of (i,j,k,l), where l is a time index: z = Z(i,j,k,l).

The VDC should be extended to support generalized structured grids. Mathematically, in the most general case a grid point coordinate would be specifiable by: X(i,j,k,l), Y(i,j,k,l), Z(i,j,k,l). Furthermore, distinctly different spatial coordinate functions should be allowed for each data variable. For example, the X coordinate of the variable temp might be given by, X(i,j,k,l), while the X coordinate of the variable pressure might be given by a different function with different dimensions: X'(i',j',k',l'). Thus data sets are no longer restricted to a single grid, and supported grid types would be generalized to include structured grids.

This generalization of the VDC would allow, for example, the direct support of a large subset of CF compliant grids commonly used in ocean and atmospheric modeling.

We should consider the various (horizontal) coordinate systems that would be available from the VDC.  Optimally one should be able to retrieve the data on a lat/lon grid as well on the native horizontal grid of the data.

Extending the VDC alone would not be sufficient to fully support structured grids. Vaporgui must also be extended to display structured grid data.

Spherical coordinates: Presently spherical coordinates are only supported in the VAPOR environment in a very limited sense: the VDC has some limited provision for representing spherical coordinates, but vaporgui has no ability to display data in a spherical coordinate system). Geo-referenced data are only supported by projecting horizontal coordinates onto a plane. This approach prohibits the display of data on a sphere, introduces complexity when handling geo-referenced data, and perhaps most seriously, introduces error when performing flow integration.

The VDC should be extended to treat spherical coordinates as a first class citizen. vaporgui could also be extended to directly display geo-referenced data on a spherical grid.  Such an extension to the user interface may best be implemented in a separate GUI-based application.

Unstructured grids: The current VDC provides no support whatsoever for unstructured grids: tessellations of Euclidean space by simple shapes, such as tetrahedra, in an irregular pattern. Numerical models employing less structured grids are emerging in the earth sciences and VAPOR should be prepared to handle them. Examples of such models include the MPAS, ICON, and HOMME models. As with general structured grids and spherical grids changes would be required of the external VDC format, the internal data model, as well as the visualizers in vaporgui.

Alternate VDC API: The VDC API has limited capability (e.g. lacks support for metadata attributes) and bears little resemblance to APIs for any of the widely used scientific data format (e.g. netCDF). Thus the VDC's suitability as a more general container for scientific data is limiting. Furthermore, substantial changes may be required to adapt a code currently using a more common data format to use the VDC. It would be desirable to offer an API that more closely resembles the capabilities and appearance of a more widely used file format (e.g. netCDF).

Extensibility

Extending VAPOR, particularly vaporgui, is presently a challenging task for VAPOR developers, and even more difficult, if not intractable, for 3rd parties. The code internals have evolved over time to meet new needs with little overall planning. Major components are neither well defined, nor documented. Adding new capability (or extending existing) is typically only possible for the original author of the impacted code.

It should be possible for any experienced C++ programmer, with a general understanding of scientific visualization, but without extensive knowledge of the VAPOR internals, to extend key VAPOR components in the following ways:

new data formats: It should be easy to add data readers for gridded data that conform to supported types (see above), but are stored in a foreign (not presently supported) file format. Data not stored in a VDC may be imported directly into vaporgui (read directly from a file), or may first be translated into a VDC using a data translator. The development of both direct data importers, and data translators should be possible.

new visualizers: Adding new, or extending existing, visualizers should also be facilitated.This would require developing (refactoring) well-defined major architectural components with documented APIs.  This API has been developed but unfortunately it has not been systematically applied and it must be updated to reflect recent changes. Additionally, a toolkit of GUI objects should provided to facilitate development. Elements of the toolkit might include "panel" objects such as TF editors and color selectors, as well has "scene" objects such as spatial domain selectors (e.g. axis aligned box, a line segment)

new shaders: OpenGL vertex and fragment shader programs are an integral component of the current direct volume renderer, and ray caster. Furthermore, in an OpenGL 3.X environment fixed functionality rendering is a deprecated capability: we will likely have to migrate all rendering to programmable shaders in the future. While new shaders can easily be loaded at run time, there is no documented interface that would allow the development of a new shader without digging into the compiled code. The interface between the DVR and Iso, and shader programs should be documented to facilitate development of new shaders.

new configurations & alternative UIs:  The VAPOR architecture should support different configurations of VAPOR components, where different user interfaces (e.g. scripting) may be used to drive a subset of the vapor libraries.  For example a scripting engine could use the vdf, params, etc libraries but operate renderers that render to files.  Or a spherically-based gui could link to renderers and other tools that worked in spherical coordinates.

Maintainability

Similar to the difficulties associated with extensibility the current code base is very messy. Components and interfaces are not well defined, resulting in much legacy code that is neither robust nor easy to fix when something goes wrong. OpenGL entry points, for example, proliferate the code with no mechanisms to guide or enforce their use in a way that prevents state corruption. A common problem is making OpenGL changes in one segment of the code that then change behavior in an unrelated code section. Application-wide OpenGL state management and usage policies are required.

Major components of the code should be identified and isolated, specifications for their functionality and their interfaces defined and documented, and implemented.

Task level parallelism is currently employed to provided functionality that could easily be implemented with serial methods (e.g. animation and error handling). The usage of task level parallelism should be evaluated and if it's use not justified it should be replaced with more reliable, deterministic, and easily understood serial methods.

Addressing common annoyances

Aspects of the current design are responsible for a number of behaviors that can be frustrating to the user:

Error popups: The error handing mechanism needs to be redesigned to limit the cascade of sometimes never ending error message popups that can occur in vaporgui when an error condition is encountered.

Better rendering control: The current implementation does not provide explicit control over rendering. Spurious events may trigger re-renders at any time. When working with very large data this can be quite frustrating.

Startup delays: VDCs that contain numerous time steps and/or variables can result in significant delays then the data set is selected by vaporgui. Lazy evaluation of VDC (or foreign data import) should be used to speed data loading.  Lazy evaluation of VDC has been implemented in 2.2.  Now when a VDC is opened, only the first and last occurrences of each variable are identified.

Usability Improvement wish list

Update documentation on website.

Provide easily accessible user help on the Web, with an index and search capabilities.

Provide hooks in vaporgui to enable users to quickly navigate to applicable help.

Fix usability bugs in vaporgui, such as the error message annoyances and the transfer function in the flow.

Provide a data translation gui as per Kisti needs, and also usable with more abstract data such as Pablo Mininni suggested.

Use cases

A partial list of use cases that would be desirable to support is available here.

Wish list evaluation

TBD

Prioritization

TBD

Execution plan

  1. How do we carry out refactoring in parallel with the development of new releases based on old code base?
  2.  

Proposed Design

DataMgr and VDC

VAPORGUI internals

  • No labels