2018-11-01

The main agenda item for today was a description of the new structure for ioda and how this fits in to our longer-term plans.

Before proceeding, Yannick alerted everyone that there have been many changes merged into the oops, ufo, and ioda repositories recently so if you encounter problems be sure to pull the latest branches. This will continue for the rest of this week.

Among the changes that have already been made or that are in the pipeline are:

new 4D Obs implementation
new ioda implementation (discussed below)
new/modified QC filters
OOPS bugfixes
background checks in ufo

Many of these are transparent to users - they should not require any changes to model or observation interfaces. A notable exception is the 4D Obs implementation, which requires changes to the model interfaces (see the discussion thread on the JEDI-models GitHub team). A few model representatives then reported their status on implementing these changes

MPAS team: has nonlinear 4D H(x) working
LFRic team: Almost there - not quite ready - nobs() returning an incorrect value

Yannick emphasized that we want to do the merge of 4D Obs (into the develop branches of oops, ioda, ufo) as soon as possible because there are other things waiting behind it. If your model isn't quite ready for it, you can always check out a previous commit of develop, or just not do a new pull.

Question (Steve S): If we get our models working with the 4D Obs branches, is this all we have to do for the model interfaces

Answer: Yes - the 4D Obs branch already has other changes merged in, such as the new ioda implementation, so if you get these branches working, then you should be prepared for the imminent merges to develop.

Then we proceeded to the main agenda item. Steve presented the new ioda implementation and plans; his slides (prepared together with Xin) are attached. Please consult the slides for an outline of the ensuing discussion. What follows in these notes will be mostly concerned with the questions (Q) that arose during the presentation and the responses/answers (A) provided by the JEDI team.

Q (Chris S), Re: Slide 1: Does this imply that the current (what we're calling the new) ioda structure is temporary - will it be completely overhauled as we move further toward an SQL-like implementation?

A: We're trying to minimize major changes but we want to be flexible to whatever may arise. We have a good idea of what we want the final interface to look like, as presented in the attached slides. The details of the underlying implementation will likely change but we hope that the overall interfaces will remain relatively robust.

↔Q (JJ), Re: slide 2: What's the difference between H(x) and GeoVaLs?

A: You can think of GeoVaLs as essentially the x in H(x) - as the output of GetVals(), it's the model data interpolated to observation locations, possibly with a variable conversion. Then this is put through ufo to generate an obsvector that will be stored in the ObsSpace structure as illustrated.

Q (Chris S), Re: slide 2: Where is the obs type in this figure, e.g. radiosonde vs aircraft? How is this controlled at the oops level?

A: What is shown here is a representative C++ object that is instantiated from the ObsSpace class. Each obs type is associated with a different ObsSpace object. The objects are created (i.e. instantiated) based on a list that is provided in the configuration (json/yaml) file.

Q, re: slide 2: For a particular obs type, this implies that you have all the data at all the locations. What about missing values?

A: Missing values can go into the table, with appropriate flags. The vector operations (e.g. dot product) need to be smart enough to avoid those missing values, or equivalently, values with unacceptable QC ratings. In order to maximize the memory efficiency, we may consider compressing these tables in the future, potentially using storage techniques that have been developed for sparse matrices.

One thing to note is that we are optimizing toward satellite data. The organization with each row in the table representing individual channels would tend to fill up the tables without missing values. The satellite data will be a vast majority of the total obs data, that is, much bigger than radiosonde and aircraft data (which include missing values). In the big picture, the sparseness of the radiosonde and aircraft data may become insignificant.

Q: What do we do about satellite data that appears with missing channels?

A: Currently, we have two satellite with full channels (15 and 11 for AMSU-A and AOD respectively). The IODA system currently reads in all channels and attaches a number to the row names to designate the channel number. We are planning on enhancing this scheme to handle hyperspectral instruments so that missing channels will simply not be stored in the tables. We would also need to come up with a syntax for selecting the channels when they are not all contiguously numbered. Perhaps something like "1-20, 35" to select channels 1 through 20 plus channel 35.

Summary of interfaces from Steve's presentation (see slides)

OOPS ↔ IODA : now entirely in C++
UFO ↔ IODA: Still Fortran

Comment (Mark): We should provide some guidance for users on allowable values for the group and variable names in the interfaces

Response: It is too generic and customizable to provide comprehensive documentation: users should be able to add groups and variables to the database as needed to support different applications. However, we should provide a list of core group and variable names that all or most obs types will have (e.g. ObsValue, ObsError, HofX, etc) to give users some guidance.

Points of emphasis from Steves' presentation (see slides)

When you use the fortran interfaces get_db() and put_db(), it is the responsibility of the calling routine to allocate the space for the vector
Be consistent on the names for the bookkeeping quantities nlocs, nvars, nobs

Q (Mark), re: slide 6: Will nobs always equal nlocs*nvars or will we sometimes filter out missing/low QC values from the obsvector

A: There may be situations where such filtering will be done

Q/discussion about what is meant by "unique locations" (slide 6). Is there a test to make sure all observations are unique?

A (Yannick): There is nothing to prevent non-unique locations (e.g two instruments measuring temperature at the same point and time).

A (Steve H): Some of this organization/bookkeeping can be done during the variable conversion after the data is read in

Q/discussion (Ming) about how different channels are handled for radiance or other remote sensing obs types

A: Each channel is currently treated essentially as a separate variable, with its own row in the ObsData table. If not handled carefully, this could be inefficient from a memory perspective for some situations when data is not available for all channels. We need to work on that.

Q: Is nlocs different for each obs type or is it a global variable?

A: Yes - it can be different for each obs type. Currently it is a member variable in the ObsSpace class, so each ObSpace object for each obs type can have a different value.

Then there was a general discussion about the concept of locations (I have consolidated this discussion with another that occurred near the end of the meeting). The JEDI core team emphasized that this is a very generic concept that can have different meaning for different obs types. A common thread is that it has both space and time information. So, in the simplest case, e.g. for radiosonde measurements, it can be a 4D location in space and time. But other implementations are possible and indeed are used - for example, integrated radiances and GNSSRO measurements that span multiple 4D coordinates. These are currently expressed in terms of a nadir and a scan angle. More generally, the Locations object can be whatever you want it to be. The current implementation is only preliminary - the generic framework allows for much more sophisticated definitions that will be developed as the JEDI project moves forward.

Steve V commented that this generality can be a challenge for reading data: how do you determine the proper value of nlocs when you want to read obs data from a file. Yannick agreed this is a challenge and to some extent it can be up to the user to define what one means by a "unique location". The definition that one adopts in the ObsSpace does not need to reflect the organization of the data in the file.

Q (Chris S), regarding the limits of how you might define the term "locations": if you wanted to designate two different variables, say T and U, as the same "variable", i.e. row in the data table, but a different "location" (column in the data table), could you do this?

A: Yes, in principle - there is nothing preventing you from doing this - the structure is designed to be very generic and flexible to meet the needs of different obs types. However, in practice, this is not likely to be very efficient from a memory standpoint (there will likely be many missing values)

Q (Marek), re: slide 7: Some observations such as GNSSRO can involve multiple spatial locations that may be distributed on different processors, making the definition of a "record" challenging. In other words, the footprint of an observation may span multiple domains. Yannick agreed that this is a difficult problem but it's mainly an interpolation problem, requiring interprocessor communication.

Q: is there any QC done before or after ioda?

A: The short answer is yes, there may be. QC filtering before ioda is largely up to the user - whether they want to remove questionable data or just flag it in such a way that ioda can understand. It's then the responsibility of ioda to interpret these flags and treat them accordingly. Furthermore, some DA applications may have different QC requirements so there may in general be some QC filtering done after ioda as well, at the oops level.

Q: Have you thought about how you'd handle more complex information such as correlated errors (off-diagonal components of the R matrix)?

A: We can generalize this framework as needed, as new applications are integrated into JEDI. For example, we may include information on correlated errors by defining a new group in the ObsSpace database.

Then there was a question/discussion about the example on slide 5 that shows pressure as the argument of get_db(). It was acknowledged in retrospect that this was perhaps not a good example because pressure is an unusually tricky quantity. Some models use it for data assimilation, so it will have an ObsValue, ObsError, HofX, Increment, etc like other DA variables, while others use it as a vertical coordinate which is more appropriately included in the data base as metadata. Still other models do both! The ioda framework allows for all of these use cases - it just requires care in the initial creation of the ObsSpace object.

Xin added that all groups with the exception of MetaData, will likely have something equivalent to ObsValue. And, he said that more sophisticated inferfaces can be implemented in the future that will allow you to retrieve multiple groups and variables from the ObsSpace.

Yannick closed the meeting by encouraging everyone to: Bring more questions with you next week!

If other questions occur to you as you familiarize yourself with the current implementation and future of ioda you are encouraged to bring them up in next week's meeting or any time after that. Yannick added that at some point in the future we would like to organize a more intensive mini-workshop where many developers and users can get together to discuss how we should proceed with the future of ioda.

Page tree

2018-11-01