Yannick opened the meeting and there were no general announcements so we proceeded to updates from JCSDA project leads.  For variety, we mixed up the order, starting with the OBS team.


Hui began with this report from OBS 1

OBS1

OBS 1 (Hui)


Finished development of SSMIS BC/QC for GDAS. Tested using GSI geovals and results are matching GSI. Also tested using FV3 background directly.

Working with EMC: creating obs/reference files for one-month. Both GSI and IODA converters are being updated to add required fields for tests (e.g. launch time for radiosonde, AVHRR-> AVHRR3, etc.). Additional geovals will be generated for reference.

UKMO’s TL/AD for precipitable water UFO in PR

EMC working on zone limb sounder UFO

EMC 10-day validation for radiance UFO

Working on GIIRS level 0 calibration

Adding attribute to IODA converter

PR almost ready for composite UFO operator to include different operators for different variables using one obs input file (ewok can’t work on multiple operator using one obs file at this point)


Then Rachel presented the following update on Diagnostics

OBS 2

OBS 2 (Francois & Rachel)


Obs-space diagnostics:
- Migrating to SageMaker Studio to provide each user with an individual EC2 instance.
- Provided training to a limited number of NOAA users
- Working with AWS on improving Notebooks performances.


OBS3

OBS 3 (Ryan)


OBS3 met for the first time last week. Since it was the first meeting of
a new task and since we had several new JEDI contributors, much of the
discussion was about project logistics. So, instead of mentioning what
we did, let me mention some of what we plan on doing. There are several
milestones coming up in the next two quarters, and we have a heavy focus
on IODA data ingest.

First, we want to establish conventions regarding what makes valid IODA
data. When a user accesses an ObsGroup in JEDI, what organizational
structure should they expect? Last year, we spent significant time
adding support for groups and multidimensional variables into IODA, and
now we want to go a bit further and implement variable conventions. Greg
Thompson has a working spreadsheet with many of the variables that we
want inside of JEDI, and he focused on the variable names. Beyond that,
we want to standardize dimensions, units, and how we can tag data with
global and variable-specific attributes. Discussion will occur through
June, and around the end of Q1 we plan to publish a conventions document
with webpages that users can consult to understand our data layout.

Second, we want to bring ODB and BUFR data into IODA. For ODB, August
and David Davies have made substantial progress in writing an interface
layer that we can incorporate as an engine in IODA. The goal is that
data transformations can occur in-memory and that we can directly query
an ODB data feed. They are at the stage of generating test files for
H(x) runs for several instruments. The BUFR engine will have the same
goal, and we will use this to get rid of our dependency on GSI's ncdiag
files. Currently, Ron McLaren is making significant enhancements to
NCEP's BUFR library to properly query variable-length, repetitive
mnemonic sequences, and I have been told that task will take much longer
than expected. This work is necessary to ensure that we properly read
all relevant data inside of a BUFR file, and EMC's current issue impacts
most of our radiance-based and conventional instruments. While they
solve their issues, we will move forward with prototyping how we can add
BUFR ingest into our applications.

Due to an EMC time conflict, we are pushing the OBS3 meetings forward a
week. The next general meeting will be on May 12, right before Mark's
JEDI1 session, and we plan on discussing our roadmap for BUFR then.


After his report, Yannick asked for a standardized list of instrument names as well as variables.  Ryan and Greg agreed that this is a priority.  Hui added that we should get input from stakeholders in setting these standards.

Then Ben posted the following comment in the chat:

We can adopt the same naming convention for the CRTM coefficient files, loop me in on that specific discussion / assign me to the issues. We use the OSCAR satellite and instrument ID numbers internally to ensure consistency, perhaps this is one option.


CRTM

CRTM (Ben)

CRTM v2.4.1 is under development, it's a bug-fix release primarily, but does add additional openMP capabilities to thread over both channels and profiles, whereas v2.4.0 only had openMP over profiles. This is intended to be the final v2.x release prior to the v3.0 development.

We've made progress on the IASI-NG transmittance coefficients (TauCoef/SpcCoeff), which is the largest instrument (16923 channels) ever simulated by the CRTM team. It's being evaluated internally, and we expect a release with v2.4.1.

New aerosol coefficient capability is available in v2.4.x, supporting other aerosol models (default is GOCART, we now support GOCART-GEOS-5, CMAQ, and NAAPS).

I'll also be the acting lead for the OBS team in Dick's absence.


Guillaume and Andy were not present so then we moved on to JEDI


JEDI 4 (Yannick)

The team continues to make good progress in running, modifying and testing the 4D Hofx application with ewok on Orion.  They ran into a bug in how fv3 is reading namelists - this may or may not be specific to Orion.  They are working on a fix.  And they identified another problem with some of the filters at large partitions (many MPI tasks).  Some of the filters throw exceptions when there are zero obs assigned to a particular MPI task.  So this needs to be fixed.  Ewok and R2D2 are otherwise ready for 4D, capably handling the large volume of data involved.

Yannick also announced that cycl has released a new pre-release.  Previous versions of cycl required python 2.7, which ewok is not equipped to handle.  Now that cycl has been upgraded to python 3, this paves the way for a smooth implementation into ewok.  So this can happen soon.

JEDI 3


Here is JEDI-Algorithms summary for today's meeting:

- ObsLocalization class was refactored to both find local observations, and compute the localization function. ObsLocalization now has access to the GeometryIterator for the current gridpoint. Local observation search is removed from ObsSpace. (https://github.com/JCSDA-internal/oops/pull/1141, https://github.com/JCSDA-internal/ioda/pull/188)

- Refactoring of CostJo and Observer in oops (https://github.com/JCSDA-internal/oops/pull/1135, https://github.com/JCSDA-internal/oops/pull/1143)
Observation errors are now passed between ObsError and ObsFilters directly. EffectiveError group is now only used for the diagnostics purposes. (https://github.com/JCSDA-internal/oops/pull/1160)

- Added ObsIterator, that can iterate over all observations, with implementations for toy models (https://github.com/JCSDA-internal/oops/pull/1157 and https://github.com/JCSDA-internal/oops/pull/1172)

- Refactoring of fields and change of variables for the qg toy model (https://github.com/JCSDA-internal/oops/pull/1144)

JEDI 2


JEDI 2 (Dan)


Jedi 2.3 - Use of NUOPC driver with FV3-JEDI


Working with EMSF folks to reorganise the GFS fields and make data visible to the NUOPC driver


Jedi 2.6 - Ensemble DA validation


Work underway to perform C96 comparison of GSI LETKF, JEDI LETKF and JEDI 3DEnVar. Jeff Whitakker has produced a single analysis run with GSI which will be the basis for the remaining testing. Next step is to produce Yaml configuration files for the testing.


Jedi 2.7 - Background error model validation


Bug fixed that was creating spurious values in liquid water at high altitudes.

Work complete to allow for varying processor layouts and resolutions. Utilities added for converting already generated B matrix model for other configurations.

Training complete for full resolution partial ensemble. Looking at Dirac tests and assessing the structures.

Met Office are validating the spectral B model in UM-JEDI.

Meeting in 4 weeks will involve some sharing of results and comparison of BUMP and Met Office findings.


Jedi 2.8 - Regional DA


Preparing 3DEnVar system to replicate GSI.

Analysing whether BUMP could be used to provide static B.


Jedi 2.11 - UM general updates


Dirac tests added linking variable transforms and spectral B model.

Will start moving things to SABER as trans is made public and moved to JCSDA-internal.

Working to unify the build system for UM-JEDI, UM-OPS, LFRIC-JEDI

Improving the use of proper vertical staggering of pressure variables passed to UFO


Jedi 2.15 - VADER


Prototype working with Yannick’s new variable change branches that remove factories and use traits for the Variable Change class.

Working on implementing an example in FV3-JEDI where both VADER and the model are used for the variable transform.

After Dan's presentation, Chris S asked if the UM spectral B work uses a structured mesh in atlas.  Dan believed they were implementing it in a generic way that exploits an Atlas Pointcloud function space but does not require the use of atlas.

Hui asked if the static B work was based on gfs ensembles and whether it is ready to be used in an application.  Dan confirmed the static B training that Benjamin has been working on uses the fv3-gfs model but it is not yet ready for implementation in applications. 

JEDI 1


We had a JEDI1 meeting yesterday. The most pressing topic was the status of the JEDI 1.1.0 / IODA 2.0 release. We identified two key issues that need to be addressed in the feature/ioda-v2 branch of ioda. The objective of the first [ioda #116](https://github.com/JCSDA-internal/ioda/issues/116) is to ensure that when this branch is merged into develop there will be no degradation in performance. So this involves several performance fixes that Ryan and Steve have been working on over the past several months. The second key issue [ioda #202](https://github.com/JCSDA-internal/ioda/issues/202) is to fix a bug that has to do with the deallocation of hdf5 objects when an application closes.

When these two issues are addressed, hopefully by next week or the week after, then we will be ready to merge the feature/ioda-v2 branches of ioda and ufo into develop. This comes with a switch to the new ioda-v2 obs file format so models and diagnostics need to be ready.

There is also a third issue that has to do with exception handling in ioda (ioda #205)[https://github.com/JCSDA-internal/ioda/issues/205]. Ryan and Steve can give further details on these three issues if you are interested. These three issues represent the main outstanding development work that is needed for the IODA 2.0 release. When they are done, we will define a release branch and we will work on cleaning up the code and adding documentation in final preparation for the release.

We also had some discussion on atlas dependencies in ioda. Anna has been looking into removing atlas as a dependency for ioda. This is motivated by a simplifying the distribution of the diagnostic tools. But, it might be beneficial to take advantage of atlas features such as the mesh class when we optimize the parallel distribution of obs in the future. Wojceich has a draft PR now in ioda (https://github.com/JCSDA-internal/ioda/pull/233) that takes some first steps in this direction. So, we still need to decide whether or not we want to leave in atlas as a ioda dependency.

We are also in the final stages of the migration to ecbuild 3.6 and the latest ecmwf public releases of eckit, fckit, and atlas. The only outstanding issue here is two test failures in fv3-jedi (https://github.com/JCSDA-internal/jedi-stack/issues/76). These are minor, but nevertheless they would break the CI if the feature/ecbuild35 branch were to be merged in today.

So, everyone should be ready for a change in the JEDI environment modules on all HPC systems and in the containers. By the week of May 10, we should be ready to make the new ecbuild/eckit/fckit/atlas modules (currently labelled ecbuild35) the default. You will still be able to use the previous modules but we encourage you to be ready for the ecbuild35 modules soon.

Maryam is making good progress in modifying the CI interface so that results of CodeBuild are published on the cdash website for everyone to see. So, all authors of PRs should soon be able to see their test results, particularly why tests fail. She has an open PR in oops (https://github.com/JCSDA-internal/oops/pull/1054) if anyone wants to have a look.

As we have reported previously, Maryam and Mark O have implemented a new method for comparing test results to reference solutions. The next step is to modify the tests to use this new method. Maryam has begun to do this, starting with the l95 and qg tests in oops. Have a look at her PR (https://github.com/JCSDA-internal/oops/pull/1054) for examples on how to do this with your tests.

Work is proceeding on getting JEDI to compile with PGI compilers on Summit. This is important for an ongoing JCSDA collaboration with the Air Force and Oak Ridge National Lab. But, more generally, it is also important because NVIDIA now owns/controls the PGI compiler suite. So, getting JEDI to compile with PGI compilers will help us implement GPU support in JEDI, which will be beneficial for exploiting future HPC architectures. So far oops, saber, crtm, and ioda are working with PGI compilers, with all tests passing. ufo compiles but 17 tests fail out of 335 so more work needs to be done there. The goal is to get fv3-bundle to work so the ORNL/USAF team can begin to work with it.

Work is also proceeding on implementing JEDI on the Microsoft Azure cloud. We already got it to work with a new intel OneAPI "SuperContainer", running the fv3 3denvar benchmark across 12 azure nodes. Last week we installed a stack natively on Azure in order to assess container overhead. We expect to get an Azure allocation in the coming months from NOAA. But, in the meantime, it looks like they are going to grant us some startup credits.

Mark M is also working on an overhaul of the AWS setup for single-node JEDI development and multi-node clusters. As before, it is unified, with environment modules installed on an external volume that can be mounted either on single development nodes or on clusters. But, enhancements include intel OneAPI modules, lmod module support, and configuration changes needed for the new slurm setup in ParallelCluster.

We are also starting to make plans for the next JEDI Academy to be held, ideally, soon after the release and before the end of Q1 of AOP 2021. This will be another virtual academy that will be held either the week of June 21 or the week of June 28.

And, we spoke about rebuilding the Hera modules, which Ryan will do probably next week or the week after. We plan to support gnu modules on /scratch1 and intel 20 modules on /scratch2 to help guard against filesystem instability.

Sergey asked if we could implement a regular nightly check to ensure JEDI compiles on Hera and Orion.  Mark responded that this was worth considering.

Sergey also asked about the intel OneAPI application containers and whether one could be built with SOCA.  Mark responded that, unlike the development containers, the application containers are best suited to be tagged and deployed with each public release.  So we will create and distribute a new application supercontainer with the JEDI 1.1.0 / IODA 2.0 release.   When there is a public release of SOCA we can work with the SOCA team to create an application container if there is sufficient interest.  Mark also mentioned that the intel application containers do not have the compilers inside (only the runtime libraries) so they can be freely distributed.  There are several intel application containers available now with recent develop versions of fv3-bundle.  These are being used for testing on Azure and AWS.  If anyone would like to access them, ask Mark M for details.

Yannick then welcomed meeting participants to please suggest topics for our focused topic meetings.  If there is any jedi feature or capability (existing or future) that you would like to discuss in more detail, let us know by emailing a member of the core team or by creating an issue on the jedi-discussions repo.

  • No labels