Yannick opened the meeting with a brief wrap-up of the ufo code sprint that has been occurring in Boulder for the past two weeks.  He said the code sprint was very active, with many pull requests.  There were a few changes that may have broken a few of your tests so let us know if you see any issues.  However, there were fewer issues like this (broken tests) than in previous sprints so we're getting better at streamlining our workflow.

Then Mark M shared the screen in Boulder to demonstrate a new (experimental) ZenHub board we created for the purpose of soliciting feedback on specific topics to discuss at our weekly JEDI meetings.  As mentioned in previous meetings, our intention for the weekly meetings is to alternate focused topic discussions with general round-table updates.  So, we want your input on what focused topics you'd like to discuss.

To see the ZenHub board, navigate to the ZenHub board on the jedi-docs repository.  At the top of the page, near the name of the repository, you should see the default workspace, named just jedi-docs.  This is a drop-down menu - select the JEDI-Weekly-Meetings workspace from the menu.   Then you should see a board with several columns, including Proposed topic discussions, scheduled topic discussions, and previous topic discussions.  There is also a Closed column but you can disregard this.  Select weekly meetings from the Labels drop-down menu to view only the meeting-related issues.  

Please edit the board to share your thoughts.  You can create new issues to describe topic discussions you'd like to see and/or you can edit the existing issues to clarify what points you would like to see addressed in each of these discussions.  One of the advantages of doing this in ZenHub is that we can prioritize proposed topics and then move them to the scheduled column when we add them to our agenda.  After the meetings, we will move these issues into the previous topic discussions column and include a link to the JEDI wiki page that includes the meeting notes and slides.  The ZenHub board is searchable, so you can look for specific topics that have been covered to see the associated meeting notes and slides.

This is experimental - we may choose to discuss topics on a different platform, such as a GitHub Team thread.   Please let us know if you like this format or if you'd rather we organize this a different way.

Then we moved to the tele-participants in the meeting, beginning with Jim R.  He has been looking at floating point exceptions and how to trap them, so that a test that triggers a fpe should fail.  He has used this to diagnose and fix a few tests in CRTM and is now working with ufo.  He says there are two tests that are triggering fpe and that should fail but don't.  So, he is working on fixing this together with Mark O (who is currently on PTO).

Then Dan reported on progress discussed during yesterday's (biweekly) JEDI Models meeting.  Two approaches have been followed to create a static B matrix.  

  • BJ has been working with NCL and the fenics python package as different approaches to solve Poisson equations on a sphere to decompose the velocity into a streamfunction/potential representation.  NCL interpolates the MPAS grid onto a Gaussian grid and then solves the Poisson equation through spherical harmonics whereas fenics uses an iterative solution algorithm.
  • Dan has been working with an offline multigrid Poisson solver, refactoring code he received from a colleague.   An attractive feature of this is that it is code that we control, rather than an external package.

Both approaches are now working and performance and accuracy comparisons continue.

Hamideh is working on an emissivity model for microwave frequencies in CRTM, including the implementation of a roughness correction.

Virginie has three PRs to ioda, ufo, and oops that add the GEOS AOD calculation.  These are not public; they depend on code external to JEDI so they cannot be tested by others.

Emily is updating documentation for background checks and is working with model background and diagnostics from H(x) that Anna recently implemented in ufo.  She is considering how to enhance the functionality of the filters.

Then Tom asked a few questions.  First, he asked why the GEOS AOD was not public and Dan responded that the code lives inside of GEOS and in order to run it you'd need to have GEOS built.  They mitigated this issue by creating a new repository in JEDI/JCSDA that serves as a "middle man" between GEOS and ufo.

Then Tom introduced that there will be a visit from Sarah Liu who is working on Aerosols and encouraged JEDIs to work with her on generic operators.

Tom also asked about work that Dan was doing with Benjamin on testing fv3-jedi configurations at high resolution.  Dan mentioned a pull request that Benjamin currently has for saber that eliminates the copying of ensemble data.  The motivation for this was to try to reduce memory usage.  

Mark M mentioned that, though he has not studied the memory usage in detail, he did do an experiment that demonstrates that Benjamin's new PR does indeed reduce the memory footprint.  This concerns a test in fv3-bundle called test_fv3jedi_hyb3dvar_gfs_aero.  Using the develop branch of saber, Mark found that this test failed on his Mac (with gnu-openmpi), apparently because it was running out of memory.  When Mark tried the same test on AWS (32 GB memory compared to 16GB on his Mac), the test passed.  Then Mark tried on his Mac with Benjamin's new branch and the test passed.  So, in short, the test was failing before with 16 GB memory but now passes with Benjamin's latest code changes.

Then Travis mentioned that the obs interpolation for SOCA was running slowly because it was repeatedly re-initializing the geometry.  They solved this problem by bypassing excessive calls to bump%setup_online and saving intialization results in a global variable. Travis identified three areas that have slow performance: The search for redundant points, the tessellation of the grid, and a masking function.

Then we turned to the Met Office.  Marek reported that he is almost finished with the implementation of the timing tesselation; 3/4 tests are working and he needs to mask out missing values.  Yannick agreed to have a look at his code when ready. It was noted that this work can be split into two parts, one for the non-linear operator and one for the linear operator. The non-linear time interpolation is functioning properly, so Marek will get that into a PR next (before fixing the linear operator) so the review can get started sooner.

David Simonin reported that he has a full team now and can start providing observations.

Cory has been working on getting JEDI up and running on Hera.  This announcement initiated a discussion from others who were also doing the same.  Several people reported problems with intel 19 and specifically an eckit mpi test that was hanging.  Chris H said the problem was in intel MPI and got it to work by using mvapich instead of impi.  Guillaume mentioned that SOCA is now working with intel 19 and they have JEDI modules set up on Hera.  Chris H responded that the shallow water tests are also fine with intel 19 but there are many failures in oops and saber tests.  He advised against using any intel 19 versions before 19.0.5, which have known problems.  In response to an issue raised by Stylianos, Chris mentioned that a ticket is active for system admins to install intel 17 on Hera.

It was agreed that the multiple people working with Hera need to come together to share their experiences and decide how to proceed. Guillaume agreed to organize that meeting.

Andrew has been adding extra information for GeoVaLs and obs files for new ctests that test the functionality implemented during the code sprint. Andrew also reported differences in AMSU-A H(x), between GSI and UFO, and is investigating into this.

Hui has also been continuing work begun at the code sprint.  She is implementing more tests into a PR she submitted for the GPSRO forward operator.  She is also accommodating recent changes in the ordering and grouping of observations that have changed the results slightly.  She had to change the tolerance level for one of the tests and is seeking to determine why this changed.

Ryan gave a well-received conference talk on snow scattering, with implications for CRTM.  He's also working with Anna and Yannick on cleanup after the code sprint.  And, he's looking into the possibility of using Eigen to store some data.  And, he is also interested in participating in the discussions on building and running JEDI on Hera.

NRL - Sarah is testing their implementation of 3DVar with bump.

Guillaume is working on the 3DVar and enVar global workflow for SOCA, and in moving this workflwo to Hera.  Before the move, they achieved a full cycle running on Theia.

JJ mentioned that some of the MPAS tests had been on a single processor but is working to remedy this using some wrappers he developed in CMake to cut down on the length of the CMakeLists.txt files.  With these wrappers, multi-core tests can be specified with a single line.  He mentioned that this functionality might be useful for others - he will announce when he's ready for a pull request so others can have a look.  Travis commented that something like this is already implemented in SOCA.

Yannick then expressed some concerns about the current practice followed by many tests of splitting up the test into separate "run" and "compare" tests.  This may cause problems as we continue to add tests.  it might be better to incorporate these into a single test.

Travis has 1, .5, and .25 degree SOCA configurations running on Discover.  They are now ready for a 4-5 month experiment and are excited to get some science out of it.

Steve V has been working more with assessing ODC and on satellite DA.  Tom asked him for a sneak peak on ODC and Steve said he was reluctant to say much until they finish their testing.  It is already clear that it is much faster than ODB-API.  It seems to be comparable to netcdf but he wants to do more testing before making a judgement on that.

Maryam now has the AWS Codebuild Continuous Integration testing tool working for both saber and fv3-jedi.  This is an alternative to Travis-CI and is capable of running tests with both gnu and intel compilers.  She's now assessing whether it is worthwhile to extend it to the other repos.  It is much faster than Travis-CI and can run in parallel.  Yannick mentioned that Travis-CI has a limit of 2 concurrent tests and that this posed problems for the ufo code sprint.  CodeBuild does not have that limitation.

Chris H has 3DVar working for his shallow water model.   One of the last hurdles for him was to implement the ObsFilter factories, which he did following the example in fv3-jedi.  But, then Anna added a generic interface in ufo to remove the need for individual models to define these factories.  So now Chris is using that.  Now he is working on the LinearModel for the shallow water model.  One challenge he faces is to develop the adjoint for GetValues.  He did not use bump for his interpolation, opting instead for a hard-coded implementation, so computing the adjoint is not trivial.  He's looking for help from anyone who has experience with this.

Yannick encouraged Chris to issue a pull request for his working H(x) in the shallow water model.  Chris agreed to do so but first has to merge in the latest changes to ufo, oops, and ioda.

Steve mentioned 2 PRs in ioda-converters that involve accessing the NCEP bufrlibs from Fortran.  He's looking for reviewers who are willing to test them, particularly on Hera.  Cory volunteered.

Xin is working on the Variational Bias correction.  His AD/TL tests now pass and he plans to submit a PR soon.  Next he plans to calculate the predictor online.  Yannick mentioned that, as a result of the code sprint, Xin should now be able to access what he needs.

Clementine is working with the MPI communicators in EDA.  She's now passing the communicators to the geometry objects.  This should solve problems she was having with the definitions of the communicators throughout the EDA run.  She wants to work with the shallow water model when ready.  Chris mentioned that the geometry objects in the shallow water model already have the communicators.  In response to a question on when this should be available, Clementine and Yannick expected it to be done by the end of the month.

Guillaume asked what happens if you don't have the resources to run all the ensemble members and Yannick responded that they could be run sequentially through a queuing system.  Tom added that EDA is less expensive than ensemble forecasts.

Last week Mark M started working on isolating the interpolation functionality in bump and moving it from saber to oops, as a utility.  He has a PR in oops now for the first step in this process, which is to create a home for it in oops.   In order to make it generic, he created a new oops_interpolator class in Fortran and associated abstract classes for grids and fields.  Interpolator objects are instantiated by means of a factory implemented in Fortran.  Though bump is currently the only interpolator in the factory, this allows us to add other interpolation options in the future.  Anyone who wants a say in how these interpolator interfaces are defined, please have a look and comment.

Tom closed the meeting by reminding developers to break large pull requests into smaller pieces to make reviews more manageable and more useful.  This is a more agile approach.

  • No labels