2020-04-23

Yannick opened the meeting announcing that this will be a general communication update since we missed last week's meetings due to code sprints.

Yannick reported that last week's code sprints (SABER merges, getValues merges) went well. We appear to back on track with the JEDI core repos and a number of the models as well.

If anyone has a topic idea for the weeks we conduct special topic discussions, please send those to the core team or post on the jedi-docs ZenHub board. Note that if you contribute a topic, you are not necessarily on the hook to present that topic to the group.

Yannick asked that everyone be sensitive to initiating automatic testing on our repos. The testing capacity is limited, and every time a commit gets published in an open PR the automatic testing is triggered. If many commits are being published, the testing queue gets backed up and this can slow down the merging process.

This request opened up discussion of which contained the following points:

We need the automatic testing feature and want to avoid going to a manual (eg, push a button to fire off the tests) testing scheme
The real issue is slow test times
- If testing ran quickly, then it wouldn't matter that many test cycles are being initiated
- This is where we want to be, but we're not there yet
Do debugging/development in your local clone and don't publish to github until your expectation is that the tests will pass
- This is a good methodology to continue with indefinitely, and is especially needed while we fix the long running test time issue
There appear to be repos where pushing commits (regardless of the existence of a PR) to github is triggering a test cycle
- Reported by several people
- This needs to be fixed and automatic testing limited to commits pushed to open PRs only
Draft PRs will trigger automatic testing and there doesn't appear to be a way to shut this off
Here are some other methods being used or proposed
- SOCA has a a templated PR form to fill out when submitting a PR, and a message can be added to ask people to minimize pushes to the open PR
- Adding a keyword "No_CI" to the commit message will shut off Travis-CI, but CodeBuild will still run
- Is there a way to recognize a label on an open PR to shut off automatic testing?

We then went around and collected updates from the group

Guillaume reported that they are working (collaboration with EMC in-kind contributor Shao Lu ) to introduce bio-geo-chemistry capability into MOM6 starting with a bio model called BLING . SOCA will handle the DA part of this scheme, and initially will include a one-way coupling to SOCA for impacts on the ocean by chlorophyl penetration depths in the ocean.

Guillaume also mentioned that work is ramping up on adding marine capability to GEOS, and adding in the assimilation of SMAP Salinity.

Chris H mentioned that last week's code sprints went smoothly. He is now working on an issue where, in the shallow water (SW) model, the inherent damping speed of the model makes it difficult to do sustained DA runs such as cycling. The basic sequence in the SW model is to initialize with a Gaussian pulse in the middle of the domain, the water sloshes around, and dampens until the surface is flat again. Chris tried introducing periodic splashes on the surface to keep the wave motion going indefinitely, and asked the group if this is a reasonable solution? Several ideas were offered:

Guillaume suggested adding wind stress at the surface as is done in ocean models (and offered to help if Chris wanted to pursue this idea)
Travis suggested forcing a boundary condition on one of the edges
Mark M suggested generalizing the initial Gaussian pulse with a time-dependent forcing function, in which randomness could be introduced to prevent the simulation entering a steady state oscillation

Guillaume noted that a forcing term style solution could allow for 4DVar assimilation experiments.

JJ reported an issue with MPAS where 4DVar and 3D FGAT are exhibiting changes in the reference files after last week's merge sprint. He asked is this expected, or is there something wrong? Dan did not expect for there to be differences in the reference files. He did say that he had to change the reference files for fv3 but this was because he added more obs, not because of the merges. Dan added that more attention has been given to the adjoint in GetValues when GetValues was separated out into its own classes (linear and nonlinear). It's possible that there may have been a bug in the old implementation of getvalues that went unnoticed and this may be the source of the changes JJ is finding. After further discussion, it still wasn't clear what, if anything, was wrong with MPAS. Dan offered to help sort this out and will get back to JJ when he has something to report.

Sarah reported that introducing the SABER work from last week's code sprints into Neptune has resulted in a zero increment of which they are diagnosing. Once this is fixed, they will move on to the getValues updates.

Dan is working on cubed-sphere grid processing in ATLAS and has mesh and projection functions working. For example, the GMESH package is working on cubed-sphere now which enables diagnostic plotting.

Dan also introduce two new in-kind contributors from GMAO, Bryan Karpowitz and Jianjun Jin. Bryan is working on the assimilation of hyper-spectral sensors and ozone measurements. Jianjun is also working on the assimilation of various satellite measurements.

Ming reported that the SABER work from last week's code sprints is working in WRF. He is now working on the getValues updates and requested help from the group.

Mark M announced that he has a Singularity multi-node container working for Intel compilers on S4, Discover and AWS. He is evaluating performance using an FV3 3DVar test case that assimilates roughly 9 million obs, and is running with 864 MPI tasks. On S4 and AWS the performance is the same inside and outside the container, and on Discover the performance is considerably slower inside the container compared to outside. However, on AWS the performance (~2 hour runtime) is much slower than the other platforms (10-15 minute runtime). Mark suspects that MPI configuration and/or interconnect hardware is the issue with AWS runtimes and will try using the Elastic Fabric Adapter (EFA) next to see if that helps. Mark O suggested using benchmarks developed by Ohio State University to help flush out the issue with MPI. As part of this optimization, the jedicluster cloudformation AMIs have been upgraded to ubuntu 18.04 and intel 19.05. This improved the single-node performance but did not solve the mpi problems.

Chris S reported that they are close to having MPAS running 3DVar using SABER based multivariate B covariances. Once this is working, they will give a short presentation of the results to the group.

Wojciech is working on the Parameter classes he showed to the group in the last meeting. These classes hold parameters specified in YAML files, and Wojciech has the ultimate goal of auto generating the YAML from a JSON schema. He is doing this work in two steps. First, he is assessing if the Parameter classes can sufficiently hold all of the YAML configuration being used in JEDI, and filling in gaps where needed. Second, once the Parameter classes can sufficiently hold the JEDI configuration, he will move to auto generating the YAML from a JSON schema.

There were no more updates after this, so Yannick closed the meeting.

Stay safe and healthy everyone!

Space shortcuts

Page tree