2019-08-08

Steve H opened the meeting announcing that Yannick will not be present because he and others, including Tom, are occupied with a workshop this week. Steve announced that, following our pattern of alternating focused topic discussions and general round-table discussions, this week will be a round-table discussion. Next week will be a focused topic discussion but we have not yet decided on a topic. So...

Please send us ideas for focused topic discussions: what would you like to see covered?

Then Steve commenced with the round-table discussion, starting in Boulder.

Ming reported that he is still working with Xin on the JEDi WRF interface. They are focusing on a domain that is 1/2 CONUS and they expect to have a 3DVar application running soon. Steve asked whether the existing ioda test data sets are sufficient to provide observations for WRF DA. Ming expects that they will be able to work with the existing data sets but made plans to talk with Steve later about how this data can be converted into a format that would be suitable for WRF DA.

Hailing has been working on a feature branch for GNSSRO data that makes use of an MPI reduce operation to find field extrema across processors for vertical profiles. This will be used to formulate QC filters that meet the requirements of GSI.

Clementine has been seeing a number of bump test failures and asked if anyone else has been seeing these. Steve and Mark O commented that some of these failures are well known - there are about 18 tests that fail intermittently on different systems and with different compilers. Mark O has traced this to eckit and suggests we reconsider how eckit is built. Benjamin M is aware of this but is currently on PTO, So, this is a known issue but there is no fix yet. However, this does not explain all of the failures that Clementine was seeing, which was over 100.

On a related note, Mark M announced that new JEDI modules are ready for use on the S4 machine in Madison, using the intel 17 compiler suite. About 90% of the ufo-bundle tests pass, but there are still about 70 test failures, all but 3 of them are in the bump tests. Mark M is still investigating the cause of these failures. At first he suspected a problem with netcdf and/or nccmp but has now ruled that out. Still, the modules should be ready to run some applications. There is a pull request currently in the jedi-docs repo that will update the JEDI documentation with instructions on how to use the S4 modules. If you have any further questions, please contact Mark M.

Dan mentioned that nccmp is not part of the jedi-stack that is currently available on Discover and this is used by many of the bump compare tests. So, unless you have your own nccmp installation, you will likely see many of these tests fail on Discover. We will remedy this in the near future.

Steve H has been working on two things. First, he has integrated the ODC python interface called Odyssey with several of the JEDI repos and has been building and testing this new functionality. The Singularity and Charliecloud containers have also been updated with the python dependancies required by Odyssey. Meanwhile, Steve V has been assessing the performance of ODC relative to other file IO alternatives. Soon they will update the group on the viability of ODC in JEDI.

The second thing that Steve H has been working on recently is to add a capability to the ioda reader to group observations by vertical profile or other criteria. In this new feature, the metadata to use for grouping can be specified in the yaml file and each group will be kept together on a single processor.

Maryam then reminded the group that when you have pull requests that span several repositories, give those branches the same names. When you initiate a pull request, the Travis CI testing will automatically check to see if branches of the same name exist in other repositories. If so, then it will use those branches instead of develop to build and check the code. This can make a big difference between a pass or fail (green or red light) on Travis.

Maryam also announced that we have begun implementing a multi-tier test framework. Under this framework, the lowest tier of tests will be performed upon every pull request (as is done currently) while higher tiers (generally more expensive and/or more extensive) will be done on a daily, weekly, or monthly basis. She is using the saber repo as a testbed for this new functionality. Currently all test tiers are still utilizing the Travis servers but the plan is to execute some of the higher-tier tests on Amazon or on a local MMM computer that we have available for this purpose.

JJ reported that the data files used for MPAS testing have been moved to git lfs. Prior to this move they were provided via a tar file. This has been done in preparation for a new CMake-based build system for MPAS that will be spearheaded by Mark O. This new CMake build system is expected to make mpas-bundle easier to use.

Jim R has been investigating the error handling in JEDI. He has identified two instances in ufo-bundle in which floating point errors were generated but not trapped. So, the tests passed but should have failed. Jim is working on a portable mechanism to ensure that such tests fail. He has it implemented with the gnu compilers but is still working on intel. Ryan asked what tests. Jim did not recall offhand but thought that both were in crtm. JJ mentioned that one is a tlad test.

Mark O plans to start working on the new CMake build system for MPAS mentioned above.

The floor then turned to EMC/GMAO.

Dan has been working with Victor on machine learning with TLAD. He has also been looking into how to change eckit to allow one to overwrite one configuration with another. He wants to use this to update the B matrix in the outer loop. He also continues to work on a portable Poisson solver for FV3, for use with variable transforms.

Jong has been implementing the TAU software for use with profiling the soca-bundle on Theia. He is able to build the code manually using the TAU compiler wrappers but is having some problems getting this to work within ecbuild/cmake. He did get it to compile but he is still having more success with the manual build. He asked for anyone who has experience using TAU to contact him. Jim R said that he has not used TAU but he has had some experience profiling JEDI code and he offered to help.

Ryan has been maintaining the ufo repo while Anna is away on PTO. He has also been working on improving the CRTM radiance operators.

Cory has ufo-bundle compiling on WCOSS. He is only getting 4 test failures that he is investigating now.

Mark M mentioned that one thing to watch out for is hdf5, though if there was a problem with that, then one would expect more than four failures. While working on S4, the sysadmins reported that they were seeing issues with version 1.10.x of hdf5. Mark M confirmed that this was indeed the case. The standard version of hdf5 used in the jedi-stack is 1.10.5. Mark M has not seen this to be a problem on any other platform but on S4 it was definitely causing problems that were resolved when it was replaced with version 1.8.21.

We then heard from Sarah at NRL. She is now installing lmod and asked if a tcl interpreter is needed in the lmod configuration. Mark M responded no, as far as he is aware, since all the jedi-stack modules use lua. He also recommended the use of the luarocks package to facilitate the installation of lmod. More generally, he invited Sarah to communicate with him offline if she had any further problems with lmod.

Sylianos is working on updates to his python BUFR to netcdf converter in ioda-converters.

We then turned to the Met Office. Marek said they are having problems updating to eckit 1.1. It works with intel but they are seeing failures in the debug build with gnu associated with a particular destructor in oops. After some discussion, it was suggested that they may try updating to a newer version of CMake. They mentioned they were using version 3.6 but we typically build the jedi-stack with version 3.13 or 3.14.

Steve S had a question for Maryam about the multi-tier test framework. He asked if it was a feature of ctest. Maryam and Mark O responded that it is implemented within the ctest framework but it is controlled by our own flags. We are still in the early stages of the implementation - there are still some thing in development. For example, we need to decide how to handle the larger data files needed for higher-resolution, higher-tier tests. We may chose to store these directly on Amazon S3 rather than through gif-LFS. After we test this more we will establish some standards and then inform the community of how to implement a multi-tier test framework in their own repos.

Marek then asked about the status of our intent to adopt an automated procedure to impose a standard Fortran coding style: see the associated ZenHub issue in the jedi-docs repo. This issue was created well over a year ago but was not deemed to be as high a priority as other things. In the last few weeks there has been renewed discussion (refer to the issue), suggesting a few possible tools that could be used for checking Fortran code style (we already implement such style checks for C++ and python). Marek offered to look into a few options and report back on which he thinks might be best.

Before closing the meeting, all were reminded again that we do not have a topic yet for next week so please let us know if you have an idea for one. Ming suggested a discussion of how user/developers should prepare data for use with ioda and future plans for ioda. But we thought that this might be premature - it might be better to wait until after Steve H and Steve S explore the viability of ODC a bit more - then we will have a better idea of where ioda may be headed.

It was also mentioned that there is an upcoming ufo code sprint in Boulder August 19-30. It was agreed that a biweekly topic discussion should be devoted to the outcomes of this code sprint.

Another topic that was suggested is the multi-tier testing framework. But, again, this is premature; it would be better to wait a few weeks until we have sorted out some more of the logistics.

It was agreed by some that a promising topic for next week might be "python in JEDI". There are several aspects that we might discuss here. One is the upcoming deprecation of python 2.7 in January, 2020. Should we choose to support only python3 from here on out? Would this break any existing tools that people use? Rahul mentioned that a discussion on whether or not to stop supporting python 2.7 should involve a bigger group of representatives from our partners, particularly because of the operational implications. Operational applications may choose to retain python 2.7 applications even after it is officially deprecated (eg., WCOSS). Rahul also mentioned that it does not take much work when writing new python code to make it backward compatible. Stylianos mentioned that dealing with different python implementations (2 and 3) cause headaches that might benefit from a move to python 3. Other python-related topics worth discussing include what python tools might we wish to use with JEDI in the future? Currently our python usage is mainly localized in the ioda-converters repo but this might change as we develop diagnostic tools to analyze and visualize model output. Are there decisions that need to be made, such as basemap vs cartopy for map projections? Should we include miniconda in the container?

We will discuss this with Yannick and decide on a topic for next week. After we decide we will announce the topic on the email list.

Meeting adjourned.

Space shortcuts

Page tree