CCSM Scripts Review and Upgrade Project

Goal

Analyze inadequacies in the CCSM3 scripts environment, identify orthogonal problem areas. For each problem area, gather and review potential solutions, choose and clarify desired solutions, and then implement solutions. The following 3 areas need to be addressed
with the following level or priorities:

  • New component set specification (priority 1)
    • Target date for coming up with a new specification is April 27.
    • We will gather recommendations on the wiki for new component set specifications and come up with a recommendation at the CSEG meeting on April 20.
    • Liaisons can get feedback from their respective groups for this recommendation.
    • We will review the feedback at the CSEG meeting on April 27 and decide on a final specification.
    • We will then proceed with the implementation as well as reviewing the impact on the testing scripts (see next item).
  • Rewrite of testing scripts to be compatible with the new component set specification (priority 2)
  • Resolve component specific Makefile issues (priority 3) Makefile flags for stand-alone components should be the same as those used when the component is run in coupled (ccsm) mode.

Brainstorming of Functionality Issues to Address

New component set specification

  • The old way of using a letter for a given component specification seems to be rapidly becoming outdated.
  • How do we handle specifying multiple types of active components (e.g. pop1.4 or pop2) and various modes of running a particular type of active components. As examples:
    • clm can be run with no carbon cycle, the CN carbon cycle or the CASA carbon cycle (for each of these modes, dynamic land use can be turned on, ..etc)
    • datm7 can be run using observation forcing data or using data produced by cam
    • pop will soon be able to run either using pop1.4 or pop2 (in the future hycom might also be an active ocean model that could be selected)
    • the scripts must easily be able to handle different biogeochemical scenarios and the associated passing of different biogeochemical tracers.

      - Issues
      – Should our compsets be backward compatable or should we come up with a new scheme.
      – Should the mode be "part" of the compset or separate. Our current testnames suggest we treat mode separately from compset, TER.01a.res.B vs TER.01b.res.B vs TER.01i.res.B are the same compset but different modes.
      - Some Options
      – Stick with one letter compset (A-Z), limit ourselves to 26 supported compsets which might change over time. We recently redefined D and I for instance. B with pop2 could be V and B with hypop could be W for example. Mode could probably be partly supported. But we're limited to 26 total.
      – Continue with semi-arbitrary names but allow them to be one or two characters [A-Z][A-Z]. So B with pop2 could be BP and B with hypop could be BH. B with some bgc options could be BA, in other modes, BB, etc. This naming convention is highly arbitrary, but is backward compatable and gives us ~700 potential combinations which would likely support just about any number of components and modes we could come up with.
      – Come up with a new naming convention. How about 5 letters (one for each component), ordered atm-lnd-ocn-ice-cpl (arbitrary). the options for A might be C=cam, D=datm+camhist, E=datm+tn460, G=cam+bgc, M=cam+mozart, X=xatm, etc. for ocn, O=pop, P=pop2, H=hypop, D=docn, X=xocn, etc. our compsets might look like CLOCC for all active and DDDDC for all data and XXXXC for all dead and GHGLC for some bgc run. This limits any one component to have just 26 types.
      – Similar to above but even a bit more explicit, as in AC_LB_OH_IL_CC.
      – Get rid of the notion of compset and wherever we refer to it now, change it to a clear component name. instead of saying/inputing B to our scripts (-compset B), we set each component uniquely (-atm cam -ocn hypop -ice csim_prescribed -lnd clm_cn -cpl cpl) and we update our scripts, docs, web pages, etc to clarify compset.
      – generalize/slightly-modify the current naming scheme with the understanding that (eg) compset "B" means all active models, but does not specify which active models, so that the ocean component might be pop1, pop2, or hycom. After running create_newcase (and selecting compset B), one needs to edit env_conf to select pop, pop2, or hycom. Thus it is not until one runs configure that the choice of pop vs. pop2 vs. hycom is nailed-down. Note: this is similar to how one currently selects "modes" in datm7.
      – Something else? HELP!

      Determination of CCSM tasks and threads

  • The current way of setting default tasks and threads is rapidly becoming obsolete as new machines are added and the optimal task and thread count also depends on the modes that a given active model is run in (i.e. is CN mode turned on for CLM, etc).
  • We need to determine a new way of setting task and threads that permits the automated testing scripts to run and yet also permits a possible "database" of recommended settings to be determined for a given set of dependencies.
  • The possible dependencies for determining tasks and threads are
    • machine (for linux this also encompasses compiler type)
    • resolution
    • component set
    • mode for a given active component (if an active component is used)
    • type of resource utilization requested (small, medium, large) (this implies that we can put together recommendations for large production resources as well as runs that require smaller resources)
      - We need to start putting in appropriate default task/threads to new scripts' pes_setups. How far should we go to get it right vs close and can we take into account component modes better?
      - Do we need to add an automatic email to provide feedback to users or CSEG of poor load balance based on timing output?

      Makefile, Macros file, and build issues

  • Which pre-processor should our Makefile use and should it run as a separate pass or as part of the f90 compilation (in which case .f90 files will not be created)? Do all compilers support pre-processing? How does CAM do it?
  • Is it OK to add if-defs in share code if they are required only by stand-alone CAM but play no role in building CCSM? (a recent issue in that appeared in latest csm_share/shr code)
  • What is our strategy wrt if-defs and what is the list of valid if-def variables?
  • Currently we only have if-defs for various OS's, do we need if-defs for compilers too?
  • Until recently, all CCSM components used the same makefile and build procedure, now CAM does not. What issues does this raise? Do we no longer desire a common build method?
  • Does the current coding standard specify f90 only, or is f90 acceptable? (i.e. allocatable statements in derived types as opposed to pointers)
  • Are we going to introduce a CCSM requirement that all literal constants must be strong typed?

    h3. New resolutions
    - gx1v4
    - gx3v6
    - 1x1.25+?

    Management Issues to Address

    Baseline simulations

  • Determine criteria for when new baseline control simulations are done
  • Should this project address this issue?

    CCSM Change Review Board

  • Should this project address this issue?
  • Should this be reactivated?
  • What should its mission be?
  • How often should it meet?
  • Who should be represented?
  • Implement CCSM change review board again to help determine this criteria

    Test Recommendations
    h3. Issues
    - tests at http://www.cgd.ucar.edu/cseg/testing/cases/testcases.html are not completely up to date. some test cases exist that aren't documented. some test cases are documented that don't exist. do we want to review cases to
    – remove doc/scripts for test cases we no longer need
    – verify that some/all test cases actually work properly in scripts
    - do we need a performance test?
    - do we need a few new tests like TBR.02i, THY.02j
    h3. General Recommendations
    - 5 Levels of testing. Each level contains a list of specific tests, as documented below. Depending on the phase of development/testing, different levels of testing are appropriate. The individual/lead always has discretion to execute appropriate tests or substitute tests depending on availability of machines and changes to components.
    - All testing should be done with the compare option if possible and appropriate.
    - Post tag testing should be done from collections whenever possible. If not collections, then it should be a clean checkout only. All post tag tests should be done using the generate option to save the results. Results should be saved in /fs/cgd/csm/ccsm_baselines or the local equivalent at nersc/ornl/etc. Baseline generation should be done only by designated people.
    - In all situations, additional tests can be run. For instance, if a specific test fails on one machine, it may be appropriate to run the same test on other machines, to run the same test at a different resolution, or to run the same test in a different mode to provide further guidance on problems.
    - Pre-tag testing should NOT be reported on the tag page unless it's highly relevant (ie. non-standard tests were run). Generally, pre-tag testing by developers is done from sandboxes with components NOT entirely consistent with the tag. If pre-tag testing is reported, it should be clearly indicated and the ccsm base version plus different component tags should be noted.
    - All post-tag, weekly, and monthly tests results should be reported on the tag web page for that tag. Always note the date.

    h3. Testing Roles and Coverage
    - Developers, Pre-Tag Testing from a sandbox, tests depend partly on scope of changes and type of component
    – Level 0 - required
    – Level 1 - run at least 3 tests (3 machines and appropriate compsets)
    – Level 2 - all tests recommended
    – Level 3 - run at least 2 tests (2 machines and appropriate compsets)
    - Tagger, Pre-Tag Testing from sandbox
    – Level 0 - as needed
    – Level 1 - run at least 9 tests (3 machines x all compsets)
    – Level 2 - all tests recommended
    - Tester, Post-Tag Testing from collections
    – Level 0 - as needed
    – Level 1 - all tests required
    – Level 2 - all tests required
    – Level 3 - all tests recommended, at discretion of CSEG lead
    – Level 4 - all tests recommended, at discretion of CSEG lead
    - Weekly Testing from collections if no new tags generated that week
    – Level 1 - all tests required
    – Level 2 - all tests required
    - Monthly Testing from collections
    – Level 1 - all tests required
    – Level 2 - all tests required
    – Level 3 - all tests required
    – Level 4 - all tests required
    – Level 5 - all tests required
    h3. Test Levels and Specific Test Cases
    - Level 0
    – run tests appropriate for model changes
    - Level 1
    – TER.01a.4x5_gx3v5.A.lightning
    – TER.01a.4x5_gx3v5.B.lightning
    – TER.01a.4x5_gx3v5.X.lightning
    – TER.01a.T31_gx3v5.A.bluesky
    – TER.01a.T31_gx3v5.B.bluesky32
    – TER.01a.T31_gx3v5.X.bluesky
    – TER.01a.1.9x2.5_gx1v3.A.bluevista
    – TER.01a.1.9x2.5_gx1v3.B.bluevista
    – TER.01a.1.9x2.5_gx1v3.X.bluevista
    – TER.01a.1.9x2.5_gx1v3.A.phoenix
    – TER.01a.1.9x2.5_gx1v3.B.phoenix
    – TER.01a.1.9x2.5_gx1v3.X.phoenix
    - Level 2
    – TBR.02a.4x5_gx3v5.A.bluevista
    – TBR.02a.4x5_gx3v5.B.bluevista
    – THY.02a.4x5_gx3v5.A.bluevista
    – THY.02a.4x5_gx3v5.B.bluevista
    – TER.01i.4x5_gx3v5.B.bluevista
    – TER.01j.4x5_gx3v5.F.bluevista
    – TER.01k.4x5_gx3v5.F.bluevista
    - Level 3
    – TDR.01a.4x5_gx3v5.A.lightning
    – TDR.01a.4x5_gx3v5.B.lightning
    – TDR.01a.4x5_gx3v5.X.lightning
    – TDR.01a.T31_gx3v5.A.bluesky
    – TDR.01a.T31_gx3v5.B.bluesky32
    – TDR.01a.T31_gx3v5.X.bluesky
    – TDR.01a.1.9x2.5_gx1v3.B.bluevista
    – TDR.01a.1.9x2.5_gx1v3.A.phoenix
    – TDR.01a.1.9x2.5_gx1v3.B.phoenix
    – TDR.01a.1.9x2.5_gx1v3.X.phoenix
    - Level 4
    – TBR.02a.T31_gx3v5.A.lightning
    – TBR.02a.T31_gx3v5.B.lightning
    – THY.02a.T31_gx3v5.A.lightning
    – THY.02a.T31_gx3v5.B.lightning
    – TER.01i.T31_gx3v5.B.lightning
    – TER.01j.T31_gx3v5.F.lightning
    – TER.01k.T31_gx3v5.F.lightning
    – TER.01k.4x5_gx3v5.F.bluesky
    – TER.01a.T42_gx3v5.D.bluesky
    – TER.01a.1.9x2.5_gx1v3.I.bluevista
    – TBR.02a.T85_gx1v3.A.bluevista
    – TER.01b.T85_gx1v3.B.bluevista
    – TBR.02a.1x1.25_gx1v3.A.bluevista
    – TBR.02a.1x1.25_gx1v3.B.bluevista
    – TBR.02a.1x1.25_gx1v3.A.phoenix
    – TBR.02a.1x1.25_gx1v3.B.phoenix
    – TBR.02a.1x1.25_gx1v3.X.phoenix
    – THY.02a.T85_gx1v3.A.phoenix
    – THY.02a.T85_gx1v3.B.phoenix
    – TER.01i.1.9x2.5_gx1v3.B.phoenix
    – TER.01j.1.9x2.5_gx1v3.F.phoenix
    – TER.01k.1.9x2.5_gx1v3.F.phoenix
    - Level 5
    – TBR.02a.T42_gx3v5.A.lightning
    – TBR.02a.T42_gx3v5.B.lightning
    – TBR.02a.T42_gx3v5.X.lightning
    – TER.01i.4x5_gx3v5.B.lightning
    – TER.01j.4x5_gx3v5.F.lightning
    – TBR.02a.T42_gx1v3.A.bluesky
    – TBR.02a.T42_gx1v3.B.bluesky32
    – TBR.02a.T42_gx1v3.X.bluesky
    – TER.01i.4x5_gx3v5.B.bluesky32
    – TER.01j.4x5_gx3v5.F.bluesky32
    – TDR.01a.1x1.25_gx1v3.A.bluevista
    – TDR.01a.1x1.25_gx1v3.B.bluevista
    – TER.01b.T85_gx1v3.B.phoenix
    – TDR.01a.1x1.25_gx1v3.A.phoenix
    – TDR.01a.1x1.25_gx1v3.B.phoenix
    – TBR.02a.4x5_gx3v5.A.seaborg
    – TBR.02a.4x5_gx3v5.B.seaborg
    – TBR.02a.4x5_gx3v5.X.seaborg
    – TBR.02a.1.9x2.5_gx1v3.A.bassi
    – TBR.02a.1.9x2.5_gx1v3.B.bassi
    – TBR.02a.1.9x2.5_gx1v3.X.bassi
    h3. Tag Submission Template
    The following is a recommended tag submission template. A filled-in template (or the equivalent information) should be provided as components are tagged to help document the ccsm tag and summarize developer testing. The sandbox parameter should contain the baseline ccsm version of the sandbox plus any updated component tags or indications of non-standard component versions.
    Tag:
    Date:
    Summary of changes:
    Sandbox:
    Test results:

  • No labels