CCSM Scripts Review and Upgrade Project

Goal

Analyze inadequacies in the CCSM3 scripts environment, identify orthogonal problem areas. For each problem area, gather and review potential solutions, choose and clarify desired solutions, and then implement solutions. The following 3 areas need to be addressed
with the following level or priorities:

  • New component set specification (priority 1)
    • Target date for coming up with a new specification is April 27.
    • We will gather recommendations on the wiki for new component set specifications and come up with a recommendation at the CSEG meeting on April 20.
    • Liaisons can get feedback from their respective groups for this recommendation.
    • We will review the feedback at the CSEG meeting on April 27 and decide on a final specification.
    • We will then proceed with the implementation as well as reviewing the impact on the testing scripts (see next item).
  • Rewrite of testing scripts to be compatible with the new component set specification (priority 2)
  • Resolve component specific Makefile issues (priority 3)
    • Makefile flags for stand-alone components should be the same as those used when the component is run in coupled (ccsm) mode.

Brainstorming of Functionality Issues to Address

New component set specification

  • The old way of using a letter for a given component specification seems to be rapidly becoming outdated.
  • How do we handle specifying multiple types of active components (e.g. pop1.4 or pop2) and various modes of running a particular type of active components. As examples:
    • clm can be run with no carbon cycle, the CN carbon cycle or the CASA carbon cycle (for each of these modes, dynamic land use can be turned on, ..etc)
    • datm7 can be run using observation forcing data or using data produced by cam
    • pop will soon be able to run either using pop1.4 or pop2 (in the future hycom might also be an active ocean model that could be selected)
    • the scripts must easily be able to handle different biogeochemical scenarios and the associated passing of different biogeochemical tracers.

      - Issues
      – Should our compsets be backward compatable or should we come up with a new scheme.
      – Should the mode be "part" of the compset or separate. Our current testnames suggest we treat mode separately from compset, TER.01a.res.B vs TER.01b.res.B vs TER.01i.res.B are the same compset but different modes.
      - Some Options
      – (1) Stick with one letter compset (A-Z), limit ourselves to 26 supported compsets which might change over time. We recently redefined D and I for instance. B with pop2 could be V and B with hypop could be W for example. Mode could probably be partly supported. But we're limited to 26 total.

      — (MV) This seems like a solution that will become obselete in the very near term. We want to implement a longer term solution that is not restricted to only 26 supported component sets.

      – (2) Continue with semi-arbitrary names but allow them to be one or two characters [A-Z][A-Z]. So B with pop2 could be BP and B with hypop could be BH. B with some bgc options could be BA, in other modes, BB, etc. This naming convention is highly arbitrary, but is backward compatable and gives us ~700 potential combinations which would likely support just about any number of components and modes we could come up with.

      — (MV) This seems better than (1), but would be difficult to understand. Someone would always have to have a translation table handy to decipher the component set.

      – (3) Come up with a new naming convention. How about 5 letters (one for each component), ordered atm-lnd-ocn-ice-cpl (arbitrary). the options for A might be C=cam, D=datm+camhist, E=datm+tn460, G=cam+bgc, M=cam+mozart, X=xatm, etc. for ocn, O=pop, P=pop2, H=hypop, D=docn, X=xocn, etc. our compsets might look like CLOCC for all active and DDDDC for all data and XXXXC for all dead and GHGLC for some bgc run. This limits any one component to have just 26 types.
      – (4) Similar to above but even a bit more explicit, as in AC_LB_OH_IL_CC.
      – (5) Get rid of the notion of compset and wherever we refer to it now, change it to a clear component name. instead of saying/inputing B to our scripts (-compset B), we set each component uniquely (-atm cam -ocn hypop -ice csim_prescribed -lnd clm_cn -cpl cpl) and we update our scripts, docs, web pages, etc to clarify compset.
      – (6) generalize/slightly-modify the current naming scheme with the understanding that (eg) compset "B" means all active models, but does not specify which active models, so that the ocean component might be pop1, pop2, or hycom. After running create_newcase (and selecting compset B), one needs to edit env_conf to select pop, pop2, or hycom. Thus it is not until one runs configure that the choice of pop vs. pop2 vs. hycom is nailed-down. Note: this is similar to how one currently selects "modes" in datm7.
      – (7) Something else? HELP!


      h3. Determination of CCSM tasks and threads
      - The current way of setting default tasks and threads is rapidly becoming obsolete as new machines are added and the optimal task and thread count also depends on the modes that a given active model is run in (i.e. is CN mode turned on for CLM, etc).
      - We need to determine a new way of setting task and threads that permits the automated testing scripts to run and yet also permits a possible "database" of recommended settings to be determined for a given set of dependencies.
      - The possible dependencies for determining tasks and threads are
      – machine (for linux this also encompasses compiler type)
      – resolution
      – component set
      – mode for a given active component (if an active component is used)
      – type of resource utilization requested (small, medium, large) (this implies that we can put together recommendations for large production resources as well as runs that require smaller resources)
      - We need to start putting in appropriate default task/threads to new scripts' pes_setups. How far should we go to get it right vs close and can we take into account component modes better?
      - Do we need to add an automatic email to provide feedback to users or CSEG of poor load balance based on timing output?

      Makefile, Macros file, and build issues

  • Which pre-processor should our Makefile use and should it run as a separate pass or as part of the f90 compilation (in which case .f90 files will not be created)? Do all compilers support pre-processing? How does CAM do it?
  • Is it OK to add if-defs in share code if they are required only by stand-alone CAM but play no role in building CCSM? (a recent issue in that appeared in latest csm_share/shr code)
  • What is our strategy wrt if-defs and what is the list of valid if-def variables?
  • Currently we only have if-defs for various OS's, do we need if-defs for compilers too?
  • Until recently, all CCSM components used the same makefile and build procedure, now CAM does not. What issues does this raise? Do we no longer desire a common build method?
  • Does the current coding standard specify f90 only, or is f90 acceptable? (i.e. allocatable statements in derived types as opposed to pointers)
  • Are we going to introduce a CCSM requirement that all literal constants must be strong typed?

    h3. New resolutions
    - gx1v4
    - gx3v6
    - 1x1.25+?
  • No labels