Known Bugs in CLM3.5

December/12/2008

===================================================================================

List of problems:

Problem with restarts in CN mode on Linux/Lahey

Problem using mksurfdata with PGI compiler

Problem with t42half case with PGI or Lahey compiler on Linux

Problem with support of number of soil-colors NOT equal to 8 or 20

Problem with exact restarts on SGI

Problem with create_crop_landunit

Problem compiling on Darwin with XLF compilers:

Leap-year calendar does NOT work:

Gregorian calendar does NOT work - even if you enable it with build-time & run-time changes

365-day calendar is hardcoded into several subroutines:

newcprnc can NOT be used to compare restart or initial-condition (IC) files:

Problem running on Cray-X1

Problem running with Pathscale:

RTM log error checks:

Potential floating point errors in src/biogeophys/SurfaceAlbedoMod.F90:

Problem with compiling using PGI5.1:

Answers are scrambled using OpenMP and PGI6:

===================================================================================

Details on each problem:

===================================================================================

Bug Number: 469

Problem with restarts in CN mode on Linux/Lahey

For this namelist and with CN turned on...

&clm_inparm
 caseid         = 'clmrun'
 ctitle         = 'clmrun'
 finidat        = ' '
 fsurdat        = "$CSMDATA/lnd/clm2/surfdata/surfdata_48x96_c070501.nc"
 fatmgrid       = "$CSMDATA/lnd/clm2/griddata/griddata_48x96_060829.nc"
 fatmlndfrc     = "$CSMDATA/lnd/clm2/griddata/fracdata_48x96_gx3v5_060829.nc"
 fpftcon        = "$CSMDATA/lnd/clm2/pftdata/pft-physiology.c070207"
 fndepdat       = "$CSMDATA/lnd/clm2/ndepdata/ndep_clm_1890_48x96_c060414.nc"
 frivinp_rtm    = "$CSMDATA/lnd/clm2/rtmdata/rdirc.05.061026"
 offline_atmdir = "$CSMDATA/lnd/clm2/NCEPDATA.Qian.T62.c051024"
 nrevsn         = "$nrevsn"
 nsrest         =  $restart_type
 nelapse        =  $run_length
 dtime          =  1800
 rtm_nsteps     =  2
 start_ymd      =  19980101
 start_tod      =  0
 irad           = -1
 wrtdia         = .true.
 mss_irt        =  0
 hist_dov2xy    = .true.
 hist_nhtfrq    =  3
 hist_mfilt     =  1
 hist_ndens     =  1
 hist_crtinic   = 'MONTHLY'
 brnch_retain_casename = .true.
 /
 &prof_inparm
 /

The above test shows different answers on restart on Linux with Lahey compiler.

===================================================================================

Bug Number: 512

mksurfdata does NOT work with PGI compiler!

mksurfdata does not work with the PGI compiler. A simple fix to the Makefile will allow this to work.
The following error occurs: (this is for the singlept test - but same problem happens with other namelists).

(GETFIL): attempting to find local file mksrf_lai.060929.nc
 (GETFIL): using /fs/cgd/csm/inputdata/lnd/clm2/rawdata/mksrf_lai.060929.nc
 read_domain read lon and lat dims
 read_domain initialized domain
 read_domain read LONGXY and LATIXY fields
 read_domain read EDGE\[NESW\]
 read_domain read LANDMASK
 read_domain compute lat\[ns\],lon\[we\] from edge\[nesw\]
 celledge, using celledge_regional
 read_domain compute cellarea with edge\[nesw\]
 cellarea, using cellarea_global
 AREAINI warning: conservation check not valid for
    input  grid of           720  x           360
    output grid of             1  x             1
 AREAINI warning: conservation check not valid for
    input  grid of           720  x           360
    output grid of             1  x             1
 areaini_pft error: w_ovr == nan!
FORTRAN STOP

To fix this add -Kieee to the Makefile for Linux.PGI compiler builds.

===================================================================================

Bug Number: 464

Problem with t42half case with PGI or Lahey compiler on Linux

t42half test using PGI fails with the following (it also fails with Lahey as well).
Here's the log

Successfully set up atmospheric grid
 (GETFIL): attempting to find local file 1997-12.nc
 (GETFIL): using
 /fs/cgd/csm/inputdata/lnd/clm2/NCEPDATA.Qian.T62.c051024/1997-12.nc
 nstep=             1  date=      19971231  sec=          1200
 ATMDRV: attempting to read data from
 /fs/cgd/csm/inputdata/lnd/clm2/NCEPDATA.Qian.T62.c051024/1997-12.nc
 clm_mapa2l subroutine
 clm_mapa2l mapping complete
 clm_mapa2l downscaling ON
 nstep =          1   TS = 0.268762129015005E+03
 nstep =          2   TS = 0.268874818048030E+03
 rtm delt update from/to    0.000000000000000         1200.000000000000
            0            2
 nstep =          3   TS = 0.268531388645297E+03
Killed
p3_17921:  p4_error: net_recv read:  probable EOF on socket: 1
p2_17905:  p4_error: net_recv read:  probable EOF on socket: 1
rm_l_3_17922: (177.468647) net_send: could not write to fd=5, errno = 32
rm_l_2_17906: (177.606602) net_send: could not write to fd=5, errno = 32
rm_l_1_18809: (177.747336) net_send: could not write to fd=6, errno = 9
rm_l_1_18809:  p4_error: net_send write: -1
    p4_error: latest msg from perror: Bad file descriptor
 HTAPES_WRAPUP: Creating history file ./clmrun.clm2.h0.1997-12-31-03600.nc
  at nstep =             3
 Opening netcdf htape ./clmrun.clm2.h0.1997-12-31-03600.nc
 HTAPE_CREATE: Successfully defined netcdf history file             1
p2_17905: (179.610358) net_send: could not write to fd=5, errno = 32
p3_17921: (179.475951) net_send: could not write to fd=5, errno = 32

Here's the namelist:

&clm_inparm
 caseid         = 'clmrun'
 ctitle         = 'clmrun'
 finidat        = ' '
 fsurdat        =
'/fs/cgd/csm/inputdata/lnd/clm2/surfdata/surfdata_360x720_070122.nc'
 flndtopo       =
'/fs/cgd/csm/inputdata/lnd/clm2/griddata/topodata_360x720_c060528.nc'
 fatmgrid       =
'/fs/cgd/csm/inputdata/lnd/clm2/griddata/griddata_64x128_060829.nc'
 fatmlndfrc     =
'/fs/cgd/csm/inputdata/lnd/clm2/griddata/fracdata_64x128_USGS_070110.nc'
 fatmtopo       =
'/fs/cgd/csm/inputdata/lnd/clm2/griddata/topodata_64x128_060829.nc'
 fpftcon        =
'/fs/cgd/csm/inputdata/lnd/clm2/pftdata/pft-physiology.c070207'
 frivinp_rtm    = "/fs/cgd/csm/inputdata/lnd/clm2/rtmdata/rdirc.05.061026"
 offline_atmdir = "/fs/cgd/csm/inputdata/lnd/clm2/NCEPDATA.Qian.T62.c051024"
 nrevsn         = ""
 nsrest         =  0
 nelapse        =  48
 dtime          =  1200
 rtm_nsteps     =  2
 start_ymd      =  19971231
 start_tod      =  0
 irad           = -1
 wrtdia         = .true.
 mss_irt        =  0
 hist_dov2xy    = .true.
 hist_nhtfrq    =  3
 hist_mfilt     =  1
 hist_ndens     =  1
 hist_crtinic   = 'MONTHLY'
 brnch_retain_casename = .true.
 /
 &prof_inparm
 /

And config line:

/fs/cgd/data0/erik/clm_trunk/test/system/../../bld/configure -spmd -debug -maxpft 4 -rtm on -dust on -voc on -s
===================================================================================

Bug Number: 452

Problem with support of number of soil-colors NOT equal to 8 or 20

The mksurfdata tools file mksoicol.F90 sets nsoicol to the max value found in
the input soilcolor file:

nsoicol = maxval(soil_color_i)

However, the code will fail if nsoicol does not equal 20 or 8 (which it might
in paleo cases). perhaps the code should be extended to handle a case where
nsoicol is not 20 or 8.

===================================================================================

Bug Number: 361

Problem with exact restarts on SGI

Restarts do NOT work correctly and do NOT give exact answers as a simulation that runs continusly.

002 er111 TER.sh 4p_vodsr_dh t31 10+38 ............................FAIL! rc= 11
003 br111 TBR.sh 4p_vodsr_dh t31 24+24 ............................FAIL! rc= 11
005 sm116 TSM.sh 4p_vodsr_o t31 48 ................................FAIL! rc= 4
007 er121 TER.sh 17p_vodsr_dh t31 10+38 ...........................FAIL! rc= 11
008 br121 TBR.sh 17p_vodsr_dh t31 24+24 ...........................FAIL! rc= 11
012 er211 TER.sh 17p_cnn_dh t31_cnall 10+38 .......................FAIL! rc= 11
013 br211 TBR.sh 17p_cnn_dh t31_cnall 24+24 .......................FAIL! rc= 11
016 er311 TER.sh 4p_casa_dh t31_casa 10+38 ........................FAIL! rc= 11
017 br311 TBR.sh 4p_casa_dh t31_casa 24+24 ........................FAIL! rc= 11
020 er411 TER.sh 10p_dgvm_dh t31_dgvm 10+38 .......................FAIL! rc= 11
021 br411 TBR.sh 10p_dgvm_dh t31_dgvm 24+24 .......................FAIL! rc= 11
===================================================================================

Bug Number: 449

Problem with create_crop_landunit

As reported to me by Bill Sacks at UW-Madison:
The crop((smile) array in pftvarcon needs to be initialized before surfrd is called
for create_crop_landunit to work properly, but crop((smile) is set in pftconrd,
which is called after surfrd. Thus, surfrd sees a crop((smile) array that is set to
0 for all PFTs, and so it doesn't put anything on the crop landunit. I have
done a simple work-around to solve this problem, making crop((smile) a parameter,
and thus initializing it at compile-time.

===================================================================================

Bug Number: 448

Problem compiling on Darwin with XLF compilers:

There's a problem running the Dependency generator on Darwin OS X 10.4.9 using
XLF compilers.
This is with cam and clm for clm3_expa_98 and as of cam3.4.10.
The file assert.H is mistakenly matched from mct/mpeu so that there is an
explcit dependence on assert.h for gptl.c.
You can get around this problem by editing Depends by hand and removing that
dependence. Or you can modify mkDepends to explicitly remove this file when
creating Depends.

===================================================================================

Bug Number: 402

Leap-year calendar does NOT work:

If you try to turn on a GREGORIAN calander in your namelist - it will NOT
work. You can get around this by modifying code to turn off the NO_LEAP_CALENDAR
or by linking with the full version of ESMF rather than the wrf time-manager
implimentation.
See required changes in next bug - 632.

===================================================================================

Bug Number: 632

Gregorian calendar does NOT work - even if you enable it with build-time & run-time changes

You need to make the following changes.

clm_time_manager.F90
Index: clm_time_manager.F90
\-\-\- clm_time_manager.F90        (revision 6824)
\+\+\+ clm_time_manager.F90        (working copy)
\@\@ -390,7 +390,7 \@\@
         cal = to_upper(calendar)
         if ( trim(cal) == 'NO_LEAP' ) then
            rst_caltype = noleap
\-        else if ( trim(cal) == 'ESMF_CAL_GREGORIAN' ) then
\+        else if ( trim(cal) == 'GREGORIAN' ) then
            rst_caltype = gregorian
         else
            write(iulog,*)sub,': unrecognized calendar specified: ',calendar
\@\@ -408,7 +408,7 \@\@
         if ( rst_caltype == noleap ) then
            calendar = 'NO_LEAP'
         else if ( rst_caltype == gregorian ) then
\-           calendar = 'ESMF_CAL_GREGORIAN'
\+           calendar = 'GREGORIAN'
         else
            write(iulog,*)sub,': unrecognized calendar type in restart file:
',rst_caltype
            call endrun
===================================================================================

Bug Number: 522

365-day calendar is hardcoded into several subroutines:

number of CLM routines use the hard-coded assumption that there are 365-days
per year rather than using the clock.
Examples include:

biogeochem/CNAllocationMod.F90:      ! ndays_active = 365.  This prevents the
continued storage of C and N.
biogeochem/CNPhenologyMod.F90:         ! of days active exceeds 365.
biogeochem/CNPhenologyMod.F90:         lgsf(p) =
max(min((days_active(p)-365._r8)/365._r8, 1._r8),0._r8)
biogeochem/STATICEcosysDynMod.F90:
(/31,28,31,30,31,30,31,31,30,31,30,31/) !days per month
main/atmdrvMod.F90:    integer :: ndaypm(12) =      &    !number of days per
month
main/ndepFileMod.F90:  real(r8), parameter :: days_per_year = 365._r8
main/ndepFileMod.F90:    wt1 = ((days_per_year + 1._r8) - cday)/days_per_year
main/pftdynMod.F90:  real(r8), parameter :: days_per_year = 365._r8
main/pftdynMod.F90:    wt1 = ((days_per_year + 1._r8) - cday)/days_per_year
main/pftdynMod.F90:
pptr%pepv%days_active(p) = 0._r8
===================================================================================

Bug number: 421

newcprnc can NOT be used to compare restart or initial-condition (IC) files:

Because restart and IC files do NOT contain a time-dimension the utility
newcprnc can NOT be used to compare them.

===================================================================================

Bug number: 447

Problem running on Cray-X1

Offline CLM will NOT run on Cray-X1 because of calls to shr_sys_flush(6) without
unit 6 being explicitly opened.

===================================================================================

Bug number: 444

Problem running with Pathscale:

Doesn't run on lightning with pathscale for some configurations. Does work with others. The specific configuration below does NOT work:
Here's the config line:

/fis/cgd/home/erik/clm_trunk/test/system/../../bld/configure -fc pathf90
- linker mpif90 -spmd -smp -debug -s

Here's the namelist:

&clm_inparm
 caseid         = 'clmrun'
 ctitle         = 'clmrun'
 finidat        = ' '
 fsurdat        =
"/fs/cgd/csm/inputdata/lnd/clm2/surfdata/surfdata_10x15_USGS_070307.nc"
 fatmgrid       =
"/fs/cgd/csm/inputdata/lnd/clm2/griddata/griddata_10x15_USGS_070110.nc"
 fatmlndfrc     =
"/fs/cgd/csm/inputdata/lnd/clm2/griddata/fracdata_10x15_USGS_070110.nc"
 fpftcon        =
"/fs/cgd/csm/inputdata/lnd/clm2/pftdata/pft-physiology.c070207"
 offline_atmdir =
"/fs/cgd/csm/inputdata/lnd/clm2/NCEPDATA.Qian-etal-JHM06.c051024"
 nrevsn         = ""
 nsrest         =  0
 nelapse        =  48
 dtime          =  3600
 start_ymd      =  19990115
 start_tod      =  0
 irad           = -1
 wrtdia         = .true.
 mss_irt        =  0
 hist_dov2xy    = .true.
 hist_nhtfrq    =  3
 hist_mfilt     =  1
 hist_ndens     =  1
 hist_crtinic   = 'NONE'
 brnch_retain_casename = .true.
 /
 &prof_inparm
 /

Here's the logfile:

tep =  3  for history tim
 e interval beginning at  0.E+0  and ending at  0.125
 HTAPES_WRAPUP: Closing local history file ./clmrun.clm2.h0.1999-01-15-10800.nc
at nstep =  3
 writing restart file ./clmrun.clm2.r.1999-01-15-10800.nc for model date =
1999-01-15-10800
 restFile_open: writing restart dataset at
./clmrun.clm2.r.1999-01-15-10800.ncat nstep =  3
 Successfully wrote local restart file ./clmrun.clm2.r.1999-01-15-10800.nc
 (OPNFIL): Successfully opened file /home/lightning/erik/lnd.clmrun.rpointer on
unit=  1
 Successfully wrote local restart pointer file
 Successfully wrote out restart data at nstep =  3
 (OPNFIL): Successfully opened file ./clmrun.clm2.r.1999-01-15-10800 on unit=
1
 Successfully wrote local restart file ./clmrun.clm2.r.1999-01-15-10800
 nstep=  4  date=  19990115  sec=  14400
 Error: Forcing height is below canopy height for pft index  905
 ENDRUN: called without a message string
 Error: Forcing height is below canopy height for pft index  851
 ENDRUN: called without a message string
\[7\] MPI Abort by user Aborting program !
\[7\] Aborting program!
\[7\] MPI Abort by user Aborting program !
\[7\] Aborting program!
\[7\] Aborting program!
\[7\] Aborting program!
 Error: Forcing height is below canopy height for pft index  632
 ENDRUN: called without a message string
 Error: Forcing height is below canopy height for pft index  688
 ENDRUN: called without a message string
\[5\] MPI Abort by user Aborting program !
\[5\] Aborting program!
\[5\] MPI Abort by user Aborting program !
\[5\] Aborting program!
\[5\] Aborting program!
\[5\] Aborting program!
 Error: Forcing height is below canopy height for pft index  337
 ENDRUN: called without a message string
 Error: Forcing height is below canopy height for pft index  277
 ENDRUN: called without a message string
\[2\] MPI Abort by user Aborting program !
\[2\] Aborting program!
\[2\] MPI Abort by user Aborting program !
\[2\] Aborting program!
\[2\] Aborting program!
\[2\] Aborting program!
===================================================================================

Bug number: 366

RTM log error checks:

Some of the clm-rtm mapping checks are obsolete, so new checks need to be
implemented (per T. Craig). One example is:

Attempting to initialize RTM
Columns in RTM = 720
Rows in RTM = 360
read river direction data
MAP_SETMAPSAR warning: masks/areas not conserved
global sum output area = 0.5100996991E+09
global sum input area =  0.2071386276E+09
===================================================================================

Bug number: 251

Potential floating point errors in src/biogeophys/SurfaceAlbedoMod.F90:

There is the potential for a floating point error to occur in TwoStream.F90 due
to taking the exponential of a large number.

s1 = exp(-h*vai(p))
          s2 = exp(-twostext(p)*vai(p))

A fix has been implemented by P. Thornton but the fix is encased in a CN cpp.
This fix should probably be implemented universally in the code but tested to
determine if there are any non-bfb changes in non-CN mode.
This is in SurfaceAlbedoMod.F90 in biogeophys

# if (defined CN)
          ! PET, 3/1/04: added this test to avoid floating point errors in
exp()
          t1 = min(h*vai(p), 40._r8)
          s1 = exp(-t1)
          t1 = min(twostext(p)*vai(p), 40._r8)
          s2 = exp(-t1)
# else
          s1 = exp(-h*vai(p))
          s2 = exp(-twostext(p)*vai(p))
# endif
===================================================================================

Bug number 620

Problem compiling with PGI5.1

This is a probem that Dagang Wang of Princton found.
I am also trying CLM3.5 with pgf compiler version 5.1. I got an error message
like this: PGF90-S-0155-Illegal component initialization in derived type
esmf_alarmint
(/home/dwang/CLM3.5/clm3.5/src/utils/esmf_wrf_timemgr/ESMF_AlarmMod.F90: 60.
The problem seems to be that the type definition sets a component of a derived
type (name) to a value (" "). This is apparantly F95 behavior and not valid for
F90.
I think you can fix this by just replacing

type ESMF_AlarmInt
        character(len=256) :: name = " "
        type(ESMF_TimeInterval) :: RingInterval
        type(ESMF_Time)  :: RingTime
        type(ESMF_Time)  :: PrevRingTime
        type(ESMF_Time)  :: StopTime
        integer :: ID
        integer :: AlarmMutex
        logical :: Ringing
        logical :: Enabled
        logical :: RingTimeSet
        logical :: RingIntervalSet
        logical :: StopTimeSet
      end type

with

type ESMF_AlarmInt
        character(len=256) :: name
        type(ESMF_TimeInterval) :: RingInterval
        type(ESMF_Time)  :: RingTime
        type(ESMF_Time)  :: PrevRingTime
        type(ESMF_Time)  :: StopTime
        integer :: ID
        integer :: AlarmMutex
        logical :: Ringing
        logical :: Enabled
        logical :: RingTimeSet
        logical :: RingIntervalSet
        logical :: StopTimeSet
      end type

I think this should work as long as a name is always assigned to an alarm.
Which is the case in our code.

===================================================================================

Bug number 652

Answers are scrambled using OpenMP and PGI6

This is a probem that Adam Schlosser of MIT found.
From Adam "I've been able to compile and run a single-threaded version of CLM3.5 on our
Linux-based Beowolf cluster (AMD Opteron 64-bit, compiled with PGF90). This
has
worked fine for our Integrated Global Systems Model (working on a zonal
framework), but we are beginning our 3D work (in collaboration with Linda
Mearns' group at ISSE), and in doing so I'm now working on 1x1 and 2x2.5 deg.
grids (admittedly, nice to be back on these!).
But I'm having trouble with getting CLM to run with OMP/SMP... which will be
somewhat of a necessity for testing here.
It all compiles just fine (setting the appropriate SMP compile options, etc.),
and it will also run (using the # of threads I set). However, the output is
strange. For a standalone run, using either the NCEP reanalysis forcing that's
provided in the CLM package or a 1x1 deg. data set I've been using, the
atmospheric forcing fields of shortwave, longwave, and precipitation viewed
from
the history (h0) files is scrambled.
This scrambling, however, is not seen in air temperature (TBOT), specific
humidity (QBOT), or the winds (WIND). That is, if you compare the outputs
between a single-threaded run and the multi-threaded run, they are identical...
but this is not the case for the radiation and precipitation inputs."
We've verified that we see this problem using PGI6.1.6 or version 6.2-3 on our Linux cluster - but did NOT find this problem on other platforms: Linux with Lahey or pathscale compilers, or the SGI, or AIX. On the Cray XT4 jaguarcnl we also verified correct behavior using PGI 7.0-7.

===================================================================================



  • No labels