CGD's data reconstruction effort - Adam Phillips (9/10/08)

In August 2008 CGD started a project to identify and correct errors in the post-processed daily/monthly Columbia run data.
The corrected daily data is located on hurricane: /datalocal/nrcm-cgd/asphilli/Columbia-daily
The corrected monthly data is also located on hurricane: /datalocal/nrcm-cgd/asphilli/Columbia-monthly

 There are a few caveats to keep in mind when using the corrected data:

  1. The data reconstruction effort detailed below corrected the most obvious errors in the post-processed monthly/daily data. There is no way to be sure that all errors have been eliminated from every field. 
  2. When analyzing the CTTOP, CPTOP, and CTAUCLD variables, keep in mind that when there is no cloud, the fields will have -9999 (=_FillValue/missing_value). This effect is highly compounded when averaging over an entire month. It is recommended that if you wish to analyze these fields that you do not use the monthly data. There are also three dates that should be avoided when looking at the daily data for these variables, 2003-11-06, 2005-03-25, and 2005-03-26, as the data on these dates is highly suspect. For more information about what additional corrections had to be taken for these variables please see below.
  3. The Time variable has been renamed to time for the monthly data, in accordance with netCDF conventions.

Data Reconstruction Details

Errors in the original post-processed daily data were initially noticed when looking at global average timeseries. Spikes in the timeseries were evident across most daily variables. In some cases the spikes were small, while in others the values approached 1e20. To automatically identify the dates where there were problems, a two step script was developed. The first step involved identifying dates where the global average timeseries value was 5-10 times greater (or less) than the timeseries mean. The second step involved identifying dates where the derivative of the timeseries exceeded a realistic value. These two tests yielded a list of 69 dates that had outliers in one of more variables:

2000-06-28, 2000-08-06, 2000-08-31, 2000-09-30, 2001-01-31, 2001-03-31, 2001-10-24, 2003-04-24, 2003-09-08, 2003-11-04, 2003-11-05, 2003-11-25, 2003-12-09, 2003-12-14, 2004-01-14, 2004-02-11, 2004-02-28, 2004-03-22, 2004-03-31, 2004-04-23, 2004-05-01, 2004-05-24, 2004-06-01, 2004-06-21, 2004-06-26, 2004-07-17, 2004-07-31, 2004-08-21, 2004-08-31, 2004-09-16, 2004-09-22, 2004-09-23, 2004-09-30, 2004-10-23, 2004-10-31, 2004-11-19, 2004-12-12, 2005-01-03, 2005-01-09, 2005-01-19, 2005-02-11, 2005-03-02, 2005-03-09, 2005-03-25, 2005-04-01, 2005-04-16, 2005-05-01, 2005-05-15, 2005-05-23, 2005-05-24, 2005-05-26, 2005-05-29, 2005-06-01, 2005-06-24, 2005-07-16, 2005-07-19, 2005-08-01, 2005-08-05, 2005-08-10, 2005-08-23, 2005-08-31, 2005-09-20, 2005-09-30, 2005-10-23, 2005-11-04, 2005-11-17, 2005-11-30, 2005-12-12, 2005-12-13

The list of days includes all of the days listed by Cindy below, along with an additional 6 days. Note that errors were seen across many variables; they were not confined solely to the accumulated fluxes. Spatial plots of the identified days were examined, and the errors ranged from being extremely subtle to grids that were totally incorrect.  Due to the fact that it was not possible to ensure that every variable for a particular date was an outlier or not, it was determined that all variables would be replaced for the 69 dates through standard linear interpolation.

Other steps that were taken:


Report on data with problems - Cindy Bruyere

Please send any reports concerning problems with data on mass store to bruyerec@ucar.edu