Blog

wvally4 site visit

16 Jan: Militzer, Oncley

Purpose: TRH died at 2pm yesterday.  Came back to life early this AM, but we don't believe it will stay up.  Might be possible to despike data from this period (0500-0800).

0815: removed probe 19; fan ok

0821: installed 25.   DIN connector had corrosion on it.   Cleaned with ContactRenu as best possible (seems ok) and put some grease on probe pins.   Also put nail polish on sht pins where attached to board.

0823-0832: Rad: Rlw.in,Rsw.in,Rlw.out dessiccant all shot; replaced.   Cleaned water off all  domes.

Conditions: Above freezing with snow cover on ground.   Some tufts of grass beginning to pop through snow.    Rain gauge has drained some fluid since our last visit.

Even though krypton is reading 0 (presumably water on lens), decided not to klean since more rain forecast for today and was reading 1.4V yesterday.

TRH at wvally4 down

TRH at wvally4 went nuts at about 13:45 MST, Jan 15.

rserial shows:

rs 5
TRH19 542.40 -1735.03 -99 -99\r\n
TRH19 542.40 -1735.03 -99 -99\r\n

Powering it down and up didn't help, it still showed -99 for the raw values.

eio 5 0
eio 5 1

Plots show a little bit of good data from about 21:50 MST to 22:00, then bad again.

isfs6_20110113_000000.dat is corrupt for some reason.

When one does an rsync, the following are reported on isfs6:/var/log/isfs/messages

Jan 15 23:15:55 isfs6 rsyncd[854]: rsync: read errors mapping "PCAPS/raw_data/isfs6_20110113_000000.dat" (in data): Input/output error (5) 

The rsync also results in a bunch of these errors in /var/log/isfs/kernel:

Jan 15 23:12:15 isfs6 kernel: attempt to access beyond end of device
Jan 15 23:12:15 isfs6 kernel: sda1: rw=0, want=78103088, limit=15825499

So that it doesn't get in the way of successive rsyncs, I moved the file to isfs6:/media/usbdisk/projects/PCAPS/bad

data_stats on the rsync'd file reports many bad samples, with incorrect types, ids, length, times, etc.

I was able to clear our most of the junk with these commands on the file at EOL:

sensor_extract -s 6,30 -s 6,100 -s 6,20 -s 6,50 -s 6,0x8000 -s 6,40 -s 6,80 -l 0 isfs6.dat isfs6_20110113_000000.dat 
nidsmerge -i isfs6.dat -o isfs6x.dat -s "2011 01 13 00:00"  -e "2011 01 13 12:00"
sensor_extract -s 6,30 -s 6,100 -s 6,20 -s 6,50 -s 6,0x8000 -s 6,40 -s 6,80 -l 0 isfs6xx.dat isfs6x.dat
bzip2 /scr/tmp/maclean/isfs6xx.dat -c > isfs6_20110113_000000.dat.bz2 

The real-time file for isfs6 for that day looks quite complete, so I don't think we have lost much, if anything.

We'll see if we have any other problems on the usb flash drive at wslope6. I plan to leave that file where it is in case it is covering a bad address on the flash, to avoid writing to that address again.

The glitch doesn't seem to have anything to do with a sudden power down, wslope6 has been up for 23 days.

Rsw

Comparing Rsw last night from the Kipp&Zonen, Quantum, and SPN-1 at 7, I notice that the Quantum reads very close to zero, the K&Z read an average of -1.5 W/m^2 (range -3 to 0), and the SPN-1 an average of +4 W/m^2 (range +2 to +7).  Apparently, the nighttime error is proportional to the cost of the sensor??

Daily status 15 Jan

Daily status 15 Jan 2011
On site: Oncley, Militzer

No precip last night.  Mid-level overcast.  Noticeably warmer.

Summary:
    - River7 Temperature/RH sensor had been down since 10AM 13 Jan.  Fixed 5PM 14 Jan.  We feel really bad that this was bad for so long.  Our only "excuse" is that the grey color used for River7 caused it to hide on the black axes of the WWW plots.  Also, since it reported (bad) values, cockpit didn't show it as missing...
    - Eslope5 Radiometers were down from 9AM - 2PM 14 Jan.  The radiometer XBee radio had to be reset. 
    - Only Playa1 and ABC2 kryptons read low values last night -- we'll let them recover on their own.
    - Warmer temperatures may give another opportunity to take soil cores.  We'll wait another day or two.

Vdsm: ok, lowest values still at 7.   Min this AM: 12.2-12.5V.
Vmote.rad: ok.  Min this AM: 12.5-12.8V
Vmote.soil(aux): ok. Min this AM: 12.3-12.9.  lowest is eslope main

P: ok 855 to 885 mb, falling.
T: ok -8 to +4 degC, Playa1 lower than the rest
RH: ok 75-101% overnight.  Playa1 highest (note the 101%).
Rainr (3,4,6):  ok. None.
Spd (1,6): ok.
Dir (1,6): ok.

csat diag: ok. Some issue at river7 at midnight.
samples.sonic: ok
spd: ok, River7 has strong drainage not seen at other sites (but consistent with other MesoWest stations)
dir: ok, most stations south, playa1 and wslope4 show some northerly intrusions
u'u': ok, max 1.0 m2/s2 higher variations 6&7
v'v': ok, max 1.2 m2/s2
w'w': ok, max 0.5 m2/s2
u*: ok, max 0.4 m/s
sigma_w/u*: 1&6 have the same excursions even below 1 (perhaps real?)

tc: ok
tc'tc': ok max 0.6 degC; large values at 5,6,7
w'tc': ok, -0.03 to 0.06 m/s degC; large overnight negative values at 5&6.

kh2oV:  max 1.0--2.1 yesterday.  1&2 bad overnight.  Lowest now is 0.3V@2
kh2o'kh2o': daytime very small
w'kh2o': max < 0.01 during day

Rsw.in: ok, max 450-650 W/m2 yesterday.
Rsw.out: ok, max 315-450 W/m2 yesterday.
albedo: ok, 0.7-0.8 yesterday. 6 had decreasing values during day, but back now.  All may get lower if warming continues.
Rsw.dfs (1,7): max 150&250 W/m2 yesterday.  River SPN-1 somewhat higher
Rsw.global (1,7): generally within 10 W/m2 of Rsw.in values -- also true for SPN-1.

Rlw.in: ok, 240-320 W/m2.  Coherent pattern across the network
Rpile.in: ok, -150 to 0 W/m2
Tcase.in: ok, -5 to +14 degC
Tdome.in (1,2,5): ok

Rlw.out: ok, 270-320 W/m2
Rpile.out: ok, -100 to +5 W/m2
Tcase.out: ok
Tdome.out (1,2,5): ok

soil.aux at E & W Slope (5,6)
Tsoil: ok. concave profile at 2.  Frozen (only) at 2, 7, some 5 and 6
Gsoil: ok.
Qsoil: ok. 2,5,6 partly frozen (consistent with Tsoil)
Cvsoil: ok

River7 site visit

14 Jan: Oncley & Militzer

Purpose: Fix TRH which died yesterday about 10AM (and no one caught :( )

1728: remove housing and old sensor (#17)

1730: NESW photos taken

1733: install new sensor (#28) using grease on contacts

1734: housing back on

Fan sounds fine during this process

eslope5 site visit

14 Jan: Militzer & Oncley

Purpose: fix rad.mote which died at 8:30 this morning (with Vmote = 12.5V)

(Huge funeral going on)

1415: Mote blinking normally; voltage okay; correct ?ID; not being receieved.  Cycled power, which reset Xbee and all now okay.  Local data store probably is okay. Would base mote reset (initiate with "eio 6 0/1") have solved this problem?

1420: Took NESW photos

1424: wipe off radiometers (just a few drops on uplooking domes); .outs dessicant looked good -- didn't check .ins. (just replaced 2-3 days ago)

kv = 2.0, so just fine.

TRH fan seems happy.

hiland3 site visit

14 Jan: Militer, Oncley

Purpose: radiometers were covered earlier, want to check that all is well; also check ETI

Excellent snowball snow (unconfirmed) -- grass clear of snow for part of walk to station.

1346: ETI gauge all liquid; no fluid needed or added

1348-1350: A few drops of water on Rsw.in and water in Rlw.out shield, but otherwise fine.  Wipe-off radiometers with residual water.  All dessicants nice and blue.

Daily status 14 Jan

Daily status 14 Jan 2011
On site: Militzer, Oncley (back again)

Another snow last night and low overcast in morning.

Summary:
    - Playa1 developed a memory issue yesterday and crashed at 4pm.  Rebooted this morning, but there are no data
       from last night.
    - "flux" computer in base was sluggish -- dsm_server died yesterday about 2pm.  Rebooted and servers restarted.
    - All kryptons read low values last night (must be because Oncley is now here).
    - 3Batteries Stn1,3,5,6;  Insulation added to all.

Vdsm: ok, lowest values still at 7.   Overall 12.2-12.4V.
Vmote.rad: ok.
Vmote.soil(aux): ok. lowest is eslope main

Current Conditions:

P: ok 845 to 885 mb, rising.
T: ok -4 to +1 degC
RH: Currently 60-100%. All >90% last night.
Rainr (3,4,6):  Totals 2.8, 1.8, 1.0mm, respectively, last night
Spd (1,6): ok.
Dir (1,6): ok.

csat diag: ok (slightly surprising, given snow, but playa1 sonic was clear of snow this morning)
samples.sonic: ok
spd: ok, Generally <2 m/s overnight.
dir: ok, most stations west
u'u': ok, max 1.0 m2/s2 higher variations 5&6
v'v': ok, max 1.0 m2/s2
w'w': ok, max 0.4 m2/s2
u*: ok, max 0.5 m/s
sigma_w/u*: 6&7 have the same excursions even below 1 (perhaps real?)

tc: ok, -4 to +2 degC.
tc'tc': ok max 2.2 degC; large values at 5&6
w'tc': ok, -0.06 to 0.06 m/s degC; large overnight negative values at 5&6.

kh2oV:  max 1.6--2.8 yesterday.  All bad overnight.  Lowest now is 0.3V@3
kh2o'kh2o': daytime very small, <.01.
w'kh2o': 0 to 0.010 during day

all radiometers could have snow (except 1 that we cleaned this morning)

Rsw.in: ok, max 350-650 W/m2 yesterday.  Now 100-400.
Rsw.out: ok, max 250-450 W/m2 yesterday.
albedo: ok, 0.7-1.0 yesterday.
Rsw.dfs (1,7): max 200-350 W/m2 yesterday.  River SPN-1 pretty close
Rsw.global (1,7): generally agrees with Rsw.in values

Rlw.in: ok, 230-320 W/m2
Rpile.in: ok, -150 to 0 W/m2
Tcase.in: ok, -5 to +14 degC
Tdome.in (1,2,5): ok, -5  to +14 degC

Rlw.out: ok, 285-320 W/m2
Rpile.out: ok, -80 to +5 W/m2
Tcase.out: ok, -5 to +14 degC
Tdome.out (1,2,5): ok, -5 to +13 degC

soil.aux at E & W Slope (5,6)
Tsoil: ok. concave profile at 2.  Frozen (only) at 2, 7, some 5 and 6
Gsoil: ok.
Qsoil: ok. 2,5,6 partly frozen (consistent with Tsoil)
Cvsoil: ok

playa1 data system hung on about 23z, Jan 13. When John/Steve connected to the console port the next morning it was reporting a lack of memory:

ntpd: page allocation failure. order:0, mode:0x20

This is a rare situation, that we have not seen previously at PCAPS. So far the stations have only died due to low power. river7 has been up continuously for 63 days.

To monitor memory usage on a station, the free command is useful:

ssh isfs1 free -m
             total       used       free     shared    buffers     cached
Mem:            61         59          1          0          7         43
-/+ buffers/cache:          8         52
Swap:            0          0          0

On the first line, the free value of 1 MByte out of a total of 64 looks alarming, but actually linux is using the free memory for buffers and cache, and is supposed to give it up to processes if they need it. The second line shows the free memory if you remove what is being used by buffers and cache, which is a healthy value of 52 MByte in this example.

My wild guess is that in rare circumstances, the kernel exhibits a bug where it doesn't free up the buffer/cache memory for use by processes. Time to upgrade the kernels from 2.6.16...

Or it could be due to a sensor input going bananas. I'll look into that.

Playa1 site visit

14 Jan: Militzer, Oncley

Reason: 3G has been down since yesterday afternoon.

0834: Jiggle router

0835: brush off solar panels

0836: brush off p port and TRH

0838-0840: brush off rads -- not much encrustation, just about a little bit of snow on 30% of uplooking domes/rest clear;  cleaned tops with methanol; dessicant on downlookers fine, uplookers about 1/2.

0840-0905: diagnosing station problem--viper was down since 4pm yesterday and lost data.  Now is working again.

P.S. from Gordon: "It somehow ate all its memory. I'll watch it and see if I can find the culprit."

0916: klean krypton (with new M1 methanol) 0.6V -> 1.9V

Site Visit Riverton7

13Jan11 15:15-17:00 local

conditions: overcast, very light winds

Reason: rlw.in down beginning at ~17z this morning

~21:55z back up

Desiccant shot on both rlw.in and rsw.in.   renewed both.

rlw.in =0x65, sn2

Had to dissect sensor and test w/serial mote in vehicle.  Probable cause may be the tiny 2pin,1mm connectors inside the case, in particular the power one, had some corrosion from water vaper.   After contact renew on i2c main connector and fiddling with those inside it resurrected.   Could also be wire problem with those also.    Keep an eye on this one.

router_check.sh script

Jan 13, 12:26 MST

Installed a new version of the router_check.sh script at eslope5.

It does a dns lookup on the destination hosts that are passed to the
script to be sure that the router's DNS isn't returning an address
of 192.168.0.1 for those hosts.

The hope is that the script will be able catch the problem at eslope5 last night,
and power cycle the router.

I'll watch it for a day before installing it on the other stations.

Later: Jan 13, 17:28 MST, installed new router_check.sh script on stations 2-7.
playa1 has not been accessible since roughly 16:00.

site eslope5 visit

13 Jan 2010 on site: John and Ling

840am - 910

850am: fix the no-realtime-communication problem:

         open the dsm box, router power on, wan on, usb off

         reset usb modem, it works.

         ssh to isfs5, checked ds and lsu

Rsw.in small amount of frost cleaned from dome as sun came over horizon

site playa1 visit

12 Jan 2011 1600-1630 on site John and Ling

1605 installed insulation and added a third battery. One of battery terminal lugs was lose when we arrived.  That may have had some effect on charging but minimal if any.    The TRH fan definitely 'noticed' the voltage swings as we paralleled in a 3rd battery onto the one that had the lose terminal.

Notes:

1> TRH fan is noisy and may need to be replaced.   It did sound more steady after adding the 3rd battery.

2> 2 Batteries' caps were/are missing.