Blog from October, 2013

new metcraxii config

Due to the change of T.30m.near to the nearup DSM, it was necessary to change configs.  We are now on metcraxii2.  All seems to be okay so far!

(Thanks, and sorry, Gordon.)

NEAR TRH work

Had noticed that 30m T was about 0.1C high and that Ifan has been bimodal between 32/52 mA.  Decided to replace with a spare (32->30).

After replacement, neither the old or new worked, even though the new tested okay when temporarily connected to the P.near port (about 1145??).  Deduced that the cable must have been damaged.

Since we only had one spare 5m cable (the others having been used to wire the motes at flr), we couldn't replace the entire 30m of cable.  Decided to recable this up to nearup.  From about 1430-1530 recovered a 5m length of cable from the bottom of near port 15, then having enough to run down from nearup.  Connected to port 11.  Changed the nearup config and determined that the replacement TRH was working. (ID 310).

Still need to change the configs on flux to use this new sensor placement.

Near mote work

At about 1230, removed near soil mote since we had seen intermittent data transmissions and I noted visually that the XBee radio was not completely seated on the mote board.  Brought it back to the trailer and fixed.  Replaced at about 14:20?

10/18/13 ~17:30Z

Near: Soil Mote xbee reset rate was still at xr=3600 (hourly).   I thought i'd changed all of them to xr=0 on 10/14, but the 'eeupdate' must not have been received by this mote.   The other 2 were at xr=0.   ID3(Soil) has been having outages of 1 hour and in the raw data files the comments show that every hour this mote was resetting the radio.   Clearly some of those reset were causing lost sync.   Meanwhile, the 'hb' command is still going at every 4 hours.   This is a bit upside down in a sense: with xr=0 the 'hb' won't matter.   Otherwise if we retain the 'hb' every 4 hours the xr should probably be xr=21600 (6hours).   For now since they seem to be working with the resets turned off, let's leave them that way.

Far: Both motes have been working well.   xr=7200 (2hours) and in the comments the radios are indeed being reset every 2 hours.   Apparently these resets aren't causing lost sync.   However, the 'hb' command is not only set at 4 hours (ie should be less than xr rate), but it also is being sent to the wrong serial port ttyS7 which is the power mote, not ttyS2 which is the soil/rad port.   Since they have been working well I changed the xbee reset rate to xr=0 disabling it.

glitch on kh2o far

Yesterday, we see an odd event on kh2o.3m.far from 12:28-12:45, with 2 large level shifts and resultant high variance.  Our only explanation is that something (a bug) partially blocked the path (twice).  Plotting with RH, this definitely was not a humidity event.

Daily status, October 18

10/18/13
Summary:
Fixed Verizon modem (CradlePoint) yesterday morning.
Tsoil.4.4cm.near flat-lined

T/RH: RH.40m.near and RH.35m.rim perhaps a bit high. Ifan suspicious at near.30m and T a bit high both day and night.
P: P.nne looks okay, although SPO noticed sporadic data in the raw time series yesterday
csat u,v: ok
csat ldiag: ok
csat w, tc: ok
kh2o: ok
motes: FLR OK!
FAR ok NEAR soil out 10/17 03:00-04:00, 09:00-10:00, 14:00-15:00, 20:00-21:00; 10/18 05:00-06:00
Wetness: ok
radiation: ok
Tsoil: Tsoil.4.4cm.near flat-lined at 327.7 degC after around 11:00 10/17
Gsoil: ok
Qsoil: ok
Cvsoil: ok
2D sonic: ok

near rads cleaned

Went to near to get some insight (pun intended) into why Rsw.in and Rsw.global.in don't agree nearly as well as in SOAS.  Didn't solve this problem, but did find that the shield on Rlw.in was cocked -- perhaps even in the field of view of the sensor -- I took a photo.  Tom thinks he may have loosened it when he last changed the dessicant (more than a week ago), though he didn't notice it being off when he cleaned 2 days ago.  The logical conclusion is that it was loose and recent winds skewed it more (though winds haven't been <that> high in the last 2 days...)

Also decided to clean the radiometers while I was there, since I could imagine a film of dust on the SPN1, though the tissue didn't come away dirty.  There were spider webs around.  At most, the change of either Rsw.in or Rsw.global was 3 W/m2 -- not enough to explain the large differences we've seen.

Cleaning done about 15:30, though I was standing near the radiometers starting about 15:15.

external network outage

The network to the outside world wasn't working from about 0800-1200 this morning.  The LAN was all fine, so we didn't lose any data.  Somehow the cradlepoint on the top of rim died.  Power cycling from rimup with vio 7 0; vio 7 1 brought it back.  The "router_check" script on rimup should have brought it back up, but apparently ran once at 0800 and not again.  Gordon is investigating why this didn't work.

None of our data were affected by this issue.  Hopefully it was late enough that it didn't impact the just-finished IOP operations.

Comment by Gordon: The above was to due to a bug in the crontab entry that checks the internet connection:

*/20 * * * *   net_check.sh eth0 192.168.0 192.168.0.5 && router_check.sh 7 www.google.com

For some reason it appears that the ethernet interface on the router died this morning, such that it didn't respond to pings from the DSM. The above crontab entry does not power cycle the router if the DSM can't ping it. The idea is not to power cycle the router and modems if the problem is at our end.

Changed it to the following, which will do a net_check.sh and router_check.sh every 10 minutes:

*/10 * * * *   net_check.sh eth0 192.168.0 192.168.0.5; router_check.sh 7 www.google.com
P.nne down yesterday

10/17, twh

I am concerned that P.nne.flr went down yesterday from 9:45 until 12:10.

SPO: 10/17 PM

We're noticing that data from this Bluetooth mote is unique in having more jitter in data reporting timing.  All of the other Bluetooth motes have an average sample rate of 1.00 s, with a min/max dt of about 0.9/1.1s.  P.nne is coming in with the same average (1.00), but min/max dt are 0.6/1.5s.  This indicates that its radio isn't performing as well?  Could this be suggestive of a problem that could cause an outage?

Daily status, October 17

10/17/13

Summary:
Hardwired floor radiation and soil to the DSM yesterday.
IOP 3 last night.

T/RH: RH.40m.near and RH.35m.rim perhaps a bit high.
         Ifan suspicious at near.30m and T a bit high both day and night.
P: P.nne down 10/16 09:45 - 12:10
csat u,v: ok
csat ldiag: ok
csat w, tc: ok
fluxes: ok
kh2o: FLR variance high at night, but also tc'tc'
motes: FLR OK! after 14:00 10/16
           FAR soil mote stopped 10/16 13:45 - 15:40.
           NEAR ok
Wetness: ok
radiation: ok
Tsoil: Tsoil.2.5cm.flr & far (linear avg) do not fit profile
Gsoil: ok
Qsoil: ok
Cvsoil: ok
2D sonic: ok

flr network outages

Dave reports several outages in the crater during the IOP tonight.  The following script reports eantf at 54dB (good?), and says it has been up for 12 hours.  On the other hand, far has only been up 3 min and near only 2 hours, even though their signal levels are now 37 and 39 dB, respectively (okay values).

As expected, data_stats shows that the raw_data archive is fine at flr.

On a subsequent run of check_ap24, eanth@sodar came back, with a signal level of 49dB.

All of the above comments are consistent with Gordon's message early in the project.

[aster@flux ~]$ check_ap24.sh

local  remote      (         mac-addr)    uptime  ccq  txrate  rxrate  rxsig  snr #txpkts  #rxpkts  txretry  rxretry txB/sec rxB/sec

ap24   ap24-3@near (00:15:6D:20:01:90)   1:56:06  98% 54.0Mbs 48.0Mbs -61dBm 39dB 9851146 9320039       0%       0%  103070  222934

ap24   ap24-2@rim  (00:15:6D:10:1C:0D)  92:33:31  98% 48.0Mbs 54.0Mbs -60dBm 36dB 9528375 9851681       0%       0%    4445   12784

ap24   eanti@far   (00:20:F6:05:24:56)   0:03:14 100% 11.0Mbs  2.0Mbs -63dBm 37dB     719     723       0%       0%     223    1186

awk: cmd. line:95: (FILENAME=- FNR=51) fatal: division by zero attempted

ap24-2             (00:15:6D:10:29:B2)  92:33:34 100% 54.0Mbs 48.0Mbs -51dBm 44dB 9929362 9450666       0%       0%   12784    4445

ap24-2 eantf@flr   (00:20:F6:05:24:5A)  12:07:31 100% 11.0Mbs 11.0Mbs -43dBm 54dB  982727  898352       0%       0%   17755    8838

awk: cmd. line:93: (FILENAME=- FNR=39) fatal: division by zero attempted

ap24-3 ap24@base   (00:15:6D:10:3C:BC)   1:56:09  97% 54.0Mbs 54.0Mbs -63dBm 32dB 9449388 9721786       0%       0%  222842  103027

10/16/13 ~21:50Z

Changed the Power Mote ID on 'up' from ID10 to ID110. (rs18)

This was done because there is the spn1 mote that also has ID10 on this station and this might make things a bit more clear

In prep for the swap from Xbee to serial operation, I've now set pp=0 on all motes (ids 1,2,17).  I didn't "reboot" the motes, in order to get the maximum amount of data before the swap.  When Sebastian and Eric swap the cables shortly, the motes obviously will reboot.

The plan is for them to disconnect the receive mote on port 2, then plug cables from each of the 3 motes to ports 2, 9, 10.  Gordon updated the config earlier this morning in anticipation of this change.

14:00 The swap is done.  All data coming in as expected!  Thanks Sebastian and Eric!

10/16/13 ~18:30Z

Flr,Near,Far; Changed all xbee status message rates from 1/2hourly (sx=360 at 5sec datarate) to 0 to disable them.

The reason is because the flux computer 'wisardMessageDecoder' of raw-files looking for mote COMMENTS was not showing any of these type messages coming in which is abnormal.   The procedure does work because I tried it via operator command (xs) to get mote responses at flr:

ID1:\0x01\0x02Xbee:  CH=15 ID=6 DL=40625DF1 SP=3E8 ST=7D0 SM=0 SO=0 NP=49 PL=4 U\0x03\0x04\r

ID2:\0x01\0x02Xbee:  CH=15 ID=6 DL=40625DF1 SP=3E8 ST=7D0 SM=0 SO=0 NP=49 PL=3 Q\0x03\0x04\r

ID17:\0x01\0x02Xbee:  CH=15 ID=6 DL=40625DF1 SP=3E8 ST=7D0 SM=0 SO=0 NP=49 PL=4 m\0x03\0x04\r

However, we have been having outages especially at flr and it is conceivable the mote's attempt to grab status values from the xbee was causing problems.   The motes perform this task by switching the xbee radios into 'command-mode' and then issue specific commands to the xbee for grabbing these parameters.   Timing is important, so if the radios were to remain in command mode even though the mote issued the 'go back to data mode' then they would be unable to send any data messages sent.   Testing in Boulder in the past showed that the method was working, but perhaps in a noisy rf environment the interaction becomes more dicey.   Timing also relies upon the 'guard-time' needed for the xbee i/o, the mote rtcc interrupts, etc.   Maybe this will help eliminate radio outages.

I tweaked the sonic azimuths on the profile towers and Gordon reran covars.

Then I tweaked the sonic tilt angles for the period 9/29 00:00 to 10/15 00:00.

Finally, I reran the covers again this morning.