Gordon, Sep 30, 12:15
Built a new kernel for titans, with a two-line fix that I hope will fix the issue where pc104 interrupts cease to be serviced.
Installed it on Aph3 from Boulder, and rebooted it. It came up and data from the emerald serial ports 5 (TRH24), 6 (TRH51) ,7 (Handar) (as well as the sonic on 2) is coming in. Previously the issue was seen within an hour or two. Wait and see...
Didn't work. Emerald data quit after about 1/2 hour. So, then made another small change to the interrupt handler, installed the new kernel and rebooted Aph3 around 12:58 pm. It's been up for about 6 hours and still getting Emerald data, which is longer than it had ever run before, so things might be fixed.
Note that the barometer on port 1 is not reporting, needs a cable.
Gordon, Oct 2
The above attempts didn't fix the problem. Emerald data still died after several hours. It took several more iterations, but I think it has finally been fixed.
Gordon, Sep 29
One of the Ethernats being used by OSU was knocked out by the lightning strike. Chris says the power light does not come on, and the status light on the injector goes out when the Etherant is plugged in.
We have pulled EantA from the main tower. Its status is unknown, likely dead.
By my record. OSU was using EantD, 00:20:F6:05:16:CD at the Sodar, and EantJ 00:20:F6:05:24:4F at the tower.
I presume it was the one at the OSU tower that died, but as now, Sep 29, these Etherants are reporting:
interface wireless registration-table print # INTERFACE RADIO-NAME MAC-ADDRESS AP SIGNAL... TX-RATE UPTIME 0 wlan1-Int-Ant 00:20:F6:05:1E:D5 no -26dBm... 11Mbps 4d23h2m11s 1 wlan1-Int-Ant 00:20:F6:05:24:85 no -32dBm... 11Mbps 1d3h51m6s 2 wlan1-Int-Ant 00:20:F6:05:24:4F no -95dBm... 11Mbps 1h33m13s 3 wlan1-Int-Ant 00:20:F6:05:24:56 no -95dBm... 11Mbps 3m4s
Unfortunately the AP24 doesn't show the radio names. From my earlier post I thought EantJ was on the OSU tower, but I guess not, since EantJ is currently reporting, so it must have been on the Sodar at that time. From the registration table printout above, here is what is currently reporting:
MAC |
name |
site |
IP |
---|---|---|---|
1E:D5 |
EantE |
ISFS C |
192.168.0.137 |
24:85 |
EantH |
ISS Sodar |
192.168.0.140 |
24:56 |
EantI |
OSU Tower? |
192.168.0.141 |
24:4F |
EantJ |
OSU Sodar? |
192.168.0.142 |
Gordon, Sep 29.
The DSMs on the main tower stopped reporting on Sep 27, 21:03:05 UTC (15:03:05 MDT). This was probably the moment just before the lightning strike.
Two of the sensors at C also ceased reporting at this time:
data_stats isfs_20120928_000000.dat.bz2 C20:/var/tmp/gps_pty0 20 30 28952 2012 09 27 20:00:00.613 09 27 23:59:59.735 2.01 0.085 6.849 51 73 C20:/dev/ttyS1 20 50 14341 2012 09 27 20:00:00.446 09 27 23:59:59.296 1.00 0.842 7.001 20 20 C20:/dev/ttyS2 20 60 11172 2012 09 27 20:00:00.646 09 27 23:59:59.726 0.78 0.949 6.421 36 36 C20:/dev/ttyS5 20 100 286914 2012 09 27 20:00:00.020 09 27 23:59:59.940 19.92 0.001 5.850 12 12 C20:/dev/ttyS6 20 110 11171 2012 09 27 20:00:00.306 09 27 23:59:59.826 0.78 0.917 7.211 38 38 C20:/dev/ttyS7 20 150 2925 2012 09 27 20:00:00.306 09 27 21:03:05.006 0.77 1.278 2.581 37 37 C20:/dev/ttyS8 20 200 286910 2012 09 27 20:00:00.021 09 27 23:59:59.900 19.92 0.003 5.850 12 12 C20:/dev/ttyS9 20 250 11176 2012 09 27 20:00:01.126 09 27 23:59:58.626 0.78 1.073 6.722 36 38 C20:/dev/ttyS10 20 300 3772 2012 09 27 20:00:00.986 09 27 21:03:04.576 1.00 0.952 2.023 20 20
Id 20,150 is the 1.5m TRH. 20,300 is the Vaisala PTB.
The 1.5m TRH (TRH34) resumed reporting 7.5 hours later on Sep 28 04:29 UTC (Sep 27 22:29 MDT). I believe this was a miraculous resurrection , with no human intervention.
This morning three TRHs are reporting at C:
data_dump -i 20,-1 | fgrep TRH 2012 09 29 16:02:03.8064 0.01448 20, 110 35 TRH19 18.57 58.28 0 0 1460 114 0\r\n 2012 09 29 16:02:04.1965 0.005246 20, 250 38 TRH28 17.02 63.57 33 0 1420 125 104\r\n 2012 09 29 16:02:04.2564 0.01515 20, 150 35 TRH34 18.84 56.44 0 0 1468 110 0\r\n
These are the 1m, 1.5 and 2.5 TRHs. The 0.5m is not reporting. From a run of data_stats on the C20_20120928_120000.dat archive file it quit on Sep 28, 14:54 MDT.
Sept 29
A lightning strike on the afternoon of Sept 27 knocked out power to the base trailer and damaged equipment on the 20m main tower.
In the real-time archive, the DSMs at the main tower ceased reporting on Sep 27 21:03:05 UTC (15:03:05 MDT).
Station 12 also quit reporting at that time. A TRH and barometer at C20 and a TRH at Ah6 also quit at that time.
Kurt restored power mid-morning Sept 28 and we assessed damage to the main tower. Our best guess is that the lightning struck near the base trailer and a current surge flowed down the power cable to DSM M21, trashing the charge controller, the PC104 stack, and the power interface panel. Many of the sensors associated with M21 were found inoperative, but M21 appears to have acted as an expensive 'fuse', limiting damage to M22 and its associated sensors. Gordon wrote:
"The most 'striking' damage was in the battery box at the main NCAR tower, the power interface panel on the lower DSM at the tower, and to systems at the OSU tower. In general, the sensors and DSM higher on the main NCAR tower suffered less damage.
"The breaker in the conference center for power to the trailer was tripped, as was the breaker in the trailer which (I believe) is the circuit for power to the transformers at 'C' and 'M'. The breaker that was tripped in the trailer is the lowest one on the panel, and is labelled 'class' Kurt saw burn marks on one lug of a power cable connector at the trailer. I believe he said it was the socket at the end of the power cable from the conference center.
"All sensors except for one at the C site are reporting. Sodars are OK, I hear. The Picarro was not connected to AC. It apparently suffered damage via the serial cable from the main tower DSM. So the damage was on the systems connected to the AC circuit to the main and OSU tower. I think Tom and Kurt agree that it seems the lightning didn't strike either tower but the surge came in from the AC circuit.
"I like Kurt's suggestion that if a lightning storm threatens, that the power cables to the transformers be disconnected at the trailer. (Probabaly a good idea to throw the "class" breaker first). That will shut down the Sodars and Piccarro. The ISFS systems and sensors at C and M should still run from their batteries. I believe everything on the OSU tower is on battery?"
The sensor damage on the main tower was determined as in the following tables. Sensors marked NG (Not Good) were removed and transported to Boulder. We also found that one of the four sensors at each of the two soil sites were bad and brought down its mote if connected. I recall that the soil temperature probe was bad at Grass, but the connector labels at Cactus were inadequate to determine the bad sensor (not the TP01 in my recollection). The bad soil sensors were disconnected from the motes but not disinterred.
M21
ht (m) |
serial port |
sensor |
status |
---|---|---|---|
0.5 |
s1 |
CSAT |
ok |
0.5 |
s2 |
TRH |
NG, fan running, high current (>1A), no data |
|
s3 |
gps |
ok |
1 |
s5 |
CSAT |
ok |
1 |
s6 |
Licor |
NG, blows fuses |
1 |
s7 |
PTB220 p |
ok |
1.5 |
s8 |
TRH |
NG, fan speed unsteady, no data |
2 |
s9 |
CSAT |
NG, blows fuses |
2 |
s10 |
Licor |
NG. No data |
2 |
s11 |
TRH |
NG, 1.2A, no data |
3 |
s12 |
CSAT |
ok |
3 |
s13 |
TRH |
NG, 1.3A |
4 |
s14 |
CSAT |
ok |
4 |
s15 |
TRH |
NG. 0.1 A but no data, no fan |
5 |
s16 |
CSAT |
NG 0.3 A, no data |
5 |
" |
kh2o/serializer |
? |
5 |
s17 |
Paroscientific p |
NG. 0.05 A, no data, tried *0100P4/r/n init |
|
s18 |
Picarro PC |
NG |
M22
ht (m) |
Serial Port |
Sensor |
Status |
---|---|---|---|
6 |
S1 |
TRH |
NG 0.1 A, no fan, no data |
6 |
S2 |
Handar |
ok |
|
S3 |
gps |
ok |
8 |
S5 |
TRH (replaced SHT) |
bad data (-40C), replacement OK |
8 |
S6 |
Handar |
ok |
10 |
S7 |
CSAT |
ok |
10 |
" |
KH2O/serializer |
NG. Raw data: 0xffff |
15 |
S8 |
TRH (replaced SHT) |
bad data (-40C), replacement OK |
15 |
S9 |
Handar |
ok |
20 |
S10 |
CSAT |
ok |
20 |
S11 |
PTB 220 p |
ok |
data_stats isfs_20120927_200000.dat.bz2 indicates that sensors at other sites quit reporting at 21:03:05, including two sensors at C: the 1.5m TRH and the barometer. The TRH resumed reporting on Sep 28 04:29 UTC.
Station 12 quit reporting at 21:03:05, but is now working after replacing the charging controller. However data from the TRHs at 12 via the bluetooth mote is not coming in.
0.5m TRH at Ah6 also stopped.
Site |
Serial Port |
Sensor |
Status |
---|---|---|---|
C20 |
7 |
1.5m TRH |
quit, but resumed on Sep 28 04:29 UTC |
C20 |
10 |
PTB |
NG no data |
Ah6 |
1 |
0.5m TRH |
NG, not reporting |
Ap12 |
btmote12 |
0.5 m TRH, 2m TRH |
unknown status, no data from Bluetooth mote |
Other Damaged Equipment
- WIFI antenna (EantA) on main tower. Unknown status, probably dead.
- WIFI antenna (EantD) on OSU tower. Unknown status.
- USB disk drive (pocketec) on M21 DSM is not recognized by host systems. This contained 21 hours of data for Sep 27, but we have the same data that was received in real-time over the network.
- 5-port network switch in M21 is toast
Pocketec USB disk drive in Mu22, upper DSM on NCAR tower, is OK.
*
Gordon, Sep 29.
Ap12 had not worked well since it was deployed. It died early every morning due to low voltage. During the night the measured station voltage, Vdsm, would dive down to a cutoff at 11.3 V, from an apparent healthy 14 V when being charged during the day.
Replaced the battery on Sep 26. Also checked that the lugs on the controller were tight.
The station was dead the evening of Sep 27, so it was feared it had been damaged in the lightning storm. The measured voltage at the interface panel was about 3 V when the system was turned on, and 12 V when off.
Brought the system to the trailer and could not find anything wrong. Voltages all OK. Decided that the problem must be that the charge controller could not deliver sufficient power. Kurt replaced the controller and the system powered up at 16:48 MDT (per the times in the archive). It has run through the night.
Gordon, later on Sep 29:
Turns out that 12 was affected by the lightning. It quit reporting, both via network an to its local storage right at the time of the strike, Sep 27, 21:03:05 UTC. After the charge controller was replaced, all sensors except the bluetooth mote serving the TRH's resumed reporting.
Sept 27
The CSAT serial numbers are found in the log files on each DSM.
Station |
0.5m Handar |
1m CSAT |
CSAT cal |
---|---|---|---|
1 |
? |
0923 |
15aug12 |
2 |
? |
0833 |
29aug12 |
3 |
? |
0743 |
05jul12 |
4 |
x |
1120 |
19jul12 |
5 |
? |
0732 |
09jul12 |
6 |
? |
0800 |
08dec11 |
7 |
x |
0673 |
18jul12 |
8 |
x |
0176 |
22aug12 |
9 |
x |
1121 |
08dec11 |
10 |
x |
0677 |
17jul12 |
11 |
x |
0674 |
13aug12 |
12 |
x |
0855 |
12sep12 |
13 |
x |
0745 |
09jul12 |
14 |
x |
1124 |
19jul12 |
15 |
x |
1122 |
08dec11 |
16 |
x |
0740 |
28aug12 |
17 |
x |
0856 |
31jan12 |
18 |
x |
0744 |
06jul12 |
19 |
x |
0672 |
17jul12 |
C Tower
Ht |
Handar |
CSAT |
CSAT cal |
---|---|---|---|
0.5 m |
? |
x |
x |
1 m |
x |
0200 |
27jun12 |
2 mm |
x |
0197 |
03jul12 |
Main Tower
Ht |
Handar |
CSAT |
CSAT cal |
---|---|---|---|
0.5 |
x |
|
|
1 |
x |
|
|
2 |
x |
|
|
3 |
x |
|
|
4 |
x |
|
|
5 |
x |
|
|
6 |
? |
x |
x |
8 |
? |
x |
x |
10 |
x |
|
|
15 |
? |
|
|
20 |
x |
|
|
Sept 27
Kurt and I installed NCAR barometer 0001 from the Manitou Beachon tower at station A3 today. We did not have a serial cable at the time, but Kurt will take one to the site this afternoon.
Sept 27:
Kurt and I installed CSAT 0800 from the Manitou Beachon tower at station 6. Unfortunately we do not have a CSAT cable, but Jielun will bring one from Boulder on Friday.
Serial TRHs at the Ah (1,2,3,5,6), C and M towers report their serial numbers in every data record, and so they can be displayed in real-time with rserial, or from the data archive with data_dump. At the Ah sites, sensor id N,40 is the 0.5m TRH and N,50 is the 2m TRH, where N is the station number:
data_dump -i 1,40 -A isfs_20120927_080000.dat.bz2 data_dump -i 1,50 -A isfs_20120927_080000.dat.bz2
TRHs that are sampled by Wisard motes report their serial numbers periodically in the Wisard message block. NIDAS processing, such as statsproc, logs the Wisard sensor serial numbers that it finds in the data archive. One can grep the output of statsproc for the string TRH. The 0.5m TRH reports as sensorType 0x10, the 2 m as 0x11.
2012-09-27,09:16:36|INFO|A4:/dev/ttyS1: 2012 09 20 21:34:05.932, mote=4, sensorType=0x10 SN=52, typeName=TRH 2012-09-27,09:16:36|INFO|A4:/dev/ttyS1: 2012 09 20 21:34:05.932, mote=4, sensorType=0x11 SN=59, typeName=TRH
The TRH serial numbers of the initial deployment on Sep 20/21 are as follows, along with the UTC date, time and serial number date after a unit swap. The ids of the TRHs on the C and M towers are also shown.
site |
SN at 0.5m |
SN at 2m |
---|---|---|
1 |
2 |
9 |
2 |
14 |
17 |
3 |
24, Nov 7 20:27=48, Nov 12 20:00=52 |
51 |
4 |
52, Oct 6 18:57=56 |
59 |
5 |
15 |
11 |
6 |
21 |
16 |
7 |
58 |
47 |
8 |
50 |
39 |
9 |
26 |
49 |
10 |
8, Oct 6 19:02=38 |
33 |
11 |
64 |
63 |
12 |
66 |
23 |
13 |
32 |
27 |
14 |
40 |
31 |
15 |
62 |
41 |
16 |
54 |
57 |
17 |
68, Oct 6 19:47=60 |
43 |
18 |
42 |
55 |
19 |
46 |
65, Oct 9 16:04=7 |
tower |
id |
SN |
---|---|---|
C 0.5m |
20,60 |
7, Oct 3 17:33=25 |
C 1m |
20,110 |
19 |
C 1.5m |
20,150 |
34 |
C 2.5m |
20,250 |
28 |
M 0.5m |
21,60 |
13, Oct 3 00:49=5 |
M 1.5m |
21,150 |
37, Oct 3 00:49=22 |
M 2.5m |
21,250 |
12, Oct 3 00:49=44 |
M 3m |
21,310 |
3, Oct 3 00:49=18 |
M 4m |
21,410 |
6, Oct 3 00:49=1 |
M 6m |
22,600 |
4, Oct 3 14:44=35 |
M 8m |
22,800 |
20, Oct 3 14:44=53 |
M 15m |
22,1500 |
10, Oct 3 14:44=29 |
The 0.5m TRH on 19, serial number 46, has failed. It worked initially after deployment from 9/20 16:00 MDT til 9/22 06:30 MDT, and then didn't report for 2 hours, then ran for 16 hours, then didn't report for 8.5 hours. It was removed from A19 yesterday, 9/26.
Sept 26, 2012
Gordon noticed a power dropout at stn 12 during the night. He replaced the 12 V battery.
Sept 26, 2012
Gordon and I investigated three stations with a bad TRH:
Stn 5: The TRH on serial port 5 at 2.5m was dead on arrival; no fan running; it had blown the 12V power fuse. We replaced it with a spare unit but used the original SHT sensor. Returned bad TRH to the base.
Stn 19: The TRH with mote id 10 at 0.5m was bad (but fan running). Tried replacing SHT but did not help. Returned bad TRH to base with original SHT.
Tower C: The TRH on serial port 6 at 1m had a bad cable. Replaced cable.
Today:
- Sonics 672 and 1123 were FedEx'd from Boulder to Campbell, after determining (through swapping head/electronics) that 1123's head was bad (and we knew that 672's head was loose). Ed thought it was possible that these could be returned by mid next week.
- Chris and Lisa removed sonics from the 8m and 30m levels at MFO (sorry, Ned) and brought them back to Boulder. Kurt will take them to SCP tomorrow to (finally) install A6 and replace A17. We'll look at the data to see if sending A17 back to Campbell is justified.
- While at MFO, Chris and Lisa also replaced a PTB220 barometer with the MPL solid-state barometer. Kurt also will take the PTB220 to SCP tomorrow to install at A3. I verified that data were coming in from the MPL. Note that I deleted the reference to a qc_file P.dat in manitou.xml (only on this DSM) since the default $QC didn't point to it and thus produced NANs.
SCP status 9/25 & 9/26 (rad & soil)
?I cycled through the dsm's and used rserial to examine each serial port.
Summary:
Known problem with Emerald card on A3.
Missing barometer at A3
TRH not reporting: s5 on A5 (no dc power on s5) & S6 on C20.
TRH data is nan, s1 on A19.
CSAT missing on A6
Ah1 s1 trh ok
s2 csat ok (esc-h)
s5 trh ok
s6 handar ok
s7 power ok (mote_dump 1,35)
Ah2 s1 trh ok
s2 csat ok
s5 trh ok
s6 handar ok
s7 power ok
Aph3 s1 p ** no data **
s2 csat ok
s5 trh ** emerald problem **
s6 trh ** emerald problem **
s7 handar ** emerald problem **
A4 s1 trh 10 ok (mote_dump 4,4)
s1 trh 11 ok
s1 power ok
s2 csat ok
Ah5 s1 trh ok
s2 csat ok
s5 trh ** not reporting **
s6 handar ok
s7 power ok
Ah6 s1 trh ok
s2 ** csat missing **
s5 trh ok
s6 handar ok
s7 power ok
A7 s1 trh 10 ok (mote_dump 4,4)
s1 trh 11 ok
s1 power 13.8V
s2 csat ok
Ap8 s1 p ok
s2 csat ok
bt trh 10 ok
bt trh 11 ok
Ap9 s1 p ok
s2 csat ok
bt trh 10 ok
bt trh 11 ok
Ap10 s1 p ok
s2 csat ok
bt trh 10 ok
bt trh 11 ok
bt power 13.8V
Ars11 s2 csat ok
bt trh 10 ok
bt trh 11 ok
bt power 13.2V
bt radiation ok (40)
bt wetness ok (40)
bt soil grass ok (41)
bt soil cactus ok (42)
Ap12 s1 p ok
s2 csat ok
bt trh 10 ok
bt trh 11 ok
bt power 13.5V
A13 s1 trh 10 ok
s1 trh 11 ok
s1 power 14.0V
s2 csat ok
Ap14 s1 p ok
s2 csat ok
bt trh 10 ok
bt trh 11 ok
bt power 13.8V
A15 s1 trh 10 ok
s1 trh 11 ok
s1 power 14.0V
s2 csat ok
A16 s1 trh 10 ok
s1 trh 11 ok
s1 power 14.0V
s2 csat ok
A17 s1 trh 10 ok
s1 trh 11 ok
s1 power 13.5V
s2 csat ok
A18 s1 trh 10 ok
s1 trh 11 ok
s1 power 13.8V
s2 csat ok
A19 s1 trh 10 ** nan **
s1 trh 11 ok
s1 power 13.9V
s2 csat ok
C20 s1 handar ok
s2 trh ok
s5 csat ok
s6 trh ** not reporting **
s7 trh ok
s8 csat ok
s9 trh ok
10 p ok
M21 s1 csat ok
s2 trh ok
s5 csat ok
s6 licor ok
s7 p ok
s8 trh ok
s9 csat ok
s10 licor ok
s11 trh ok
s12 csat ok
s13 trh ok
s14 csat ok
s15 trh ok
s16 csat ok
s17 p ok
s18 Picarro ** not reporting **
M22 s1 trh ok
s2 handar ok
s5 trh ok
s6 handar ok
s7 csat ok
s8 trh ok
s9 handar ok
s10 csat ok
s11 p ok
Sensor I.D.'s for the main tower:
20m CSAT 0741
PRES B4
15m Handar 678
TRH 10
10m CSAT 1119
KH2O 1393
8m Handar 0370003
TRH 20
6m Handar 1528
TRH 4
5m CSAT 0671
KH2O 1389
Micro Baro ?
4m CSAT 0738
TRH 6
3m CSAT 0540
TRH ?
2.5m TRH 12
2m CSAT 0538
Licor 1163
1.5m TRH 37
1m CSAT 1117
Licor 1167
PRES B7
.5m CSAT 1455
TRH 13
Sept 24:
Five sites had bad/missinig/no-serial-cable sonics:
A6 - missing sonic: no change
A12 - bad sonic: replaced S/N 1123 with S/N 0855
A17 - (formerly) bad sonic: seems to have fixed itself for the moment, no action taken
A19 - missing sonic: installed S/N 0178
M, serial port 12: missing cable: installed new cable
Kurt is taking 1123 back to Boulder for Chris to test and likely ship to Campbell
Thus we need only one good sonic to install at A6