PTB210 at s1 was working up until 2020-04-12,15:23:03, except of course for periods when s1 was probably dead. There are a few messages after that, until 202-04-16,01:29:11, and then they stop. As of the swap of the s9 DSM, nothing has been received from the PTB.
The PTB was set for 7E1, so I've tried connecting to it on s9 with minicom in 7E1, but there is still no response. Maybe a blown fuse, or maybe the PTB itself is dead. The data messages received in 7E1 mode are recoverable with some code changes to NIDAS.
When Leila swapped the s9 DSM for s1, I discovered the disdrometer messages were broken. The quick summary is that the eeprom got erased, leading to these questions. The details follow.
- Can we take up this problem with Ott Hydromet?
- Any clues as to what could be causing the Parsivel2 to lose it's memory?
- Is voltage or current supply borderline for reliable operation?
It looks like the first data file at setup for s1 is s1_20200325_120000.dat. Disdrometer was working as of 2020-03-25,17:49:13. I'm guessing the site was setup that day, and for whatever reason the disdrometer data messages start there, without the boot messages. Probably the time was not synchronized until then.
The messages appeared to be fine, including reporting the serial number 450620:
It booted up again 2020-03-26,01:35:11, for reasons unknown. It reported one good message, then started rebooting and reporting "POWERSUPPLY TEST FAILED !!!".
Eventually it starts repeating the messages "ERROR: No Valid Serial Number found !!!" and "ERROR: No Valid Hardware info found !!!".
It keeps reporting the "No Valid Hardware info" messages until 2020-03-27,01:50, then some noise, then nothing until 20:13, when it starts reporting the default messages with the serial number of XXXXXXXX:
There are still some reboot messages later on and more error messages, so it's not like the disdrometer is stabilized again but just missing eeprom. Either way it's in a broken state, and I don't think this is the only one to have had this kind of problem.
For the moment, I have modified the NIDAS config to parse the messages but skip the serial number field. However, that is not a fix since the whole configuration of the data messages has been lost, and we don't know if losing the hardware info and any other eeprom settings makes the data useless.
A week ago, Leila and Charles visited Site 1 to find the DSM inop. For simplicity sake, this past Sunday (4/26) I had them replace the S1 DSM with the unused S9 DSM. Nothing was swapped between the two, so S9 has it's original cell modem and SD card.
eth0- 192.168.1.209 (DSM address)
eth1- 126.96.36.199 (cell modem address)
Just a quick note that s8 has routinely seen winds >20 m/s, with the highest 5-min average of 27 m/s on 17 Mar.
s15 is the next windiest, up to 17 m/s, though I suspect that s17 would have been high as well if it were being recorded.
The pattern is not surprising, though the magnitudes are rather high for what ISFS normally sees.
It seems that the s8 CSAT sometimes misses these events – is this the high-wind-speed-error that Sean saw several years ago?
Still in idling mode after being pulled from set-up last month for COVID.
s1 - Barometer not reporting (configuration?). Station only comes up when batteries fully charged. (Had worked ok initially after setup.). Sometimes, files are not opened. Most of the time the time stamp is bad.
P.S. Site was visited by Leila and Charles on 18 Apr. They replaced batteries and Victron, which brought power up to DSM and sensors, but DSM still hasn't come up. This suggests either a fault or wrong setting (mode 4 instead of mode 3?) of the old Victron. Dan things the next step is a DSM change, but the DSM was swaged closed so the PIs couldn't get into the box. A DSM issue is odd, though, since it was working last week.
s3 - TRH died in rain on 6 Apr. Otherwise ok
s4 - ok
s8 - EC150 never installed. GPS not receiving most messages as of 14 Mar(!). (At that time, about 13 Mar 23:30 nsat drops from 10 to 7, then further drops to 0 about 14 Mar 02:30.) Otherwise ok
s10 - ok
s14 - Mote data never worked properly, last data on 3 Apr. Cable from DSM to mote found to have water during site visit by PIs on 11 Apr, but didn't have spare. Barometer highly intermittent (but Pirga ok). Otherwise, ok.
s15 - ok
s17 - Site pretty much never worked. Just a few hours of data early on. Last data 26 Mar. From log snapshot taken when station was last up, seemed to be a DSM USB issue.
s1: not reporting. Last message (18 Mar) missing P, Vbatt was okay.
s3: all working
s4: all working
s8: ec150 not installed
s10: all working
s14: P, TRH, mote all down (mote has a lot of 0x00 characters before message)
s15: Qsoil needed power cycle
s17: not reporting, suspect DSM usb issue. Last message (13 Mar) had bad TRH fan, RH questionable, missing Ott, missing TP01 (might just have been timing, since prior message was okay), Vbatt was okay
So... we want to offer a dataset to the PIs in geo coordinates. Speaking with Kurt, he is confident that the tripods of each site were oriented with a compass to make the csat point out from the mast at an angle of 315 deg (NW), to within about 2 degrees. I have thus entered Vazimuth = 315- 180 - 90 = 45 into the cal files for s1, s3, s4, s8, s10, s14, s15, and s17.
Dan told me that the orientation of the Gill 2D could be any multiple of 90 degrees from the csat orientation. By creating a scatterplot of each site's csat vs Gill, I verified this and entered the appropriate multiple + 45 also into the cal files. Running statsproc with noqc_geo produces dir=Dir now, so I think we're close enough for an unsupported project.
IF the teardown crew has nothing better to do, it would be nice to actually measure these angles...
I guess we can't leave this blog up in perpetuity without some explanation of what has happened in the last week!
Due to the world-wide Covid-19 coronavirus pandemic, all staff were recalled from the field. On 3/12, s13, which had been partially assembled but never transmitting data, was removed and the field crew started securing the base and Pod. On 3/13, Dan left and Kurt and Clayton serviced TRHs at s8 and s10. On 3/14, Kurt and Clayton left the site as well.
This left s1, s3, s4, s8, s10, s14, s15, and s17 installed. The EC150 was never installed at s8. The barometer at s1 seems to be flaky. s17 connects very intermittently, presumably due to a USB issue in the DSM that is rebooting it frequently – the last data came through 13 Mar.
We will continue to let these run, perhaps with a bit of servicing by UCSB, until we are next cleared for travel. At that point, we will send out a tear-down crew to pull everything and wait for SWEX2021...
Logged in to see what's up. Steve fixed the udev rules so now the pwrmon is reporting. I noticed in the dsm logs that dsm statsproc from relampago was still trying to run, so I disabled it and turned off the service. Steve rsynced the data files to barolo.
Looked at logs to see if I could figure out why it's been off the net so much. Looks like it's rebooting frequently due to USB problems:
Lots of these reboots in the logs. Interestingly when the system reboots it seems to always come up with a time right around 01:17:05 of the current day, even if it means jumping back in time by minutes or hours.
There were some other usb messages in the logs that didn't seem to trigger a reboot, but were still notable:
I saw this message for both port 1 and port 2 of usb 1-1.5.
Do we think this is the fault of one bad USB device, like the usb stick or cell modem? Is usb 1-1 the external hub? If Kurt and Dan paid a visit to try and get s17 more reliably online, would it be better to swap out the whole DSM so we can troubleshoot this one back in Boulder, or are we fairly confident swapping out one component would fix it?
s17 is back down now, so I can't keep looking. Steve copied some of /var/log into /scr/tmp/oncley, but it didn't seem to get very far before the connection went down.
Also, Steve noted that ssh to isfs17.dyndns.org connects to s3 right now because the dyndns names haven't been updated, which is confusing.
The field crew waited out the rain this morning and then rushed to install s1 in the afternoon. Everything appears to running, except the barometer, no doubt the 7E1 problem.
Thus, we now have reporting s1, s4, s8, s10, s14, s15.
Also, s17 briefly came in this afternoon (22:29 - 22:37 UTC). Vmote values were generally reasonably high, indicating that the station has power.
Ott: all okay
TRH: s10 died today, s8 reporting fan not working, others okay
P: s1 not reporting – probably 7E1 issue (but can't log in to fix), others okay
CSAT: all okay
EC150: s8 bad, others okay
Gill 2D: all okay (but need to add to qctables)
Rad: all okay
Soils: all okay
Victron: s17 not reporting – probably usb rules setting (but can't log in to fix), others okay
mote: all okay, changed "sn" setting on s4 to report serial numbers
Isabel, Jacquie, and I have all worked to get the R/json-based webplots and qctables working. I just added the usual link to these in the top wiki page. Note the different qctables colors, which are hopefully easier to read!
Some things that still need work:
- labels for the 2D plot panels
- winds in geo coordinates
- reordering of plots and qctable data to get the station sequence 1-18 DONE (plots and qctable)
- placeholders for totally missing data in qctables DONE
- add Spd, Rfan to qctables DONE
Today, the crew appears to have installed s14. From the data:
Ott: working, but for some reason data aren't being parsed by barolo. The first character seems to be 0x00, sometimes followed by 0xff, before the good message. Other Otts don't have this.
Gill 2D: okay
Victron: not reporting
Kurt tells me that the bottom panel in the solar panel rack here has corrosion damage (presumably due to being submerged either in CHEESEHEAD or VERTEX) and is presently unusable. Thus, this station is running on only one panel. The power estimate spreadsheet says that it will now take 11 days, rather than 3 days, to fully charge a dead station – clearly longer than we want. Thus, we can expect this station to lose power in cloudy conditions. Obviously, we can replace this panel or rack if there is a spare. I don't know from Kurt's description if it will be possible to fix on-site.