Blog

s1 PTB is down

PTB210 at s1 was working up until 2020-04-12,15:23:03, except of course for periods when s1 was probably dead. There are a few messages after that, until 202-04-16,01:29:11, and then they stop. As of the swap of the s9 DSM, nothing has been received from the PTB.

The PTB was set for 7E1, so I've tried connecting to it on s9 with minicom in 7E1, but there is still no response. Maybe a blown fuse, or maybe the PTB itself is dead.  The data messages received in 7E1 mode are recoverable with some code changes to NIDAS.

2020 04 12 15:22:59.8795  0.7426  1,  22      10 \xb1\xb1\xb1\xb2.\xb26\x8d\n
2020 04 12 15:23:00.5341  0.6546  1,  22      10 \xb1\xb1\xb1\xb2.\xb26\x8d\n
2020 04 12 15:23:01.3770  0.8429  1,  22      10 \xb1\xb1\xb1\xb2.\xb26\x8d\n
2020 04 12 15:23:02.0969  0.7199  1,  22      10 \xb1\xb1\xb1\xb2.\xb2\xb7\x8d\n
2020 04 12 15:23:02.7773  0.6803  1,  22      10 \xb1\xb1\xb1\xb2.\xb2\xb7\x8d\n
2020 04 12 15:23:03.6169  0.8396  1,  22      10 \xb1\xb1\xb1\xb2.\xb2\xb7\x8d\n
2020-04-29,10:42:57|INFO|opening: s1_20200415_144143.dat
2020 04 15 14:41:44.7493 2.567e+05  1,  22      10 \xb1\xb1\xb13.0\xb2\x8d\n
2020-04-29,10:42:57|INFO|opening: s1_20200415_144322.dat
2020-04-29,10:43:03|INFO|opening: s1_20200416_012833.dat
2020 04 16 01:29:10.0350 3.885e+04  1,  22      10 \xb1\xb109.5\xb2\x8d\n
2020 04 16 01:29:10.5671  0.5322  1,  22      10 \xb1\xb109.5\xb2\x8d\n
2020 04 16 01:29:11.2509  0.6837  1,  22      10 \xb1\xb109.53\x8d\n
2020-04-29,10:43:03|INFO|opening: s1_20200416_012957.dat
2020-04-29,10:43:03|INFO|opening: s1_20200416_013538.dat
2020-04-29,10:43:03|INFO|opening: s1_20200416_013838.dat
2020-04-29,10:43:03|INFO|opening: s1_20200416_120000.dat
s1 disdrometer is broken

When Leila swapped the s9 DSM for s1, I discovered the disdrometer messages were broken. The quick summary is that the eeprom got erased, leading to these questions.  The details follow.

  • Can we take up this problem with Ott Hydromet?
  • Any clues as to what could be causing the Parsivel2 to lose it's memory?
  • Is voltage or current supply borderline for reliable operation?

It looks like the first data file at setup for s1 is s1_20200325_120000.dat. Disdrometer was working as of 2020-03-25,17:49:13. I'm guessing the site was setup that day, and for whatever reason the disdrometer data messages start there, without the boot messages. Probably the time was not synchronized until then.

[isfs@barolo raw_data]$ data_stats -a -i 1,-1 s1_20200325_120000.dat 
2020-04-29,09:39:54|NOTICE|parsing: /h/eol/isfs/isfs/projects/SWEX/ISFS/config/swex.xml
Exception: EOFException: s1_20200325_120000.dat: open: EOF
sensor                           dsm sampid    nsamps |------- start -------|  |------ end -----|    rate        minMaxDT(sec) minMaxLen
s1:/dev/ttyDSM1                    1      8       371 2020 03 25 17:49:13.254  03 25 23:59:10.090    0.02      57.007   60.021 4142 4142
s1:/dev/gps_pty0                   1     10     44573 2020 03 25 17:48:16.892  03 25 23:59:59.024    2.00      -2.133    1.253   59  147
s1:/var/log/chrony/tracking.log    1     18      2654 2020 03 25 17:48:16.958  03 25 23:59:59.913    0.12       0.000 2578.788  100  100
s1:/dev/ttyDSM2                    1     20     22436 2010 02 01 00:00:57.185  03 25 23:59:59.275    0.00-1946506.446    1.342   42   44
s1:/dev/ttyDSM3                    1     22         0 ***********************  ******************     nan         nan      nan  nan  nan
s1:/dev/ttyDSM4                    1     40    446122 2010 02 01 00:00:57.159  03 25 23:59:59.974    0.00-1946507.325    0.493   60   60
s1:/dev/ttyPWRMONV                 1     60    445799 2010 02 01 00:00:56.308  03 25 23:59:59.422    0.00-1946506.527    1.172    3   90
s1:/dev/ttyDSM5                    1    100    223047 2020 03 25 17:48:17.032  03 25 23:59:59.976   10.00      -2.889    0.428   32   32
s1:/dev/ttyDSM7                    1 0x8000      4467 2020 03 25 17:48:20.498  03 25 23:59:57.411    0.20       1.616    5.689   26   60

The messages appeared to be fine, including reporting the serial number 450620:

2020 03 25 17:49:13.2549       0  1,   8    4142 450620;0000.000;00;20000;024;27025;00000;0;000;...

It booted up again 2020-03-26,01:35:11, for reasons unknown. It reported one good message, then started rebooting and reporting "POWERSUPPLY TEST FAILED !!!".

2020-04-29,09:13:52|INFO|opening: s1_20200326_013511.dat
2020 03 26 01:35:11.6336   121.6  1,   8       3 \r\n
2020 03 26 01:35:11.6346 0.001042  1,   8      22 BOOTLOADER PARSIVEL\r\n
2020 03 26 01:35:12.9390   1.304  1,   8       3 \r\n
2020 03 26 01:35:12.9400 0.001042  1,   8      21 *** PARSIVEL 2 ***\r\n
2020 03 26 01:35:12.9518 0.01182  1,   8      20 OTT HYDROMET GMBH\r\n
2020 03 26 01:35:12.9631 0.01129  1,   8      22 COPYRIGHT (C) 2019 \r\n
2020 03 26 01:35:12.9755 0.01238  1,   8      18 VERSION: 2.11.6\r\n
2020 03 26 01:35:12.9843 0.008857  1,   8      17 BUILD: 2112151\r\n
2020 03 26 01:35:13.0045 0.02015  1,   8       3 \r\n
2020 03 26 01:35:13.3090  0.3045  1,   9      40          0          0      20000         15          0          0          3          0          0          0 
2020 03 26 01:35:13.3090       0  1,   8    4142 450620;0000.000;00;.....;\r\n
2020 03 26 01:35:28.0125    14.7  1,   8       4 \x80\r\n
2020 03 26 01:35:29.6423    1.63  1,   8      22 BOOTLOADER PARSIVEL\r\n
2020 03 26 01:35:30.9507   1.308  1,   8       3 \r\n
2020 03 26 01:35:30.9517 0.001042  1,   8      21 *** PARSIVEL 2 ***\r\n
2020 03 26 01:35:30.9635 0.01181  1,   8      20 OTT HYDROMET GMBH\r\n
2020 03 26 01:35:30.9748 0.01131  1,   8      22 COPYRIGHT (C) 2019 \r\n
2020 03 26 01:35:30.9858 0.01094  1,   8      18 VERSION: 2.11.6\r\n
2020 03 26 01:35:30.9961 0.01028  1,   8      17 BUILD: 2112151\r\n
2020 03 26 01:35:31.0044 0.008336  1,   8       3 \r\n
2020 03 26 01:35:35.0523   4.048  1,   8      31 \xf8POWERSUPPLY TEST FAILED !!!\r\n

Eventually it starts repeating the messages "ERROR: No Valid Serial Number found !!!" and "ERROR: No Valid Hardware info found !!!".

2020 03 26 01:38:57.7429 0.001042  1,   8      22 BOOTLOADER PARSIVEL\r\n
2020 03 26 01:38:59.0512   1.308  1,   8       3 \r\n
2020 03 26 01:38:59.0522 0.001042  1,   8      21 *** PARSIVEL 2 ***\r\n
2020 03 26 01:38:59.0641 0.01184  1,   8      20 OTT HYDROMET GMBH\r\n
2020 03 26 01:38:59.0740 0.009899  1,   8      22 COPYRIGHT (C) 2019 \r\n
2020 03 26 01:38:59.0879 0.01393  1,   8      40 VERSION: \xf8POWERSUPPLY TEST FAILED !!!\r\n
2020-04-29,09:13:52|INFO|opening: s1_20200326_120000.dat
2020 03 26 19:49:41.7501 6.544e+04  1,   8      42 ERROR: No Valid Serial Number found !!!\r\n
2020 03 26 19:49:41.7597 0.009589  1,   8      43 ERROR: No Valid Hardware info found !!!\r\r\n
2020 03 26 19:49:58.6922   16.93  1,   8      22 BOOTLOADER PARSIVEL\r\n
2020 03 26 19:50:00.0337   1.341  1,   8      42 ERROR: No Valid Serial Number found !!!\r\n
2020 03 26 19:50:00.0577 0.02407  1,   8      43 ERROR: No Valid Hardware info found !!!\r\r\n

It keeps reporting the "No Valid Hardware info" messages until 2020-03-27,01:50, then some noise, then nothing until 20:13, when it starts reporting the default messages with the serial number of XXXXXXXX:

2020 03 27 01:50:38.3070 0.001042  1,   8      22 BOOTLOADER PARSIVEL\r\n
2020 03 27 01:50:39.0419  0.7349  1,   8      42 \x00\xe0\x00\x00\x00\x00\x00\xff\x00\x00\xfe\x00POWERSUPPLY TEST FAILED !!!\r\n
2020 03 27 01:50:42.5119    3.47  1,   8      33 \x00\x00\x00POWERSUPPLY TEST FAILED !!!\r\n
2020 03 27 01:50:44.8753   2.363  1,   8      44 \x00\x00\x00\x00\x00\x00\x00\x0e\x00\xff\x00\xfe\x00\x00POWERSUPPLY TEST FAILED !!!\r\n
2020 03 27 01:50:47.5648    2.69  1,   8      63 \x00\x00\xff\x02\x00\xfe\x00\xff\x00\x00\x00\x00\x9d(\x80\x00\xff\x00\xff\x00\xfd\xff\x00\x00\xfe\x00\xff\x00\x00\xff\xff\x00\x00POWERSUPPLY TEST FAILED !!!\r\n
2020 03 27 01:50:51.8081   4.243  1,   8      49 \x00\x00\x00\x00\x00@\x14\x00\x00\x00\xfe\x00\x00\x00\x0e\x00\x00\xff\x00POWERSUPPLY TEST FAILED !!!\r\n
2020-04-29,09:13:56|INFO|opening: s1_20200327_120000.dat
2020 03 27 20:13:06.5367       0  1,   8      71 XXXXXXXX;0000.000;0000.00;00;-9.999;20000;0000.00;025;27028;00000;0;\r\n
2020 03 27 20:14:06.5447       0  1,   8      71 XXXXXXXX;0000.000;0000.00;00;-9.999;19320;0000.00;025;27018;00000;0;\r\n

There are still some reboot messages later on and more error messages, so it's not like the disdrometer is stabilized again but just missing eeprom. Either way it's in a broken state, and I don't think this is the only one to have had this kind of problem.

For the moment, I have modified the NIDAS config to parse the messages but skip the serial number field. However, that is not a fix since the whole configuration of the data messages has been lost, and we don't know if losing the hardware info and any other eeprom settings makes the data useless.


A week ago, Leila and Charles visited Site 1 to find the DSM inop. For simplicity sake, this past Sunday (4/26) I had them replace the S1 DSM with the unused S9 DSM. Nothing was swapped between the two, so S9 has it's original cell modem and SD card. 

S9 Comms

m59
eth0- 192.168.1.209 (DSM address)
eth1- 166.255.144.36 (cell modem address)


--

Dan

High winds

Just a quick note that s8 has routinely seen winds >20 m/s, with the highest 5-min average of 27 m/s on 17 Mar.  

s15 is the next windiest, up to 17 m/s, though I suspect that s17 would have been high as well if it were being recorded.

The pattern is not surprising, though the magnitudes are rather high for what ISFS normally sees.


It seems that the s8 CSAT sometimes misses these events – is this the high-wind-speed-error that Sean saw several years ago?


Quickie status - 20 Apr

Still in idling mode after being pulled from set-up last month for COVID.

Sites deployed:

s1 - Barometer not reporting (configuration?).  Station only comes up when batteries fully charged.   (Had worked ok initially after setup.). Sometimes, files are not opened.  Most of the time the time stamp is bad.

P.S. Site was visited by Leila and Charles on 18 Apr.  They replaced batteries and Victron, which brought power up to DSM and sensors, but DSM still hasn't come up.  This suggests either a fault or wrong setting (mode 4 instead of mode 3?) of the old Victron.  Dan things the next step is a DSM change, but the DSM was swaged closed so the PIs couldn't get into the box.  A DSM issue is odd, though, since it was working last week.

s3 - TRH died in rain on 6 Apr.  Otherwise ok

s4 - ok

s8 - EC150 never installed.  GPS not receiving most messages as of 14 Mar(!).  (At that time, about 13 Mar 23:30 nsat drops from 10 to 7, then further drops to 0 about 14 Mar 02:30.)  Otherwise ok

s10 - ok

s14 - Mote data never worked properly, last data on 3 Apr.  Cable from DSM to mote found to have water during site visit by PIs on 11 Apr, but didn't have spare.  Barometer highly intermittent (but Pirga ok).  Otherwise, ok.

s15  - ok

s17 - Site pretty much never worked.  Just a few hours of data early on.  Last data 26 Mar.  From log snapshot taken when station was last up, seemed to be a DSM USB issue.

Status - Mar 20

s1: not reporting.  Last message (18 Mar) missing P, Vbatt was okay.

s3: all working

s4: all working

s8: ec150 not installed

s10: all working

s14: P, TRH, mote all down (mote has a lot of 0x00 characters before message)

s15: Qsoil needed power cycle

s17: not reporting, suspect DSM usb issue.  Last message (13 Mar) had bad TRH fan, RH questionable, missing Ott, missing TP01 (might just have been timing, since prior message was okay), Vbatt was okay


Wind directions

So... we want to offer a dataset to the PIs in geo coordinates.  Speaking with Kurt, he is confident that the tripods of each site were oriented with a compass to make the csat point out from the mast at an angle of 315 deg (NW), to within about 2 degrees.  I have thus entered Vazimuth = 315- 180 - 90 = 45 into the cal files for s1, s3, s4, s8, s10, s14, s15, and s17.

Dan told me that the orientation of the Gill 2D could be any multiple of 90 degrees from the csat orientation.  By creating a scatterplot of each site's csat vs Gill, I verified this and entered the appropriate multiple + 45 also into the cal files.  Running statsproc with noqc_geo produces dir=Dir now, so I think we're close enough for an unsupported project.

IF the teardown crew has nothing better to do, it would be nice to actually measure these angles...

Status 19 Mar

I guess we can't leave this blog up in perpetuity without some explanation of what has happened in the last week!

Due to the world-wide Covid-19 coronavirus pandemic, all staff were recalled from the field.   On 3/12, s13, which had been partially assembled but never transmitting data, was removed and the field crew started securing the base and Pod.  On 3/13, Dan left and Kurt and Clayton serviced TRHs at s8 and s10.  On 3/14, Kurt and Clayton left the site as well.

This left s1, s3, s4, s8, s10, s14, s15, and s17 installed.  The EC150 was never installed at s8.   The barometer at s1 seems to be flaky.  s17 connects very intermittently, presumably due to a USB issue in the DSM that is rebooting it frequently –  the last data came through 13 Mar.

We will continue to let these run, perhaps with a bit of servicing by UCSB, until we are next cleared for travel.  At that point, we will send out a tear-down crew to pull everything and wait for SWEX2021...

s17 temporarily up

Logged in to see what's up. Steve fixed the udev rules so now the pwrmon is reporting. I noticed in the dsm logs that dsm statsproc from relampago was still trying to run, so I disabled it and turned off the service. Steve rsynced the data files to barolo.

Looked at logs to see if I could figure out why it's been off the net so much. Looks like it's rebooting frequently due to USB problems:

Mar  8 15:15:10 s17 kernel: [   42.553988] usb 1-1.5-port1: cannot disable (err = -71)
Mar  8 15:15:10 s17 kernel: [   42.555830] usb 1-1.5: Failed to suspend device, error -71
Mar  8 15:15:10 s17 kernel: [   42.562454] usb 1-1.5: USB disconnect, device number 118
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Mar  8 01:17:06 s17 kernel: [    0.000000] Linux version 4.9.35-v7+ (dc4@dc4-XPS13-9333) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611) ) #1014 SMP Fri Jun 30 14:47:43 BST 2017
Mar  8 01:17:06 s17 kernel: [    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
Mar  8 01:17:06 s17 kernel: [    0.000000] CPU: div instructions available: patching division code
Mar  8 01:17:06 s17 kernel: [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
Mar  8 01:17:06 s17 kernel: [    0.000000] OF: fdt:Machine model: Raspberry Pi 2 Model B Rev 1.1
Mar  8 01:17:06 s17 kernel: [    0.000000] cma: Reserved 8 MiB at 0x3a800000
Mar  8 01:17:06 s17 kernel: [    0.000000] Memory policy: Data cache writealloc
Mar  8 01:17:06 s17 kernel: [    0.000000] On node 0 totalpages: 241664
Mar  9 01:17:14 s17 kernel: [   16.952991] usb 1-1.5: Product: USB 2.0 Hub
Mar  9 01:17:14 s17 kernel: [   16.954917] hub 1-1.5:1.0: USB hub found
Mar  9 01:17:14 s17 kernel: [   16.955420] hub 1-1.5:1.0: 4 ports detected
Mar  9 01:17:14 s17 kernel: [   17.171205] hub 1-1.5:1.0: hub_ext_port_status failed (err = -71)
Mar  9 01:17:14 s17 kernel: [   17.172380] usb 1-1.5: Failed to suspend device, error -71
Mar  9 01:17:14 s17 kernel: [   17.226561] usb 1-1.5: USB disconnect, device number 28
Mar  9 01:17:14 s17 kernel: [   17.520553] usb 1-1.5: new full-speed USB device number 29 using dwc_otg
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^
Mar  9 01:17:06 s17 kernel: [    0.000000] Linux version 4.9.35-v7+ (dc4@dc4-XPS13-9333) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611) ) #1014 SMP Fri Jun 30 14:47:43 BST 2017
Mar  9 01:17:06 s17 kernel: [    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
Mar  9 01:17:06 s17 kernel: [    0.000000] CPU: div instructions available: patching division code
Mar  9 01:17:06 s17 kernel: [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
Mar  9 01:17:06 s17 kernel: [    0.000000] OF: fdt:Machine model: Raspberry Pi 2 Model B Rev 1.1
Mar  9 01:17:06 s17 kernel: [    0.000000] cma: Reserved 8 MiB at 0x3a800000
Mar  9 01:17:06 s17 kernel: [    0.000000] Memory policy: Data cache writealloc
Mar  8 15:19:10 s17 kernel: [   42.911045] 
Mar  8 15:19:10 s17 kernel: [   42.911078] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 3, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x06000021
Mar  8 15:19:10 s17 kernel: [   42.911078] 
Mar  8 15:19:10 s17 kernel: [   42.911112] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 4, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x06000021
Mar  8 15:19:10 s17 kernel: [   42.911112] 
Mar  8 15:19:10 s17 kernel: [   42.911181] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 7, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x06000021
Mar  8 15:19:10 s17 kernel: [   42.911181] 
Mar  8 15:19:10 s17 kernel: [   42.911214] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 5, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x06000021
Mar  8 15:19:10 s17 kernel: [   42.911214] 
Mar  8 15:19:10 s17 kernel: [   42.911247] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 2, DMA Mode -- ChHltd set, but reason for hMar  8 01:17:06 s17 kernel: [    0.000000] Booting Linux on physical CPU 0xf00
Mar  8 01:17:06 s17 kernel: [    0.000000] Linux version 4.9.35-v7+ (dc4@dc4-XPS13-9333) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611) ) #1014 SMP Fri Jun 30 14:47:43 BST 2017
Mar  8 01:17:06 s17 kernel: [    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
Mar  8 01:17:06 s17 kernel: [    0.000000] CPU: div instructions available: patching division code
Mar  8 01:17:06 s17 kernel: [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
Mar  8 01:17:06 s17 kernel: [    0.000000] OF: fdt:Machine model: Raspberry Pi 2 Model B Rev 1.1

Lots of these reboots in the logs. Interestingly when the system reboots it seems to always come up with a time right around 01:17:05 of the current day, even if it means jumping back in time by minutes or hours.

There were some other usb messages in the logs that didn't seem to trigger a reboot, but were still notable:

Mar  8 01:17:10 s17 kernel: [   13.663002] usb 1-1.5: Product: USB 2.0 Hub
Mar  8 01:17:10 s17 kernel: [   13.664314] hub 1-1.5:1.0: USB hub found
Mar  8 01:17:10 s17 kernel: [   13.664799] hub 1-1.5:1.0: 4 ports detected
Mar  8 01:17:11 s17 kernel: [   13.980630] usb 1-1.5.1: new full-speed USB device number 35 using dwc_otg
Mar  8 01:17:11 s17 kernel: [   13.982863] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.983375] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.983959] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.984481] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.984980] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.984988] usb 1-1.5-port1: Cannot enable. Maybe the USB cable is bad?
Mar  8 01:17:11 s17 kernel: [   13.985569] usb 1-1.5-port1: cannot disable (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.986115] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.986665] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.987144] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.987727] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.988208] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.988215] usb 1-1.5-port1: Cannot enable. Maybe the USB cable is bad?
Mar  8 01:17:11 s17 kernel: [   13.988765] usb 1-1.5-port1: cannot disable (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.989280] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.989862] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.990342] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.990997] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.991627] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.991637] usb 1-1.5-port1: Cannot enable. Maybe the USB cable is bad?
Mar  8 01:17:11 s17 kernel: [   13.992182] usb 1-1.5-port1: cannot disable (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.992779] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.993295] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.993872] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.994386] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.994945] usb 1-1.5-port1: cannot reset (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.994952] usb 1-1.5-port1: Cannot enable. Maybe the USB cable is bad?
Mar  8 01:17:11 s17 kernel: [   13.995477] usb 1-1.5-port1: cannot disable (err = -71)
Mar  8 01:17:11 s17 kernel: [   13.995515] usb 1-1.5-port1: unable to enumerate USB device
Mar  8 01:17:11 s17 kernel: [   13.996019] usb 1-1.5-port1: cannot disable (err = -71)

I saw this message for both port 1 and port 2 of usb 1-1.5.

Mar  8 01:17:40 s17 kernel: [   43.600792] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 0, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x06000001
Mar  8 01:17:40 s17 kernel: [   43.600792] 
Mar  8 01:17:40 s17 kernel: [   43.600848] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 7, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x06000001
Mar  8 01:17:40 s17 kernel: [   43.600848] 
Mar  8 01:17:40 s17 kernel: [   43.600907] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x06000021
Mar  8 01:17:40 s17 kernel: [   43.600907] 
Mar  8 01:17:40 s17 kernel: [   43.600983] hub 1-1:1.0: hub_ext_port_status failed (err = -71)
Mar  8 01:17:40 s17 kernel: [   43.601058] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 4, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x06000001
Mar  8 01:17:40 s17 kernel: [   43.601058] 
Mar  8 01:17:40 s17 kernel: [   43.601100] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 6, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x06000001
Mar  8 01:17:40 s17 kernel: [   43.601100] 
Mar  8 01:17:40 s17 kernel: [   43.601144] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 0, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x06000001
Mar  8 01:17:40 s17 kernel: [   43.601144] 
Mar  8 01:17:40 s17 kernel: [   43.601192] usb 1-1-port5: cannot reset (err = -71)
Mar  8 01:17:40 s17 kernel: [   43.601254] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 7, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x06000001
Mar  8 01:17:40 s17 kernel: [   43.601254] 
Mar  8 01:17:40 s17 kernel: [   43.601294] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x06000001
Mar  8 01:17:40 s17 kernel: [   43.601294] 
Mar  8 01:17:40 s17 kernel: [   43.601336] ERROR::handle_hc_chhltd_intr_dma:2215: handle_hc_chhltd_intr_dma: Channel 4, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x06000001
Mar  8 01:17:40 s17 kernel: [   43.601336] 
Mar  8 01:17:40 s17 kernel: [   43.601379] usb 1-1-port5: cannot reset (err = -71)


Do we think this is the fault of one bad USB device, like the usb stick or cell modem? Is usb 1-1 the external hub? If Kurt and Dan paid a visit to try and get s17 more reliably online, would it be better to swap out the whole DSM so we can troubleshoot this one back in Boulder, or are we fairly confident swapping out one component would fix it?

s17 is back down now, so I can't keep looking. Steve copied some of /var/log into /scr/tmp/oncley, but it didn't seem to get very far before the connection went down.

Also, Steve noted that ssh to isfs17.dyndns.org connects to s3 right now because the dyndns names haven't been updated, which is confusing.


Quickie status - Mar 11

s3 now up.  All sensors okay.

Quickie status - 10 Mar

The field crew waited out the rain this morning and then rushed to install s1 in the afternoon.  Everything appears to running, except the barometer, no doubt the 7E1 problem.

Thus, we now have reporting s1, s4, s8, s10, s14, s15.

Also, s17 briefly came in this afternoon (22:29 - 22:37 UTC). Vmote values were generally reasonably high, indicating that the station has power.

Ott: all okay

TRH: s10 died today, s8 reporting fan not working, others okay

P: s1 not reporting – probably 7E1 issue (but can't log in to fix), others okay

CSAT: all okay

EC150: s8 bad, others okay

Gill 2D: all okay (but need to add to qctables)

Rad: all okay 

Soils: all okay

Victron: s17 not reporting – probably usb rules setting (but can't log in to fix), others okay

mote: all okay, changed "sn" setting on s4 to report serial numbers

Isabel, Jacquie, and I have all worked to get the R/json-based webplots and qctables working.  I just added the usual link to these in the top wiki page.  Note the different qctables colors, which are hopefully easier to read!


Some things that still need work:

  • labels for the 2D plot panels 
  • winds in geo coordinates
  • reordering of plots and qctable data to get the station sequence 1-18 DONE (plots and qctable)
  • placeholders for totally missing data in qctables DONE
  • add Spd, Rfan to qctables DONE


...to pick up my removal of ".tip" in the name of Rainr...

s14 status

Today, the crew appears to have installed s14.  From the data:

Ott: working, but for some reason data aren't being parsed by barolo.  The first character seems to be 0x00, sometimes followed by 0xff, before the good message.  Other Otts don't have this.

TRH: okay

P: okay

CSAT/EC150: okay

Gill 2D: okay

Rad: okay

Soils: okay

Victron: not reporting


s8 solar panel

Kurt tells me that the bottom panel in the solar panel rack here has corrosion damage (presumably due to being submerged either in CHEESEHEAD or VERTEX) and is presently unusable.  Thus, this station is running on only one panel.  The power estimate spreadsheet says that it will now take 11 days, rather than 3 days, to fully charge a dead station – clearly longer than we want.  Thus, we can expect this station to lose power in cloudy conditions.  Obviously, we can replace this panel or rack if there is a spare.  I don't know from Kurt's description if it will be possible to fix on-site.