The iss3 dsm stopped reporting data around 17:15 UTC on 3/15. I had assumed that it got turned off before ISS left Santa Barbara and so didn't look into it, but that was not the case. Leila visited iss3 and power cycled the DSM on 3/21 and now it's reporting again. I don't know if the DSM was unresponsive or if nidas just stopped recording data. 

Looking at the logs there's our old friend the USB disconnect on 3/15 15:45:

Mar 15 15:43:40 localhost kernel: [157315.468311] usb 1-1.1: USB disconnect, device number 3
Mar 15 15:43:40 localhost kernel: [157315.468971] smsc95xx 1-1.1:1.0 eth0: unregister 'smsc95xx' usb-3f980000.usb-1.1, smsc95xx USB 2.0 Ethernet
Mar 15 15:43:40 localhost kernel: [157315.469180] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup

Everything looks like it usually does when this happens, except this message is new:

Mar 15 15:45:10 localhost rsyslogd-2007: action 'action 2' suspended, next retry is Sun Mar 15 15:46:40 2020 [try http://www.rsyslog.com/e/2007 ]
Mar 15 15:46:41 localhost rsyslogd-2007: action 'action 2' suspended, next retry is Sun Mar 15 15:48:11 2020 [try http://www.rsyslog.com/e/2007 ]
Mar 15 15:48:11 localhost rsyslogd-2007: action 'action 2' suspended, next retry is Sun Mar 15 15:49:41 2020 [try http://www.rsyslog.com/e/2007 ]
Mar 15 15:49:41 localhost rsyslogd-2007: action 'action 2' suspended, next retry is Sun Mar 15 15:51:11 2020 [try http://www.rsyslog.com/e/2007 ]

It repeats every 90 or so seconds until the DSM gets rebooted on 3/21.

By the isfs log it looks like the USB stick was read-only and the ethernet was also not working:

Mar 15 15:43:40 localhost dsm[1787]: EMERGENCY|SampleOutput: FileSet: /media/usbdisk/projects/SWEX/raw_data/dsm-iss3_%Y%m%d_%H%M%S.dat: IOException: /media/usbdisk/projects/SWEX/raw_data/dsm-iss3_20200315_120000.dat: write: Read-only file system
...
Mar 15 16:01:56 localhost dsm[1787]: WARNING|McSocketMulticaster: inet:0.0.0.0:0: IOException: inet:192.168.0.56:30002: send: Network is unreachable

So I'm guessing the DSM wouldn't have been reachable over ssh while it was unresponsive, but I'm not sure.

  • No labels