It has been pretty obvious from the plots that the sonic diagnostic flag is worse during the time periods when the data are being "rsync"d back to Boulder. This is only for the CSAT3, not the CSAT3A/EC150. When I plot the high-rate data, I even find data gaps – 12s for one period I looked at. Even during non-rsync times, there are continuous non-zero ldiag values. These should be pretty close to zero all the time.
I suspect that the CSAT3s are picking up RF interference from the cell phones. If it hasn't been done already, we should play with grounding the heads and boxes (after checking to see if the mechanical connections already do this). Andy knows the drill all too well.
If this persists, it isn't a huge problem, since bad samples should be rejected by software, but it would be nice to fix if we can.
4 Comments
Gordon Maclean
Sep 16, 2016Since CSAT3_CKCNTR is true (set in isfs_env.sh and datasets.xml on barolo where statsproc is running) then missing UDP samples will cause the diag to be non-zero. So it could be due to network congestion, not RF interference. We can check this by running statsproc on the rsync'd data files and see if ldiag has the same signature (or setting CSAT3_CKCNTR to false).
I've wondered if the dropouts are worse on Vipers rather than Titans (forgot which stations were which).
Steve Oncley AUTHOR
Sep 16, 2016Yeah, I wondered about that explanation as well. In particular, I did a diff(tspar(<rawdata>)) and saw dropouts. s14 had one of 12seconds, s12&13 only up to 1s. s15 doesn't have this issue. I didn't have time to check out the sequence counter – "diagbits" seems to strip it.
Steve Oncley AUTHOR
Sep 16, 2016OK. Confirmed that this is the old UDP dropped messages issue, not RF. Forget about grounding. I see, for example on s14 at 20160915 12:00:28.51 that there are several dropped sonic samples in a row in the UDP file that exist in the rsync file, and that diag is set to 16 because of this. However, in the process of looking at this, I see errors in the data files, even on the station itself. This happened more than once and on s2 and s14 (only, both vipers) during this time period. Usually, our raw_data files are rock solid, I thought?
daq@s14:/media/usbdisk/projects/VERTEX/raw_data$ data_stats s14_20160915_120000.dat 2016-09-17,03:38:21|NOTICE|parsing: /home/daq/isfs/projects/VERTEX/ISFS/config/vertex.xml 2016-09-17,03:42:14|WARNING|SampleInputStream: s14_20160915_120000.dat: bad sample header: #bad=1,filepos=42778267,id=(0,0),type=0,len=0 Exception: EOFException: s14_20160915_120000.dat: open: EOF sensor dsm sampid nsamps |------- start -------| |------ end -----| rate minMaxDT(sec) minMaxLen s14:/dev/ttyS1 14 6 21623 2016 09 15 12:00:01.267 09 15 23:59:58.760 0.50 0.726 33.934 43 43 s14:/dev/gps_pty0 14 10 87058 2016 09 15 12:00:00.880 09 15 23:59:59.622 2.02 -0.123 31.847 51 73 s14:/var/log/chrony/tracking.log 14 15 2944 2016 09 15 12:00:11.719 09 15 23:59:49.335 0.07 0.000 48.096 100 100 s14:/dev/ttyS2 14 20 42946 2016 09 15 12:00:01.241 09 15 23:59:59.903 0.99 0.039 32.161 37 38 s14:/dev/ttyS5 14 50 863378 2016 09 15 12:00:00.041 09 15 23:59:59.954 19.99 -0.187 31.490 12 12 s14:/dev/ttyS4 14 100 431660 2016 09 15 12:00:00.889 09 15 23:59:59.886 9.99 -0.212 31.503 32 32 512 2560 1 1950 05 11 06:28:52.197 05 11 06:28:52.197 nan 0.000 0.000 17920 17920 512 12800 2 1950 05 11 05:52:50.824 05 11 07:28:50.777 0.00 5759.953 5759.953 3072 3072 512 25600 2 1950 05 11 06:03:02.844 05 11 07:38:11.599 0.00 5708.754 5708.754 8192 8192Gordon Maclean
Sep 17, 2016I usually like to blame these corruptions on sudden power downs, but I don't see any indication of such a thing happening at s14 on Sep 15th.
My guess is it is due to some sort of USB glitch, which is shared between the cell modem and the flash drive. Perhaps in some instances the DSM can't keep up with the USB interrupt load. Perhaps it happens more with Vipers?
The system logs (/var/log/kern.log, messages) should have some info. It would be nice to grab those.
Looks like a loss of about 30 seconds of data. Hope that the UDP dataset will fill it in, though if the USB was having issues one might expect the cell connection was problematic too.