It has been pretty obvious from the plots that the sonic diagnostic flag is worse during the time periods when the data are being "rsync"d back to Boulder.  This is only for the CSAT3, not the CSAT3A/EC150.  When I plot the high-rate data, I even find data gaps – 12s for one period I looked at.  Even during non-rsync times, there are continuous non-zero ldiag values.  These should be pretty close to zero all the time.

I suspect that the CSAT3s are picking up RF interference from the cell phones.  If it hasn't been done already, we should play with grounding the heads and boxes (after checking to see if the mechanical connections already do this).  Andy knows the drill all too well.

If this persists, it isn't a huge problem, since bad samples should be rejected by software, but it would be nice to fix if we can.

 

  • No labels

4 Comments

  1. Since CSAT3_CKCNTR is true (set in isfs_env.sh and datasets.xml on barolo where statsproc is running) then missing UDP samples will cause the diag to be non-zero.  So it could be due to network congestion, not RF interference.  We can check this by running statsproc on the rsync'd data files and see if ldiag has the same signature (or setting CSAT3_CKCNTR to false).

    I've wondered if the dropouts are worse on Vipers rather than Titans (forgot which stations were which).

  2. Steve Oncley AUTHOR

    Yeah, I wondered about that explanation as well.  In particular, I did a diff(tspar(<rawdata>)) and saw dropouts.  s14 had one of 12seconds, s12&13 only up to 1s.  s15 doesn't have this issue.  I didn't have time to check out the sequence counter – "diagbits" seems to strip it.  

  3. Steve Oncley AUTHOR

    OK. Confirmed that this is the old UDP dropped messages issue, not RF.  Forget about grounding.  I see, for example on s14 at 20160915 12:00:28.51 that there are several dropped sonic samples in a row in the UDP file that exist in the rsync file, and that diag is set to 16 because of this.  However, in the process of looking at this, I see errors in the data files, even on the station itself.  This happened more than once and on s2 and s14 (only, both vipers) during this time period. Usually, our raw_data files are rock solid, I thought?

    daq@s14:/media/usbdisk/projects/VERTEX/raw_data$ data_stats s14_20160915_120000.dat 
    2016-09-17,03:38:21|NOTICE|parsing: /home/daq/isfs/projects/VERTEX/ISFS/config/vertex.xml
    2016-09-17,03:42:14|WARNING|SampleInputStream: s14_20160915_120000.dat: bad sample header: #bad=1,filepos=42778267,id=(0,0),type=0,len=0
    Exception: EOFException: s14_20160915_120000.dat: open: EOF
    sensor                            dsm sampid    nsamps |------- start -------|  |------ end -----|    rate     minMaxDT(sec)   minMaxLen
    s14:/dev/ttyS1                     14      6     21623 2016 09 15 12:00:01.267  09 15 23:59:58.760    0.50    0.726   33.934    43    43
    s14:/dev/gps_pty0                  14     10     87058 2016 09 15 12:00:00.880  09 15 23:59:59.622    2.02   -0.123   31.847    51    73
    s14:/var/log/chrony/tracking.log   14     15      2944 2016 09 15 12:00:11.719  09 15 23:59:49.335    0.07    0.000   48.096   100   100
    s14:/dev/ttyS2                     14     20     42946 2016 09 15 12:00:01.241  09 15 23:59:59.903    0.99    0.039   32.161    37    38
    s14:/dev/ttyS5                     14     50    863378 2016 09 15 12:00:00.041  09 15 23:59:59.954   19.99   -0.187   31.490    12    12
    s14:/dev/ttyS4                     14    100    431660 2016 09 15 12:00:00.889  09 15 23:59:59.886    9.99   -0.212   31.503    32    32
                                      512   2560         1 1950 05 11 06:28:52.197  05 11 06:28:52.197     nan    0.000    0.000 17920 17920
                                      512  12800         2 1950 05 11 05:52:50.824  05 11 07:28:50.777    0.00 5759.953 5759.953  3072  3072
                                      512  25600         2 1950 05 11 06:03:02.844  05 11 07:38:11.599    0.00 5708.754 5708.754  8192  8192
  4. I usually like to blame these corruptions on sudden power downs, but I don't see any indication of such a thing happening at s14 on Sep 15th.

    My guess is it is due to some sort of USB glitch, which is shared between the cell modem and the flash drive.  Perhaps in some instances the DSM can't keep up with the USB interrupt load.  Perhaps it happens more with Vipers?

    The system logs (/var/log/kern.log, messages) should have some info. It would be nice to grab those.

    Looks like a loss of about 30 seconds of data. Hope that the UDP dataset will fill it in, though if the USB was having issues one might expect the cell connection was problematic too.