Blog from March, 2017

tnw07b is responding to pings but ssh connections are reset.   Oops, I made a mistake, nmap does look like a DSM.  (Port 22 is ssh, 8888 is tinyproxy, 30000 is dsm.  Xinetd check-mk 6556 is not listed I suspect because nmap does not scan it by default.)

[daq@ustar raw_data]$ nmap  tnw07b
Starting Nmap 7.40 ( https://nmap.org ) at 2017-03-10 17:40 WET
Nmap scan report for tnw07b (192.168.1.146)
Host is up (0.055s latency).
Not shown: 997 closed ports
PORT      STATE SERVICE
22/tcp    open  ssh
8888/tcp  open  sun-answerbook
30000/tcp open  ndmps

And data are coming in:

tnw07b:/dev/gps_pty0                   7      2        32 2017 03 10 17:41:25.874  03 10 17:41:41.036    2.04  0.142  0.925   69   80
tnw07b:/dev/ttyUSB0                    7     22        16 2017 03 10 17:41:25.828  03 10 17:41:40.958    0.99  1.008  1.009   39   39
tnw07b:/dev/ttyUSB4                    7    102        16 2017 03 10 17:41:26.368  03 10 17:41:41.434    1.00  1.004  1.005   38   38
tnw07b:/dev/ttyUSB7                    7  32768         4 2017 03 10 17:41:26.049  03 10 17:41:41.049    0.20  4.772  5.228   17   30

The data connection to dsm_server is in fact from the right IP address, so tnw07b appears to be configured correctly:

[root@ustar daq]# netstat -ap | grep tnw07b
tcp        0      0 ustar:51968             tnw07b:43666            ESTABLISHED 9440/dsm_server     

According to nagios, tnw07b was responding to check-mk requests until 2017-03-09 10:52 UTC, so something happened then which now causes network connections to be reset.  Probably this system needs to be rebooted.

 

Steve noticed that ports ttyS11 and ttyS12 are no longer reporting any data on rsw04.  After getting rsw04 updated and clearing off the USB yesterday and restarting DSM, those ports are still not reporting.  They were working until Feb 25.  ttyS10 was out for a while also, but it came back this morning at 2017 03 10 12:16:47.339, before the reboot.

[daq@ustar raw_data]$ data_stats rsw04_20170[23]*.dat
2017-03-10,15:58:41|NOTICE|parsing: /home/daq/isfs/projects/Perdigao/ISFS/config/perdigao.xml
Exception: EOFException: rsw04_20170310_155028.dat: open: EOF
sensor                              dsm sampid    nsamps |------- start -------|  |------ end -----|    rate      minMaxDT(sec) minMaxLen
rsw04:/dev/gps_pty0                  35     10   3944084 2017 02 03 09:44:08.995  03 10 15:58:31.569    1.29  0.015 1090606.000   51   73
rsw04:/var/log/chrony/tracking.log   35     15    133438 2017 02 03 09:44:53.133  03 10 15:58:25.024    0.04  0.000 1090616.750  100  100
rsw04:/dev/ttyS11                    35    100  38021353 2017 02 03 09:44:08.517  02 25 09:48:48.136   20.00 -0.016       0.992   60   77
rsw04:/dev/ttyS12                    35    102  38021782 2017 02 03 09:44:12.831  02 25 09:48:48.206   20.00 -0.107       1.390   40  125
rsw04:/dev/dmmat_a2d0                35    208  39114544 2017 02 03 09:44:08.570  03 10 15:58:31.363   12.84  0.034 1090604.875    4    4
rsw04:/dev/ttyS10                    35  32768    767733 2017 02 03 09:44:13.130  03 10 15:58:31.137    0.25 -0.031 1132080.875   12  104

Steve tried connecting to the ports directly yesterday and did not see anything.  After the reboot, I still don't see anything either.  This is a viper, so I'm thinking ports 11 and 12 are on the second emerald serial card, and these log messages are relevant:

[   41.641774] emerald: NOTICE: version: v1.2-522
[   41.842945] emerald: INFO: /dev/emerald0 at ioport 0x200 is an EMM=8
[   41.871947] emerald: WARNING: /dev/emerald1: Emerald not responding at ioports[1]=0x240, val=0x8f
[   41.881346] emerald: WARNING: /dev/emerald1: Emerald not responding at ioports[2]=0x2c0, val=0x8f
[   41.890757] emerald: WARNING: /dev/emerald1: Emerald not responding at ioports[3]=0x300, val=0x8f

 

Per work

(I assume Preban also was there...)

Working very late (until at least 10:30pm), the DTU crew got all of our installed masts on the network, though a few DSMs didn't come up.  We're very grateful!

From several Per emails:

5 Mar

The fiber for the internet is still not working, but José C. has promised that someone will come on Tuesday to have a look at it. I can see that the media converter reports an error on one the fibers.

We have brought a Litebeam 5 AC 23dBi with us and we have placed it on the antenna pole of the ops center. That has helped significantly on the performance and stability of link to the ridges. So I don’t think It’ll be necessary for you to manufacture any special brackets.

We have then placed the “old” litebeam from the ops center according to Teds plan at rNE_06. We have also placed the 19 dBi spare NanoBeam on RiNE_07 and reconfigured Tower 10 to match the new NanoBeam. So now we’re only lacking to replace the last of the 3 Prisms which I noticed was now mounted in tower 37. The Litebeam that Ted has ordered could maybe then replace that one?

We have gained some more bandwidth from the ops center to tower 29 by moving the frequencies further away from the ones being used by the two sector antennas at tower 29. It seemed like these three antennas close by each other were interfering.

8 Mar

As you already has discovered the fiber was fixed to day. It turned out that we had two issues with the connection out of here. Rx fiber was broken close to the first junction box they have. Aparently a couple of kilometers from here. The Tx fiber also had a problem with too sharp a bent in the very first electricity pole outside the building. The latter could explain the changing performance we were seeing on the line performance.

 The last 100m tower was successfully instrumented today, and your DSM’s should with a little luck be visible on the network.

 

9 Mar

We found the fault on tse04 top, the uSD card was ejected. It should be visible now.

We have changed the Ubiquiti config in the 4 army alu towers behind riNE07. They should now be online.
...and later

A few of the ubiquities on the towers were not set up with the proper wireless security rules, some were locked on the MAC address of the old AP we replaced (the Prism) and the last one was set in the wrong network mode.

We have moved a few towers from the planned accesspoint to another were the signal quality was higher. I still miss to correct it on the spreadsheet, I’ll do that asap.

The ARL ubiquities were all having the wrong PSK. José C. forwarded me a mail from a Sean, where he says there’s an IP conflict in one of his units, but they all seemed to have the IP address stated to the far right in the spreadsheet. And not the .110 to .113 stated in the mail. I were not able to access the web config page as described in his mail either, but since the IP’s matched Ted’s spreadsheet I put them on the network.

rne06.20m csat3a

This was reporting all NA.  pio got it to work.  I'm actually surprised, since I thought we had seen this problem in Jan and had even sent people up the tower to check the sonic head connection, with no success then...

TRH resets

Now that the network is up and we can look at things, I'm finding lots of TRHs with ifan=0:

tse06.2m: #67, no response to ^r, responded to pio (after power cycle, responds to ^r)

tse06.10m: #43, no response to ^r, pio didn't restart fan (after power cycle, responds to ^r)

tse06.60m: #8, responds to ^r and pio, but didn't restart fan

tse09b.2m: #103, ^r worked

tse11.2m: #120, no response to ^r, responded to pio

tse11.20m: #116, responds to ^r and pio, but didn't restart fan

tse11.40m: #110, responds to ^r and pio, but didn't restart fan

tse11.60m: #121, was in weird cal land, no response to ^r, responded to pio

tse13.2m: #119, no response to ^r, pio didn't restart fan (after power cycle, responds to ^r)

tse13.100m: #111, no response to ^r, pio didn't restart fan, reset CUR 200, now running at 167mA (and T dropped by 0.2 C). WATCH THIS! (has been running all day today)

tnw07.10m: #42, no response to ^r, responded to pio

tnw07.60m: #125, ^r killed totally!  pio doesn't bring back. dead.