We are using Bluetooth PAN (personal area network) technology for network connections to stations 1-19 at SCP. Each DSM at a station has a Bluegiga WT41 radio in a small white box, which is connected to the DSM via a USB cable.

Mounted on a pole on the deck railing of building 2 are 3 bluetooth radios which establish network connections to the 19 stations. In the terminology of bluetooth, the stations are operating as PANU (PAN users) devices and the radios at the base building are NAPs (network access points).

  • NAP AP1 serves stations 1-7
  • NAP AP2 serves stations 8-13
  • NAP btap3 serves stations 14-19

AP1 and AP2 are Bluegiga WT41 radios. These radios are identical units to the radios on the stations. They are interfaced to the "gully" base laptop via USB cables that lead across the deck and through the window, and are then connected to the powered USB hub on the desk. Software on the "gully" system acts as a network bridge, which integrates the ethernet and bluetooth networks into one virtual network, with all systems having 192.168.0.X IP addresses.

btap3 is a Bluegiga AP3241 access point, also in a white box on the pole. It is a small Linux system, which is running its own bridging software. It then is connected to the local network via an ethernet cable across the deck. It also requires 12VDC power, which is provided by the power supply on the desk.

Quick Check of System Status

A simple script ping_check.sh on the gully or flux laptops does a ping of the 19 bluetooth DSMs and the 3 DSMs that are connected via WIFI. DSMs 1-22 are IP addresses 192.168.0.101-122. ping_check.sh then sleeps a minute and pings them again. You will notice from time to time that a station does not respond to pings, which is typically due to radio or network congestion and is only a problem if it persists.

Gary has also enabled nagios checks of the stations. See Systems Monitoring.

ssh to a DSM

The aster account on "gully" and "flux" has a $HOME/.ssh/config file which provides shortcuts in order to ssh to a given DSM:

  • ssh aN for N=1-19 to reach the bluetooth DSMs

Also:

  • ssh c20 DSM at C via WIFI
  • ssh m21 lower DSM on main tower via WIFI
  • ssh m22 upper DSM on main via WIFI

Archive File Rsyncs

Every 4 hours the rsync_stns.sh script runs on the gully laptop to rsync any of the previous day's NIDAS archive files from the flash drives on the DSMs. Usually only at 00:00 UTC does the script have anything to do. Four rsyncs are run simultaneously. An rsync runs for each group of DSMs that are served by the three NAP base radios. Another rsync copies files from the WIFI DSMs: C20, M21 and Mu22. While those rsyncs are in progress, a station may not respond to every ping, and so a ping failure does not indicate a broken connection.

The check_rsync.sh script checks the status of the rsync file copies, by listing the hidden files on /media/isfs0/projects/SCP/raw_data and the .rsync-partial directory

[aster@gully scripts]$ check_rsync.sh 
Files being currently copied to /media/isfs0/projects/SCP/raw_data:
-rw-------. 1 aster eol 28311552 Oct 16 19:41 .A16_20121016_000000.dat.u7tB8S
-rw-------. 1 aster eol  1835008 Oct 16 19:41 .Ah2_20121016_120000.dat.oxLl0V
-rw-------. 1 aster eol  2359296 Oct 16 19:41 .Ap10_20121016_120000.dat.xlpsC0
-rw-------. 1 aster eol 29622272 Oct 16 19:41 .Mu22_20121016_120000.dat.HnBmVR
Partial copies on .rsync-partial that failed and will be resumed:
-r--r--r--. 1 aster eol 10891630 Oct 16 19:12 A16_20121016_000000.dat
-r--r--r--. 1 aster eol  4108273 Oct 16 18:15 Ap14_20121016_000000.dat

A archive file with a leading '.' is a file being currently copied. Any files in the .rsync-partial directory indicate that an rsync failed, and will be resumed later. This happens on one or two stations every night, and is not a problem.

Configuration on "gully" System

At bootup, the "gully" base laptop starts the pand.service by running /etc/init.d/pand. The status of the pand service can be viewed with the systemctl status pand.service command:

[aster@gully]$ systemctl status pand.service
pand.service - LSB: Bluetooth Personal Area Networking Daemon.
          Loaded: loaded (/etc/rc.d/init.d/pand)
          Active: active (running) since Thu, 11 Oct 2012 08:20:43 -0600; 4 days ago
         Process: 946 ExecStart=/etc/rc.d/init.d/pand start (code=exited, status=0/SUCCESS)
          CGroup: name=systemd:/system/pand.service
                  â 1282 pand -i hci0 -e bnep1 --role NAP --master --service PANU --connect 00:07:80:4f:...
                  â 1284 pand -i hci0 -e bnep2 --role NAP --master --service PANU --connect 00:07:80:4f:...
                  â 1286 pand -i hci0 -e bnep3 --role NAP --master --service PANU --connect 00:07:80:4f:...
                  â 1288 pand -i hci0 -e bnep4 --role NAP --master --service PANU --connect 00:07:80:4f:...
                  â 1290 pand -i hci0 -e bnep5 --role NAP --master --service PANU --connect 00:07:80:4f:...
                  â 1292 pand -i hci0 -e bnep6 --role NAP --master --service PANU --connect 00:07:80:4f:...
                  â 1294 pand -i hci0 -e bnep7 --role NAP --master --service PANU --connect 00:07:80:4f:...
                  â 1296 pand -i hci1 -e bnep8 --role NAP --master --service PANU --connect 00:07:80:4f:...
                  â 1298 pand -i hci1 -e bnep9 --role NAP --master --service PANU --connect 00:07:80:4f:...
                  â 1300 pand -i hci1 -e bnep10 --role NAP --master --service PANU --connect 00:07:80:4f...
                  â 1302 pand -i hci1 -e bnep11 --role NAP --master --service PANU --connect 00:07:80:4f...
                  â 1304 pand -i hci1 -e bnep12 --role NAP --master --service PANU --connect 00:07:80:4f...
                  â 1306 pand -i hci1 -e bnep13 --role NAP --master --service PANU --connect 00:07:80:4f...

The version of /etc/init.d/pand which is distributed with Fedora has been edited to comment out starting pand in listening mode.

Instead /etc/init.d/pand executes the script /etc/bluetooth/pan/system-up, which creates the virtual bridge interface called pan0, and then starts the 13 pand processes shown above to connect to the stations. The hex MAC addresses shown above are the hard-coded bluetooth addresses of the radios at stations 1-13.

The USB interfaces of the two WT41 radios at the base system are known as hci0 (AP1) and hci1 (AP2). As shown above, hci0 serves stations 1-7 and hci1 serves 8-13. The commands hciconfig and hcitool on the gully linux system can be used to control and query the radios. query_radios.sh, in the SCP scripts directory, uses hcitool to query the RSSI, link quality and other parameters from hci0 and hci1:

[aster@gully ~]$ query_radios.sh 
Do man hcitool for info on radio parameters

We need to learn whether the following are useful values,
and if so, what are acceptable ranges.

rssi=received signal strength information. Furthest station a1 is usually -9 or -10
lq=link quality. Appears that higher is better. Usually correlated with rssi?
tpl=transmit power level. Always seems to be 19.
afh=adaptive frequency hopping channel map

ip             bdaddr            rssi   lq  tpl afh
192.168.0.101  00:07:80:4F:D4:F1  -10   67   19 0xff03f01f0000fcffff7f
192.168.0.102  00:07:80:4F:D4:EF   -3  124   19 0xff03f01f0000fcffff7f
192.168.0.103  00:07:80:4F:D4:EE   -3   88   19 0xff03f01f0000fcffff7f
192.168.0.104  00:07:80:4F:D4:F7   -2   99   19 0xff03f01f0000fcffff7f
192.168.0.105  00:07:80:4F:D4:D9   -2  110   19 0xff03f01f0000fcffff7f
192.168.0.106  00:07:80:4F:D4:F8   -4   88   19 0xff03f01f0000fcffff7f
192.168.0.107  00:07:80:4F:D4:EB   -4   91   19 0xff03f01f0000fcffff7f
192.168.0.108  00:07:80:4F:D4:E4   -2   93   19 0xff000000ffffff00fc7f
192.168.0.109  00:07:80:4F:D4:ED    0  137   19 0xff000000ffffff00fc7f
192.168.0.110  00:07:80:4F:D4:E9   -5   88   19 0xff000000ffffff00fc7f
192.168.0.111  00:07:80:4F:D4:FB   -1  154   19 0xff000000ffffff00fc7f
192.168.0.112  00:07:80:4F:D4:E7   -7   88   19 0xff000000ffffff00fc7f
192.168.0.113  00:07:80:4F:D4:F6   -2  127   19 0xff000000ffffff00fc7f
Troubleshooting Bluetooth on gully

The pand processes on gully are started with a --persist option, which means that if a connection to a station goes down, the corresponding pand process continues to run, and tries to reconnect. In general the bluetooth networking has worked very well, and little intervention has been needed.

If a radio will not connect, check the cabling, especially the USB connection at the radio, which may work loose, despite attempts to secure it with cable ties.

Twice in the first 4 weeks, a NAP radio would drop all connections and would not reconnect. If it happens again, I'd like to see if the pand_check.sh script works. It is in the SCP scripts directory.

pand_check.sh hciN

where N is 0 for AP1 and 1 for AP2. pand_check.sh kills all the pand processes for a radio, sends it hciconfig down,reset,up,piscan commands and restarts the pand processes.

If that script does not work one has to do a cold restart of the gully system and the radio. It does not help to unplug/replug the USB connections to the radios. If that is done, Linux gets confused, and will usually crash. Instead, in this case, shutdown Linux on "gully" with the poweroff command. Then unplug/replug power to the USB hub on the desk, to completely power cycle the radios. Bootup the gully laptop again and things should work.

We have spare WT41 radios in white boxes. If a base NAP radio fails, simply replace it with a spare. After attaching the new radio, you will probably have to reboot gully as described above.

Configuration on btap3, the Bluegiga AP3241

The 3241 radio on the pole acts as an all-in-one NAP radio and Linux bridge host for stations 14-19. It has a hardcoded address of 192.168.0.13 on the local network. One can point a browser at http://192.168.0.13, or login via *ssh root@192.168.0.13".

On bootup, the 3241 executes /home/ISFS/scripts/up_check.sh from /etc/rc2.d/rc.local. Every 10 minutes, that script reads the contents of /home/ISFS/scripts/ipbt, which contains the IP and bluetooth addresses of stations 14-19:

192.168.0.114 00:07:80:4f:d4:fc
192.168.0.115 00:07:80:4f:d4:f2
192.168.0.116 00:07:80:4f:d4:f5
192.168.0.117 00:07:80:4f:d4:f0
192.168.0.118 00:07:80:4f:d5:00
192.168.0.119 00:07:80:4f:d4:ff

If a station does not respond to ping, the up_check.sh script on the 3241 executes an IWRAP command to establish a PAN connection to the station, in an similar way to the pand proceses on the gully laptop:

btcli CALL $btaddr PAN-PANU PAN-NAP

$btaddr is the hex bluetooth address of a station. The up_check.sh script logs message to /var/log/messages on the 3241:

[root@AP3 /root]$ fgrep up_check /var/log/messages
Oct 15 04:38:14 (none) daemon.warn up_check: PING 192.168.0.118 (192.168.0.118): 56 data bytes
Oct 15 04:38:14 (none) daemon.warn up_check: --- 192.168.0.118 ping statistics ---
Oct 15 04:38:14 (none) daemon.warn up_check: 1 packets transmitted, 1 packets received, 0% packet loss
Oct 15 04:38:14 (none) daemon.warn up_check: round-trip min/avg/max = 2479.272/2479.272/2479.272 ms

If a problem arises with the 3241, the likely culprit is the ethernet connection or power. /var/log/messages may contain useful diagnostics. We have a spare 3241 in the trailer if the first one fails. I believe it is all set up to go.

If one logs into the 3214 via ssh root@192.168.0.13, the inq command command can be used to see the current RSSI and BER (bit error rate) values for the current connections:

[root@AP3 /root]$ inq
iWRAP Link State     Type    MTU  In         Out       Time    MSC MSC Bdaddr            Ch Direction Power  Role   Crypt Pid HCI RSSI BER     Friendly Name
===== ==== =====     ====    ===  ==         ===       ====    === === ======            == ========= =====  ====   ===== === === ==== ===     =============
10101 0    CONNECTED PAN-NAP 1690 93MiB      5088KiB   17h     8d  00  00:07:80:4f:d4:fc 0  OUTGOING  ACTIVE MASTER PLAIN 0   2a  -65  12.0200 Ap14
10101 2    CONNECTED PAN-NAP 1690 1534MiB    92MiB     11d     8d  00  00:07:80:4f:d4:ff 0  OUTGOING  ACTIVE MASTER PLAIN 0   39  -60  8.7400  A19
10101 3    CONNECTED PAN-NAP 1690 1921MiB    135MiB    18d     8d  00  00:07:80:4f:d5:00 0  OUTGOING  ACTIVE MASTER PLAIN 0   33  -62  9.3000  A18
10101 5    CONNECTED PAN-NAP 1690 1162MiB    88MiB     11d     8d  00  00:07:80:4f:d4:f0 0  OUTGOING  ACTIVE MASTER PLAIN 0   36  -65  12.0200 A17
10101 13   CONNECTED PAN-NAP 1690 186MiB     13MiB     2d      8d  00  00:07:80:4f:d4:f5 0  OUTGOING  ACTIVE MASTER PLAIN 0   2d  -70  21.6200 A16
10101 14   CONNECTED PAN-NAP 1690 1217MiB    91MiB     11d     8d  00  00:07:80:4f:d4:f2 0  OUTGOING  ACTIVE MASTER PLAIN 0   30  -62  6.5800  A15

RSSI values between -60 and -70, as above, are good signal levels.

Bluetooth Configuration on the DSMs

On bootup, the DSMs execute /etc/rc2.d/S25bluetooth start. This script sets environment variables from /etc/default/bluetooth. If PAND_ENABLE is true, then pand is run, with the options specified in PAND_OPTIONS:

tail /etc/default/bluetooth
# Start pand (allowed values are "true" and "false")
PAND_ENABLE=true
# Arguments to pand
PAND_OPTIONS="--role PANU --service NAP --listen --nosdp --autozap --persist"

In this way, pand is started on the stations, listening for incoming connections from the base NAP radios. With the "--persist" option, pand is able to reconnect on a dropped connection.

Once a DSM has established a bluetooth connection, an IP address is assigned to the bnep0 interface, as specified by /etc/network/interfaces on each DSM:

iface bnep0 inet static
        address 192.168.0.101
        netmask 255.255.255.0
        broadcast 192.168.0.255
        gateway 192.168.0.1

The IP addresses for DSMs 1-19 are set to 192.168.0.101-192.168.0.119.

To be sure that the connection stays up, a blue_check.sh script is run from crontab every 10 minutes:

crontab -l
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/scripts:/usr/local/bin
5 * * * *       /etc/cron.daily/logrotate
*/10 * * * * /usr/local/scripts/blue_check.sh 192.168.0.1

This script pings the given address, and if it is not reachable, does an hciconfig reset and a restart of the pand process. The default handling of the pand process seems to handle most of the connection issues. I don't know how frequently this script has been necessary to bring the connection back up. One can grep blue_check /var/log/isfs/messages to see its log messages.

Again, in general the bluetooth connections to the stations have been very dependable.

Swapping A Bluetooth Radio on a Station.

If a station radio fails, we have several spares, which can be swapped in. No reconfiguration is needed on the DSM.

If the station is served by AP1 or AP2 on gully, change the bluetooth address of the radio in /etc/bluetooth/pan/system.up and in $ISFF/projects/SCP/ISFF/config/mac_ip_hci. Then do:

systemctl restart pand.service

If the station is served by the ap3241, change the bluetooth address in /home/ISFS/scripts/ipbt*. On the 3241, the connection should be automatically re-established when the up_check.sh script runs every 10 minutes.