Gordon Oct 25

Cockpit data was not coming in.

Traced this down a system clock problem on flux. Its clock gradually lost time, and then the dsm_server process started throwing away samples, with this message in /var/log/isfs/isfs.log:

Oct 25 14:54:54 flux dsm_server[30058]: WARNING|sample with timetag in future by 2.305084 secs. time: 2012 Oct 25 20:54:56.861 id=10,20 total future samples=39039001

The problem was in /etc/chrony.conf on flux and gully. The chrony processes were configured to use keys, which were different, and so flux was not able to get time information from gully. Commented out

# keyfile /etc/chrony.keys

statement in both, and restarted chronyd

systemctl restart chronyd.service

chronyd on flux is configured to get time from gully, via "server 192.168.0.12" directive in /etc/chrony.conf on flux.

The server directives in /etc/chrony.conf on gully now look like so:

# server 0.fedora.pool.ntp.org iburst
# server 1.fedora.pool.ntp.org iburst
# server 2.fedora.pool.ntp.org iburst
# server 3.fedora.pool.ntp.org iburst
server ntp.colostate.edu
server c20
server m21
server m22

chrony was never able to connect any of the fedora.pool.ntp.org servers. The chronyc sources command always returned lines like this for the pool servers:

MS Name/IP address           Stratum Poll LastRx Last sample
============================================================================
^? 64.73.32.134                  0    6    10y     +0ns[   +0ns] +/-    0ns
^? ns1.your-site.com             0    6    10y     +0ns[   +0ns] +/-    0ns
^? 64.73.32.135                  0    6    10y     +0ns[   +0ns] +/-    0ns
^? cheezum.mattnordhoff.net      0    6    10y     +0ns[   +0ns] +/-    0ns

The traffic is probably blocked by a firewall somewhere.

In order to have a reference check for our DSMs, I added ntp.colostate.edu as a chrony server for gully, which works, which means our router is not blocking the traffic.

chronyc sources on gully with the above server configuration looks like so, which indicates that our DSMs (with PPS from a GPS) have better clocks than ntp.colostate.edu:

# chronyc sources
210 Number of sources = 4
MS Name/IP address           Stratum Poll LastRx Last sample
============================================================================
^+ yuma.acns.colostate.edu       2    6      5    +75ms[  +75ms] +/-  173ms
^+ C20                           3    6      5  -1281us[-1287us] +/- 3162us
^+ M21                           3    6      5   +470us[ +464us] +/- 2506us
^* Mu22                          3    6      5    -30us[  -36us] +/- 1799us

chronyd