Gordon Oct 25
Cockpit data was not coming in.
Traced this down a system clock problem on flux. Its clock gradually lost time, and then the dsm_server process started throwing away samples, with this message in /var/log/isfs/isfs.log:
Oct 25 14:54:54 flux dsm_server[30058]: WARNING|sample with timetag in future by 2.305084 secs. time: 2012 Oct 25 20:54:56.861 id=10,20 total future samples=39039001
The problem was in /etc/chrony.conf on flux and gully. The chrony processes were configured to use keys, which were different, and so flux was not able to get time information from gully. Commented out
# keyfile /etc/chrony.keys
statement in both, and restarted chronyd
systemctl restart chronyd.service
chronyd on flux is configured to get time from gully, via "server 192.168.0.12" directive in /etc/chrony.conf on flux.
The server directives in /etc/chrony.conf on gully now look like so:
# server 0.fedora.pool.ntp.org iburst # server 1.fedora.pool.ntp.org iburst # server 2.fedora.pool.ntp.org iburst # server 3.fedora.pool.ntp.org iburst server ntp.colostate.edu server c20 server m21 server m22
chrony was never able to connect any of the fedora.pool.ntp.org servers. The chronyc sources command always returned lines like this for the pool servers:
MS Name/IP address Stratum Poll LastRx Last sample ============================================================================ ^? 64.73.32.134 0 6 10y +0ns[ +0ns] +/- 0ns ^? ns1.your-site.com 0 6 10y +0ns[ +0ns] +/- 0ns ^? 64.73.32.135 0 6 10y +0ns[ +0ns] +/- 0ns ^? cheezum.mattnordhoff.net 0 6 10y +0ns[ +0ns] +/- 0ns
The traffic is probably blocked by a firewall somewhere.
In order to have a reference check for our DSMs, I added ntp.colostate.edu as a chrony server for gully, which works, which means our router is not blocking the traffic.
chronyc sources on gully with the above server configuration looks like so, which indicates that our DSMs (with PPS from a GPS) have better clocks than ntp.colostate.edu:
# chronyc sources 210 Number of sources = 4 MS Name/IP address Stratum Poll LastRx Last sample ============================================================================ ^+ yuma.acns.colostate.edu 2 6 5 +75ms[ +75ms] +/- 173ms ^+ C20 3 6 5 -1281us[-1287us] +/- 3162us ^+ M21 3 6 5 +470us[ +464us] +/- 2506us ^* Mu22 3 6 5 -30us[ -36us] +/- 1799us
chronyd