Gordon Oct 25
Cockpit data was not coming in.
Traced this down a system clock problem on flux. Its clock gradually lost time, and then the dsm_server process started throwing away samples, with this message in /var/log/isfs/isfs.log:
Oct 25 14:54:54 flux dsm_server[30058]: WARNING|sample with timetag in future by 2.305084 secs. time: 2012 Oct 25 20:54:56.861 id=10,20 total future samples=39039001 |
The problem was in /etc/chrony.conf on flux and gully. The chrony processes were configured to use keys, which were different, and so flux was not able to get time information from gully. Commented out
# keyfile /etc/chrony.keys |
statement in both, and restarted chronyd
systemctl restart chronyd.service |
chronyd on flux is configured to get time from gully, via "server 192.168.0.12" directive in /etc/chrony.conf on flux.
The server directives in /etc/chrony.conf on gully now look like so:
# server 0.fedora.pool.ntp.org iburst # server 1.fedora.pool.ntp.org iburst # server 2.fedora.pool.ntp.org iburst # server 3.fedora.pool.ntp.org iburst server ntp.colostate.edu server c20 server m21 server m22 |
chrony was never able to connect any of the fedora.pool.ntp.org servers. The chronyc sources command always returned lines like this for the pool servers:
MS Name/IP address Stratum Poll LastRx Last sample ============================================================================ ^? 64.73.32.134 0 6 10y +0ns[ +0ns] +/- 0ns ^? ns1.your-site.com 0 6 10y +0ns[ +0ns] +/- 0ns ^? 64.73.32.135 0 6 10y +0ns[ +0ns] +/- 0ns ^? cheezum.mattnordhoff.net 0 6 10y +0ns[ +0ns] +/- 0ns |
The traffic is probably blocked in a firewall somewhere.
In order to have a reference check for our DSMs, I added ntp.colostate.edu as a chrony server for gully. chronyc sources on gully with the above server configuration looks like so, which indicates that our DSMs have better clocks than ntp.colostate.edu:
# chronyc sources 210 Number of sources = 4 MS Name/IP address Stratum Poll LastRx Last sample ============================================================================ ^+ yuma.acns.colostate.edu 2 6 5 +75ms[ +75ms] +/- 173ms ^+ C20 3 6 5 -1281us[-1287us] +/- 3162us ^+ M21 3 6 5 +470us[ +464us] +/- 2506us ^* Mu22 3 6 5 -30us[ -36us] +/- 1799us |
chronyd