Gordon Oct 23

I believe the issue Chris saw this morning on 6 is similar to what I've seen from time to time on other systems. When I've logged into a system and it is slooooow, I've run top and see that ntpd is taking 100% of the cpu. I believe that is because the tee_tty process has died. This message was in the system log for 6:

Oct 23 07:21:35 Ah6 tee_tty: 2012-10-23,07:21:35|NOTICE|received signal Interrupt(2), si_signo=2, si_errno=0, si_code=128

This happens about once every week on 1 or 2 of the 22 systems, so it is hard to diagnose.

tee_tty reads from the physical serial port that the GPS is connected to and sends that data to pseudo-terminals, one read by ntpd and one read by the dsm process. It appears that if tee_tty dies, then ntpd doesn't correctly detect the error on the input port and then likely attempts reads at a high rate.

Previously I tried to change ntpd so that it might catch the error and exit without hanging, without success.

Now I might know why the tee_tty is being sent the SIGINT signal. I turns out the serial port is opened in "cooked" mode, which means that if it somehow receives a ctrl-C on that port then the process is sent a SIGINT. I guess a ctrl-C could also be received on any pseudo-terminal that is opened for reading. A BREAK condition can also result in a SIGINT, but IGNBRK is set on the serial port, and I don't believe BREAKs can be sent over pseudo-terminals.

Today around 3 pm I logged into all the stations and updated the file /etc/gps.conf to change the "c" in GPS_TEE_OPTS to "r", so that the serial port is opened in "raw" mode:

GPS_TEE_OPTS="4800n81lnrxx -p 60 -l pps"

Next I'm rebuilding tee_tty so that if the real serial port is opened in raw mode, the pseudo-terminals are also opened in raw mode. I don't think the ctrl-C could be coming from the dsm process. rserial traps the ctrl-C and terminates.

As of 4:33 pm, the tee_tty app has been updated on all DSMs, and their tee_tty and ntpd processes restarted.