11:45 Low lost communication with its emerald boards around 11:00.
rebooted Rebooted low and everything back up.
...
Those were the only messages before the reboot, and they occurred at least 23 hours earlier, which means the problem is not due to a kernel oops, or any other atypical event that the kernel could detect. It is just the good ol' situation where there seems to be a very small but finite possibly possibility that a PC104 interrupt can be missed, and not retriggered, even though the PC104 IRQ interrupt line is high, such that the interrupt handler is never again called.
...
Seems that I need to install a PC104 interrupt watchdog module. There is some indication this has happened on the aircraft, also quite infrequently. A test is being setup out at RAF.
When the PC104 interrupts are being handled, the irqs listing looks like so, showing 275 interrupts/sec from the Emerald cards:
Code Block | ||
---|---|---|
| ||
root@low root# irqs Counting interrupts over 5 seconds ... IRQ Interrupt Type Total Int Int/sec ------------------------------------------------------ 3: ISA serial: 1376 275.2 24: GPIO-l eth0: 62 12.4 25: GPIO-l GPIO1-PC104: 1376 275.2 36: SC serial: 15 3 37: SC serial: 101 20.2 42: SC ost0: 509 101.8 114: GPIO isp116x-hcd:usb1: 90 18 115: GPIO serial: 228 45.6 116: GPIO serial: 102 20.4 |