playa1 data system hung on about 23z, Jan 13. When John/Steve connected to the console port the next morning it was reporting a lack of memory:

ntpd: page allocation failure. order:0, mode:0x20

This is a rare situation, that we have not seen previously at PCAPS. So far the stations have only died due to low power. river7 has been up continuously for 63 days.

To monitor memory usage on a station, the free command is useful:

ssh isfs1 free -m
             total       used       free     shared    buffers     cached
Mem:            61         59          1          0          7         43
-/+ buffers/cache:          8         52
Swap:            0          0          0

On the first line, the free value of 1 MByte out of a total of 64 looks alarming, but actually linux is using the free memory for buffers and cache, and is supposed to give it up to processes if they need it. The second line shows the free memory if you remove what is being used by buffers and cache, which is a healthy value of 52 MByte in this example.

My wild guess is that in rare circumstances, the kernel exhibits a bug where it doesn't free up the buffer/cache memory for use by processes. Time to upgrade the kernels from 2.6.16...

Or it could be due to a sensor input going bananas. I'll look into that.