There were many failing nagios checks on iss3, but I think the problem was that the catalog database was not updating.  My theory is that the ceilometer PC changed the mtime property on some (many) of the camera images, and rsync copied those mtime changes to the data manager, and that caused the catalog to think all those images had changed and needed to be rescanned.  There are about 32000 camera images, and it looks like 20,000 had to be rescanned.

After finally completing the rescan, subsequent rescans are now much faster again, and the only checks failing now are for the ceilometer and allsky camera.  So there is still an issue with the ceilometer PC.  Right now rsync connections to it are failing.


  • No labels

6 Comments

  1. When did the errors show up in nagios for ISS 3? I was checking it all day and didn't see any errors as of last evening.   Am I not checking the right nagios thing?

  2. Gary Granger AUTHOR

    Yeah, good question, I should have noted that.  It looks like the camera image downloads started getting flaky around 8 pm PT, but nagios was flipping back and forth between green and red until about 9:40 PT, when it went solid red.  The other camera image categories and ceilometer data coming from the ceilometer PC look similar.  I'm sure you're checking the right nagios (http://iss3-field.dyndns.org/nagios/), it's probably just that things were in and out.  I noticed because I was looking at the water vapor problem on iss2 and decided to check the other sites as well.

  3. Gary Granger AUTHOR

    I wish I had some idea what was going on with this ceilometer PC and rsync, but all I can do is guess.  The mtime on the images is now in 2006, so that seems very peculiar.  Maybe the system time just got out of whack and that has messed up the software or the rsync server.  Or maybe there are issues with the disk or filesystem.  If the Internet PDU is out there somewhere, maybe we should plug the ceilometer PC into it so we can at least power cycle it remotely.  But then we'd still need to get VNC working on it to be able to connect and manually restart the ceilometer software after the reboot....so maybe it's still easiest to just keep manually rebooting it occasionally.  Have I painted a subtle enough picture about how obnoxious this PC is, or should I be more blunt?

    35 iss3.field.eol.ucar.edu:/iss/ds/cameras/allsky/2022/05/03> ls -laF | tail
    -rw-rw-r-- 1 iss iss 108333 Dec  1  2006 20220503_0311_reproj.jpg
    -rw-rw-r-- 1 iss iss  60112 Dec  1  2006 20220503_0312_pano.jpg
    -rw-rw-r-- 1 iss iss 162939 Dec  1  2006 20220503_0312_raw.jpg
    -rw-rw-r-- 1 iss iss 108331 Dec  1  2006 20220503_0312_reproj.jpg
    -rw-rw-r-- 1 iss iss  60393 Dec  1  2006 20220503_0313_pano.jpg
    -rw-rw-r-- 1 iss iss 164361 Dec  1  2006 20220503_0313_raw.jpg
    -rw-rw-r-- 1 iss iss 108557 Dec  1  2006 20220503_0313_reproj.jpg
    -rw-rw-r-- 1 iss iss  60671 Dec  1  2006 20220503_0314_pano.jpg
    -rw-rw-r-- 1 iss iss 165866 Dec  1  2006 20220503_0314_raw.jpg
    -rw-rw-r-- 1 iss iss 108346 Dec  1  2006 20220503_0314_reproj.jpg
  4. Do you need me to go out there and manually reboot the ceilometer computer?  I can head out there at 1 PM PDT, after a telecon I have.


  5. Laura was out at ISS3 today and upon arrival saw this message on the profiler computer screen:

    It had appeared that he hard drive had filled up, or something similar.  Bill had Laura empty out the C:/temp directory, reboot the machine, and restart the ceilometer software.   The error in Nagios have cleared and remained so for the rest of the day.

  6. To my surprize, I found that I can connect to the ceilometer computer via vncviewer so if it reboots, I can hopefully connect and restart the processes remotely.