Blog from October, 2012

Took Amperage readings for the East transformer. (readings are rounded up) 

C Tower - 0.50A

OSU SODAR - 2.00A

NCAR SODAR - 2.50A

I did not take readings on West transformer due to not easy access to power lines.  I may try after the storm (worried about not being able to re-cover good enough).

Weather:

Cloudy and cold today.  Very slight winds.  Rain came in later in the day and noticed station 6 sonic become unhappy, the first of all the stations now.  

Status & Ops:

All stations reporting.  Gully is saving data.  Cockpit was updated and all plots are working.  If anyone is interested in my set-up you can load ChrisCockpit.  Be aware that it is Halloween colors.  Don't be scared.

To Dos:

Yet some more...clean the base trailer.  Will take Amperage readings for all branches of AC power.  This is a task Kurt wanted me to do.  

I have parked the four wheeler under the base trailer for shelter from rain and snow.  I organized some of Cristoph's PVC and things under the front part of the trailer so I could park and lock the ATV up.  This may not be a permanent spot for parking the ATV but for now it will work.

10m Sonic on Main

October 23, 2012

Afternoon-ish.

I tried to tighten the bolt to the head of the sonic at 10m on Main Tower, but it was snug.  I'm presuming this is something similar to A17?  If this does not help then I will put on ToDo List that it needs replacing.

Gordon Oct 23

I believe the issue Chris saw this morning on 6 is similar to what I've seen from time to time on other systems. When I've logged into a system and it is slooooow, I've run top and see that ntpd is taking 100% of the cpu. I believe that is because the tee_tty process has died. This message was in the system log for 6:

Oct 23 07:21:35 Ah6 tee_tty: 2012-10-23,07:21:35|NOTICE|received signal Interrupt(2), si_signo=2, si_errno=0, si_code=128

This happens about once every week on 1 or 2 of the 22 systems, so it is hard to diagnose.

tee_tty reads from the physical serial port that the GPS is connected to and sends that data to pseudo-terminals, one read by ntpd and one read by the dsm process. It appears that if tee_tty dies, then ntpd doesn't correctly detect the error on the input port and then likely attempts reads at a high rate.

Previously I tried to change ntpd so that it might catch the error and exit without hanging, without success.

Now I might know why the tee_tty is being sent the SIGINT signal. I turns out the serial port is opened in "cooked" mode, which means that if it somehow receives a ctrl-C on that port then the process is sent a SIGINT. I guess a ctrl-C could also be received on any pseudo-terminal that is opened for reading. A BREAK condition can also result in a SIGINT, but IGNBRK is set on the serial port, and I don't believe BREAKs can be sent over pseudo-terminals.

Today around 3 pm I logged into all the stations and updated the file /etc/gps.conf to change the "c" in GPS_TEE_OPTS to "r", so that the serial port is opened in "raw" mode:

GPS_TEE_OPTS="4800n81lnrxx -p 60 -l pps"

Next I'm rebuilding tee_tty so that if the real serial port is opened in raw mode, the pseudo-terminals are also opened in raw mode. I don't think the ctrl-C could be coming from the dsm process. rserial traps the ctrl-C and terminates.

As of 4:33 pm, the tee_tty app has been updated on all DSMs, and their tee_tty and ntpd processes restarted.

Weather:

Nice crisp and cool morning.  Temps hovering around 8C with humidities in the upper 60%.  

Status:

All stations are communicating via bluetooth and wifi.

Station 6 down on arrival by Cockpit.  Had trouble logging into the dsm but it finally logged me in.  I did a 'adn' then 'aup' to see if I could get the system talking again. Extremely slloooooowwwwww communications with Ah6.  The 'aup' did not work.  I do see sensors alive in Minicom though.  Giving Ah6 a reboot to see if things begin to work.

I did see if the head of the M21, 10m sonic was loose but it was not.  

Gave a tour to EOL Admins today.  

I do want to add that the base trailer has been neglected this project and I am very dissapointed that it got to this condition.  We all need to do a better job in getting organized and vacuuming the carpet.  This will make the trailer last a lot longer and make attitudes a lot better.

Sensors and Issues:

All sensors look good but Ah6 due to no data coming down the pipeline.

To Dos:

Ah6 needs help.  Fixed trh's in base.  Try to figure out what probes that are found laying around that are good.  Organize base for show-and-tell today and tomorrow.

Weather:

            Light winds during the night. A few clouds this morning with calm conditions. High humidity

Status:

            TRH at 2m on A17 went down last night. Used TIO command to get it going again.

            The cockpit for Lambdasoil shows bad data however cockpit A11rs shows good data. A data_dump confirms the data is coming in. Killed lambdasoil cockpit and started another one. Data coming in. 

Also noticed A18 0.5m TRH went out.  We tried to use the DIO lines but nothing happened.  Went out to tower and rebooted mote.  Nothing.  Noticed the cable to the TRH was getting pulled out.  Fixed tightness of cable and all good.  

To Do:

          Finish potting TRH units (Done in the AM). Shoot boom angles and clean Kryptons (Done in the PM).

WEATHER & OPS

On arrival weather was sunny and comfortable temperatures(~15C).  Very calm winds (compared to last week) and traffic on Hwy 85 light.  Can hear prairie dogs singing to the sodars.  

Steve S. and Chris G. have taken over for ops these next couple of weeks.  We have gotten the new Flux system and plugged everything in with nothing happening.  We found out that we need to shut off wifi and tell it to talk ethernet all the time and it all worked.   We brought the theodolites down to see if the Main Tower is plumb from tensioning the guy-wires in the winds last week.  Half a tube to the West and eighth of a tube to the South.  So it wasn't too far off, but we did do some tweeking and it is now a eighth of a tube out to the South.  New tensions were took and logged.

SENSOR STATUS AND ISSUES

No issues were found with any of the towers or sensors.

TO DO

Will repair TRHs tomorrow and finishing potting TRH wagon wheels.  While doing the potting we will both shoot boom angles.  Trying to get issues that require two people since we will try a one person Ops starting Tuesday.

Gordon, Oct 21

We don't have TRH or Handar data from Aph3 last night and today from Oct 20 20:29 to Oct 21 11:51 MDT, due to the old problem of PC104 interrupts stopping.

Installed a new kernel, and rebooted. I don't think the new kernel will solve this issue. Turns out the fix for "pending" interrupts was also in the previous kernel.

Linux Aph3 2.6.35.9-ael1-2-titan #1 PREEMPT Sat Oct 13 12:45:33 MDT 2012 armv5tel unknown

20-Oct-12

Weather & Ops

Clear and pretty calm in the am. Wspd at .5m was <2 and 1m ~2.5 with
the coolest site being 11 in the bottom of the confluence. However
temps have been staying up around 10 ± 4ish so maybe still too warm
for preferred event.

Communications are all up and files are growing. Everyone is reachable.
Gully is humming along, disks show /media/isfs0 75% and pocketec was at
40% but swapped out to bring hr data back to Boulder. New pocketec at 1%

Sensors & Issues

Thank goodness for the qc table and plots. Everything appears up.
Very few diags on sonics. Krypton voltages still up. Battery
voltages good, and they should be after all the sun the last few days.

To Dos

Bring Flux computer back on line
Crew change: jm leaving this morning, SteveS/ChrisG arriving tomorrow
morning.
Separate list also on wiki
Mini-List for ChrisG: shoot another set of boom angles, finish
protective greasing trh wagon-wheels for a14-a18. Put high-q
conductive grease on all din connector pins. Keep eye on battery
charging but it seems ok. Clean rad, licors and eventually kh2o. Do
we want a soil sample? One TRH-i2c cable and pc-console cable in base trailer to repair.

A2 CSAT Swapped

19-Oct-12 09:55-10:15 MST (15:55-16:15Z)

CSAT Removed: s/n 833   This is the unit that was glitching in higher winds.   It's data was ok in light winds, but better safe than sorry.   Could be used-as-last-resort if nothing else is available.

CSAT Installed: s/n 671   Gordon confirmed it was looking good from Boulder.

9-Oct-12

personnel: JohnM, TomH.

Weather / Ops:

Winds diminished last night and remain low in the morning, possibly a good night/day for the science.  Clear throughout the day with seasonally warm temps.

Ingest:   Everything remained up throughout the day.

Flux Computer:

The Flux computer was still running cockpit ok this morning allowing status check.   Tom Horst retreived it by 7:30 and took it to Boulder to be rebuilt before the next crew arrives.

Sensor Status / Issues:

The sensors continue to perform well and are coming in.   Stations remain up and running.   The issues noticed:

a2: 1m CSAT was not having any diag errors because the winds have been low.    It was swapped by 10:15MST (16:15Z): s/n 833 removed, s/n 671 installed.

a8: TRH mote data now looking good.  Since code update have not seen any \r\n hits.

Mote \r\n Issues: I don't see any/many of these in the data from other stations and now ID8 is running well.   A more thorough examination confirmed that really only ID8 was a culprit.   That particular mote was a replacement/repair brought up from Boulder after the site had some trh issues.    The problem appears resolved.

To Do:

  1. Flux computer getting rebuilt in Boulder

18-Oct-12

AP2 serving Stns A8-A13 lost communications last night.   

Power at the interface board was ok, and so were the connections.    Per Gordon's instructions I did not simply unplug/re-plug the usb to AP2, instead shutdown Gully via "poweroff" and rebooted it.   That worked to re-establish the links with all bluetooths/ap24, etc for all stations.

Another method to check the AP1/AP2 is:     pand_check.sh hci#        where #=0 for AP1 and 1 for AP2.  This kills the pand processes serving those bluetooth interfaced radios and sends the hciconfig command to go down, reset, go up and do a piscan.    This may or may not work but's worth a try before doing the full 'poweroff' noted above.

However during the Gully reboot: /media/usbdisk and archiving did not startup automatically but the dsm_server and rsync processes did.

Gully Fix:

The 'adam_env.sh' script in /home/aster/projects/SCP/ISFS/scripts is run during bootup and had the incorrect DATAMNT device listed.   Gordon fixed this to correctly point to /media/isfs0, so now Gully is good to be rebooted when needed.

18-Oct-12

personnel: JohnM, TomH.,  KurtK/ChrisG came up today to retension the main tower.

Weather / Ops:

Winds diminished over last night but still probably still more than 'science preferred'.  Clear throughout the day with brisk winds and seasonally warm temps.  The winds have dropped down by evening (0z for the 19th), so maybe it'll be a good night tonight.

Ingest:   BlueTooth AP2 serving Stns 8-13 died last night.    Gully is ingesting/archiving AP1, AP3241 for the others ok.  See other log entry on Gully/AP comms.

Chadi Sayde came and worked on Christoph's fiber.   He installed the new transformer with Kurt's help.   The kink in his fiber cable remains but is on the 'turn-around-tower' so they're still getting the lower temp readings.   Kurt/Chris checked and retensioned the main tower guys.

Flux Computer Crashes / Sort-of-Fixes:

The Flux computer' fedora/window mgr died and flux had to be rebooted this morning.   This occurred later in the day as well.   Gordon walked us through 'fixing it up' by removing a 2nd disk, fsck checking the boot disk, updating 'grub' to boot proper fedora linux, startup up automatically better (no 'debug manager', and dsm_server/cockpit autostarts.)    However it still appears that the single disk system is sick and will need to be rebuilt in the near future.   We are able to run with it for diagnostics still pretty well though.

Sensor Status / Issues:

Overall the sensors continue to perform well and are coming in.   Stations remain up and running.   The issues noticed:

a17: The comms/dsm BlueTooth has been working pretty well since last night.   It only shows drop outs when I the Gully system/AP2 was restarted.

a2: 1m CSAT is having a diag bit error continuing.

a8: TRH mote data loss from ~10:00-16:30MST (16:00-22:30Z).   The trh mote was reprogrammed this morning, however, I set it for the incorrect destination port so TRH data was lost until ~16:30MST when it was switched to the serial port.

To Do:

  1. A2: Swap CSAT
  2. A8: Monitor TRH data to determine if the new software fixed the \r\n bug.   
  3. Look for 'trh mote' software crc glitches to determine if others need the updated software as well
  4. Nurse Flux and have it rebuilt soon.
A17 communication failure

17-Oct-12

We've had ups and downs on a17 today, suggesting that the intermittent data dropouts are a bluetooth communication issue.   ping_test.sh does show in sometimes and then some long outages, and of course RTdata.

The station itself did reboot this afternoon and based on the /var/log/isfs/messages files has done so a few times in the last few days.  Looking at the messages does show that the blue_check is finding lots of hci/down, restarts so for sure the BTRadio is dropping in and out.  Since it's been a very windy today the thought was perhaps the usb cabling was loose, however, upon checking at the station that did not appear to be the case.   I did put some cable ties on it and re-positioned the mote box slightly with a bigger tie as well.

Conclusion is we'll keep watching it and perhaps swap the radio if this continues.

It looks like the diagbits for the CSAT at A2 has a value of 4, indicating poor signal lock, possibly caused by an obstruction in the path.

John checked A2 for a physical obstruction to the sonic path around 11 am and found none.  When he was able improvise a zero-wind enclosure with his sweatshirt, the diagbits dropped to zero.  Thus the sonic data around this time will be contaminated by John's testing.