Closed Bug 1111389 Opened 10 years ago Closed 10 years ago

sp-processor03.phx1.mozilla.com:Socorro Processors - procs is CRITICAL: PROCS CRITICAL: 0 processes with regex args processor_app

Categories

(Infrastructure & Operations :: MOC: Problems, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: achavez, Unassigned)

Details

I followed the trouble shooting steps in the run book here: https://mana.mozilla.org/wiki/display/NAGIOS/Socorro+Processors+-+procs Here's what happened: /var/log/socorro/socorro-processor.log /var/log/socorro/socorro-processor.log: line 1: /data/socorro/socorro-virtualenv/lib/python2.6/site-packages/configman/config_manager.py:747:: No such file or directory /var/log/socorro/socorro-processor.log: line 2: syntax error near unexpected token `(' /var/log/socorro/socorro-processor.log: line 2: ` 'Invalid options: %s' % ', '.join(unmatched_keys)' [achavez@sp-processor03.phx1 ~]$ /etc/init.d/socorro-processor restart Stopping socorro-processor: [FAILED] Starting socorro-processor: Can't create lock file "/var/run/socorro-processor.pid": Permission denied I'm going to cc lars@mozilla.com to see if this can get fixed.
Flags: needinfo?(lars)
Also received this alert: sp-processor03.phx1.mozilla.com:Socorro Processors - log file age is CRITICAL: FILE_AGE CRITICAL: /var/log/socorro/socorro-processor.log is 808 seconds old and 10221153 bytes Tried running this commands in the run book: https://mana.mozilla.org/wiki/display/NAGIOS/Socorro+Processors+-+log+file+age and got permission denied.
it looks like at 2014-12-14T13:28:55 the processor was summarily killed without it having a chance to log anything. The last thing that it said was: 2014-12-14 13:28:54,747 DEBUG - Thread-4 - BotoBenchmarkWrite save_raw_and_processed 0:00:00.562493 Further investigation shows in /var/log/messages at that instant: Dec 14 13:28:55 sp-processor03.phx1.mozilla.com kernel: python[19139] general protection ip:38a287a83f sp:7fec097d9230 error:0 in libc-2.12.so[38a2800000+197000] so something went seriously wrong within Python itself. About ten minutes later, it appears that Puppet restarted the processor and it continued normally. It continues to work normally. I put this down as an anomaly as we've not seen this behavior before and none of the other processors have experienced it. If it happens again, then it'll warrant a more serious investigation.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(lars)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.