sp-processor03.phx1.mozilla.com:Socorro Processors - procs is CRITICAL: PROCS CRITICAL: 0 processes with regex args processor_app

RESOLVED WORKSFORME

Status

RESOLVED WORKSFORME
4 years ago
4 years ago

People

(Reporter: achavez, Unassigned)

Tracking

Details

(Reporter)

Description

4 years ago
I followed the trouble shooting steps in the run book here:
https://mana.mozilla.org/wiki/display/NAGIOS/Socorro+Processors+-+procs

Here's what happened:

/var/log/socorro/socorro-processor.log
/var/log/socorro/socorro-processor.log: line 1: /data/socorro/socorro-virtualenv/lib/python2.6/site-packages/configman/config_manager.py:747:: No such file or directory
/var/log/socorro/socorro-processor.log: line 2: syntax error near unexpected token `('
/var/log/socorro/socorro-processor.log: line 2: `  'Invalid options: %s' % ', '.join(unmatched_keys)'
[achavez@sp-processor03.phx1 ~]$ /etc/init.d/socorro-processor restart
Stopping socorro-processor:                                [FAILED]
Starting socorro-processor: Can't create lock file "/var/run/socorro-processor.pid": Permission denied

I'm going to cc lars@mozilla.com to see if this can get fixed.
(Reporter)

Updated

4 years ago
Flags: needinfo?(lars)
(Reporter)

Comment 1

4 years ago
Also received this alert:  sp-processor03.phx1.mozilla.com:Socorro Processors - log file age is CRITICAL: FILE_AGE CRITICAL: /var/log/socorro/socorro-processor.log is 808 seconds old and 10221153 bytes 

Tried running this commands in the run book:
https://mana.mozilla.org/wiki/display/NAGIOS/Socorro+Processors+-+log+file+age and got permission denied.
it looks like at 2014-12-14T13:28:55 the processor was summarily killed without it having a chance to log anything.  The last thing that it said was:

2014-12-14 13:28:54,747 DEBUG  - Thread-4 - BotoBenchmarkWrite save_raw_and_processed 0:00:00.562493

Further investigation shows in /var/log/messages at that instant:

Dec 14 13:28:55 sp-processor03.phx1.mozilla.com kernel: python[19139] general protection ip:38a287a83f sp:7fec097d9230 error:0 in libc-2.12.so[38a2800000+197000]

so something went seriously wrong within Python itself.  About ten minutes later, it appears that Puppet restarted the processor and it continued normally.  It continues to work normally.

I put this down as an anomaly as we've not seen this behavior before and none of the other processors have experienced it.   If it happens again, then it'll warrant a more serious investigation.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Flags: needinfo?(lars)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.