If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Processors and monitor can get into a non-working state, while still being active

RESOLVED WONTFIX

Status

Socorro
Backend
RESOLVED WONTFIX
5 years ago
4 years ago

People

(Reporter: laura, Assigned: lars)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

5 years ago
This morning, we experienced a problem with Zeus, which led to processor and monitor failure.  Ashish restarted everything but things did not come back to life.

Investigation of logs showed the following items:
- processor07 was wedged and not doing anything.  The last log lines were
2012-08-24 04:47:49,078 INFO - MainThread - registering with 'processors' table
2012-08-24 04:47:49,079 DEBUG - MainThread - looking for a dead processor for host sp-processor07.phx1.mozilla.com
2012-08-24 04:47:49,081 INFO - MainThread - will step in for processor 3168
2012-08-24 04:47:49,082 DEBUG - MainThread - taking over a dead processor

- Other processors showed no jobs to do, and were sleeping for six seconds each

- The monitor was running, but the only things showing up in its logs were the cleanup job, and the priority job thread.  The MainThread was not showing up.  Priority jobs were being processed normally.  No other jobs were being processed.

- The contents of the processors table was as follows:
http://ashish.pastebin.mozilla.org/1774060

- The contents of the jobs table was as follows:
http://ashish.pastebin.mozilla.org/1774061

I had Ashish stop all processors and monitors, and run the following SQL:
delete from processors;
delete from jobs;
and then restart the processors and monitor.  Everything came back to life normally.

Lars, can you explain what might have happened, and decide whether there is something we might change in the code to avoid similar problems in future?
(Reporter)

Comment 1

5 years ago
Excerpt from monitor log during the failure:
http://laura.pastebin.mozilla.org/1774089
(Assignee)

Comment 2

4 years ago
the monitor is deprecated - that made this problem go away.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.