Closed
Bug 1066338
Opened 11 years ago
Closed 11 years ago
[PulseGuardian] Gracefully handle queues with no message information
Categories
(Webtools :: Pulse, defect)
Webtools
Pulse
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: cliang, Unassigned)
Details
Attachments
(1 file)
|
34.11 KB,
text/plain
|
Details |
The PulseGuardian log looks like this:
INFO:__main__:PulseGuardian started
Traceback (most recent call last):
File "guardian.py", line 253, in <module>
pulse_guardian.guard()
File "guardian.py", line 223, in guard
self.monitor_queues(queues)
File "guardian.py", line 119, in monitor_queues
queue = self.update_queue_information(queue_data)
File "guardian.py", line 70, in update_queue_information
q_size, q_name = (queue_data['messages'],
KeyError: 'messages'
If I look at the queues via the web interface, I see that there are two queues that report "?" for messages:
- bugzfeed-dev-1.ateam.aws
- queue/mcote/b528fa1c-352e-11e4-b6d9-14109fd2a13f
When I query the rabbitMQ API for queue information, these queues show no message_stats.
Comment 1•11 years ago
|
||
Yeah, these queues are weird. I'm not sure how they get into this state, since they are set to autodelete, but there's no associated connection. Afaik that shouldn't be possible.
I'd like to dig in a bit more as to why/how this happens, but for now, I've just modified pulseguardian to ignore queues in this state. I'm going to leave this bug open (and rename it) for further investigation. We should probably be at least alerting the owner and/or admin in this case.
Here's the commit that at least prevents pulseguardian from crashing:
https://github.com/mozilla/pulseguardian/commit/aac05d9c6468dabb1c3d149834e4113ebc86671c
OS: Mac OS X → All
Hardware: x86 → All
Summary: PulseGuardian fails when no message information is returned for a queue → [PulseGuardian] Gracefully handle queues with no message information
| Reporter | ||
Comment 2•11 years ago
|
||
Change pushed out to both dev and production environments. The production Pulse Guardian process seems to have successfully dealt with the two queues that are in a zombie state.
$ cd /data/pulse/src/pulse/pulseguardian
$ sudo git pull
remote: Counting objects: 4, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 4 (delta 3), reused 1 (delta 0)
Unpacking objects: 100% (4/4), done.
From https://github.com/mozilla/pulseguardian
6c49143..aac05d9 master -> origin/master
Updating 6c49143..aac05d9
Fast-forward
pulseguardian/guardian.py | 8 ++++++++
1 file changed, 8 insertions(+)
$ sudo /data/pulse/deploy pulse
[2014-09-12 14:50:57] Running rsync_project
[2014-09-12 14:50:57] [localhost] running: /usr/bin/rsync -aq --include '.gitkeep' --exclude '.git*' --exclude '.hg*' --exclude '.svn*' --exclude 'CVS' --exclude '.bzr*' --delete /data/pulse/src/pulse/ /data/pulse/www/pulse/
[2014-09-12 14:50:57] [localhost] finished: /usr/bin/rsync -aq --include '.gitkeep' --exclude '.git*' --exclude '.hg*' --exclude '.svn*' --exclude 'CVS' --exclude '.bzr*' --delete /data/pulse/src/pulse/ /data/pulse/www/pulse/ (0.051s)
[2014-09-12 14:50:57] Finished rsync_project (0.052s)
[2014-09-12 14:50:57] Running commit_www
[2014-09-12 14:50:57] [localhost] running: cd /data/pulse/www && /usr/bin/git add .; /usr/bin/git commit -a -m 'deploy ['pulse']'
[2014-09-12 14:50:57] [localhost] finished: cd /data/pulse/www && /usr/bin/git add .; /usr/bin/git commit -a -m 'deploy ['pulse']' (0.013s)
[2014-09-12 14:50:57] Finished commit_www (0.013s)
[2014-09-12 14:50:57] Running push_www
[2014-09-12 14:50:57] [pulse-app1.dmz.phx1.mozilla.com] running: /data/bin/update-www.sh pulse
[2014-09-12 14:51:01] [pulse-app1.dmz.phx1.mozilla.com] finished: /data/bin/update-www.sh pulse (3.430s)
[2014-09-12 14:51:01] Finished push_www (3.431s)
[2014-09-12 14:51:01] Starting new HTTPS connection (1): changelog.paas.allizom.org
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 3•8 years ago
|
||
By the way, I saw that new queues also are missing many of their fields. Waiting a few seconds of creation, the fields appear. So perhaps this is a symptom of an un-synchronized queue?
You need to log in
before you can comment on or make changes to this bug.
Description
•