Closed Bug 1066338 Opened 11 years ago Closed 11 years ago

[PulseGuardian] Gracefully handle queues with no message information

Categories

(Webtools :: Pulse, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cliang, Unassigned)

Details

Attachments

(1 file)

Attached file api_queue.json_pretty
The PulseGuardian log looks like this: INFO:__main__:PulseGuardian started Traceback (most recent call last): File "guardian.py", line 253, in <module> pulse_guardian.guard() File "guardian.py", line 223, in guard self.monitor_queues(queues) File "guardian.py", line 119, in monitor_queues queue = self.update_queue_information(queue_data) File "guardian.py", line 70, in update_queue_information q_size, q_name = (queue_data['messages'], KeyError: 'messages' If I look at the queues via the web interface, I see that there are two queues that report "?" for messages: - bugzfeed-dev-1.ateam.aws - queue/mcote/b528fa1c-352e-11e4-b6d9-14109fd2a13f When I query the rabbitMQ API for queue information, these queues show no message_stats.
Yeah, these queues are weird. I'm not sure how they get into this state, since they are set to autodelete, but there's no associated connection. Afaik that shouldn't be possible. I'd like to dig in a bit more as to why/how this happens, but for now, I've just modified pulseguardian to ignore queues in this state. I'm going to leave this bug open (and rename it) for further investigation. We should probably be at least alerting the owner and/or admin in this case. Here's the commit that at least prevents pulseguardian from crashing: https://github.com/mozilla/pulseguardian/commit/aac05d9c6468dabb1c3d149834e4113ebc86671c
OS: Mac OS X → All
Hardware: x86 → All
Summary: PulseGuardian fails when no message information is returned for a queue → [PulseGuardian] Gracefully handle queues with no message information
Change pushed out to both dev and production environments. The production Pulse Guardian process seems to have successfully dealt with the two queues that are in a zombie state. $ cd /data/pulse/src/pulse/pulseguardian $ sudo git pull remote: Counting objects: 4, done. remote: Compressing objects: 100% (4/4), done. remote: Total 4 (delta 3), reused 1 (delta 0) Unpacking objects: 100% (4/4), done. From https://github.com/mozilla/pulseguardian 6c49143..aac05d9 master -> origin/master Updating 6c49143..aac05d9 Fast-forward pulseguardian/guardian.py | 8 ++++++++ 1 file changed, 8 insertions(+) $ sudo /data/pulse/deploy pulse [2014-09-12 14:50:57] Running rsync_project [2014-09-12 14:50:57] [localhost] running: /usr/bin/rsync -aq --include '.gitkeep' --exclude '.git*' --exclude '.hg*' --exclude '.svn*' --exclude 'CVS' --exclude '.bzr*' --delete /data/pulse/src/pulse/ /data/pulse/www/pulse/ [2014-09-12 14:50:57] [localhost] finished: /usr/bin/rsync -aq --include '.gitkeep' --exclude '.git*' --exclude '.hg*' --exclude '.svn*' --exclude 'CVS' --exclude '.bzr*' --delete /data/pulse/src/pulse/ /data/pulse/www/pulse/ (0.051s) [2014-09-12 14:50:57] Finished rsync_project (0.052s) [2014-09-12 14:50:57] Running commit_www [2014-09-12 14:50:57] [localhost] running: cd /data/pulse/www && /usr/bin/git add .; /usr/bin/git commit -a -m 'deploy ['pulse']' [2014-09-12 14:50:57] [localhost] finished: cd /data/pulse/www && /usr/bin/git add .; /usr/bin/git commit -a -m 'deploy ['pulse']' (0.013s) [2014-09-12 14:50:57] Finished commit_www (0.013s) [2014-09-12 14:50:57] Running push_www [2014-09-12 14:50:57] [pulse-app1.dmz.phx1.mozilla.com] running: /data/bin/update-www.sh pulse [2014-09-12 14:51:01] [pulse-app1.dmz.phx1.mozilla.com] finished: /data/bin/update-www.sh pulse (3.430s) [2014-09-12 14:51:01] Finished push_www (3.431s) [2014-09-12 14:51:01] Starting new HTTPS connection (1): changelog.paas.allizom.org
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
By the way, I saw that new queues also are missing many of their fields. Waiting a few seconds of creation, the fields appear. So perhaps this is a symptom of an un-synchronized queue?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: