Open Bug 1609989 Opened 5 years ago Updated 5 years ago

pulseguardian cannot delete exclusive queues, doesn't log about it

Categories

(Webtools :: Pulse, defect)

defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: dustin, Assigned: asmaknikar, Mentored)

Details

(Keywords: good-first-bug, Whiteboard: [lang=python])

Jan 17 12:59:53 pulseguardian app/worker.1: [Other] New queue. {"queuename": "queue/mrgigglesdev/tc-builds-dev", "queuesize": 113774, "queuedurable": false, "valid": true, "ownername": "mrgigglesdev", "newowner": false} ["queue"]
Jan 17 12:59:53 pulseguardian app/worker.1: [Other] Deleting queue. {"queuename": "queue/mrgigglesdev/tc-builds-dev", "queuesize": 113774, "warningthreshold": 4000, "deletionthreshold": 20000} ["queue"]

This was an excl queue, and pulseguardian kept trying to delete it but failed, and didn't log anything. Trying to delete in the management console gave

405 RESOURCE_LOCKED - cannot obtain exclusive access to locked queue 'queue/mrgigglesdev/tc-builds-dev' in vhost '/'

I'd like to have at least seen that in the logs.

CloudAMQP's alerting did let us know about this queue, which is how we knew to look.

The fix was to manually terminate the connection, which automatically deleted the queue. if this happens again we should probably try to automate that.

Yeah, I saw a huge pile of email from this when I started working today, and immediately went in to shut the client down. (It's a development instance of the mrgiggles IRC bot. The main bot's pulse connection still seems happy.) I haven't investigated to see why it was behind. I assumed that the queue was being deleted, but the bot retried and re-created it, but it sounds like that's wrong?

I did get multiple emails per minute for a few hours out of it!

It looks like the connection was stuck, not consuming any messages but holding the queue open. It's somthing pulseguardian should be resilient to.

Keywords: good-first-bug
Whiteboard: [lang=python]
Assignee: dustin → nobody
Mentor: dustin

Is someone assigned or can I attempt to work on it?

(In reply to Ashish Maknikar from comment #3)

Is someone assigned or can I attempt to work on it?

i.e can I be assigned. It is my first bug.

Sure! Have you gotten pulseguardian development set up?

Assignee: nobody → asmaknikar

Can you guide me. Is there a git repo or is it included somewhere in the mozilla source repo.

The repository is here and has getting-started directions in the README. I'd recommend getting that cloned and getting the existing tests running, then looking into how you might reproduce the situation described above, then thinking about how to fix it.

Did you have any issues with python absolute path while running tests. The test/runtests file seems not to be able to refer to the pulseguardian module(saying it does not exist).I have temporatily added it to the environment PYTHONPATH variable.

I think you need to do these steps as well, before running the tests (this isn't clear from the README -- feel free to make a PR to clarify!)

Within the chosen environment, install and configure PulseGuardian:

Install the requirements:

  pip install -r requirements.txt

Install the package. This will ensure you have access to the pulseguardian package from anywhere in your virtualenv.

 python setup.py develop
You need to log in before you can comment on or make changes to this bug.