Closed
Bug 791909
Opened 12 years ago
Closed 12 years ago
pulsebuildmonitor timed out and never reconnected
Categories
(Webtools :: Pulse, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mcote, Assigned: jgriffin)
Details
Attachments
(1 file)
1.38 KB,
patch
|
jgriffin
:
review+
|
Details | Diff | Splinter Review |
I'm not sure if this is a bug or a feature request, but both the autophone production server (Mountain View) and staging server (Montreal) died this weekend with this exception: Exception in thread Thread-1: Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner self.run() File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 477, in run self.__target(*self.__args, **self.__kwargs) File "/Users/mozauto/pulsebuildmonitor/pulsebuildmonitor/pulsebuildmonitor.py", line 95, in listen self.pulse.listen() File "/Library/Python/2.6/site-packages/mozillapulse/consumers.py", line 136, in listen self.consumer.wait() File "/Library/Python/2.6/site-packages/carrot/messaging.py", line 446, in wait it.next() File "/Library/Python/2.6/site-packages/carrot/backends/pyamqplib.py", line 300, in consume self.channel.wait() File "/Library/Python/2.6/site-packages/amqplib/client_0_8/abstract_channel.py", line 95, in wait self.channel_id, allowed_methods) File "/Library/Python/2.6/site-packages/amqplib/client_0_8/connection.py", line 202, in _wait_method self.method_reader.read_method() File "/Library/Python/2.6/site-packages/amqplib/client_0_8/method_framing.py", line 221, in read_method raise m error: [Errno 60] Operation timed out The staging server last found a build on Sept 14 at 21:34 EDT, and the production server on Sept 15 at 20:38 PDT. I'm not sure if the pulsebuildmonitor is *supposed* to continue to reconnect or not, but it would be nice if it did. :)
Assignee | ||
Comment 1•12 years ago
|
||
We do want it to reconnect. I've updated pulsebuildmonitor on pypi (v0.64) to include the fix from bug 788580. Hopefully if you update your pulsebuildmonitor, this will stop happening.
Reporter | ||
Comment 2•12 years ago
|
||
I was using the latest code from the repo when I saw this exception, though.
Assignee | ||
Comment 3•12 years ago
|
||
Oh, right. That fix requires the caller to wrap listen in try/except, and call listen again in case of failure, if desired. I can fix this so that it happens automatically...I guess there is no reason we'd want to propagate the exception to the caller.
Reporter | ||
Comment 4•12 years ago
|
||
Just got the same traceback, though with "[Errno 54] Connection reset by peer" this time. Yeah I think it makes sense to do this in the pulsebuildmonitor. Even if pulse goes down for a day or two, ideally I wouldn't have to restart my listeners. I can't think of a good reason for making the user shut down and start up either the program or the listener thread, unless maybe I have configured a timeout.
Reporter | ||
Comment 5•12 years ago
|
||
What do you think about something like this?
Attachment #664944 -
Flags: review?(jgriffin)
Assignee | ||
Comment 6•12 years ago
|
||
Comment on attachment 664944 [details] [diff] [review] Relaunch listener if exception detected Review of attachment 664944 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for the fix! Looks good with the fix below. ::: pulsebuildmonitor/pulsebuildmonitor.py @@ +95,4 @@ > self.make_pulse_consumer() > + while True: > + try: > + self.pulse.listen() You should pull self.make_pulse_consumer() into the try clause, so that it gets called before self.pulse.listen(). Otherwise, the amqp lib may attempt to re-use an existing dead connection, and it will not be successful in reconnecting. Creating a new pulse consumer works around this problem.
Attachment #664944 -
Flags: review?(jgriffin) → review+
Reporter | ||
Comment 7•12 years ago
|
||
Cool, fixed and pushed: http://hg.mozilla.org/automation/pulsebuildmonitor/rev/6e94fe6db44c
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•