Closed Bug 791909 Opened 12 years ago Closed 12 years ago

pulsebuildmonitor timed out and never reconnected

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mcote, Assigned: jgriffin)

Details

Attachments

(1 file)

Relaunch listener if exception detected 12 years ago Mark Côté [:mcote] 1.38 KB, patch	jgriffin : review+	Details \| Diff \| Splinter Review

Mark Côté [:mcote]

Reporter

Description

•

12 years ago

I'm not sure if this is a bug or a feature request, but both the autophone production server (Mountain View) and staging server (Montreal) died this weekend with this exception:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner
    self.run()
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 477, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/mozauto/pulsebuildmonitor/pulsebuildmonitor/pulsebuildmonitor.py", line 95, in listen
    self.pulse.listen()
  File "/Library/Python/2.6/site-packages/mozillapulse/consumers.py", line 136, in listen
    self.consumer.wait()
  File "/Library/Python/2.6/site-packages/carrot/messaging.py", line 446, in wait   
    it.next()
  File "/Library/Python/2.6/site-packages/carrot/backends/pyamqplib.py", line 300, in consume
    self.channel.wait()
  File "/Library/Python/2.6/site-packages/amqplib/client_0_8/abstract_channel.py", line 95, in wait
    self.channel_id, allowed_methods)
  File "/Library/Python/2.6/site-packages/amqplib/client_0_8/connection.py", line 202, in _wait_method
    self.method_reader.read_method()
  File "/Library/Python/2.6/site-packages/amqplib/client_0_8/method_framing.py", line 221, in read_method
    raise m
error: [Errno 60] Operation timed out

The staging server last found a build on Sept 14 at 21:34 EDT, and the production server on Sept 15 at 20:38 PDT. 

I'm not sure if the pulsebuildmonitor is *supposed* to continue to reconnect or not, but it would be nice if it did. :)

Jonathan Griffin (:jgriffin)

Assignee

Comment 1

•

12 years ago

We do want it to reconnect.  I've updated pulsebuildmonitor on pypi (v0.64) to include the fix from bug 788580.  Hopefully if you update your pulsebuildmonitor, this will stop happening.

Mark Côté [:mcote]

Reporter

Comment 2

•

12 years ago

I was using the latest code from the repo when I saw this exception, though.

Jonathan Griffin (:jgriffin)

Assignee

Comment 3

•

12 years ago

Oh, right.  That fix requires the caller to wrap listen in try/except, and call listen again in case of failure, if desired.  I can fix this so that it happens automatically...I guess there is no reason we'd want to propagate the exception to the caller.

Mark Côté [:mcote]

Reporter

Comment 4

•

12 years ago

Just got the same traceback, though with "[Errno 54] Connection reset by peer" this time.

Yeah I think it makes sense to do this in the pulsebuildmonitor. Even if pulse goes down for a day or two, ideally I wouldn't have to restart my listeners. I can't think of a good reason for making the user shut down and start up either the program or the listener thread, unless maybe I have configured a timeout.

Mark Côté [:mcote]

Reporter

Comment 5

•

12 years ago

Attached patch Relaunch listener if exception detected — Details — Splinter Review

What do you think about something like this?

Attachment #664944 - Flags: review?(jgriffin)

Jonathan Griffin (:jgriffin)

Assignee

Comment 6

•

12 years ago

Comment on attachment 664944 [details] [diff] [review]
Relaunch listener if exception detected

Review of attachment 664944 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the fix!  Looks good with the fix below.

::: pulsebuildmonitor/pulsebuildmonitor.py
@@ +95,4 @@
>      self.make_pulse_consumer()
> +    while True:
> +      try: 
> +        self.pulse.listen()

You should pull self.make_pulse_consumer() into the try clause, so that it gets called before self.pulse.listen().

Otherwise, the amqp lib may attempt to re-use an existing dead connection, and it will not be successful in reconnecting.  Creating a new pulse consumer works around this problem.

Attachment #664944 - Flags: review?(jgriffin) → review+

Mark Côté [:mcote]

Reporter

Comment 7

•

12 years ago

Cool, fixed and pushed: http://hg.mozilla.org/automation/pulsebuildmonitor/rev/6e94fe6db44c

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

pulsebuildmonitor timed out and never reconnected

Categories

(Webtools :: Pulse, defect)

Tracking

(Not tracked)

People

(Reporter: mcote, Assigned: jgriffin)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Attachment

General

Description

File Name

Content Type