Closed
Bug 628720
Opened 14 years ago
Closed 14 years ago
twistd.log exceptions from amqp-related (pulse) failures
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Assigned: catlee)
References
Details
(Whiteboard: [pulse])
Attachments
(2 files)
16.54 KB,
patch
|
dustin
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
1.07 KB,
patch
|
dustin
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
We're getting a lot of mail with tracebacks in amqplib or pulse code. I *suspect* that these are harmless and the messages are safely re-transmitted later. If that's the case, it would be great to handle the exceptions smoothly so we don't get emails about them.
Exception in /builds/buildbot/build_master3/master/twistd.log:
2011-01-25 08:12:04-0800 [-] Unhandled Error
Traceback (most recent call last):
File "/tools/python-2.6.5/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/twisted/python/threadpool.py", line 207, in _worker
result = context.call(ctx, function, *args, **kwargs)
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/twisted/python/context.py", line 59, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/twisted/python/context.py", line 37, in callWithContext
return func(*args,**kw)
--- <exception caught here> ---
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/buildbotcustom/status/pulse.py", line 95, in threadedPush
self.publisher.publish(msg)
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/mozillapulse/publishers.py", line 66, in publish
routing_key=message.routing_key)
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/carrot/messaging.py", line 657, in __init__
self.declare()
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/carrot/messaging.py", line 669, in declare
auto_delete=self.auto_delete)
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/carrot/backends/pyamqplib.py", line 253, in exchange_declare
return self.channel.exchange_declare(exchange=exchange,
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/carrot/backends/pyamqplib.py", line 188, in channel
self._channel_ref = weakref.ref(connection.channel())
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/amqplib/client_0_8/connection.py", line 230, in channel
return Channel(self, channel_id)
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/amqplib/client_0_8/channel.py", line 82, in __init__
self._x_open()
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/amqplib/client_0_8/channel.py", line 471, in _x_open
(20, 11), # Channel.open_ok
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/amqplib/client_0_8/abstract_channel.py", line 89, in wait
self.channel_id, allowed_methods)
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/amqplib/client_0_8/connection.py", line 198, in _wait_method
self.method_reader.read_method()
File "/builds/buildbot/build_master3/lib/python2.6/site-packages/amqplib/client_0_8/method_framing.py", line 215, in read_method
raise m
socket.error: [Errno 110] Connection timed out
Exception in /builds/buildbot/try_master1/master/twistd.log:
2011-01-25 01:14:06-0800 [-] Unhandled Error
Traceback (most recent call last):
File "/tools/python-2.6.5/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/builds/buildbot/try_master1/lib/python2.6/site-packages/twisted/python/threadpool.py", line 207, in _worker
result = context.call(ctx, function, *args, **kwargs)
File "/builds/buildbot/try_master1/lib/python2.6/site-packages/twisted/python/context.py", line 59, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/builds/buildbot/try_master1/lib/python2.6/site-packages/twisted/python/context.py", line 37, in callWithContext
return func(*args,**kw)
--- <exception caught here> ---
File "/builds/buildbot/try_master1/buildbotcustom/status/pulse.py", line 95, in threadedPush
self.publisher.publish(msg)
File "build/bdist.linux-i686/egg/mozillapulse/publishers.py", line 67, in publish
Reporter | ||
Updated•14 years ago
|
Assignee: nobody → catlee
Priority: -- → P2
Whiteboard: [pulse]
Assignee | ||
Updated•14 years ago
|
Summary: twistd.log exceptions from amqp-related failures → twistd.log exceptions from amqp-related (pulse) failures
Assignee | ||
Comment 2•14 years ago
|
||
A lot has changed. It may be easier to look at the whole file here: http://hg.mozilla.org/users/catlee_mozilla.com/buildbotcustom/file/b2f71b37bba8/status/pulse.py
These errors are due to the firewall shutting down our connection to pulse after some maximum amount of time (used to be 30 minutes, now 12h). To work around this, we need to be proactive about shutting our connection down.
Two timers have been added:
* _disconnect_timer, which fires after a connection has been active more than max_connect_time seconds
* _idle_timer, which fires after a connection has been idle more than max_idle_time seconds
When either of these fire, they force us to disconnect. Subsequent messages will automatically reconnect.
I've also added a heartbeat message, sent every 15 minutes by default.
Attachment #509890 -
Flags: review?(dustin)
Reporter | ||
Comment 3•14 years ago
|
||
Comment on attachment 509890 [details] [diff] [review]
pimped out pulse
From IRC, this is still hanging reconfigs, so no review:
09:41 < catlee> dustin: so ignore my pulse patch
09:41 < catlee> I'm still getting hangs sometimes on reconfig
Attachment #509890 -
Flags: review?(dustin)
Assignee | ||
Comment 4•14 years ago
|
||
Comment on attachment 509890 [details] [diff] [review]
pimped out pulse
can't reproduce the problem I was having before in staging. it's most likely due to an older instance of the pulse plugin that was still running.
Attachment #509890 -
Flags: review?(dustin)
Reporter | ||
Comment 5•14 years ago
|
||
Comment on attachment 509890 [details] [diff] [review]
pimped out pulse
+ a connection to the broken open.
r^
Attachment #509890 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 6•14 years ago
|
||
Comment on attachment 509890 [details] [diff] [review]
pimped out pulse
http://hg.mozilla.org/build/buildbotcustom/rev/aacca5aff8eb
Attachment #509890 -
Flags: checked-in+
Assignee | ||
Comment 7•14 years ago
|
||
Some fallout resulting in this exception:
2011-03-08 11:32:30-0800 [-] Pulse <0x23dfeb6c>: Disconnecting after 600 seconds
2011-03-08 11:32:30-0800 [-] Unhandled Error
Traceback (most recent call last):
File "/builds/buildbot/tests_master3/lib/python2.6/site-packages/buildbotcustom/status/pulse.py", line 277, in _finished
File "/builds/buildbot/tests_master3/lib/python2.6/site-packages/twisted/internet/defer.py", line 1122, in release
d.callback(self)
File "/builds/buildbot/tests_master3/lib/python2.6/site-packages/twisted/internet/defer.py", line 318, in callback
self._startRunCallbacks(result)
File "/builds/buildbot/tests_master3/lib/python2.6/site-packages/twisted/internet/defer.py", line 424, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/builds/buildbot/tests_master3/lib/python2.6/site-packages/twisted/internet/defer.py", line 441, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "/builds/buildbot/tests_master3/lib/python2.6/site-packages/buildbotcustom/status/pulse.py", line 193, in set_timers
return self
File "/builds/buildbot/tests_master3/lib/python2.6/site-packages/twisted/internet/base.py", line 89, in cancel
raise error.AlreadyCalled
twisted.internet.error.AlreadyCalled: Tried to cancel an already-called event.
will try and repro in staging and come up with a fix.
Assignee | ||
Comment 8•14 years ago
|
||
in cases where the network is dropping packets, we can get blocked elsewhere trying to send events or disconnect. if the idle timer fires during this period we won't handle it due to the thread lock in place, and it's possible that the next pending call on the thread lock is another pushEvents call which will try and cancel the (already called, but not handled) idle timer.
Attachment #517992 -
Flags: review?(dustin)
Reporter | ||
Updated•14 years ago
|
Attachment #517992 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 9•14 years ago
|
||
Comment on attachment 517992 [details] [diff] [review]
guard against idle timer firing
http://hg.mozilla.org/build/buildbotcustom/rev/12fcaba68965
Attachment #517992 -
Flags: checked-in+
Assignee | ||
Updated•14 years ago
|
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•