Closed
Bug 665254
Opened 13 years ago
Closed 13 years ago
idleizer: call loseConnection when master does not support slave-initiated graceful shutdown
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Assigned: dustin)
References
Details
Attachments
(1 file)
1.84 KB,
patch
|
catlee
:
review+
|
Details | Diff | Splinter Review |
We're seeing
2011-06-16 18:22:23-0700 [Broker,client] Master does not support slave initiated shutdown. Upgrade master to 0.8.3 or later to use this feature.
because masters are still 0.8.2.
In this case, when the slave *is* eventually graceful'd, it's already primed to reboot. I suspect adding a simple loseConnection() call in there would give us success without any significant risk of burning a build which just happened to start as the slave was rebooting itself.
Assignee | ||
Comment 1•13 years ago
|
||
From bug 665765, gracefulShutdown's immediate shutdown if not connected is also problematic - saw this on three hosts over the weekend. It turns out that the solution is easiest to do in the same patch as for this bug, so I'm merging them here.
Assignee | ||
Comment 3•13 years ago
|
||
With the fix, a *connected* idle looks like:
2011-06-20 16:42:34-0700 [-] I feel very idle and was thinking of rebooting as soon as the buildmaster says it's OK
2011-06-20 16:42:34-0700 [-] Telling the master we want to shutdown after any running builds are finished
2011-06-20 16:42:34-0700 [Broker,client] Master does not support slave initiated shutdown. Upgrade master to 0.8.3 or later to use this feature.
2011-06-20 16:42:34-0700 [Broker,client] rebooting NOW, since the master won't talk to us
2011-06-20 16:42:34-0700 [Broker,client] Invoking platform-specific reboot command
2011-06-20 16:42:34-0700 [Broker,client] lost remote
2011-06-20 16:42:34-0700 [Broker,client] lost remote
2011-06-20 16:42:34-0700 [Broker,client] lost remote
(and rebooted)
and a disconnected slave looks like:
2011-06-20 16:48:18-0700 [-] Connecting to preproduction-master.build.sjc1.mozilla.comm:9010
2011-06-20 16:48:18-0700 [-] Connection to preproduction-master.build.sjc1.mozilla.comm:9010 failed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.DNSLookupError'>: DNS lookup failed: address 'prepr
oduction-master.build.sjc1.mozilla.comm' not found: [Errno 8] nodename nor servname provided, or not known.
]
2011-06-20 16:48:18-0700 [-] <twisted.internet.tcp.Connector instance at 0x1012721b8> will retry in 15 seconds
2011-06-20 16:48:18-0700 [-] Stopping factory <buildslave.bot.BotFactory instance at 0x101879518>
2011-06-20 16:48:21-0700 [-] I feel very idle and was thinking of rebooting as soon as the buildmaster says it's OK
2011-06-20 16:48:21-0700 [-] No active connection, rebooting NOW
2011-06-20 16:48:21-0700 [-] Invoking platform-specific reboot command
2011-06-20 16:48:23-0700 [-] Main loop terminated.
2011-06-20 16:48:23-0700 [-] Server Shut Down.
(and rebooted)
(this was tested on moz2-darwin10-slave01. No builds were burned in the making of this patch. Does not contain BPA.)
Attachment #540625 -
Flags: review?(catlee)
Updated•13 years ago
|
Attachment #540625 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 4•13 years ago
|
||
Committed to the 'slaves' branch.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•