Closed
Bug 816345
Opened 12 years ago
Closed 12 years ago
solution to stop buildslaves on foopies for b2g pandas
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mozilla, Assigned: mozilla)
Details
We stop buildslaves on talos hardware by rebooting and checking slavealloc before starting buildbot.
We stop buildslaves on Android tegra foopies by a combination of clientproxy, flag files, and cron jobs, none of which have knowledge of whether we think the tegra/buildslave are actually in use at this point in time.
We want a non-clientproxy solution for the b2g panda foopies, but we can't reboot due to running multiple buildslaves on a single host.
Assignee | ||
Comment 1•12 years ago
|
||
Solution 1:
Set reboot_command to a buildslave_stop.sh script, e.g.:
'reboot_command': ['scripts/external_tools/buildslave_stop.sh']
In that example, we're creating a mozharness/external_tools/buildslave_stop.sh script.
At its bare bones, it's
#!/bin/sh
cd ..
/tools/buildbot-0.8.4-pre-moz2/bin/buildslave stop .
sleep 600
However, if we just run it like that, it'll stop the buildslave after every test run, and we'll run out of buildslaves very very fast.
Putting in a grep for something would allow us to make this conditional, e.g.
#!/bin/sh
egrep -q '^bad_slave:' properties/*
if [ $? -ne 0 ] ; then
cd ..
/tools/buildbot-0.8.4-pre-moz2/bin/buildslave stop .
sleep 600
fi
That would require us to self.set_buildbot_property("bad_slave", "somevalue", write_to_file=True) to trigger. Ideally we also self.buildbot_status(TBPL_RETRY) so buildbot retries the job.
We could also do a similar grep in logs/log_error.log for a special string.
This makes the buildslave go away without any sort of weird error message, since the reboot step already expects the slave might go away.
Assignee | ||
Comment 2•12 years ago
|
||
Solution 2:
We can call buildslave stop . directly from the script, e.g.:
dirs = self.query_abs_dirs()
self.warning("Looks like a bad slave; setting buildbot RETRY and stopping the buildslave.")
self.buildbot_status(TBPL_RETRY)
buildslave = self.query_exe('buildslave', return_type="list")
self.run_command(buildslave + ["stop", "."], cwd="%s/.." % dirs['base_work_dir'])
time.sleep(600)
This is a bit messier log-wise, since you'll see the connection reset message in the log. We also won't process any properties (maybe we don't want to?)
We don't have to rely on property files parsing or log parsing to get to the buildslave stop .
In either approach, we have to somehow mark the device as bad, and make sure we know that the buildslave was stopped on purpose (so we don't log into a foopy, wonder why 5 buildslaves are stopped, and just start them up again).
Comment 3•12 years ago
|
||
I think we could output something into a file or property and then in the reboot function include a case for "propert/file based disconnect requested".
What do you think? The disconnection would happen on the reboot step rather on the test step.
BTW, let's remind ourselves that this is a nice to have since the tegras don't have this feature.
From looking at code in here:
http://hg.mozilla.org/build/buildbotcustom/file/default/process/factory.py#l6273
hg.mozilla.org/build/buildbotcustom/file/default/steps/misc.py#l446
"If force_disconnect is True, then the slave will always be disconnected after the command completes".
6273 def reboot(self):
6274 def do_disconnect(cmd):
6275 try:
6276 if 'SCHEDULED REBOOT' in cmd.logs['stdio'].getText():
6277 return True
6278 except:
6279 pass
6280 return False
6281 if self.reboot_command:
6282 self.addStep(DisconnectStep(
6283 name='reboot',
6284 flunkOnFailure=False,
6285 warnOnFailure=False,
6286 alwaysRun=True,
6287 workdir='.',
6288 description="reboot",
6289 command=self.reboot_command,
6290 force_disconnect=do_disconnect,
6291 env=self.env,
6292 ))
Assignee | ||
Comment 4•12 years ago
|
||
Sure.
The Tegras solve this by creating an error.flg and clientproxy shuts down the processes. I think this is better.
Comment 5•12 years ago
|
||
I have tested that from buildbot I can call this:
/builds/manage_buildslave.sh stop WithProperties('%(slavename)s')
Assignee | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•