Closed
Bug 1042358
Opened 10 years ago
Closed 10 years ago
Make runner responsible for buildbot startup on Ubuntu test
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ianconnolly, Assigned: bhearsum)
References
Details
Attachments
(2 files, 1 obsolete file)
8.62 KB,
patch
|
dustin
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
523 bytes,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Comment 1•10 years ago
|
||
I still need to test this more, but I _think_ this has the bases covered as far as getting runner running at all. I need to make sure the other tasks work still, but this gets as far as running Buildbot and connecting to a master.
Comment 2•10 years ago
|
||
Comment on attachment 8480813 [details] [diff] [review] run runner with upstart on ubuntu Review of attachment 8480813 [details] [diff] [review]: ----------------------------------------------------------------- ::: modules/runner/templates/runner.upstart.conf.erb @@ +10,5 @@ > + > + # We sleep a bit here because even though Xvfb has completed, we want to > + # make sure that the DE has launched. Some sort of check of the process > + # list would be better, but this is probably good enough. > + sleep 10 So, this is a pretty substantial change in buildbot startup: from running in a gnome terminal after DE startup, to running via "su -c cltbld 'python runslave.py'". It looks like the latter doesn't even take care to set up DISPLAY, actually. And I know at least __GL_YIELD=NOTHING is required (modules/gui/manifests/init.pp), and possibly others.
Attachment #8480813 -
Flags: feedback?(dustin) → feedback+
Assignee | ||
Comment 3•10 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #2) > Comment on attachment 8480813 [details] [diff] [review] > run runner with upstart on ubuntu > > Review of attachment 8480813 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: modules/runner/templates/runner.upstart.conf.erb > @@ +10,5 @@ > > + > > + # We sleep a bit here because even though Xvfb has completed, we want to > > + # make sure that the DE has launched. Some sort of check of the process > > + # list would be better, but this is probably good enough. > > + sleep 10 > > So, this is a pretty substantial change in buildbot startup: from running in > a gnome terminal after DE startup, to running via "su -c cltbld 'python > runslave.py'". It looks like the latter doesn't even take care to set up > DISPLAY, actually. And I know at least __GL_YIELD=NOTHING is required > (modules/gui/manifests/init.pp), and possibly others. Yeah, this is something I'm still testing for. DISPLAY is already set by buildbot, but I'm concerned about XDG/GNOME/DBUS stuff (and the __GL_YIELD one you just mentioned). So far, all of the desktop tests appear to pass. I still need do some checking on other machine types, too.
Assignee | ||
Comment 4•10 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #3) > (In reply to Dustin J. Mitchell [:dustin] from comment #2) > > Comment on attachment 8480813 [details] [diff] [review] > > run runner with upstart on ubuntu > > > > Review of attachment 8480813 [details] [diff] [review]: > > ----------------------------------------------------------------- > > > > ::: modules/runner/templates/runner.upstart.conf.erb > > @@ +10,5 @@ > > > + > > > + # We sleep a bit here because even though Xvfb has completed, we want to > > > + # make sure that the DE has launched. Some sort of check of the process > > > + # list would be better, but this is probably good enough. > > > + sleep 10 > > > > So, this is a pretty substantial change in buildbot startup: from running in > > a gnome terminal after DE startup, to running via "su -c cltbld 'python > > runslave.py'". It looks like the latter doesn't even take care to set up > > DISPLAY, actually. And I know at least __GL_YIELD=NOTHING is required > > (modules/gui/manifests/init.pp), and possibly others. > > Yeah, this is something I'm still testing for. DISPLAY is already set by > buildbot, but I'm concerned about XDG/GNOME/DBUS stuff (and the __GL_YIELD > one you just mentioned). So far, all of the desktop tests appear to pass. I > still need do some checking on other machine types, too. Somewhat surprisingly, no tests have failed due to not having these variables. I've grepped over the logs to make sure that tests actually ran, and spot checked a bunch of logs. If anyone else wants to look, they'll be available here for awhile: http://dev-master1.srv.releng.scl3.mozilla.com:8118/one_line_per_build?numbuilds=150 I'm going to ask around to try and get better confirmation about these variables, but unless I find something suggesting they *are* important, I'm planning to proceed here. Catlee suggested doing some sort of staged rollout, and I think that would be prudent here. Eg, 5-10 regular AWS machines, 5-10 large ones (for emulator tests), and a few in house machines. I still need to figure out how to make this happen in Puppet.
Assignee | ||
Comment 5•10 years ago
|
||
Per IRC, I'd like to roll this out on a few production slaves pointing at my puppet environment. Seems like I should have r+ before doing that, though.
Attachment #8480813 -
Attachment is obsolete: true
Attachment #8482741 -
Flags: review?(dustin)
Assignee | ||
Comment 6•10 years ago
|
||
I spoke with Rail about how to set aside some AWS machines to do this. It looks like we should be able to just bring up some on demand machines and pin them to my environment. Emulator test machines don't have any entries in buildbot-configs for ondemand machines yet, so I'm adding some here. I'll be fiddling with moz-state to make sure that stop idle doesn't shut these down (otherwise it's very unlikely that they'll get picked over spot machines).
Attachment #8482758 -
Flags: review?(catlee)
Updated•10 years ago
|
Attachment #8482758 -
Flags: review?(catlee) → review+
Assignee | ||
Updated•10 years ago
|
Attachment #8482758 -
Flags: checked-in+
Comment 7•10 years ago
|
||
Comment on attachment 8482741 [details] [diff] [review] fully tested patch to get buildbot started with runner Review of attachment 8482741 [details] [diff] [review]: ----------------------------------------------------------------- ::: modules/toplevel/manifests/slave/releng/test.pp @@ +14,5 @@ > include dirs::builds::hg_shared > include dirs::builds::git_shared > include dirs::builds::tooltool_cache > > + case $::operatingsystem { Can you add a comment here explaining that this conditional is temporary until runner is set up on every platform?
Attachment #8482741 -
Flags: review?(dustin) → review+
Comment 8•10 years ago
|
||
Merged to production, and deployed.
Assignee | ||
Comment 9•10 years ago
|
||
I pinned talos-linux64-ix-001, 002, 005, and 006 to my user environment. Sheriffs are aware, and I've added a note in Slavealloc. I'll be doing the same for a few slaves from the tst-linux64 and tst-emulator64 aws pools shortly, too.
Assignee | ||
Comment 10•10 years ago
|
||
The ec2 machines are up now too: tst-linux64-ec2-001, 002, 003, and 004 tst-emulator64-ec2-001 and 002 I've flipped their moz-state tags to testing-bug1042358 to avoid them getting shut down. That should be changed back when testing is done.
Assignee | ||
Comment 11•10 years ago
|
||
So far things are looking mostly fine. One build failed with DISPLAY not being set, but I'm extremely confused as to why This: http://buildbot-master103.srv.releng.scl3.mozilla.com:8201/builders/Ubuntu%20HW%2012.04%20x64%20mozilla-central%20pgo%20talos%20other_l64/builds/160 HOME=/home/cltbld LANG=en_US.UTF-8 LANGUAGE=en_US:en LOGNAME=cltbld MAIL=/var/mail/cltbld NODE_PATH=/usr/lib/nodejs:/usr/lib/node_modules:/usr/share/javascript PATH=/usr/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games PROPERTIES_FILE=/builds/slave/talos-slave/test-pgo/buildprops.json PWD=/builds/slave/talos-slave/test-pgo SHELL=/bin/bash SHLVL=1 TERM=linux TMOUT=86400 USER=cltbld XDG_SESSION_COOKIE=dd26bb57dc7379c38bda76df000001a9-1409930523.515999-565090523 _=/tools/buildbot/bin/python In addition to not having DISPLAY set, it's also missing other variables defined in the same place (http://mxr.mozilla.org/build-central/source/buildbotcustom/env.py#186). I'm tempted to write this off as a freak occurence because other jobs that are configured in the exact same way have the right variables set: DISPLAY=:0 HOME=/home/cltbld LANG=en_US.UTF-8 LANGUAGE=en_US:en LOGNAME=cltbld MAIL=/var/mail/cltbld MOZ_CRASHREPORTER_NO_REPORT=1 MOZ_NO_REMOTE=1 NODE_PATH=/usr/lib/nodejs:/usr/lib/node_modules:/usr/share/javascript NO_EM_RESTART=1 PATH=/usr/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games PROPERTIES_FILE=/builds/slave/talos-slave/test/buildprops.json PWD=/builds/slave/talos-slave/test SHELL=/bin/bash SHLVL=1 TERM=linux TMOUT=86400 USER=cltbld XDG_SESSION_COOKIE=dd26bb57dc7379c38bda76df000001a9-1409925928.329783-534482962 XPCOM_DEBUG_BREAK=warn _=/tools/buildbot/bin/python Still, going to look into this more, but I'm not going to disable anything -- I'd like them to run over the weekend.
Assignee | ||
Comment 12•10 years ago
|
||
Turns out that we don't set the env in buildbot for PGO talos jobs, but we do for non-PGO talos jobs. I'm fixing this in bug 1063739. I'm not going to disable the 4 slaves locked to my puppet env because there's only a small set jobs that will fail because of this, and there shouldn't be more than a few that happen over the weekend.
Assignee | ||
Comment 13•10 years ago
|
||
These jobs have looked fine on the pinned machines for awhile. I plan to check in the puppet change to production tonight, so that the spot AMIs will pick up the changes tomorrow morning. In-house Ubuntu machines (such as talos-linux64-ix) will pick up the changes tonight - I'll hang around to watch them in case of bustage.
Assignee | ||
Comment 14•10 years ago
|
||
I've moved all machines back to the production environment, and reset moz-state on the ec2 machines. Aka, they're back to how they were before I started testing this. I'll land the puppet patch later this evening.
Assignee | ||
Comment 15•10 years ago
|
||
Comment on attachment 8482741 [details] [diff] [review] fully tested patch to get buildbot started with runner Landed on default+production.
Attachment #8482741 -
Flags: checked-in+
Assignee | ||
Comment 16•10 years ago
|
||
I forgot to add a new file when I first landed. This worked fine after I fixed that, though.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•