Closed
Bug 1094293
Opened 10 years ago
Closed 9 years ago
Add initial Yosemite(10.10) slaves to releng CI
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: coop, Assigned: kmoir)
References
Details
(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/4223] [10.10][mac])
Attachments
(4 files, 4 obsolete files)
838 bytes,
patch
|
jlund
:
review+
coop
:
checked-in+
|
Details | Diff | Splinter Review |
4.26 KB,
patch
|
dustin
:
review-
|
Details | Diff | Splinter Review |
4.63 KB,
patch
|
kmoir
:
checked-in+
|
Details | Diff | Splinter Review |
2.44 KB,
patch
|
dustin
:
review+
kmoir
:
checked-in+
|
Details | Diff | Splinter Review |
We'll have 3 Yosemite slaves available soon (bug 1094279) that we can use to start greening up testing on 10.10. They'll need to be added to releng CI in the standard places: * slavealloc * buildbot-configs * buildbotcustom? * mozharness? * tools * slaveapi * relengapi * slave heath * graphserver * treeherder
Reporter | ||
Comment 1•10 years ago
|
||
Machine entries exist in graphserver (staging and prod) and in slavealloc. Puppet work was covered by bug 1078882.
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: -- → P2
Reporter | ||
Comment 2•10 years ago
|
||
Most of the work is happening over in bug 1094288 because it's essentially a s/mavericks/yosemite/g operation.
See Also: → 1094288
Reporter | ||
Comment 3•10 years ago
|
||
Added to slave health: https://hg.mozilla.org/build/slave_health/rev/01ce02fc5954
Reporter | ||
Comment 4•10 years ago
|
||
The problem I'm trying to solve here is that /usr/local/bin is not in the default PATH on Yosemite, while it is in PATH on Snow Leopard/Lion/Mountain Lion/Mavericks. hg, wget, and the screenresolution script are in /usr/local/bin, so having it in the PATH is kinda important. In this patch, I add a standard PATH for all Mac machines. I then explicitly set the env of steps that need access to hg, wget, or screenresolution. I haven't done exhaustive testing, but I have tested Yosemite, w864, and talos-linux32 in staging to make sure the change works and doesn't affect other builds. Essentially, if a platform doesn't have an env already set, nothing changes. The change to clone_buildtools in MozillaBuildFactory has a small chance of breaking things, but again, if the env wasn't set before, nothing changes now.
Attachment #8529884 -
Flags: review?(jlund)
Comment 5•10 years ago
|
||
Comment on attachment 8529884 [details] [diff] [review] Pass env to ScriptFactory steps Review of attachment 8529884 [details] [diff] [review]: ----------------------------------------------------------------- ::: process/factory.py @@ +492,5 @@ > self.addStep(MercurialCloneCommand( > name='clone_buildtools', > command=['hg', 'clone', self.buildToolsRepo, 'tools'], > description=['clone', 'build tools'], > log_eval_func=rc_eval_func({0: SUCCESS, None: RETRY}), you mentioned this step may fail. After today's disaster, we should probably change this line so we don't blanket RETRY like we did for mozharness :)
Comment 6•10 years ago
|
||
Comment on attachment 8529884 [details] [diff] [review] Pass env to ScriptFactory steps Review of attachment 8529884 [details] [diff] [review]: ----------------------------------------------------------------- looks good. IIUC, env.py will only affect test jobs. However, it looks like you are adding self.env to an hg MozillaBuildFactory step (I'm guessing cause MozillaTestFactory won't die). So this will affect build jobs too. I *think* that is okay since it looks like we define platforms[pf]['env'] everywhere correctly in mozilla/config.py. I think we should block on waiting for a https://bugzilla.mozilla.org/show_bug.cgi?id=1094293#c5 solution before landing this though ::: process/factory.py @@ +4872,5 @@ > )) > self.addPeriodicRebootSteps() > > > +def resolution_step(env={}): I think this is safe in this situation but I try to never use mutable default arguments: http://docs.python-guide.org/en/latest/writing/gotchas/
Attachment #8529884 -
Flags: review?(jlund) → review+
Reporter | ||
Comment 7•10 years ago
|
||
Comment on attachment 8529884 [details] [diff] [review] Pass env to ScriptFactory steps Review of attachment 8529884 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Jordan Lund (:jlund) from comment #6) > I think we should block on waiting for a > https://bugzilla.mozilla.org/show_bug.cgi?id=1094293#c5 solution before > landing this though I agree it should be fixed, but blocking my patch on it doesn't make sense. My patch isn't changing the existing behavior here, and catlee's work in bug 733663 should fix this globally. > I think this is safe in this situation but I try to never use mutable > default arguments: http://docs.python-guide.org/en/latest/writing/gotchas/ Huh, that's very useful. /me commits those to memory. https://hg.mozilla.org/build/buildbotcustom/rev/7eef3d9293d2
Attachment #8529884 -
Flags: checked-in+
Reporter | ||
Comment 8•10 years ago
|
||
In production: https://hg.mozilla.org/build/buildbotcustom/rev/7eef3d9293d2
Reporter | ||
Comment 9•10 years ago
|
||
Comment on attachment 8529884 [details] [diff] [review] Pass env to ScriptFactory steps Review of attachment 8529884 [details] [diff] [review]: ----------------------------------------------------------------- Backed out for build bustage. :(
Attachment #8529884 -
Flags: checked-in+ → checked-in-
Reporter | ||
Comment 10•10 years ago
|
||
Here I thought the puppet part was done (see comment #1). This patch gets cltbld setup with a proper shell on 10.10, complete with PATH. I just ran some tests in staging and they ran fine, properly inheriting the PATH from the OS.
Attachment #8529884 -
Attachment is obsolete: true
Attachment #8530394 -
Flags: review?(jlund)
Comment 11•10 years ago
|
||
Comment on attachment 8530394 [details] [diff] [review] [puppet] Setup user properly on 10.10 Review of attachment 8530394 [details] [diff] [review]: ----------------------------------------------------------------- Oh cool! much simpler indeed. I think I understand what is going on and this seems much safer as it should only affect yosemite. Though, since I'm not a puppet expert, just want to verify the testing. By 'ran some tests in staging', does that mean you pinned your puppet environment to a yosemite slave? I poked around macosx_productversion_major and I noticed a few places where we haven't defined 10.10 case yet. I think I can understand why we haven't with most but just wanted to sanity check this one: http://mxr.mozilla.org/build/source/puppet/modules/disableservices/manifests/common.pp#58
Attachment #8530394 -
Flags: review?(jlund) → review+
Reporter | ||
Comment 12•10 years ago
|
||
(In reply to Jordan Lund (:jlund) from comment #11) > I poked around macosx_productversion_major and I noticed a few places where > we haven't defined 10.10 case yet. I think I can understand why we haven't > with most but just wanted to sanity check this one: > http://mxr.mozilla.org/build/source/puppet/modules/disableservices/manifests/ > common.pp#58 It's quite possible we'll need a block like http://mxr.mozilla.org/build/source/puppet/modules/disableservices/manifests/common.pp#71 soon. What Kim discovered when she was setting up 10.9 was that Apple had changed many of the services which is why the block is so different than for 10.8 and earlier. I'll start digging into the services part. This initial patch gets the tests themselves unblocked.
Reporter | ||
Comment 13•10 years ago
|
||
Comment on attachment 8530394 [details] [diff] [review] [puppet] Setup user properly on 10.10 Review of attachment 8530394 [details] [diff] [review]: ----------------------------------------------------------------- http://hg.mozilla.org/build/puppet/rev/c13ab9c78a85
Attachment #8530394 -
Flags: checked-in+
Comment 14•10 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #13) > Comment on attachment 8530394 [details] [diff] [review] > [puppet] Setup user properly on 10.10 > > Review of attachment 8530394 [details] [diff] [review]: > ----------------------------------------------------------------- > > http://hg.mozilla.org/build/puppet/rev/c13ab9c78a85 something is not right: https://tbpl.mozilla.org/?tree=Cedar&rev=44e61e86879c had ~4k jobs spun up from cloning mozharness again between our 2 yosemite slaves. Not sure why it failed but after the week we have had, we should address this issue of RETRY when hg fails in all places. IIUC philor stopped the bleeding by disabling yosemite slaves via slavealloc.
Blocks: t-yosemite-r5-0002, t-yosemite-r5-0003
Reporter | ||
Comment 15•10 years ago
|
||
A couple of things happened here. I reimaged the two yosemite nodes before returning them to production. The puppet change got merged to production, but there is still a random element to when that goes live, as determined by: /bin/sh -c sleep $((($RANDOM/(32768/200)))); /etc/puppet/update.sh I should have forced an update by hand, but did not. :/ Obviously the auto-retry is a big issue too. Because the build never manages to clone tools, it can't even run count_and_reboot.py to pull in the puppet changes that would have stopped the bleeding. This is one of the reasons why we *need* that shared tools checkout on each slave.
Reporter | ||
Comment 16•10 years ago
|
||
OK, there's more failing here than I think I can debug on my own. I'm re-opening bug 1078882 to get help with things that should be working.
Assignee | ||
Updated•10 years ago
|
Assignee: coop → kmoir
Comment 17•10 years ago
|
||
seeing Upon execvpe hg ['hg', 'clone', 'https://hg.mozilla.org/build/mozharness', 'scripts'] in environment id 140423192525424 :Traceback (most recent call last): File "/tools/buildbot-0.8.4-pre-moz4/lib/python2.7/site-packages/twisted/internet/process.py", line 414, in _fork executable, args, environment) File "/tools/buildbot-0.8.4-pre-moz4/lib/python2.7/site-packages/twisted/internet/process.py", line 460, in _execChild os.execvpe(executable, args, environment) File "/tools/buildbot-0.8.4-pre-moz4/lib/python2.7/os.py", line 353, in execvpe _execvpe(file, args, env) File "/tools/buildbot-0.8.4-pre-moz4/lib/python2.7/os.py", line 380, in _execvpe func(fullname, *argrest) OSError: [Errno 2] No such file or directory program finished with exit code 1 I wonder if this may have just been a problem with the old hg we tried to install.. Note that it seems to work fine with the version I built and deployed: [cltbld@t-yosemite-r5-0002.test.releng.scl3.mozilla.com ~]$ /tools/python27-mercurial/bin/hg --version Mercurial Distributed SCM (version 3.2.1) (see http://mercurial.selenic.com for more information) Copyright (C) 2005-2014 Matt Mackall and others This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [cltbld@t-yosemite-r5-0002.test.releng.scl3.mozilla.com ~]$ /tools/python27_mercurial/bin/hg --version Mercurial Distributed SCM (version 3.2.1) (see http://mercurial.selenic.com for more information) Copyright (C) 2005-2014 Matt Mackall and others This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [cltbld@t-yosemite-r5-0002.test.releng.scl3.mozilla.com ~]$ ls Applications Desktop Downloads Library Movies Music Pictures [cltbld@t-yosemite-r5-0002.test.releng.scl3.mozilla.com ~]$ /tools/python27_mercurial/bin/hg clone https://hg.mozilla.org/build/ mozharness warning: hg.mozilla.org certificate with fingerprint af:27:b9:34:47:4e:e5:98:01:f6:83:2b:51:c9:aa:d8:df:fb:1a:27 not verified (ch eck hostfingerprints or web.cacerts config setting) destination directory: mozharness requesting all changes adding changesets adding manifests adding file changes added 3460 changesets with 6124 changes to 528 files (+1 heads) updating to branch default 332 files updated, 0 files merged, 0 files removed, 0 files unresolved
Updated•10 years ago
|
Whiteboard: [10.10][mac] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/4223] [10.10][mac]
Assignee | ||
Comment 18•10 years ago
|
||
I have green results for some opt results on my master but the talos tests are timing out, investigating
Reporter | ||
Comment 19•10 years ago
|
||
(In reply to Kim Moir [:kmoir] from comment #18) > I have green results for some opt results on my master but the talos tests > are timing out, investigating Is the addition "Require all granted" to talos.conf in place? See https://bugzilla.mozilla.org/show_bug.cgi?id=1078882#c30.
Assignee | ||
Comment 20•10 years ago
|
||
Thanks coop. I fixed that and it addressed the issue. I have a patch to fix this in puppet.
Assignee | ||
Comment 21•10 years ago
|
||
Add "Require all granted" to apache config on 10.10 to talos tests run
Attachment #8536643 -
Flags: review?(dustin)
Comment 22•10 years ago
|
||
Comment on attachment 8536643 [details] [diff] [review] bug1094293apache.patch Just an observational question, Is there a reason we don't just use template if on osx10.10 in the .conf rather than fork the .conf wholesale? (if this is the only change I just imagine it will be lest mental complexity)
Assignee | ||
Updated•10 years ago
|
Attachment #8536643 -
Flags: review?(dustin)
Comment 23•10 years ago
|
||
Comment on attachment 8536643 [details] [diff] [review] bug1094293apache.patch Agree with Callek - these should be the same template, just with conditionals on the OS.
Attachment #8536643 -
Flags: review-
Assignee | ||
Comment 24•10 years ago
|
||
Attachment #8536643 -
Attachment is obsolete: true
Attachment #8536684 -
Flags: review?(dustin)
Assignee | ||
Comment 25•10 years ago
|
||
This patch fixes *the apache issue for talos tests *disables several services that were already disabled on 10.9 so update notifications etc, don't appear * adds softlink to /usr/local/bin/hg because it was missing * installs xcode coop or dustin: either one of you are welcome to review, would be nice to get 10.10 tests running on cedar before the holidays I ran tests in staging with these changes overnight and the tests aren't all green. However, the orange seems to be valid test failures, not infrastructure related.
Attachment #8538668 -
Flags: review?(dustin)
Attachment #8538668 -
Flags: review?(coop)
Assignee | ||
Updated•10 years ago
|
Attachment #8536684 -
Attachment is obsolete: true
Attachment #8536684 -
Flags: review?(dustin)
Comment 26•10 years ago
|
||
Comment on attachment 8538668 [details] [diff] [review] bug1094293puppet-2.patch Review of attachment 8538668 [details] [diff] [review]: ----------------------------------------------------------------- The 'hg' symlink still feels gross to me - like we're papering over something we don't understand. Are the PATH differences going to cause other errors? Will this change (which isn't conditionalized on OS version) cause issues on other OS X versions? I don't have a better idea.. just a feeling :/ r- for the really minor non-touchy-feely stuff below. ::: modules/disableservices/manifests/common.pp @@ +67,5 @@ > enable => false, > ensure => stopped, > } > } > + 10.9, 10.10: { Oops!! ::: modules/packages/manifests/mozilla/py27_mercurial.pp @@ +58,5 @@ > + ["/usr/bin/hg"]: > + ensure => link, > + owner => "root", > + replace => "no", > + group => "$users::root::group", No quotes here ::: modules/talos/manifests/init.pp @@ +55,2 @@ > include packages::javadeveloper_for_os_x > # not sure why this is required, but it appears to be Can you update this comment with the reason? ::: modules/talos/manifests/settings.pp @@ +5,5 @@ > > class talos::settings { > # public variables used by talos module > $apachedocumentroot = "/builds/slave/talos-slave/talos-data/talos" > + $requireall = $::macosx_productversion_major ? { This is a good fix, but worth a comment.
Attachment #8538668 -
Flags: review?(dustin) → review-
Reporter | ||
Comment 27•10 years ago
|
||
Comment on attachment 8538668 [details] [diff] [review] bug1094293puppet-2.patch Review of attachment 8538668 [details] [diff] [review]: ----------------------------------------------------------------- Nothing to add from Dustin's comments.
Attachment #8538668 -
Flags: review?(coop)
Assignee | ||
Comment 28•10 years ago
|
||
I don't understand why this is an oops
> enable => false,
> ensure => stopped,
> }
> }
> + 10.9, 10.10: {
I left the 10.9 stuff there because my understanding is that some of the signing servers are 10.9
I don't like the hg fix either, but have changed it to be limited to os and point directly at the binary. Running some tests again now to make sure nothing broke.
Assignee | ||
Updated•10 years ago
|
Attachment #8538763 -
Flags: checked-in+
Assignee | ||
Comment 29•10 years ago
|
||
Again, links to binaries in /usr/local/bin because the plist isn't loading the path to /usr/local/bin in the environment. I hope to have a better fix for but I don't have one right now. Current status of tests in staging on cedar * all talos tests are green * opt tests have a lot of orange that developers need to look at once we enabled them * Builder Rev5 MacOSX Yosemite 10.10 cedar opt test webapprt-chrome is failing with an error I'm investigating * Builder Rev5 MacOSX Yosemite 10.10 cedar opt test jetpack is failing but is doing the same on production branches today
Attachment #8539310 -
Flags: review?(dustin)
Comment 30•10 years ago
|
||
Comment on attachment 8539310 [details] [diff] [review] bug1094293puppet-4.patch Can't plist files add to PATH? (For the stuff they run)
Assignee | ||
Comment 31•10 years ago
|
||
That is the problem. There is a bug in 10.10 with plists adding to the path.
Assignee | ||
Comment 32•10 years ago
|
||
Failure message for webapprt-chrome seems to be something with the test path. Apparently this test suite is only enabled on cedar anyways so I'm not going to worry about it that much. So I think the steps forward after this last patch is approved and landed are *reimage yosemite slaves *attach them to a mac test master *try to run tests on cedar *ask developers to look at all the orange I don't think anything needs to be done on the buildbot configs side to enable them on cedar since this worked out of the box on my test dev master.
Comment 33•10 years ago
|
||
Comment on attachment 8539310 [details] [diff] [review] bug1094293puppet-4.patch If it was just hg, I was OK with a temporary workaround until the PATH issue could be resolved, but this is getting pretty bad :( At the very least, these hacks should all be in the package::mozilla::xx class (here, the screenresolution hack isn't), and flagged with comments referencing to this bug indicating that they're a temporary workaround. Morgan, is there a way we could set the path in runner, instead of in the plist? Kim, do you have links or anything like that about the PATH bug?
Flags: needinfo?(winter2718)
Attachment #8539310 -
Flags: review?(dustin) → review-
Comment 34•10 years ago
|
||
Yes, you can modify the environment of runner tasks by adding an entry to runner.cfg under the [env] heading. See here: http://hg.mozilla.org/build/puppet/file/111e26506b74/modules/runner/templates/runner.cfg.erb#l12 Just add an entry like: [env] PATH=${PATH}:/home/yadda/yadda/bin
Flags: needinfo?(winter2718)
Assignee | ||
Comment 35•9 years ago
|
||
patch to add /usr/local/bin to path in runner and remove the symlink to hg Morgan: thanks for the advice Dustin: Here's a link regarding the 10.10 issue http://apple.stackexchange.com/questions/106355/setting-the-system-wide-path-environment-variable-in-mavericks I tested this patch in staging and the results are green so far
Attachment #8539310 -
Attachment is obsolete: true
Attachment #8544718 -
Flags: review?(dustin)
Comment 36•9 years ago
|
||
Comment on attachment 8544718 [details] [diff] [review] bug1094293-5.patch Review of attachment 8544718 [details] [diff] [review]: ----------------------------------------------------------------- Interesting -- that's talking about /etc/launchd.conf, not the EnvironmentVariables plist key, which is still in the launchd.plist(5) manpage on 10.10 (side-note, apparently Apple quit updating their online manpages, as they're still for 10.9) EnvironmentVariables <dictionary of strings> This optional key is used to specify additional environmental variables to be set before running the job. Each value in the dictionary is the value of an environment vari- able, with the corresponding value being a string representing the desired value. NOTE: Values other than strings will be ignored. Anyway, this fix is better than relying on launchd to set the path for RUNNER regardless of what's wrong with launchd. Thanks for revising from the earlier fixes!
Attachment #8544718 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 37•9 years ago
|
||
Comment on attachment 8544718 [details] [diff] [review] bug1094293-5.patch and merged to production
Attachment #8544718 -
Flags: checked-in+
Assignee | ||
Comment 38•9 years ago
|
||
So there are 10.10 jobs running on cedar but they aren't showing up on treeherder yet due to bug 1119412. The jobs yesterday were all orange because cedar was in a bad state due to bug 1118902. Today, I manually started some new mac jobs on cedar via the buildapi, I'll look at the results once they are available.
Assignee | ||
Comment 39•9 years ago
|
||
I ran some more jobs again today and you can see the results here https://treeherder.mozilla.org/#/jobs?repo=cedar&revision=7c03738e7a95&filter-searchStr=10.10 They are taking a while because only two macs are servicing them.
Assignee | ||
Comment 40•9 years ago
|
||
I think this can be closed. We have three 10.10 slaves running on try and working is going well to green up the test results. I'll open other bugs when we need to enable a larger pool of machines on for other branches.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•