Closed
Bug 1141628
Opened 9 years ago
Closed 9 years ago
Create an OS X 10.7 build host for DR
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: arich, Assigned: dividehex)
References
Details
Attachments
(1 file)
672 bytes,
patch
|
dividehex
:
review+
kmoir
:
checked-in+
|
Details | Diff | Splinter Review |
We need to create an OS X 10.7 build client host for DR.
Reporter | ||
Comment 1•9 years ago
|
||
coop/jordan, can you specify an r4 machine to take out of service and use for this?
Flags: needinfo?(jlund)
Flags: needinfo?(coop)
Updated•9 years ago
|
Depends on: t-snow-r4-0165
Flags: needinfo?(coop)
Assignee | ||
Comment 3•9 years ago
|
||
nagios checks removed svn commit -m 'Bug 1141626 & Bug 1141628: remove nagios checks for DR minis' Sending releng/scl3.pp Transmitting file data . Committed revision 102585.
Assignee | ||
Comment 4•9 years ago
|
||
This host was renamed to bld-lion-r4-001.test.releng.scl3.mozilla.com for reimaging purposes. I copied over the lion image from install.build and setup a new workflow (Restore lion-r4) but it failed to come back from a reimage. I suspect it didn't like the apple core storage stuff so I've removed it and filed a bug to have dcops netboot it. Hopefully it will recover after that.
Assignee | ||
Comment 5•9 years ago
|
||
DCOPs got this to netboot and it took the image correctly. It has been puppetized to include toplevel::slave::releng::build::standard I going to have this moved to the build vlan so it can be tested against a dev master
Comment 6•9 years ago
|
||
Test build is running here http://dev-master2.bb.releng.use1.mozilla.com:8039/builders/OS%20X%2010.7%20mozilla-central%20nightly/builds/1
Comment 7•9 years ago
|
||
I had to change the host name because it still reflected the test network scutil --set HostName bld-lion-r4-001.build.releng.scl3.mozilla.com The build failed because the host is missing some ssh keys, I think because it couldn't puppetize properly because the incorrect host name is in the configs.
Attachment #8588579 -
Flags: review?(jwatkins)
Assignee | ||
Comment 8•9 years ago
|
||
Comment on attachment 8588579 [details] [diff] [review] bug1141628.patch hmmm... It was imaged and puppetized before it was moved to the build vlan. In theory, it should have gotten all the same keys and configuration regardless of the fqdn
Attachment #8588579 -
Flags: review?(jwatkins) → review+
Comment 9•9 years ago
|
||
Comment on attachment 8588579 [details] [diff] [review] bug1141628.patch and merged to production
Attachment #8588579 -
Flags: checked-in+
Comment 10•9 years ago
|
||
I ran some more tests and I think it is okay. The warnings are from missing taskcluster credentials. A dmg was created and I was able to run the nightly build How will it be continue to be updated in TOR - I assume it will continue to be updated via puppet?
Reporter | ||
Comment 11•9 years ago
|
||
It won't get any updates. When we test the DR, it will be updated by hand (possibly by attaching it to a puppet master in AWS, but not necessarily during a test). The machine will be powered off in a closet until a regular test (see the resiliency/DR docs I added to).
Comment 12•9 years ago
|
||
Okay I think the missing credentials were because the machine was not in slavealloc. I've fixed this and am running another test.
Comment 13•9 years ago
|
||
Still failing with this error message, even though this file exists on the filesystem http://dev-master2.bb.releng.use1.mozilla.com:8039/builders/OS%20X%2010.7%20mozilla-central%20build/builds/28/steps/run_script/logs/stdio 10:03:39 INFO - Error: /builds/slave/m-cen-m64-00000000000000000000/build/src/obj-firefox/i386/browser/installer/package-manifest:462: File missing in ../../dist: Nightly.app/Contents/Resources/modules/services/datareporting/sessions.jsm 10:03:55 INFO - Traceback (most recent call last): 10:03:55 INFO - File "/builds/slave/m-cen-m64-00000000000000000000/build/src/toolkit/mozapps/installer/packager.py", line 404, in <module> 10:03:55 INFO - main() 10:03:55 INFO - File "/builds/slave/m-cen-m64-00000000000000000000/build/src/toolkit/mozapps/installer/packager.py", line 353, in main 10:03:55 INFO - copier.add(mozpath.join(respath, 'removed-files'), removals) 10:03:55 INFO - File "/tools/python/lib/python2.7/contextlib.py", line 24, in __exit__ 10:03:55 INFO - self.gen.next() 10:03:55 INFO - File "/builds/slave/m-cen-m64-00000000000000000000/build/src/python/mozbuild/mozpack/errors.py", line 129, in accumulate 10:03:55 INFO - raise AccumulatedErrors() 10:03:55 INFO - mozpack.errors.AccumulatedErrors 10:03:55 INFO - make[3]: *** [stage-package] Error 1 10:03:55 INFO - make[2]: *** [postflight_all] Error 2 10:03:55 INFO - make[1]: *** [realbuild] Error 2 10:03:55 INFO - make: *** [build] Error 2 10:03:55 INFO - 153 compiler warnings present. :mshal do you have any ideas? I know you have more experience with the mac build side of things than I
Flags: needinfo?(mshal)
Comment 14•9 years ago
|
||
We chatted on IRC - this particular problem was gone on the most recent build, so I didn't investigate further. I think the only remaining issue is the credentials used for 'make upload'.
Flags: needinfo?(mshal)
Assignee | ||
Comment 15•9 years ago
|
||
(In reply to Michael Shal [:mshal] from comment #14) > We chatted on IRC - this particular problem was gone on the most recent > build, so I didn't investigate further. I think the only remaining issue is > the credentials used for 'make upload'. What is the status on this host? Is there a reason the credentials are missing? Could these be manually added for DR purposes?
Comment 16•9 years ago
|
||
The current status is * the correct keys are on the machine * I can run the ssh command to connect to dev-stage01.srv.releng.scl3.mozilla.com that fail during the build from the command line and it worked * I adjusted the staging server name and clobbered the builder dir on the machine and am rerunning the build I think the machine is fine I'm just being careful to ensure it's in a valid state.
Comment 17•9 years ago
|
||
So the subsequent build failed. I don't understand why. If I run the Command '['ssh', '-o', 'IdentityFile=/Users/cltbld/.ssh/ffxbld_rsa', 'ffxbld@dev-stage01.srv.releng.scl3.mozilla.com', 'mktemp -d'] command from the command line of the slave it works fine. If anyone has suggestions on what to do they would be appreciated. 2:09:17 INFO - 2015-04-15 12:09:17,378 - Copying ../../dist//firefox-40.0a1.en-US.mac.checksums.asc to cache /builds/slave/m-cen-m64-00000000000000000000/build/signing_cache/gpg/f4de22cdae03cd6036d153e4c82571d98b17c546 12:09:17 INFO - /builds/slave/m-cen-m64-00000000000000000000/build/src/obj-firefox/i386/_virtualenv/bin/python /builds/slave/m-cen-m64-00000000000000000000/build/src/build/upload.py --base-path ../../dist \ 12:09:17 INFO - '../../dist/firefox-40.0a1.en-US.mac.dmg' '../../dist/mac/xpi/firefox-40.0a1.en-US.langpack.xpi' '../../dist/firefox-40.0a1.en-US.mac.tests.zip' '../../dist/firefox-40.0a1.en-US.mac.crashreporter-symbols.zip' '../../dist//firefox-40.0a1.en-US.mac.txt' '../../dist//firefox-40.0a1.en-US.mac.json' '../../dist//firefox-40.0a1.en-US.mac.mozinfo.json' '../../dist/jsshell-mac.zip' ../../dist/host/bin/mar ../../dist/host/bin/mbsdiff \ 12:09:17 INFO - '../../dist//firefox-40.0a1.en-US.mac.checksums' '../../dist//firefox-40.0a1.en-US.mac.checksums'.asc 12:09:17 INFO - Permission denied (publickey,gssapi-with-mic,password). 12:09:28 INFO - Permission denied (publickey,gssapi-with-mic,password). 12:09:44 INFO - Permission denied (publickey,gssapi-with-mic,password). 12:10:08 INFO - Permission denied (publickey,gssapi-with-mic,password). 12:10:45 INFO - Permission denied (publickey,gssapi-with-mic,password). 12:10:45 INFO - Command '['ssh', '-o', 'IdentityFile=/Users/cltbld/.ssh/ffxbld_rsa', 'ffxbld@dev-stage01.srv.releng.scl3.mozilla.com', 'mktemp -d']' returned non-zero exit status 255
Comment 18•9 years ago
|
||
-oIdentityFile won't disable use of an SSH agent. If ssh-add -l shows your key on the command line, that might be why. Try running it with 'env -i ssh -oIdentityFile...'
Comment 19•9 years ago
|
||
Dustin: Yes this explains it. Thanks for the pointer. So the problem was that I had the prod env specified in slavealloc and thus it had the prod keys, not the staging keys. In any case, when I fixed this I was able to run a m-c nightly build that ran green.
Comment 20•9 years ago
|
||
The machine ran several m-c builds last night, all were green so I think the machine is good.
Assignee | ||
Comment 21•9 years ago
|
||
(In reply to Kim Moir [:kmoir] from comment #20) > The machine ran several m-c builds last night, all were green so I think the > machine is good. Thanks Kim!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•