Closed
Bug 1116210
Opened 9 years ago
Closed 9 years ago
[mig agent] MacOS DMG packaging
Categories
(Enterprise Information Security Graveyard :: MIG, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jvehent, Assigned: dustin)
References
Details
Attachments
(3 files, 2 obsolete files)
MIG Agent needs proper packaging for MacOS. After some testing, it seems that the best approach is to build the agent on MacOS directly. It would be nice to cross-compile from Linux, but FPM PKG generation and hdiutil for DMG creation are only available on darwin. (in the future, we should investigate alternatives, but that's not a priority). The following method works: 1. Compile the agent with OS=darwin 2. Using FPM, create a PKG file 3. Using hdiutil, create a DMG file that contains only the PKG file 4. To install, mount the DMG file, execute the installer command on the PKG file, and unmount the DMG file Any install script contained in the PKG file will be executed by the installer. Sample run: $ hdiutil mount /tmp/mig-agent-20141229+da443e6.dev-x86_64.dmg /dev/disk1 Apple_partition_scheme /dev/disk1s1 Apple_partition_map /dev/disk1s2 Apple_HFS /Volumes/Mozilla InvestiGator Agent $ ls /Volumes/Mozilla\ InvestiGator\ Agent/ mig-agent-20141229+da443e6.dev-x86_64.pkg $ sudo installer -package /Volumes/Mozilla\ InvestiGator\ Agent/mig-agent-20141229+da443e6.dev-x86_64.pkg -target / Password: installer: Package name is mig-agent-20141229+da443e6.dev-x86_64 installer: Upgrading at base path / installer: The upgrade was successful. $ hdiutil unmount /Volumes/Mozilla\ InvestiGator\ Agent/ "/Volumes/Mozilla InvestiGator Agent/" unmounted successfully. $ sudo /sbin/mig-agent -q=pid 6161 Makefile entry is in attachment for feedback.
Reporter | ||
Comment 1•9 years ago
|
||
Attachment #8542217 -
Flags: feedback?(dustin)
Assignee | ||
Updated•9 years ago
|
Attachment #8542217 -
Flags: feedback?(dustin) → feedback+
Reporter | ||
Updated•9 years ago
|
Component: Operations Security (OpSec): General → Operations Security (OpSec): MIG
Reporter | ||
Updated•9 years ago
|
Reporter | ||
Comment 2•9 years ago
|
||
Retested the procedure today with latest mig-agent, manual install works fine. $ hdiutil mount ~/Code/build_mig/packages/opsec/mig-agent-20150402+b05e40b.prod-x86_64.dmg /dev/disk1 Apple_partition_scheme /dev/disk1s1 Apple_partition_map /dev/disk1s2 Apple_HFS /Volumes/Mozilla InvestiGator Agent $ sudo installer -package /Volumes/Mozilla\ InvestiGator\ Agent/mig-agent-20150402+b05e40b.prod-x86_64.pkg -target / installer: Package name is mig-agent-20150402+b05e40b.prod-x86_64 installer: Upgrading at base path / installer: The upgrade was successful. $ hdiutil unmount /Volumes/Mozilla\ InvestiGator\ Agent/ "/Volumes/Mozilla InvestiGator Agent/" unmounted successfully. $ sudo /sbin/mig-agent [info] using builtin conf $ sudo /sbin/mig-agent -q=pid 8759 $ sudo /sbin/mig-agent -V 20150402+b05e40b.prod The configuration will be identical to the one already puppeted for linux, so the only change needed is to deploy & install the DMG. How should this be done? I see some dmg install scripts in puppet that curl packages from various places. Should I hosts the DMG on a S3 bucket operated by opsec or is there a more standard way to deploy it?
Assignee | ||
Comment 3•9 years ago
|
||
Sweet!! The DMGs end up in /data/repos/DMGs, and are installed with the packages::pkgdmg define, e.g., class packages::mozilla::mig_agent { case $::operatingsystem { ... Darwin: { packages::pkgdmg { mig-agent: # must match the DMG base name version => "20150402+b05e40b.prod-x86_64", os_version_specific => false; # same binary works on multiple versions } } ... } }
Reporter | ||
Comment 4•9 years ago
|
||
Ok, I scp-ed the package into /data/repos/DMGs/mig-agent-20150402+b05e40b.prod-x86_64.dmg Do I need to run a command to update the repository? I'll send a patch, but it will need the config file change from bug 1149639 to be merged first.
Assignee | ||
Comment 5•9 years ago
|
||
Nope, OS X doesn't have "repositories" :( So you should be good to go.
Reporter | ||
Comment 6•9 years ago
|
||
The attached patch provides the base support for darwin packages in the mig modules. It requires merging https://bug1149639.bugzilla.mozilla.org/attachment.cgi?id=8587388 first because of the dependency on the `api` config parameter. I have not assigned the class mig::agent::daemon to any hosts yet. I'm assuming we want to test it on a single host first and see if it works as expected. Dustin: can you perform the test deploy, and I'll help verify it?
Attachment #8587412 -
Flags: review?(dustin)
Reporter | ||
Comment 7•9 years ago
|
||
Attachment #8587412 -
Attachment is obsolete: true
Attachment #8587412 -
Flags: review?(dustin)
Assignee | ||
Updated•9 years ago
|
Attachment #8587424 -
Flags: review+
Assignee | ||
Comment 8•9 years ago
|
||
Comment on attachment 8587424 [details] [diff] [review] deploy macos agent 20150402+1c880e7.prod-x86_64 remote: https://hg.mozilla.org/build/puppet/rev/43752e966b5c remote: https://hg.mozilla.org/build/puppet/rev/be45292de713
Assignee | ||
Comment 9•9 years ago
|
||
This actually deploys the patch. I tried this on a buildslave and a server with no errors, although the buildslave hasn't rebooted yet (bld-lion-r5-007.try.releng.scl3.mozilla.com). I had to add anchors here so that the restart could depend on pkgdmg on OS X. Arr, given the risk of outage, is it OK to deploy this?
Attachment #8587470 -
Flags: review?(jvehent)
Attachment #8587470 -
Flags: feedback?(arich)
Comment 10•9 years ago
|
||
Comment on attachment 8587470 [details] [diff] [review] bug1116210-deploy.patch I'd install this on one or two clients of each type (10.6, 10.7, and 10.10) and make sure they go through a reboot cycle to make sure we aren't going to run into issues where machines get hung before rolling it out site-wide.
Attachment #8587470 -
Flags: feedback?(arich) → feedback+
Assignee | ||
Updated•9 years ago
|
Assignee: jvehent → dustin
Assignee | ||
Comment 11•9 years ago
|
||
bld-lion-r5-007.try.releng.scl3.mozilla.com mac-signing2.srv.releng.scl3.mozilla.com t-yosemite-r5-0009.test.releng.scl3.mozilla.com
Reporter | ||
Comment 12•9 years ago
|
||
I can't see bld-lion-r5 or t-yosemite-r5. The other two are showing, plus my own: mig=> select name from agents where status='online' and environment->>'os'='darwin'; name ------------------------------------------ install.build.releng.scl3.mozilla.com mac-signing2.srv.releng.scl3.mozilla.com Juliens-Mac-mini.local (3 rows)
Assignee | ||
Comment 13•9 years ago
|
||
Morgan, are there dashboards and things I can look at to see what's going on with the mig runner task on those platforms? Also, how cool is it that this (almost) just runs on OS X without any additional futzing with startup scripts?
Flags: needinfo?(winter2718)
Comment 14•9 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #13) > Morgan, are there dashboards and things I can look at to see what's going on > with the mig runner task on those platforms? > > Also, how cool is it that this (almost) just runs on OS X without any > additional futzing with startup scripts? Quite cool :) You can use the runner dashboards. If you need another retry dashboard that's separated by platform I can make one in a jiffy. https://stats.taskcluster.net/grafana/#/dashboard/db/runner
Flags: needinfo?(winter2718)
Assignee | ||
Comment 15•9 years ago
|
||
So, running by hand: [root@bld-lion-r5-007.try.releng.scl3.mozilla.com ~]# cat /opt/runner/tasks.d/1-mig_agent #!/bin/bash # This Source Code Form is subject to the terms of the Mozilla Public # License, v. 2.0. If a copy of the MPL was not distributed with this # file, You can obtain one at http://mozilla.org/MPL/2.0/. # run mig-agent in checkin mode /sbin/mig-agent -m agent-checkin || true [root@bld-lion-r5-007.try.releng.scl3.mozilla.com ~]# /sbin/mig-agent -m agent-checkin [info] Using external conf from /etc/mig/mig-agent.cfg [root@bld-lion-r5-007.try.releng.scl3.mozilla.com ~]# echo $? 0 So it looks like it's running. Julien, do you see this host now?
Reporter | ||
Comment 16•9 years ago
|
||
Yep. bld-lion-r5-007.try.releng.scl3.mozilla.com just showed up.
Assignee | ||
Comment 17•9 years ago
|
||
Apr 06 06:52:22 bld-lion-r5-007 1-mig_agent: starting (max time 600s) Apr 06 06:52:23 bld-lion-r5-007 1-mig_agent: OK Can you see if it re-checked-in at that time? If so, what's the latest status of t-yosemite?
Assignee | ||
Comment 18•9 years ago
|
||
Yosemite appears not to be logging to papertrail. In its system.log I only see this about runner: [root@t-yosemite-r5-0009.test.releng.scl3.mozilla.com ~]# grep runner /var/log/system.log Apr 6 06:44:15 t-yosemite-r5-0009 com.apple.xpc.launchd[1] (com.mozilla.runner): This key does not do anything: OnDemand yet, Buildbot is running, so I'm guessing runner ran..
Reporter | ||
Comment 19•9 years ago
|
||
Negative. The only hit I have is at 13:33 UTC, so 06:33 PST. Can you get syslog from that box? The agent sends INFO logs into the DAEMON facility.
Assignee | ||
Comment 20•9 years ago
|
||
I think there's something wrong with logging on OS X. There's nothing for runner or mig in /var/log/system.log. On papertrail, there's nothing whatsoever for t-yosemite-*, and for bld-linux-r5-007, the lines matching 'mig' are all from runner: Apr 06 06:52:15 bld-lion-r5-007 tasks: ['0-darwin_clean_buildbot', '1-cleanslate', '1-mig_agent', '4-buildbot.py', '99-post_flight'] Apr 06 06:52:21 bld-lion-r5-007 running: pre-task hook: /opt/runner/task_hook.py {"try_num": 1, "max_retries": 5, "task": "1-mig_agent", "result": "RUNNING"} Apr 06 06:52:22 bld-lion-r5-007 1-mig_agent: starting (max time 600s) Apr 06 06:52:23 bld-lion-r5-007 1-mig_agent: OK Apr 06 06:52:23 bld-lion-r5-007 running: post-task hook: /opt/runner/task_hook.py {"try_num": 1, "max_retries": 5, "task": "1-mig_agent", "result": "OK"}
Assignee | ||
Comment 21•9 years ago
|
||
OK, I was wrong about yosemite, but all I see is logs of runner running the task.
Reporter | ||
Comment 22•9 years ago
|
||
I'd suggest changing the config in /etc/mig/agent.conf to log to a file instead of syslog: [logging] mode = "file" level = "debug" file = "/tmp/mig_agent.log"
Assignee | ||
Comment 23•9 years ago
|
||
Morgan reminded me that on OS X, runner runs as cltbld, not root, and mig needs to be run as root. Julien suggested something like if [ "$UID" != 0 ]; then PREFIX="sudo"; fi; $PREFIX /sbin/mig-agent -m agent-checkin with /sbin/mig-agent added to sudoers. That's probably the easiest course to making this work.
Assignee | ||
Comment 24•9 years ago
|
||
OK, this worked in our testing. r? for this patch and attachment 8587470 [details] [diff] [review]?
Attachment #8588759 -
Flags: review?(jvehent)
Reporter | ||
Comment 25•9 years ago
|
||
Comment on attachment 8587470 [details] [diff] [review] bug1116210-deploy.patch Review of attachment 8587470 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me.
Attachment #8587470 -
Flags: review?(jvehent) → review+
Assignee | ||
Comment 26•9 years ago
|
||
Comment on attachment 8588759 [details] [diff] [review] bug1116210-sudo.patch I think the r+ on the previous was meant for this patch.
Attachment #8588759 -
Flags: review?(jvehent) → review+
Assignee | ||
Updated•9 years ago
|
Attachment #8587424 -
Attachment is obsolete: true
Assignee | ||
Comment 27•9 years ago
|
||
Comment on attachment 8587470 [details] [diff] [review] bug1116210-deploy.patch https://hg.mozilla.org/build/puppet/rev/072b4b8631f5 remote: https://hg.mozilla.org/build/puppet/rev/6027577b8bd7
Assignee | ||
Comment 28•9 years ago
|
||
Comment on attachment 8588759 [details] [diff] [review] bug1116210-sudo.patch https://hg.mozilla.org/build/puppet/rev/7d6a75968ed0 https://hg.mozilla.org/build/puppet/rev/6027577b8bd7
Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 29•9 years ago
|
||
While I now have ~250 MacOS hosts checking in, it seems that bld-lion-r5-007 didn't rejoin the pool. Reopening to investigate.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 30•9 years ago
|
||
There are only two bld-lion-r5 hosts that have checked in, in fact.
Assignee | ||
Comment 31•9 years ago
|
||
Apr 07 10:26:40 bld-lion-r5-008 running: pre-task hook: /opt/runner/task_hook.py {"try_num": 1, "max_retries": 5, "task": "1-mig_agent", "result": "RUNNING"} Apr 07 10:26:41 bld-lion-r5-008 1-mig_agent: starting (max time 600s) Apr 07 10:26:42 bld-lion-r5-008 sudo: cltbld : TTY=unknown ; PWD=/Users ; USER=root ; COMMAND=/sbin/mig-agent -m agent-checkin Apr 07 10:26:42 bld-lion-r5-008 kernel: nstat_lookup_entry failed: 2 Apr 07 10:26:47 bld-lion-r5-008 kernel: nstat_lookup_entry failed: 2 Apr 07 10:26:48 bld-lion-r5-008 kernel: nstat_lookup_entry failed: 2 Apr 07 10:26:49 bld-lion-r5-008 1-mig_agent: OK Apr 07 10:26:49 bld-lion-r5-008 running: post-task hook: /opt/runner/task_hook.py {"try_num": 1, "max_retries": 5, "task": "1-mig_agent", "result": "OK"} So, it's running mig. I also ran mig on this host by hand a few minutes ago, which I bet checked it in. I'm assuming something is failing in mig when run from runner on this platform. Do the 'nstat' messages mean anything to you, Julien?
Assignee | ||
Comment 32•9 years ago
|
||
dustin@euclid ~/tmp $ ./mig-agent-search "environment->>'os'='darwin'" | grep bld-lion-r5 | cut -d\; -f 1 | sort "bld-lion-r5-007.try.releng.scl3.mozilla.com"; "2015-04-08T17:28:45Z" "bld-lion-r5-008.try.releng.scl3.mozilla.com"; "2015-04-08T16:47:31Z" "bld-lion-r5-015.try.releng.scl3.mozilla.com"; "2015-04-08T16:44:17Z" "bld-lion-r5-050.build.releng.scl3.mozilla.com"; "2015-04-08T16:09:25Z" "bld-lion-r5-051.build.releng.scl3.mozilla.com"; "2015-04-08T16:57:42Z" "bld-lion-r5-053.build.releng.scl3.mozilla.com"; "2015-04-08T16:42:28Z" "bld-lion-r5-055.build.releng.scl3.mozilla.com"; "2015-04-08T16:21:06Z" "bld-lion-r5-057.build.releng.scl3.mozilla.com"; "2015-04-08T16:29:00Z" "bld-lion-r5-061.build.releng.scl3.mozilla.com"; "2015-04-08T16:46:05Z" "bld-lion-r5-065.build.releng.scl3.mozilla.com"; "2015-04-08T17:09:50Z" "bld-lion-r5-068.build.releng.scl3.mozilla.com"; "2015-04-08T17:20:42Z" "bld-lion-r5-070.build.releng.scl3.mozilla.com"; "2015-04-08T17:31:15Z" "bld-lion-r5-071.build.releng.scl3.mozilla.com"; "2015-04-08T17:42:34Z" "bld-lion-r5-072.build.releng.scl3.mozilla.com"; "2015-04-08T17:14:24Z" "bld-lion-r5-076.build.releng.scl3.mozilla.com"; "2015-04-08T17:14:28Z" "bld-lion-r5-080.build.releng.scl3.mozilla.com"; "2015-04-08T17:08:28Z" "bld-lion-r5-082.build.releng.scl3.mozilla.com"; "2015-04-08T16:58:50Z" "bld-lion-r5-083.build.releng.scl3.mozilla.com"; "2015-04-08T17:29:21Z" "bld-lion-r5-085.build.releng.scl3.mozilla.com"; "2015-04-08T17:31:22Z" "bld-lion-r5-092.build.releng.scl3.mozilla.com"; "2015-04-08T17:03:59Z" So for about an hour yesterday, lion hosts could talk to mig. ???!?
Assignee | ||
Comment 33•9 years ago
|
||
Sorry, today, about an hour ago. I didn't change anything related during that time.
Assignee | ||
Comment 34•9 years ago
|
||
Just to pick one that's not on the list above: [root@bld-lion-r5-095.try.releng.scl3.mozilla.com ~]# uptime 11:18 up 40 mins, 2 users, load averages: 7.81 6.13 3.29 Apr 08 10:41:17 bld-lion-r5-095 running: pre-task hook: /opt/runner/task_hook.py {"try_num": 1, "max_retries": 5, "task": "1-mig_agent", "result": "RUNNING"} Apr 08 10:41:18 bld-lion-r5-095 1-mig_agent: starting (max time 600s) Apr 08 10:41:18 bld-lion-r5-095 sudo: cltbld : TTY=unknown ; PWD=/Users ; USER=root ; COMMAND=/sbin/mig-agent -m agent-checkin Apr 08 10:41:23 bld-lion-r5-095 kernel: nstat_lookup_entry failed: 2 Apr 08 10:41:24 bld-lion-r5-095 kernel: nstat_lookup_entry failed: 2 Apr 08 10:41:25 bld-lion-r5-095 1-mig_agent: OK Apr 08 10:41:25 bld-lion-r5-095 running: post-task hook: /opt/runner/task_hook.py {"try_num": 1, "max_retries": 5, "task": "1-mig_agent", "result": "OK"} So, somehow that mig agent didn't check in -- but others have??
Reporter | ||
Comment 35•9 years ago
|
||
I just approved 35 bld-lion hosts that had checked into the scheduler. Now that they are approved, they will show up in the mig-agent-search next time they check in.
Reporter | ||
Comment 36•9 years ago
|
||
MIG currently sees 75 bld-lion hosts. Most of them seem to check in every hour or so, and then there's this: name | checkin time -----------------------------------------------+------------------------------- bld-lion-r5-088.build.releng.scl3.mozilla.com | 2015-04-09 03:51:17.226116+00 bld-lion-r5-088.build.releng.scl3.mozilla.com | 2015-04-09 10:56:11.958589+00 That host only checked in twice today at 7 hours intervals. I don't know if we run build job that last for 7 hours, or if this is a potential issue with MIG/Runner missing checkins. Morgan, any thought?
Flags: needinfo?(winter2718)
Comment 37•9 years ago
|
||
In the logs on the machine I'm seeing that mig only ran twice. Instead of running on a set schedule, mig only runs before a buildbot job starts, so if it takes a long time to pick up a job this sort of thing will happen. from /var/tmp/runner.err: 2015-04-09 03:56:03,358 - INFO - iteration 1 2015-04-09 03:56:03,377 - DEBUG - tasks: ['0-darwin_clean_buildbot', '1-cleanslate', '1-mig_agent', '4-buildbot.py', '99-post_flight'] 2015-04-09 03:56:03,378 - DEBUG - Updating env with {'HG_SHARE_BASE_DIR': '/builds/hg-shared', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11', 'RUNNER_CONFIG_CMD': '/opt/runner/bin/python2.7 /opt/runner/bin/runner -c /opt/runner/runner.cfg', 'TWISTD_LOG_PATH': '/builds/slave/twistd.log', 'GIT_SHARE_BASE_DIR': '/builds/git-shared'} 2015-04-09 03:56:03,378 - DEBUG - running pre-task hook: /opt/runner/task_hook.py {"try_num": 1, "max_retries": 5, "task": "0-darwin_clean_buildbot", "result": "RUNNING"} 2015-04-09 03:56:04,381 - DEBUG - 0-darwin_clean_buildbot: starting (max time 600s) 2015-04-09 03:56:05,385 - DEBUG - 0-darwin_clean_buildbot: OK 2015-04-09 03:56:05,386 - DEBUG - running post-task hook: /opt/runner/task_hook.py {"try_num": 1, "max_retries": 5, "task": "0-darwin_clean_buildbot", "result": "OK"} 2015-04-09 03:56:06,389 - DEBUG - running pre-task hook: /opt/runner/task_hook.py {"try_num": 1, "max_retries": 5, "task": "1-cleanslate", "result": "RUNNING"} 2015-04-09 03:56:07,393 - DEBUG - 1-cleanslate: starting (max time 600s) 2015-04-09 03:56:07,441 - DEBUG - No saved process list found, creating one at /var/tmp/cleanslate 2015-04-09 03:56:08,396 - DEBUG - 1-cleanslate: OK 2015-04-09 03:56:08,397 - DEBUG - running post-task hook: /opt/runner/task_hook.py {"try_num": 1, "max_retries": 5, "task": "1-cleanslate", "result": "OK"} 2015-04-09 03:56:09,400 - DEBUG - running pre-task hook: /opt/runner/task_hook.py {"try_num": 1, "max_retries": 5, "task": "1-mig_agent", "result": "RUNNING"} 2015-04-09 03:56:10,403 - DEBUG - 1-mig_agent: starting (max time 600s) [info] Using external conf from /etc/mig/mig-agent.cfg 2015-04-09 03:56:17,413 - DEBUG - 1-mig_agent: OK 2015-04-09 03:56:17,414 - DEBUG - running post-task hook: /opt/runner/task_hook.py {"try_num": 1, "max_retries": 5, "task": "1-mig_agent", "result": "OK"} 2015-04-09 03:56:19,418 - DEBUG - running pre-task hook: /opt/runner/task_hook.py {"try_num": 1, "max_retries": 5, "task": "4-buildbot.py", "result": "RUNNING"} 2015-04-09 03:56:20,422 - DEBUG - 4-buildbot.py: starting (max time 600s) Error sending notice to nagios (ignored)
Flags: needinfo?(winter2718)
Reporter | ||
Comment 38•9 years ago
|
||
Ha, that's interesting. I somehow assumed that all buildbots are busy 100% of the time, and missed the case where an unused buildbot would just no be running mig. MacOS support is working as expected, so I'm going to resolve this bug. Thanks for the help. As a somewhat unrelated note, could we run MIG in daemon mode on those hosts, and use a pre-job hook to shut down then agent, then restart it with a post-job hook?
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Component: Operations Security (OpSec): MIG → MIG
Product: mozilla.org → Enterprise Information Security
Updated•4 years ago
|
Product: Enterprise Information Security → Enterprise Information Security Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•