Support OS X signing in puppetagain

RESOLVED FIXED

Status

Infrastructure & Operations
RelOps: Puppet
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: dustin, Assigned: dustin)

Tracking

Details

Attachments

(5 attachments)

Comment hidden (empty)
Component: Server Operations: RelEng → RelOps: Puppet
Product: mozilla.org → Infrastructure & Operations
QA Contact: arich → dustin
Signmar is built correctly, hoorj.  Just need to find GnuPG and build the test-files DMG.
Ben, FYI, I've got this basically finished - all of the packages that I know are required are ready to roll, so I'm just wrapping up the puppet configs.

How should we go about testing this?
Created attachment 779435 [details] [diff] [review]
bug891561.patch

In fact, let's just get that reviewed now.
Attachment #779435 - Flags: review?(bhearsum)
(In reply to Dustin J. Mitchell [:dustin] from comment #2)
> Ben, FYI, I've got this basically finished - all of the packages that I know
> are required are ready to roll, so I'm just wrapping up the puppet configs.
> 
> How should we go about testing this?

Unless we've got spare minis to test on, I guess we'll have to pull one of the existing masters and repuppetize it.
Comment on attachment 779435 [details] [diff] [review]
bug891561.patch

Review of attachment 779435 [details] [diff] [review]:
-----------------------------------------------------------------

This seems ok other than the typo noted below. Needs real testing though, of course! I noticed you've got bld-lion-r5-003 still reserved, so maybe we can test against it?

::: modules/packages/manifests/libevent-dmg.sh
@@ +29,5 @@
> +
> +# -- create-dmg.sh
> +
> +DIR_TO_PACKAGE=$ROOT
> +PACKAGE_BASENAME=p7zip-9.20.1

typo here?
Attachment #779435 - Flags: review?(bhearsum) → review+
Yes, I'll land with that typo fixed, and with the package regenerated using the correct script.

Sure, we can use bld-lion-r5-003.  I'll make the necessary node changes when I land the patch, and then re-image it to be sure.
It occurs to me that we may not have the right flows for this.  Can you check out bld-lion-r5-003, and if that looks reasonable without actual testing, we can reimage one of the existing mac signing hosts?  That will be a lot easier than adding and later removing temporary flows.
(In reply to Dustin J. Mitchell [:dustin] from comment #7)
> It occurs to me that we may not have the right flows for this.  Can you
> check out bld-lion-r5-003, and if that looks reasonable without actual
> testing, we can reimage one of the existing mac signing hosts?  That will be
> a lot easier than adding and later removing temporary flows.

Unfortunately it doesn't look like we can use these to test with. I'm not sure about the flows, but we need to have VNC access to work around OS X security contexts. This doesn't work due to bug 733534. This also reminded me that our existing signing servers are 10.6 based, not 10.7. IIRC, we can't sign on 10.7 because the signatures don't validate on some older (but supported) versions of OS X.
OK, well, this will have to wait until we've got 10.6 support up and running in PuppetAgain, then.

VNC works fine on Lion, btw.  It's just ML that's broken (in more ways than one!)
Depends on: 898007
Created attachment 782723 [details] [diff] [review]
bug891561-snow-leopard.patch

Minor mods, many of which also appear in bug 891881.  The bulk of the signing stuff is the same as on 10.7.
Attachment #782723 - Flags: review?(bhearsum)
This is waiting on bug 894988 to be able to reimage 10.6 hosts into puppetagain.
Depends on: 894988
Comment on attachment 782723 [details] [diff] [review]
bug891561-snow-leopard.patch

Review of attachment 782723 [details] [diff] [review]:
-----------------------------------------------------------------

r=me assuming the Python/Mercurial versions are still correct.

::: modules/packages/manifests/gnupg.pp
@@ +21,5 @@
> +                    os_version_specific => false,
> +                    before => File['/Library/LaunchAgents/org.gpgtools.macgpg2.gpg-agent.plist.plist'];
> +            }
> +
> +            # and, this file is bogus and causes warnings:

lol

::: modules/packages/manifests/mozilla/py27_mercurial.pp
@@ +42,5 @@
>              packages::pkgdmg {
>                  python27-mercurial:
> +                    version => $macosx_productversion_major ? {
> +                        10.6 => "2.5.4-2",
> +                        default => "2.5.4-1", # pending bug 895995

This bug is fixed, does this comment/version still apply?

::: modules/packages/manifests/mozilla/python27.pp
@@ +59,5 @@
>                  python27:
> +                    version => $macosx_productversion_major ? {
> +                        10.6 => "2.7.3-1",
> +                        10.7 => "2.7.2-1", # pending bug 602908
> +                        10.8 => "2.7.2-1"  # pending bug 602908

Same here.
Attachment #782723 - Flags: review?(bhearsum) → review+
Comment on attachment 782723 [details] [diff] [review]
bug891561-snow-leopard.patch

Yes, both of those are now outdated.  I landed with those changes (and the alternate hg location) removed.
Attachment #782723 - Flags: checked-in+
Ben, how should we handle deploying this?
(In reply to Dustin J. Mitchell [:dustin] from comment #14)
> Ben, how should we handle deploying this?

Coordinating with buildduty should be good enough I think...this should be mostly a no-op change, right?
In theory, yes, but we haven't tested it for signing at all.  Should we reimage one signing server and see what happens?
(In reply to Dustin J. Mitchell [:dustin] from comment #16)
> In theory, yes, but we haven't tested it for signing at all.  Should we
> reimage one signing server and see what happens?

Sorry, my PTO addled brain mixed up what deploying meant. +1 to this, can we plan to do it on Wednesday or Thursday? I can pull one out of the pool and double check it in staging before we do the rest.
Sure thing!
Ben, do you want to reimage one of these today or tomorrow?  I'm offline starting Wednesday afternoon.
Created attachment 797846 [details] [diff] [review]
remove mac signing 4 from the list of active mac signing servers

Once we get the masters reconfiged with this we can re-image this machine.
Attachment #797846 - Flags: checked-in+
mac-signing4 can be reimaged at any time.
Imaging is complete.
This had some trouble with $mercurial pointing to the wrong place (in reopened bug 895995).  I added a symlink to fix it temporarily.
Depends on: 911290
Hit a few issues bringing up an instance on mac-signing4:
1) Couldn't login through VNC. Dustin suggested launching through /usr/libexec/StartupItemContext. Still trying to see if this will pan out.
2) DMG test signing doesn't work because it tries to create a lockfile in the test file dir, which is now shared and not writable per instance. This is a code issue that we should fix, tracking it in bug 911290.
3) The signmar version we're using appears to be broken. Dustin is currently getting tip of it built. If that doesn't work we'll revert to an older version (looks like Gecko 15 would be a good choice). I filed bug 911289 on signmar.
Ben, signmar-23.0 is at
  http://puppetagain.pub.build.mozilla.org/data/repos/DMGs/signmar-23.0.dmg
if you want to install it and see if it fixes things.  Signs point to "no", but maybe it's worth a try.
Depends on: 912695, 911289
(In reply to Dustin J. Mitchell [:dustin] from comment #25)
> Ben, signmar-23.0 is at
>   http://puppetagain.pub.build.mozilla.org/data/repos/DMGs/signmar-23.0.dmg
> if you want to install it and see if it fixes things.  Signs point to "no",
> but maybe it's worth a try.

This one doesn't work because of a missing library :(

[cltsign@mac-signing4.build.scl1.mozilla.com dep-key-signing-server]$ ls /tools/signmar/bin/
libnspr4.dylib		libnss3.dylib		libnssutil3.dylib	libplc4.dylib		libplds4.dylib		signmar
[cltsign@mac-signing4.build.scl1.mozilla.com dep-key-signing-server]$ /tools/signmar/bin/signmar -d secrets/mar -n dep1 -s /tools/signing-test-files/test.mar /tmp/test.mar.tmp
dyld: Library not loaded: @executable_path/libmozglue.dylib
  Referenced from: /tools/signmar/bin/libnss3.dylib
  Reason: image not found
Trace/BPT trap
Status: NEW → ASSIGNED
That got fixed, and I'm now reimaging the host to make sure everything works smoothly.
darnit:

Sep  6 12:47:35 mac-signing4 puppet-agent[121]: Could not find command '/tools/python27-mercurial/bin/hg'
Sep  6 12:47:35 mac-signing4 puppet-agent[121]: (/Stage[main]/Toplevel::Server::Signing/Signingserver::Instance[rel-key-signing-server]/Mercurial::Repo[signing-rel-key-signing-server-tools]/Exec[clone-/builds/signing/rel-key-signing-server/tools]/returns) change from notrun to 0 failed: Could not find command '/tools/python27-mercurial/bin/hg'

(bug 895995)
From nagios:
 ntp_time:		NRPE: Command 'check_ntp_time' not defined
 signing-server:	NRPE: Command 'check_child_procs_regex' not defined
Duration 9 days.
Everything looked fine on mac-signing4 over the weekend. I'm planning to put this machine in the production rotation today. As long as we don't hit any issues by tomorrow morning we can start re-imaging the other machines then.

It looks like there's one nrpe.cfg tweak that needs to be made though:
08:39 < nagios-releng> Mon 05:39:30 PDT [4443] mac-signing4.build.scl1.mozilla.com:ntp time is CRITICAL: NRPE: Command check_ntp_time not defined 
                       (http://m.allizom.org/ntp+time)
Just noticed this one failing too:
09:15 < nagios-releng> Mon 06:15:30 PDT [4455] mac-signing4.build.scl1.mozilla.com:signing-server is CRITICAL: NRPE: Command check_child_procs_regex not defined 
                       (http://m.allizom.org/signing-server)
Interesting - the linux servers aren't monitored at all.  I added bug 914139 for that.
Created attachment 801560 [details] [diff] [review]
bug891561-nagios.patch

Clean up the nagios checks.  This tested just fine, and its landing shouldn't block further deployment.

Note that this affects the Linux servers, too - they have the same checks configured.
Attachment #801560 - Flags: review?(bhearsum)
Attachment #801560 - Flags: review?(bhearsum) → review+
When trying to start up a signing server instance with a developer ID cert I hit this old friend of an error:
2013-09-09 07:17:54,117 - Test.app: CSSMERR_TP_NOT_TRUSTED

Which indicates that it can't trace the developer ID back to a trusted root. I tried adding the developer ID root cert with "security add-trusted-cert", but that didn't seem to make a difference.

Trying to debug this further still.
Attachment #801560 - Flags: checked-in+
Created attachment 801618 [details] [diff] [review]
add developer id intermediary cert to system store

(In reply to Ben Hearsum [:bhearsum] from comment #34)
> When trying to start up a signing server instance with a developer ID cert I
> hit this old friend of an error:
> 2013-09-09 07:17:54,117 - Test.app: CSSMERR_TP_NOT_TRUSTED
> 
> Which indicates that it can't trace the developer ID back to a trusted root.
> I tried adding the developer ID root cert with "security add-trusted-cert",
> but that didn't seem to make a difference.
> 
> Trying to debug this further still.

I figured out how to get this added with Puppet.

It's not clear to me why "security" returns an error from the install command, but it certainly adds the certificate despite that.

Do you want to re-image mac-signing4 one more time after this lands?
Attachment #801618 - Flags: review?(dustin)
Attachment #801618 - Flags: review?(dustin) → review+
And yes, let's re-image - it's as simple as
  bless --netboot --server bsdp://10.12.48.8; reboot
Attachment #801618 - Flags: checked-in+
I re-imaged the machine and it's now back in the production pool. I'm watching the logs closely.
mac-signing4 looked fine overnight. We're starting the process of re-imaging mac-signing1-3, starting with 3.
mac-signing3 is back in production after the re-image. mac-signing1 and 2 haven't been started yet due to complications with re-imaging this type of machine in scl3.
Depends on: 914660
Comment on attachment 801560 [details] [diff] [review]
bug891561-nagios.patch

I only just landed this now - there must have been a push conflict yesterday that I didn't notice.
Depends on: 915194
Depends on: 915386
Current status is that we are testing a 10.6 deploy in scl3, as we haven't done one before.  Once that's known to work, we'll reimage mac-signing{1,2}
Verification complete.  We can reimage mac-signing{1,2} (one at a time) whenever y'all would prefer.
mac-signing1 and mac-signing2 both got re-imaged this morning and are back in production. I think we're finally done here?
I think so too!
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.