Closed
Bug 1472860
Opened 6 years ago
Closed 6 years ago
enable mdc1+mdc2 signing servers
Categories
(Release Engineering :: Release Automation: Signing, enhancement)
Release Engineering
Release Automation: Signing
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mozilla, Assigned: mozilla)
References
Details
(Whiteboard: [stockwell disable-recommended])
Attachments
(3 files)
7.70 KB,
patch
|
catlee
:
review+
mozilla
:
checked-in+
|
Details | Diff | Splinter Review |
55 bytes,
patch
|
catlee
:
review+
|
Details | Diff | Splinter Review |
55 bytes,
patch
|
catlee
:
review+
|
Details | Diff | Splinter Review |
We have these in DNS and I believe spun up / moved. We need to make sure they work, and enable them.
Assignee | ||
Comment 1•6 years ago
|
||
Related: bug 1374787 - dep signing servers. Not a blocker, but if we're doing work here, we could also do this.
See Also: → 1374787
Assignee | ||
Comment 2•6 years ago
|
||
I was able to get depsigning to work from depsigning-worker1 to signing7 :) The only real snag was failing to get widevine signing to work (bailed on "bad passphrase") until I updated the tools clone. Should be easy enough to get the various linux instances up in mdc1 and added to the signing passwords list. I still need to do that, plus get the various mac signing servers up (bug 1403674). Dave, is mdc2 ready yet, or should I hold off?
Flags: needinfo?(dhouse)
Aki, please go ahead with mdc2. mdc2 linux signing servers were set up through bug 1443291. mdc2 signing10 10.51.48.29 signing11 10.51.48.30 signing12 10.51.48.31
Flags: needinfo?(dhouse)
Assignee | ||
Comment 4•6 years ago
|
||
Looks like we need a new ssl cert for the mdc2 servers. aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host signing10.srv.releng.mdc2.mozilla.com:9110 ssl:True [CertificateError: ("hostname 'signing10.srv.releng.mdc2.mozilla.com' doesn't match either of 'signing4.srv.releng.scl3.mozilla.com', 'signing5.srv.releng.scl3.mozilla.com', 'signing6.srv.releng.scl3.mozilla.com', 'mac-v2-signing1.srv.releng.scl3.mozilla.com', 'mac-v2-signing2.srv.releng.scl3.mozilla.com', 'mac-v2-signing3.srv.releng.scl3.mozilla.com', 'mac-v2-signing4.srv.releng.scl3.mozilla.com', 'mac-v2-signing6.srv.releng.scl3.mozilla.com', 'mac-v2-signing7.srv.releng.scl3.mozilla.com'",)]
Assignee | ||
Comment 5•6 years ago
|
||
Also, mac-v2-signing13 probably needs puppetizing; I got a password prompt when trying to ssh.
Assignee | ||
Comment 6•6 years ago
|
||
I have my environment set up on releng-puppet2 in mdc1, but it looks like it's looking at the system hiera file instead of my env? Once I clear that up I can proceed with testing.
Assignee | ||
Comment 7•6 years ago
|
||
Tools equivalent to https://github.com/mozilla-releng/signingscript/pull/54 . - remove old mdc1 cert (currently not used in production) - adds new scl3 cert. This will last past Aug30; we can remove it when it's no longer needed. Even if we retire the scl3 signing servers earlier, this should avoid the 60 day expiration warning. - adds a new non-scl3 cert. We can keep this one until summer next year.
Assignee: nobody → aki
Attachment #8994299 -
Flags: review?(catlee)
Assignee | ||
Comment 8•6 years ago
|
||
Attachment #8994302 -
Flags: review?(catlee)
Assignee | ||
Comment 9•6 years ago
|
||
We should be able to enable mdc2 once we test, spin up, and resolve bug 1477139 -- the new cert includes the mdc2 hosts.
Attachment #8994303 -
Flags: review?(catlee)
Updated•6 years ago
|
Attachment #8994299 -
Flags: review?(catlee) → review+
Updated•6 years ago
|
Attachment #8994302 -
Attachment is patch: true
Attachment #8994302 -
Attachment mime type: text/x-github-pull-request → text/plain
Attachment #8994302 -
Flags: review?(catlee) → review+
Updated•6 years ago
|
Attachment #8994303 -
Attachment is patch: true
Attachment #8994303 -
Attachment mime type: text/x-github-pull-request → text/plain
Attachment #8994303 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 10•6 years ago
|
||
Comment on attachment 8994299 [details] [diff] [review] tools-ssl https://hg.mozilla.org/build/tools/rev/57f463082cdb1d4fd6c625b0fe928433d58e3ef3
Attachment #8994299 -
Flags: checked-in+
Assignee | ||
Comment 11•6 years ago
|
||
Fixed mac-depsigning2 by `export SDKROOT=macosx10.10` before puppetizing.
Assignee | ||
Comment 12•6 years ago
|
||
New ssl certs are rolled out, and trees appear green. To do: - spin up mdc2 signing servers - test mdc2 signing servers - enable mdc2 signing servers - verify there are no lingering issues
Assignee | ||
Comment 13•6 years ago
|
||
I had to disable the mdc1 servers [1] because attempts to connect to request a token were hanging. My current theory is that the routes are wrong: we may be going from use1/usw2 through scl3, which would make the request look like the scl3 exit node, which might not be allowlisted. (That's a lot of guesses =\ ) I'm not sure why my one-off tests worked, but we need to resolve this before we can re-enable. (I'm thinking about going through all the failures to try to see a pattern -- maybe only one of the two AWS regions was failing? or something?) Dave, do you have a better idea what's going on, or know who would? I'm going to keep poking at this, but I'd welcome another pair of eyes here. [1] https://github.com/mozilla-releng/build-puppet/commit/397edbc9f6a88ab6244f24aa213aaa8385690bb7
Flags: needinfo?(dhouse)
Assignee | ||
Comment 14•6 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #13) > I'm not sure why my one-off tests worked, but we need to resolve this before > we can re-enable. (I'm thinking about going through all the failures to try > to see a pattern -- maybe only one of the two AWS regions was failing? or > something?) I see both use1 and usw2, both depsigning and prod signing in the failures. I do see some token requests in signing9's depsigning log, as well as some signing requests, so some requests got through. Not seeing a pattern yet. I'm going to look at the various routing rules in AWS and compare the scl3 vs mdc1 rules.
Assignee | ||
Comment 15•6 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #14) > (In reply to Aki Sasaki [:aki] from comment #13) > > I'm not sure why my one-off tests worked, but we need to resolve this before > > we can re-enable. (I'm thinking about going through all the failures to try > > to see a pattern -- maybe only one of the two AWS regions was failing? or > > something?) > > I see both use1 and usw2, both depsigning and prod signing in the failures. > I do see some token requests in signing9's depsigning log, as well as some > signing requests, so some requests got through. Not seeing a pattern yet. It looks like at least one job succeeded by a) getting its token from a non-mdc1 signing server, and then b) successfully getting a signature from an mdc1 server. The problem is directly related to the token request hanging.
Assignee | ||
Comment 16•6 years ago
|
||
Wondering if we need an mdc1 -> us{e1,w2} vpc ?
Assignee | ||
Comment 17•6 years ago
|
||
On depsigning-worker15.srv.releng.use1.mozilla.com, - `sudo mtr signing7.srv.releng.mdc1.mozilla.com` gives: Keys: Help Display mode Restart statistics Order of fields quit Packets Pings Host Loss% Snt Last Avg Best Wrst StDev 1. 169.254.255.109 0.0% 5 0.5 1.5 0.5 5.7 2.3 2. 169.254.255.1 0.0% 4 13.0 4.6 1.7 13.0 5.6 3. 169.254.255.2 0.0% 4 77.6 77.6 77.6 77.7 0.1 4. signing7.srv.releng.mdc1.mozilla.com 0.0% 4 77.8 78.1 77.8 78.9 0.5 - `nc -vz signing7.srv.releng.mdc1.mozilla.com 9110` hangs
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 20•6 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #13) > I had to disable the mdc1 servers [1] because attempts to connect to request > a token were hanging. > My current theory is that the routes are wrong: we may be going from > use1/usw2 through scl3, which would make the request look like the scl3 exit > node, which might not be allowlisted. (That's a lot of guesses =\ ) > > I'm not sure why my one-off tests worked, but we need to resolve this before > we can re-enable. (I'm thinking about going through all the failures to try > to see a pattern -- maybe only one of the two AWS regions was failing? or > something?) > Dave, do you have a better idea what's going on, or know who would? I'm > going to keep poking at this, but I'd welcome another pair of eyes here. > > [1] > https://github.com/mozilla-releng/build-puppet/commit/ > 397edbc9f6a88ab6244f24aa213aaa8385690bb7 Aki, I'm sorry I didn't check into this earlier. I missed the NI. I'll check on the status for bug 1478868 to help out a little.
Flags: needinfo?(dhouse)
Assignee | ||
Comment 21•6 years ago
|
||
Dave, I think we're good now, thanks :) Sorry, I should have cleared the ni earlier.
Assignee | ||
Comment 22•6 years ago
|
||
mdc1 servers are live. I'm going to do a little more testing with mdc2 and then bring those live, skipping mac-v2-signing13 until bug 1477139 is resolved.
Comment 23•6 years ago
|
||
I've requested QTS reimage mac-depsigning6.srv.releng.mdc2.mozilla.com through REQ0240274 (using recovery console, and doing a bless and reboot: `/usr/sbin/bless --netboot --server bsdp://10.51.56.16; reboot`).
Comment 24•6 years ago
|
||
(In reply to Dave House [:dhouse] from comment #23) > I've requested QTS reimage mac-depsigning6.srv.releng.mdc2.mozilla.com > through REQ0240274 (using recovery console, and doing a bless and reboot: > `/usr/sbin/bless --netboot --server bsdp://10.51.56.16; reboot`). QTS closed my request as completed. I'll follow up in bug 1480512
Assignee | ||
Comment 25•6 years ago
|
||
Currently: - linux signing servers in mdc1 are live - mdc2 is not live - no mac mdc? servers are live - there's a signingscript bug that makes token requests more fragile than it needs to be (we don't catch network issues in requesting a token, so if we choose a broken server we'll fail out rather than try another server). I have a fix in the autograph PR, but we can and should land that fix earlier. see bug 1480512 for the mdc macs that still need help (9+13). Ben figured out how to fix the mac signing servers without vnc! I've spun up mac{8,10,11,12}, with testing of all 3 ports. So they're up, but nothing's pointing at them yet. To do: - I will update the docs. - I'll submit a PR to catch the token request network exceptions. - Depending on how long 9 and 13 take to fix, we can either bring up the 4 mac signing servers listed above, or all of them with the signingscript token fix, with the knowledge that we'll retry whenever we hit a disabled server - I should also test the linux mdc2 signing servers, and bring those up. I had dep signing working in prod on a single depsigning worker, but spot testing the nightly and release ports may be wise with all the bustage we've seen rolling these out. Once the above is done and the prod signing servers in mdc{1,2} are up, we can resolve this bug.
Assignee | ||
Comment 26•6 years ago
|
||
Documented at https://mana.mozilla.org/wiki/display/RelEng/Signing#Signing-Copyingtherelease/nightlykeytoanewmacsigningserver .
Assignee | ||
Comment 28•6 years ago
|
||
https://github.com/mozilla-releng/build-puppet/pull/159 I think we're done here.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•