Closed
Bug 1039977
(mac-v2-signing1)
Opened 11 years ago
Closed 10 years ago
mac-v2-signing1 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Unassigned)
References
Details
Attachments
(1 file)
3.38 KB,
patch
|
bhearsum
:
review+
nthomas
:
checked-in+
|
Details | Diff | Splinter Review |
Spontaneously combusted:
Wed 23:25:33 PDT [4961] mac-signing1.srv.releng.scl3.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
Reporter | ||
Comment 1•11 years ago
|
||
Rebooted via PDU. No indication of what went wrong in signing logs, /var/log/system.log or kernel.log. Restarted signing servers.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 2•10 years ago
|
||
Locked up again, nagios started with
Sun 01:41:43 PDT [4407] mac-signing1.srv.releng.scl3.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
arr rebooted via pdu, bug 1041305.
Ben, can we send this off for diagnostics as-is ? At the moment, the slaves figure out immediately that it's not responsive, presumably OS X refuses the connection because none of the signing scripts are running.
We probably need to remove mac-signing1 from puppet/modules/buildmaster/templates/passwords.py.erb if it's going to go offline, because I don't see a timeout set in tools/lib/python/signing/client.py (and the default in the socket library is no timeout).
Status: RESOLVED → REOPENED
Flags: needinfo?(bhearsum)
Resolution: FIXED → ---
Comment 3•10 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #2)
> Locked up again, nagios started with
> Sun 01:41:43 PDT [4407] mac-signing1.srv.releng.scl3.mozilla.com is DOWN
> :PING CRITICAL - Packet loss = 100%
>
> arr rebooted via pdu, bug 1041305.
>
> Ben, can we send this off for diagnostics as-is ?
I'd be more comfortable if we could wipe the disk first. Everything secret is passphrase protected, but it's still better to make sure people can't get their hands on the files.
> We probably need to remove mac-signing1 from
> puppet/modules/buildmaster/templates/passwords.py.erb if it's going to go
> offline, because I don't see a timeout set in
> tools/lib/python/signing/client.py (and the default in the socket library is
> no timeout).
Yeah, agreed.
Flags: needinfo?(bhearsum)
Reporter | ||
Comment 4•10 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #3)
> I'd be more comfortable if we could wipe the disk first. Everything secret
> is passphrase protected, but it's still better to make sure people can't get
> their hands on the files.
We could do some srm on selected key chains and certs instead of a full wipe, but I'm not sure where they all are placed.
Reporter | ||
Comment 5•10 years ago
|
||
Reporter | ||
Updated•10 years ago
|
Attachment #8460128 -
Flags: review?(bhearsum)
Comment 6•10 years ago
|
||
Comment on attachment 8460128 [details] [diff] [review]
[puppet] Remove mac-signing1 while testing
Review of attachment 8460128 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks!
Attachment #8460128 -
Flags: review?(bhearsum) → review+
Reporter | ||
Comment 7•10 years ago
|
||
Comment on attachment 8460128 [details] [diff] [review]
[puppet] Remove mac-signing1 while testing
Landed in puppet, will need a reconfig too:
https://hg.mozilla.org/build/puppet/rev/fd51ccdb93e8
https://hg.mozilla.org/build/puppet/rev/38d60c42ecc1
Attachment #8460128 -
Flags: checked-in+
Reporter | ||
Comment 8•10 years ago
|
||
Cleaned up secrets.
Comment 9•10 years ago
|
||
Back online after a RAM replacement and a new name (bug 1049546).
Alias: mac-signing1 → mac-v2-signing1
Summary: mac-signing1 problem tracking → mac-v2-signing1 problem tracking
Updated•10 years ago
|
Status: REOPENED → RESOLVED
Closed: 11 years ago → 10 years ago
Resolution: --- → FIXED
Comment 10•10 years ago
|
||
The dep-key signing server went down at 2014-09-03 06:28:07, for an (as yet) unknown reason:
pmoore@Elisandra:~ $ ssh root@mac-v2-signing1.srv.releng.scl3.mozilla.com tail -10 /builds/signing/dep-key-signing-server/signing.log
2014-09-03 06:26:12,399 - DEBUG - Cleaning up...
2014-09-03 06:26:12,638 - INFO - Deleting /builds/signing/dep-key-signing-server/unsigned-files/0d92f62009f95636e4dccef61a728a2ff979f424 (too old)
2014-09-03 06:26:12,723 - INFO - Deleting /builds/signing/dep-key-signing-server/unsigned-files/0d92f62009f95636e4dccef61a728a2ff979f424.fn (too old)
2014-09-03 06:26:12,728 - INFO - Deleting /builds/signing/dep-key-signing-server/unsigned-files/cf00cf13318ed9dd0bd5a83121b118986e25c46f (too old)
2014-09-03 06:26:12,734 - INFO - Deleting /builds/signing/dep-key-signing-server/unsigned-files/cf00cf13318ed9dd0bd5a83121b118986e25c46f.fn (too old)
2014-09-03 06:26:12,855 - INFO - Deleting /builds/signing/dep-key-signing-server/signed-files/gpg/0c29eacf41ad6bc3172aac1412bc5fe09cd6aa60.out with no unsigned file
2014-09-03 06:26:12,864 - INFO - Deleting /builds/signing/dep-key-signing-server/signed-files/gpg/0d92f62009f95636e4dccef61a728a2ff979f424 with no unsigned file
2014-09-03 06:26:12,868 - INFO - Deleting /builds/signing/dep-key-signing-server/signed-files/gpg/cf00cf13318ed9dd0bd5a83121b118986e25c46f with no unsigned file
2014-09-03 06:28:07,382 - INFO - pid 99383 exiting normally
2014-09-03 06:28:07,684 - INFO - exiting
pmoore@Elisandra:~ $
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 11•10 years ago
|
||
I can start it up, but first I'd like to check if I can find any reason this might have intentionally been brought down...
Comment 12•10 years ago
|
||
top -o cpu
doesn't show any obvious cpu eaters etc - so I'm going to restart it now
Comment 13•10 years ago
|
||
Started back up. However, when restarting, I was expecting to be prompted for passphrases for:
* gpg
* signcode
* mar
* jar
* b2gmar
* dmg
However, I was only prompted for passphrases for:
* gpg
* dmg
* mar
In other words, the following types of signing are probably not available on this dep-key signing server instance on this server (since I was not prompted for their passphrases):
* signcode
* jar
* b2gmar
This is in contrast to bug 1062302 which says gpg and mar should be disabled on mac v2 signing servers (whereas they seem to comprise 2/3rds of the available services for dep-key signing).
Ben, what are your thoughts?
Pete
Flags: needinfo?(bhearsum)
Comment 14•10 years ago
|
||
(In reply to Pete Moore [:pete][:pmoore] from comment #13)
> Started back up. However, when restarting, I was expecting to be prompted
> for passphrases for:
>
> * gpg
> * signcode
> * mar
> * jar
> * b2gmar
> * dmg
>
> However, I was only prompted for passphrases for:
>
> * gpg
> * dmg
> * mar
Mac signing machines have only ever done gpg, dmg, and mar signing. It's not surprising at all that you were prompted for only these three. You may have been confused by the list of passphrases in the private repo, but that's simply a list of all possible ones - it doesn't imply that they're all enabled on all signing servers.
> This is in contrast to bug 1062302 which says gpg and mar should be disabled
> on mac v2 signing servers (whereas they seem to comprise 2/3rds of the
> available services for dep-key signing).
This part is confusing. We changed Buildbot to stop looking at mac-v2 signing servers for gpg and mar signing, because they were dying under the load. We didn't change the formats that were enabled for the servers themselves because of time constraints. It's likely that we *will* disable those formats, but more investigation is needed first.
I suspect what happened here is that after a very long time, my request for a stop of that instance (made about ~24h ago) finally went through. The machine was so heavily loaded at the time that I wasn't sure if python even launched to try to shut it down, and I didn't think to check later. Sorry =(
Flags: needinfo?(bhearsum)
Comment 15•10 years ago
|
||
This is lingering in the buildduty queue, Its unclear to me what is left to do, if anything. Can one of you summarize or move/close please
Flags: needinfo?(pmoore)
Flags: needinfo?(bhearsum)
Comment 16•10 years ago
|
||
I'll look at this soon.
Assignee: nobody → bhearsum
Flags: needinfo?(pmoore)
Flags: needinfo?(bhearsum)
Comment 17•10 years ago
|
||
There's nothing left to do here - the critical issue is fixed. We should disable gpg and mar signing on all of the mac signing servers, but that's not a critical issue. I filed bug 1065871 for that.
Assignee: bhearsum → nobody
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•