Closed
Bug 1424241
Opened 7 years ago
Closed 7 years ago
Chain of Trust verification error during Nightly builds
Categories
(Release Engineering :: Release Automation: Other, defect, P1)
Release Engineering
Release Automation: Other
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: NarcisB, Unassigned)
Details
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=a08e1277507b4a72049c758b09a23f51ddc51a14
https://public-artifacts.taskcluster.net/cCVFLBH4QqaEb9WkkA6S9Q/0/public/logs/chain_of_trust.log
2017-12-08T11:35:02 CRITICAL - Chain of Trust verification error!
Traceback (most recent call last):
File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 805, in verify_cot_signatures
verify_sig=chain.context.config['verify_cot_signature']
File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/gpg.py", line 546, in get_body
verify_signature(gpg, signed_data)
File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/gpg.py", line 520, in verify_signature
raise ScriptWorkerGPGException("Signature could not be verified!")
scriptworker.exceptions.ScriptWorkerGPGException: Signature could not be verified!
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 1485, in verify_chain_of_trust
verify_cot_signatures(chain)
File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 808, in verify_cot_signatures
raise CoTError("GPG Error verifying chain of trust for {}: {}!".format(path, str(exc)))
scriptworker.exceptions.CoTError: 'GPG Error verifying chain of trust for /builds/scriptworker/work/cot/Wt0rDWWCSiy-rfS2FOkW_w/public/chainOfTrust.json.asc: Signature could not be verified!!'
Comment 1•7 years ago
|
||
This is happening in staging releases as well. We're debugging to find the problem.
Component: Buildduty → Release Automation
Priority: -- → P1
Comment 2•7 years ago
|
||
Weirdest thing is I can actually verify cot against them, locally, and they both seem fine. Now I'm confused.
verify_cot --task-type signing A4Qnt3RFTp64h3yETi5Pzw
verify_cot --task-type build EJnPf70mQy2Fe7ZP_d_47g
There's something I'm missing for sure here.
Comment 3•7 years ago
|
||
Comment 4•7 years ago
|
||
Something is not write. The cot verification works on my local machine. I'm rerunning the upstream task, EJnPf70mQy2Fe7ZP_d_47g (e.g for nightly l10n macosx) to see if it gets picked up my a different instance. Maybe I'm missing something obvious.
Comment 5•7 years ago
|
||
verify_cot doesn't check sigs. I'm currently guessing we have a bad workerType somewhere.
Comment 6•7 years ago
|
||
09:20 <&jonasfj> aki: that's my fault
09:20 <&jonasfj> aki: gps and I rolled a new AMI on gecko-3-b-linux
Comment 7•7 years ago
|
||
09:28 <aki> once we have the key, we need to make sure that the level 3 build, decision, and docker-image workerTypes have valid cot gpg keys. the other workerTypes don't
09:28 <&jonasfj> hmm..
09:29 <&jonasfj> I'm rolling back and will ask wcosta to update docs on this next week..
09:29 <aki> in the meantime, if we can revert to the previous ami, that would help stop the burning
09:29 <aki> thank you!
Comment 8•7 years ago
|
||
09:33 <&jonasfj> ah,
09:33 <&jonasfj> I figured it out...
09:33 <&jonasfj> we updated to untrusted AMIs instead of the trusted ones...
09:33 <&jonasfj> problem fixed...
09:33 → Aryx joined (Archaeopter@moz-7q7fi0.sbh3.pu07.2450.2a02.IP)
09:33 <&jonasfj> aki: should we kill the other old workers?
09:33 <aki> jonasfj: please
09:34 <aki> just the ones on the bad ami
09:35 <gps> this all needs to be turn key. if you need to type in a secret as part of a deploy for anything other than unlocking your secrets vault, you are doing it wrong
09:36 <aki> it's a goal to avoid having the private key in human hands, so i think you're saying the same thing
09:36 <aki> somehow guaranteeing the important workerTypes get the trusted ami would be a fine check to add
09:37 <gps> this may have been a one-off - since we (really me) updated the gecko-3-b-linux worker definitions manually
09:37 cristian_brindusan|sheriffduty → cristian_brindusan
09:37 <gps> it is possible that if the automated deployment mechanism were used, it would have used the trusted AMIs
09:37 <aki> yeah, and i think garndt had the most working knowledge of trusted docker-worker ami deployment
09:37 <&jonasfj> gps: this was our fault, had we used the upgrade workertypes script this would all have worked out
09:38 <aki> aha
then some discussion about going to certs rather than gpg keys, which is something we'll have to address in 2018.
Comment 9•7 years ago
|
||
TC AMI issue. Should be fixed.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•