Closed Bug 1424241 Opened 7 years ago Closed 7 years ago

Chain of Trust verification error during Nightly builds

Categories

(Release Engineering :: Release Automation: Other, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: NarcisB, Unassigned)

Details

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=a08e1277507b4a72049c758b09a23f51ddc51a14 https://public-artifacts.taskcluster.net/cCVFLBH4QqaEb9WkkA6S9Q/0/public/logs/chain_of_trust.log 2017-12-08T11:35:02 CRITICAL - Chain of Trust verification error! Traceback (most recent call last): File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 805, in verify_cot_signatures verify_sig=chain.context.config['verify_cot_signature'] File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/gpg.py", line 546, in get_body verify_signature(gpg, signed_data) File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/gpg.py", line 520, in verify_signature raise ScriptWorkerGPGException("Signature could not be verified!") scriptworker.exceptions.ScriptWorkerGPGException: Signature could not be verified! During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 1485, in verify_chain_of_trust verify_cot_signatures(chain) File "/builds/scriptworker/lib/python3.5/site-packages/scriptworker/cot/verify.py", line 808, in verify_cot_signatures raise CoTError("GPG Error verifying chain of trust for {}: {}!".format(path, str(exc))) scriptworker.exceptions.CoTError: 'GPG Error verifying chain of trust for /builds/scriptworker/work/cot/Wt0rDWWCSiy-rfS2FOkW_w/public/chainOfTrust.json.asc: Signature could not be verified!!'
This is happening in staging releases as well. We're debugging to find the problem.
Component: Buildduty → Release Automation
Priority: -- → P1
Weirdest thing is I can actually verify cot against them, locally, and they both seem fine. Now I'm confused. verify_cot --task-type signing A4Qnt3RFTp64h3yETi5Pzw verify_cot --task-type build EJnPf70mQy2Fe7ZP_d_47g There's something I'm missing for sure here.
Something is not write. The cot verification works on my local machine. I'm rerunning the upstream task, EJnPf70mQy2Fe7ZP_d_47g (e.g for nightly l10n macosx) to see if it gets picked up my a different instance. Maybe I'm missing something obvious.
verify_cot doesn't check sigs. I'm currently guessing we have a bad workerType somewhere.
09:20 <&jonasfj> aki: that's my fault 09:20 <&jonasfj> aki: gps and I rolled a new AMI on gecko-3-b-linux
09:28 <aki> once we have the key, we need to make sure that the level 3 build, decision, and docker-image workerTypes have valid cot gpg keys. the other workerTypes don't 09:28 <&jonasfj> hmm.. 09:29 <&jonasfj> I'm rolling back and will ask wcosta to update docs on this next week.. 09:29 <aki> in the meantime, if we can revert to the previous ami, that would help stop the burning 09:29 <aki> thank you!
09:33 <&jonasfj> ah, 09:33 <&jonasfj> I figured it out... 09:33 <&jonasfj> we updated to untrusted AMIs instead of the trusted ones... 09:33 <&jonasfj> problem fixed... 09:33 → Aryx joined (Archaeopter@moz-7q7fi0.sbh3.pu07.2450.2a02.IP) 09:33 <&jonasfj> aki: should we kill the other old workers? 09:33 <aki> jonasfj: please 09:34 <aki> just the ones on the bad ami 09:35 <gps> this all needs to be turn key. if you need to type in a secret as part of a deploy for anything other than unlocking your secrets vault, you are doing it wrong 09:36 <aki> it's a goal to avoid having the private key in human hands, so i think you're saying the same thing 09:36 <aki> somehow guaranteeing the important workerTypes get the trusted ami would be a fine check to add 09:37 <gps> this may have been a one-off - since we (really me) updated the gecko-3-b-linux worker definitions manually 09:37 cristian_brindusan|sheriffduty → cristian_brindusan 09:37 <gps> it is possible that if the automated deployment mechanism were used, it would have used the trusted AMIs 09:37 <aki> yeah, and i think garndt had the most working knowledge of trusted docker-worker ami deployment 09:37 <&jonasfj> gps: this was our fault, had we used the upgrade workertypes script this would all have worked out 09:38 <aki> aha then some discussion about going to certs rather than gpg keys, which is something we'll have to address in 2018.
TC AMI issue. Should be fixed.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.