Closed
Bug 1436712
Opened 7 years ago
Closed 7 years ago
Linux and OS X nightlies failed with error: Chain of Trust verification error!
Categories
(Release Engineering :: Release Automation: Other, defect)
Release Engineering
Release Automation: Other
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 1420482
People
(Reporter: aryx, Assigned: mozilla)
References
Details
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=0ac953fcddf10132eaecdb753d72b2ba5a43c32a&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Log: https://tools.taskcluster.net/groups/HMQt9pRvSXOfagdouqHNqg/tasks/Sz26vcTyQkqH-FmxZYtEnw/runs/0/logs/public%2Flogs%2Fchain_of_trust.log
signing:build:docker-image G-nqPXUNR2K--jN0UDp4Ng workerType differs!
graph: "gecko-t-linux-xlarge"
task: "gecko-3-images"
signing:build:docker-image G-nqPXUNR2K--jN0UDp4Ng routes differs!
graph: [
"tc-treeherder.v2.autoland.2ebcb42faee25c8eb2dd34d01c062bebcbc7b1ea.59266",
"coalesce.v1.autoland.79de2402eaedaded334a"
]
task: [
"index.gecko.cache.level-3.docker-images.v1.debian7-amd64-build.hash.24b13257acc45f20f04f8ad54180c53623000735232b2d7a81f914aca9c0f8fc",
"index.gecko.cache.level-3.docker-images.v1.debian7-amd64-build.latest",
"index.gecko.cache.level-3.docker-images.v1.debian7-amd64-build.pushdate.2018.02.07.20180207214729",
"tc-treeherder.v2.autoland.2ebcb42faee25c8eb2dd34d01c062bebcbc7b1ea.59266"
]
signing:build:docker-image G-nqPXUNR2K--jN0UDp4Ng scopes differs!
graph: [
"secrets:get:project/taskcluster/gecko/hgfingerprint",
"docker-worker:feature:allowPtrace",
"docker-worker:cache:level-3-autoland-test-workspace-v2-bc7e1a7ad01a345394f1",
"docker-worker:cache:level-3-checkouts-v2-bc7e1a7ad01a345394f1"
]
task: [
"secrets:get:project/taskcluster/gecko/hgfingerprint",
"docker-worker:cache:level-3-imagebuilder-sparse-bc7e1a7ad01a345394f1-v2-bc7e1a7ad01a345394f1"
]
signing:build:docker-image G-nqPXUNR2K--jN0UDp4Ng tags differs!
graph: {
"createdForUser": "mbanner@mozilla.com",
"kind": "test",
"label": "test-linux32-stylo-disabled/debug-web-platform-tests-e10s-12",
"os": "linux",
"worker-implementation": "docker-worker"
}
task: {
"createdForUser": "mbanner@mozilla.com",
"kind": "docker-image",
"label": "build-docker-image-debian7-amd64-build"
}
2018-02-08T11:48:27 CRITICAL - Can't find task signing:build:docker-image G-nqPXUNR2K--jN0UDp4Ng in signing:build:docker-image:parent TkH1qBmxTvaFAgZqDal4Pw task-graph.json!
2018-02-08T11:48:27 CRITICAL - Chain of Trust verification error!
Updated•7 years ago
|
Component: General → Release Automation
Product: Taskcluster → Release Engineering
QA Contact: catlee
Comment 1•7 years ago
|
||
Looks like these pushes are in the regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=0ac953fcddf10132eaecdb753d72b2ba5a43c32a
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=06b5d7476ebd6dd611f1d22c15f3be2d812fa51b
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=8cc2427a322caa1e2c09ca3957335f88e573dc7a
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=fc3d9de1f56c35341db3d903dbbd0be5931ca343
Comment 2•7 years ago
|
||
I looked through the taskcluster/ changes in the regression range and there's not much.
https://bugzilla.mozilla.org/show_bug.cgi?id=1436283 added a new package to taskcluster/ci/docker-image/kind.yml, and https://bugzilla.mozilla.org/show_bug.cgi?id=1398799 made some changes to a release-only docker image. I don't see how either could cause CoT errors, but I only have limited understanding of this.
cc'ing glandium for the first bug, and I'll keep looking into the second bug (which is mine).
Comment 3•7 years ago
|
||
The fact that windows is not broken makes me wonder if something on the workers has changed - I think both of these platforms build on similar Linux images, whereas Windows builds on Windows?
Comment 4•7 years ago
|
||
15:03 < aki-away> guessing https://hg.mozilla.org/mozilla-central/rev/7eb47d8f194d and the next commit triggered a rebuild
15:04 < aki-away> we probably need to back that out until we kill cotv1
Assignee | ||
Comment 5•7 years ago
|
||
07:10 <aki-away> sure. the short explanation is docker-image is one of the two highly sensitive docker-worker task types, along with the decision task, so we have extra checks around both. the above changes break the assumptions in those checks
Assignee | ||
Comment 6•7 years ago
|
||
If our current theory (autoland retriggers broke docker-image cot verifiability) holds true, bug 1420482 would help us prevent old-style retriggers .
Depends on: 1420482
Assignee | ||
Comment 7•7 years ago
|
||
09:26 <aki> i *think* the retriggers fixed the docker-image bustage
09:27 <aki> `verify_cot --task-type build --cleanup -- Z1QpdX6IQym4fLWmubDSkA` breaks, for https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=161110628
09:27 <aki> `verify_cot --task-type build --cleanup Cv31u_kTQAaBsEXpVjm16w` works for https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=161121336
I essentially reran the gecko-3-images tasks in TkH1qBmxTvaFAgZqDal4Pw 's label-to-taskid.json, which required a few passes to get right. Thanks Tom and Dustin!
Assignee: nobody → aki
Assignee | ||
Comment 8•7 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #6)
> If our current theory (autoland retriggers broke docker-image cot
> verifiability) holds true, bug 1420482 would help us prevent old-style
> retriggers .
I think comment #7 verifies that the retriggers broke cot.
Assignee | ||
Comment 9•7 years ago
|
||
10:15 <aki> crap, repackage-signing cot issues on birch after a merge. hoping it's a one-off
10:16 <bhearsum> looks like we have it on maple too, after i merged central
10:16 <bhearsum> mozilla-beta seems fine
10:18 <aki> yeah. cot assumes docker-image tasks have an allowlisted dockerhub image sha. i think something changed docker-image to be built from another docker-image
10:19 <aki> guessing we still need to back something out or make an emergency cot fix
Assignee | ||
Comment 10•7 years ago
|
||
New theory: the reruns in comment #7 means we have new shas for the docker-images. These finished before the graphs started on birch and maple, but a delay (somewhere?) could have meant that the builds used the old sha and verification used the new sha, breaking verification.
Trying a new push to birch. If the new theory holds up, the next round of m-c nightlies should be green.
Assignee | ||
Comment 11•7 years ago
|
||
11:37 <aki> rerunning the repackage-signing task on birch fixed it
which suggests we could be in good shape... we need to verify with a full nightly graph.
Comment 12•7 years ago
|
||
is bug 1436960 related?
Assignee | ||
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Assignee | ||
Comment 14•7 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #12)
> is bug 1436960 related?
Yup. Also an old-style retrigger, caused by not fixing bug 1420482.
Reporter | ||
Comment 15•7 years ago
|
||
Can you explain why this is a duplicate of bug 1420482, please? Thank you in advance.
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=0ac953fcddf10132eaecdb753d72b2ba5a43c32a&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=success&selectedJob=161078000&filter-searchStr=linux%20opt on Linux 32 opt:
- has one build B which is green (no retriggers)
- nightly signing Ns task run 0 already had the CoT failure: https://tools.taskcluster.net/groups/HMQt9pRvSXOfagdouqHNqg/tasks/QinyJ3j8QYCRQSOJXuLoOg/runs/0/logs/public%2Flogs%2Fchain_of_trust.log
Flags: needinfo?(aki)
Assignee | ||
Comment 16•7 years ago
|
||
This bug was caused by Glandium old-style retriggering the docker-image tasks in https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=2ebcb42faee25c8eb2dd34d01c062bebcbc7b1ea&filter-searchStr=docker-image , which broke Chain of Trust verification across all trees with that revision merged in.
The fix was to rerun those tasks via the taskcluster `rerun` command (taskcluster cli or tc-filter.py).
I'm duping against bug 1436960 because although the short-term fix was to overwrite the index with properly run (read: not old-style retriggered) docker-image tasks, the deeper issue is that the treeherder retrigger defaults to old-style retriggers.
Flags: needinfo?(aki)
Assignee | ||
Comment 17•7 years ago
|
||
Also, the old graphs pointing at the broken [old-style retriggered] docker-image tasks had broken task definitions, so we had to launch new nightly graphs to pick up the fix. I waited for the afternoon nightlies.
(In reply to Aki Sasaki [:aki] from comment #16)
> The fix was to rerun those tasks via the taskcluster `rerun` command
> (taskcluster cli or tc-filter.py).
Alternately, using the new-style custom-action retriggers works here.
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•