Closed
Bug 1328719
(cot-v2)
Opened 8 years ago
Closed 7 years ago
[tracking] rebuild the decision task definition for chain of trust verification
Categories
(Taskcluster :: General, defect)
Taskcluster
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mozilla, Assigned: mozilla)
References
Details
Attachments
(10 files)
857 bytes,
patch
|
dustin
:
review+
|
Details | Diff | Splinter Review |
1.63 KB,
patch
|
bstack
:
review+
dustin
:
review+
|
Details | Diff | Splinter Review |
55 bytes,
text/x-github-pull-request
|
jlorenzo
:
review+
|
Details | Review |
61 bytes,
text/plain
|
jlorenzo
:
review+
|
Details |
3.03 KB,
patch
|
mtabara
:
review+
|
Details | Diff | Splinter Review |
2.81 KB,
patch
|
mtabara
:
review+
|
Details | Diff | Splinter Review |
2.88 KB,
patch
|
mtabara
:
review+
|
Details | Diff | Splinter Review |
3.74 KB,
patch
|
jlorenzo
:
review+
|
Details | Diff | Splinter Review |
2.94 KB,
patch
|
nthomas
:
review+
|
Details | Diff | Splinter Review |
1017 bytes,
text/plain
|
Details |
In the chain of trust, the decision task is key. It spawns all of the other tasks in the graph, andtheir task definitions can be verified against the decision task's task-graph.json. However, the decision task itself could be any arbitrary task; it takes a considerable amount of code to try to determine whether the decision task appears legitimate and tied to a revision in-tree. During the chain of trust tech talk, :jonasfj and I think :dustin suggested we rebuild the decision task's task definition from the in-tree info (most likely .taskcluster.yml). If the decision task's task definition matches, and the other tasks match the decision task's task-graph.json, we can definitively say that the graph is generated from that revision in-tree. We could then remove, or mark as optional, a lot of the other checks in chain of trust verification. When I looked into this, I noted that this isn't a quick fix. We will likely need to change how mozilla-taskcluster works for on-push decision task generation, and we definitely need to change how ourhooks-triggered nightlies work. Here's my current list, which may grow. These will turn into blocking bugs. - decision task must be tied to a revision in-tree - pre-decision task for hooks? since the hook doesn't know which revision to use at trigger-time - we need to be able to see that revision at verify-time, either in the chain of trust artifact, orthe task definition. - mozilla-taskcluster seems to hold a lot of information that isn't in .taskcluster.yml; ideally that information moves to .taskcluster.yml - this can be read as, ".taskcluster.yml doesn't have enough info to rebuild the decision task taskdefinition" - we could also clone mozilla-taskcluster and run against .taskcluster.yml at chain of trust verification time, but if mozilla-taskcluster changes independent of mozilla-central, previously good decision task definitions may suddenly become invalid. probably best to have all information in-tree. - the hooks decision task must also be verifiable; it should have a similar or same definition as anon-push decision task - decision task verification must then be able to handle nightly + on-push cases, and most likely more in the future. - we have an allowlist for decision task docker image shas. If we were somehow able to pin that shain-tree, we could get rid of the allowlist. If the docker-image sha were similarly pinned in-tree, we could remove both allowlists. This is likely a fragile part of chain of trust verification, so this would be a big win.
Assignee | ||
Comment 1•8 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #0) > - we have an allowlist for decision task docker image shas. If we were > somehow able to pin that shain-tree, we could get rid of the allowlist. If > the docker-image sha were similarly pinned in-tree, we could remove both > allowlists. This is likely a fragile part of chain of trust verification, > so this would be a big win. It looks like we have this here: https://hg.mozilla.org/mozilla-central/file/a2741dd43eea/.taskcluster.yml#l82 I'm not sure we do the same thing with the docker-image sha. Adding that, and fixing bug 1326436, would allow me to remove the docker image allowlists for decision and docker-image tasks, once this bug is resolved.
Depends on: 1326436
Comment 2•8 years ago
|
||
> the hooks decision task must also be verifiable; it should have a similar or same definition as anon-push decision task
I'm not sure this is necessary -- if the decision task it produces is legitimate, then it doesn't matter what produced it.
BTW, this is called a "cron task"
Assignee: nobody → aki
Assignee | ||
Comment 3•8 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #2) > > the hooks decision task must also be verifiable; it should have a similar or same definition as anon-push decision task > > I'm not sure this is necessary -- if the decision task it produces is > legitimate, then it doesn't matter what produced it. > > BTW, this is called a "cron task" In my rough wording, the "pre-decision task" maps to the cron task, and the "hooks decision task" maps to the decision task that the cron task generates. Agreed that the cron task doesn't need to match anything or need to be verified. Also agreed that it's not necessary that the cron-generated decision task look exactly like the mozilla-taskcluster-generated decision task, as long as we can verify both. However, the more similar they are, the more we can reuse the verification logic.
Assignee | ||
Updated•8 years ago
|
Alias: cot-v2
Updated•8 years ago
|
Blocks: tcmigration-cleanup
Assignee | ||
Comment 5•7 years ago
|
||
Dustin: I think the dep is backwards. We need json-e to proceed with this bug, so bug 1372600 should block this bug. Agree?
Flags: needinfo?(dustin)
Comment 6•7 years ago
|
||
There's already a dep on the JSON-e implementation. Deploying JSON-e depends on this bug being complete (this being one of the places to deploy it). I'm using that tracker as a way of organizing all of the things required to call JSON-e "done".
Flags: needinfo?(dustin)
Assignee | ||
Comment 7•7 years ago
|
||
Ok. In https://github.com/escapewindow/scriptworker/commits/cotv2 , I have enough checks + hacked hardcodes to get verify_cot --task-type decision --cleanup -- A3klZsqPTieZ6q8cUOH2Iw to only break on datestrings. I have plans for the hardcodes -- mainly I'm missing the hg pushlog and production-branches.json references. The main question is, how do I override $fromNow to be based upon the task.created datestring? I tried https://github.com/escapewindow/scriptworker/commit/9352bf8f0cc7f25b657b6592d445b009a46a91da , but it appears the builtin takes precedence over the context.
Flags: needinfo?(dustin)
Assignee | ||
Comment 8•7 years ago
|
||
I suppose I can mock datetime.datetime.utcnow() during the call, but that seems pretty hacky.
Assignee | ||
Comment 9•7 years ago
|
||
I tried that in https://github.com/escapewindow/scriptworker/commit/53a95c1df4e6c6a5bf55c5cffb053ba8dae5d107 . It looks like that won't work, because we calculate utcnow() for every datestring, so things like the expires datestring is off by a couple milliseconds: AssertionError: [('change', 'payload.artifacts.public.expires', ('2018-09-04T17:23:33.110Z', '2018-09-04T17:23:33.108Z'))] We can proceed in a few ways: - CoT can completely ignore datestrings - CoT can fuzzy match datestrings - JSON-e can allow for $fromNow to have an overridden `now`. The decision task should set this when it specifies `created` time, and build the others from that date, rather than keep calling utcnow(). Then CoT can rebuild the datestrings from the `created` time.
Comment 10•7 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #9) > - JSON-e can allow for $fromNow to have an overridden `now`. The decision > task should set this when it specifies `created` time, and build the others > from that date, rather than keep calling utcnow(). Then CoT can rebuild the > datestrings from the `created` time. This should already be the case, for exactly this reason -- Brian, can you take a look?
Flags: needinfo?(dustin) → needinfo?(bstack)
Comment 11•7 years ago
|
||
We've added that to json-e as of a few days ago, but tools/th need to be updated to make use of it afaict. I'll add this to my to-do for my second round of updates to those sites.
Flags: needinfo?(bstack)
See Also: → 1396969
Comment 12•7 years ago
|
||
Ah, so if I understand, Aki's regeneration of the action task is correct (basing the times on the same "now") but the original generation of the action task is not?
Comment 13•7 years ago
|
||
Oh, I misunderstood in that case. I think we need a new `now` for action tasks since the `now` will be when they were triggered? I admit I don't totally understand what's going on here.
Assignee | ||
Comment 14•7 years ago
|
||
I'm trying to regenerate an on-push decision task atm. So, I'm guessing mozilla-taskcluster isn't pinning 'now', so `task.created` and `task.payload.artifacts.public.expires` are built on different datetime.datetime.utcnow() calls. What I would like to have happen is for a) mozilla-taskcluster and all other things that create these json-e template tasks to pin `now`, then base all $fromNow calculations from `now`. Then CoT can use `task.created` to regenerate all datestrings deterministically. b) for json-e to allow for overriding `now` in general. I couldn't figure out how, so all my $fromNow calls called `datetime.datetime.utcnow()`, rather than letting me specify what `now` means. I suppose I could tmpl = yaml.safe_load(path_to_taskcluster_yml) # recursively search tmpl for $fromNow, and change that to $hackedFromNow, so I can override the default `now`, or # use mock or something ugly, or I could jsone.render(tmpl, context, now=previous_datestring_now) Or override `now` in env or context. The latter seems much nicer than the mock / $hackedFromNow solution.
Assignee | ||
Comment 15•7 years ago
|
||
Now that we [will soon] have the ability to rebuild action tasks from their task.extra.action + the decision task's actions.json, we should add that work to the json-e work for cotv2.
Updated•7 years ago
|
Attachment #8939707 -
Flags: review?(dustin) → review+
Assignee | ||
Updated•7 years ago
|
Keywords: leave-open
Comment 17•7 years ago
|
||
Pushed by asasaki@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/5ce40a4ffe44 add task.extra.tasks_for to help rebuild the decision task. r=dustin
Assignee | ||
Comment 19•7 years ago
|
||
Dustin, Brian, I'm able to now verify an hg-push decision task via json-e. For actions, I'm hitting a couple issues. I'm hoping this is clear, but I'm happy to clarify. First, `task.payload.env.ACTION_TASK_GROUP_ID` doesn't match `context['action']['taskGroupId']`. We use `action.taskGroupId` in 4 places in `.taskcluster.yml` [1], and this the env var is the only place that differs. I'm guessing this is because the action task's taskGroupId is the decision task's taskId, but the graph that the action task spawns has a taskGroupId of the action task's taskId. Here's an example action task that shows the different taskGroupIds: [2]. ACTION_TASK_ID is null, ACTION_TASK_GROUP_ID is TVwCEBYSQ3Wzsdpjht2_Zw, but extra.action.context.taskGroupId is B2WpjpnRSiqomo3VgChfLg . Do you have a recommendation on how to fix? Maybe set this to `ownTaskId`? My current fix [3] is pretty ugly: render the json-e, then if it's an action task, munge `task.payload.env.ACTION_TASK_GROUP_ID`. [1] https://hg.mozilla.org/mozilla-central/file/9099a6ed993f/.taskcluster.yml#l104 [2] https://queue.taskcluster.net/v1/task/TVwCEBYSQ3Wzsdpjht2_Zw [3] https://github.com/escapewindow/scriptworker/blob/a3a9bb9eba2706579c3457926d13b5b3513531c7/scriptworker/cot/verify.py#L1200-L1205 Second, `{$json: {$eval: 'DICTIONARY_NAME'}}` results in an unsorted string. We do this twice that I've noticed so far [4][5]. Is there some way we can get sorted output so it's reproducible? Otherwise I may have to special case these env vars, `json.loads()` them, compare the loaded dicts, and ignore them in the comparison. The less I need to special case the better; this is currently looking fairly fragile against future changes. [6] [4] https://hg.mozilla.org/mozilla-central/file/9099a6ed993f/.taskcluster.yml#l106 [5] https://hg.mozilla.org/mozilla-central/file/9099a6ed993f/.taskcluster.yml#l104 [6] https://github.com/escapewindow/scriptworker/blob/a3a9bb9eba2706579c3457926d13b5b3513531c7/scriptworker/cot/verify.py#L1167-L1196
Flags: needinfo?(dustin)
Flags: needinfo?(bstack)
Assignee | ||
Comment 20•7 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #19) > Second, `{$json: {$eval: 'DICTIONARY_NAME'}}` results in an unsorted string. Attempt at a fix: https://github.com/taskcluster/json-e/pull/221
Assignee | ||
Comment 21•7 years ago
|
||
Looks like I answered my own questions. Thank you rubber ducky! :) As of e81a019eb8b5d0218e352ef191e12d2ed35694c0 in https://github.com/escapewindow/scriptworker/commits/cotv2 , I'm able to verify decision, action, and cron tasks. I removed the ACTION_TASK_GROUP_ID hack and added a temporary workaround until https://github.com/taskcluster/json-e/pull/221 lands and rolls out. I did hit another issue: when rebuilding a cron task definition, task['tags'] is empty, so the rebuilt definition doesn't contain a 'tags' key. The actual task has a task['tags'] of {}. Any recommendations on how to deal with this? [1] [1] https://github.com/escapewindow/scriptworker/blob/e81a019eb8b5d0218e352ef191e12d2ed35694c0/scriptworker/cot/verify.py#L1217-L1220
Flags: needinfo?(dustin)
Flags: needinfo?(bstack)
Assignee | ||
Comment 22•7 years ago
|
||
Landed these on maple to test. It resulted in https://tools.taskcluster.net/groups/cecmymuWTM6Xsr9GWvrE5w/tasks/N2QNWi40Qku05WKbLjHsMQ/details (action) and https://tools.taskcluster.net/groups/K0ZOy9D-QvSf9e_Hu25ssA/tasks/K0ZOy9D-QvSf9e_Hu25ssA/details (cron); these are cotv2-verifiable as of https://github.com/escapewindow/scriptworker/commit/e81a019eb8b5d0218e352ef191e12d2ed35694c0 .
Attachment #8940495 -
Flags: review?(dustin)
Attachment #8940495 -
Flags: review?(bstack)
Assignee | ||
Comment 23•7 years ago
|
||
Working, with 100% test coverage: https://github.com/mozilla-releng/scriptworker/compare/master...escapewindow:cotv2?expand=1 CoT v1 still works: verify_cot --task-type signing --cleanup -- ah37hplCR1GcwuxUPPl7VQ (m-b nightly signing task) TODO: - land https://bugzilla.mozilla.org/attachment.cgi?id=8940495&action=edit - wait for treeherder to update json-e https://whatsdeployed.io/?owner=mozilla&repo=treeherder&name[]=Stage&url[]=https://treeherder.allizom.org/revision.txt&name[]=Prod&url[]=https://treeherder.mozilla.org/revision.txt - rebase branch to be more reviewable - get review
Comment 24•7 years ago
|
||
Comment on attachment 8940495 [details] [diff] [review] task.extra.cron + task.payload.env.ACTION_TASK_GROUP_ID Sorry I didn't reply to the NI till now. I was confused about what you were asking until I saw this patch. It's been a while since I looked at actions, but won't this change the semantics of what task_group_id is in an action task[0]? In [1] it is defined as the task group the action was triggered for. Mostly I'm trying to remember why we have them as possibly different values but I _think_ there was a reason. [0] https://dxr.mozilla.org/mozilla-central/source/taskcluster/mach_commands.py#217 [1] https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/actions/registry.py#118
Comment 25•7 years ago
|
||
Comment on attachment 8940495 [details] [diff] [review] task.extra.cron + task.payload.env.ACTION_TASK_GROUP_ID After talking in irc, I think this makes sense!
Attachment #8940495 -
Flags: review?(bstack) → review+
Assignee | ||
Comment 26•7 years ago
|
||
Interesting. I think the confusion here is because: - release promotion actions create a new taskGroupId based on the action taskId - other actions use the existing taskGroupId based on the action taskGroupId (the decision task's taskId). https://tools.taskcluster.net/groups/e5cO0K3sSoSMl0bH3JfHsQ/tasks/L-d1CNGUQGuOs_-wckujtw/details is an 'Add new jobs' custom action task. It sets "ACTION_TASK_GROUP_ID": "L-d1CNGUQGuOs_-wckujtw", or `ownTaskId`, per the above patch. But https://tools.taskcluster.net/groups/L-d1CNGUQGuOs_-wckujtw doesn't exist; it created https://tools.taskcluster.net/groups/e5cO0K3sSoSMl0bH3JfHsQ/tasks/KL6JMqr8Rc2TuBENqi-RvQ/details in the same taskGroupId as the other tasks. (This is the desired behavior). My reading: - setting `ACTION_TASK_GROUP_ID` to `ownTaskId` doesn't seem to affect what taskGroupId the spawned tasks use... the release promotion tasks use the action task's taskId; other action tasks use the decision task's taskId. - setting `ACTION_TASK_GROUP_ID` to `ownTaskId` does seem to help with cotv2 verification. Which means the patch is either harmless or beneficial. Testing other actions did surface a couple of cotv2 verification holes. Working on those.
Updated•7 years ago
|
Attachment #8940495 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 28•7 years ago
|
||
Comment on attachment 8940495 [details] [diff] [review] task.extra.cron + task.payload.env.ACTION_TASK_GROUP_ID https://hg.mozilla.org/integration/mozilla-inbound/rev/4936907a3a91da2ca0908c31c66d6297bcde472a
Comment 30•7 years ago
|
||
Pushed by asasaki@mozilla.com https://hg.mozilla.org/integration/mozilla-inbound/rev/4936907a3a91da2ca0908c31c66d6297bcde472a task.extra.cron + task.payload.env.ACTION_TASK_GROUP_ID. r=bstack r=dustin
Flags: needinfo?(aki)
Updated•7 years ago
|
Flags: needinfo?(aki)
Updated•7 years ago
|
Attachment #8942416 -
Flags: review?(jlorenzo) → review+
Updated•7 years ago
|
Attachment #8944125 -
Flags: review?(mtabara) → review+
Updated•7 years ago
|
Attachment #8944126 -
Flags: review?(mtabara) → review+
Updated•7 years ago
|
Attachment #8944127 -
Flags: review?(mtabara) → review+
Updated•7 years ago
|
Attachment #8944129 -
Flags: review?(nthomas) → review+
Updated•7 years ago
|
Attachment #8943856 -
Flags: review?(jlorenzo) → review+
Comment 37•7 years ago
|
||
Comment on attachment 8944128 [details] [diff] [review] bump pushapk scriptworker to scriptworker 8.0.0 Review of attachment 8944128 [details] [diff] [review]: ----------------------------------------------------------------- r+ modulo the missing pushapkscript. Feel free to update the deps pointed out, if you're sure mozapkpublisher can make a TLS connection to Google Play. Otherwise, let's change them in a follow up. ::: modules/pushapk_scriptworker/manifests/init.pp @@ +32,2 @@ > 'chardet==3.0.4', > + 'cryptography==2.1.4', Nit: I haven't tested if such a version bump would break mozapkpublisher. On the other had, I would love to update the deps. They are about a year old now. If you tested that change, let's do it now. Otherwise, I'd prefer a follow up. @@ -41,5 @@ > - 'multidict==2.1.6', > - 'oauth2client==4.1.1', > - 'pexpect==4.2.1', > - 'ptyprocess==0.5.1', > - 'pushapkscript==0.4.1', Error: pushapkscript disappeared from the list :) @@ +45,5 @@ > + 'oauth2client==4.1.2', > + 'pexpect==4.3.1', > + 'ptyprocess==0.5.2', > + 'pyOpenSSL==17.5.0', > + 'pyasn1==0.4.2', Same comment as cryptography. Please see pyasn1-module too.
Attachment #8944128 -
Flags: review?(jlorenzo) → review+
Assignee | ||
Comment 38•7 years ago
|
||
(In reply to Johan Lorenzo [:jlorenzo] from comment #37) > Comment on attachment 8944128 [details] [diff] [review] > bump pushapk scriptworker to scriptworker 8.0.0 > > Review of attachment 8944128 [details] [diff] [review]: > ----------------------------------------------------------------- > > r+ modulo the missing pushapkscript. Feel free to update the deps pointed > out, if you're sure mozapkpublisher can make a TLS connection to Google > Play. Otherwise, let's change them in a follow up. Ok. And oops good catch re: pushapkscript =\
Assignee | ||
Comment 39•7 years ago
|
||
Rolled out.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment 40•7 years ago
|
||
I'm sorry for the review. It turns out it broke today's push-apk job[1]. I fixed it up by upgrading the deps to the version you initially put, on my puppet environment. The job passed with them[2]. I then pushed to: * default https://hg.mozilla.org/build/puppet/rev/042aa21c38096dce60c1d78784ca8bb653424da1 * production https://hg.mozilla.org/build/puppet/rev/576d129d0a9ff2bb44ee3309b3144b187a74e928 [1] https://queue.taskcluster.net/v1/task/FHtuk9B0QseKw9wPLiwVew/runs/0/artifacts/public/logs/live_backing.log [2] https://public-artifacts.taskcluster.net/FHtuk9B0QseKw9wPLiwVew/1/public/logs/live_backing.log
Comment 41•7 years ago
|
||
It looks like the fuzzy task matching isn't quite smart enough to handle the packages (and presumably toolchains) that get passed via an environment variable. If a package or toolchain is retriggered via treeherder, the downstream tasks will have an environment the differs in MOZ_TOOLCHAINS or DOCKER_IMAGE_PACKAGES environment variables.
Assignee | ||
Comment 42•7 years ago
|
||
(In reply to Tom Prince [:tomprince] from comment #41) > It looks like the fuzzy task matching isn't quite smart enough to handle the > packages (and presumably toolchains) that get passed via an environment > variable. If a package or toolchain is retriggered via treeherder, the > downstream tasks will have an environment the differs in MOZ_TOOLCHAINS or > DOCKER_IMAGE_PACKAGES environment variables. I think that means we need to retrigger via the custom action rather than treeherder's retrigger (related: bug 1420482). Fuzzy task matching is pretty limited and noisy, and I will be happy to see it go away in bug 1432364.
Comment 43•7 years ago
|
||
Pushed by mozilla@hocat.ca: https://hg.mozilla.org/comm-central/rev/57678a6eaade Add extra details to help Chain-of-Trust rebuild the decision task.
Comment 44•7 years ago
|
||
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•