Closed Bug 1526979 Opened 10 months ago Closed 9 months ago

Configure on-push hooks with ci-admin

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

Attachments

(4 files, 1 obsolete file)

..protected behind some feature so that we can turn them on one branch at a time.

Per discussion with Tom, I'm going to try to implement this similar to how we implement crons; that is, a short hook definition creates a quick-running task which interprets .taskcluster.yml and creates the decision task.

Depends on: 1527416

I have had some preliminary success:

  • hook template in ci-configuration
  • installed as a hook by ci-admin, with some values (repo URL, level, etc.) supplied at that time
  • hook payload runs an embedded python script in the python:3 image
  • script downloads .taskcluster.yml, renders, and creates task

To Do:

  • build a docker image encapsulating this Python script (maybe this should be part of ci-taskgraph??)
  • support repos that pull .taskcluster.yml from another repo (try, ci-configuration)
  • double-check scopes are restricted appropriately
  • support for re-running with a triggerHook call

(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #2)

To Do:

  • build a docker image encapsulating this Python script (maybe this should be part of ci-taskgraph??)

I think either ci-admin or ci-config would be a better home (probably ci-admin for now).

  • support repos that pull .taskcluster.yml from another repo (try, ci-configuration)

I'm not sure what you mean about try, but supporting out-of-repo .taskcluster.yml (such as for ci-configuration) can probably be postponed till after we get the rest in production.

Writing try was an error.

I'm building the docker image in ci-configuration (with the usual shell-script + Dockerfile directory combination).

All push tasks will run on an hg-push workerType. Only the hooks have permission to create tasks there -- no repo:hg.mozilla.org:* roles should have that permission. The tasks do not execute arbitrary code and take a very constrained input, so I think this sharing is OK.

Still do do:

  • support re-running with a triggerHook call (set the triggerSchema correctly)
  • check that only expected things can create tasks on hg-push

This includes a hook task template, as well as a small Python script embedded
in a Docker image that creates decision tasks based on .taskcluster.yml

This is currently gated behind a temporary feature in projects.yml. We can
slowly add this feature to various projects as we disable them in
mozilla-taskcluster, until everything is moved over.

This adds support for "bindings" in the Hooks API, and uses it to support hooks
that run when a push is generated, with the hook template based on a file in
ci-configuration.

If you have time for a 30% review of this, I'd appreciate it. There are a few XXX where things are not yet done, but they're minor.

Flags: needinfo?(mozilla)

Looks good!

Flags: needinfo?(mozilla)

Once this lands and sticks, I think we should role it out as follows:

  1. Enable it on all hg.m.o/ci repos
  2. Determine how to roll back changes. I think this is:
    a) close the tree
    b) re-enable in ci-config
    c) reset push-id in mozilla-taskcluster db (this may depend on whether the database gets updated for disabled repos)
    d) disable hooks
    e) open the tree
  3. Enable it on a project repo (jamun?) and test hooks and rolling back.
    • test CoT on the pushes
  4. On a Monday or Tuesday morning (probably Tue to avoid releases), enable hooks for autoland
    • coordinate with sherrifs to let in a few pushes and then close the tree again
    • verify decision tasks run as expected and test cot
    • re-open trees and monitor
  5. After a day or two, enable hooks for try.
  6. The following week, enable hooks for remaining gecko/comm reposiotires
  7. Enable hooks for remaining non-gecko trees.

Sounds like a good plan! I can land something in mozilla-taskcluster that will cause it to continue to track the repo while not actually starting tasks for projects with this feature. That will avoid the dangerous work of resetting push-ids.

I'm not worried about the rest, but a phased roll-out is a good idea nonetheless.

Step 1 is complete, including restarting mozilla-taskcluster.

I'll check out step 2 next.

ci-admin's config is:

{
_id: ObjectId("5c5cd5210ef8e6f9b5902f8e"),
id: "3673abd8bee52ae6d10bc8aa3936e6cb",
alias: "ci-admin",
url: "https://hg.mozilla.org/ci/ci-admin/",
lastPushId: 53,
lastChangeset: "c78adb7fd28d2f0d7c3f3c9bc1af773944ee7809"
}

and those last* properties are up-to-date. The only decision task showing in treeherder for that push is
https://tools.taskcluster.net/groups/VAtyDHBTRCGKHgD7CbCOJw/tasks/VAtyDHBTRCGKHgD7CbCOJw/details
which was created by
https://tools.taskcluster.net/groups/Yor_hlLcQkW9j368G-uBbw/tasks/Yor_hlLcQkW9j368G-uBbw/details
so I think there's no need for step 2c. Looking at the source, it appears to poll everything in the "repositories" Mongo table, regardless of configuration. Which is great for purposes of rolling back!

So, I'm going to roll back ci-taskgraph-try. I'll skip the tree closure / opening.

a) close tree (skipped)
b) land change to revert config in projects.yml
c) ci-admin apply
d) restart dynos for mozilla-taskcluster
e) open tree

I did that for ci-taskgraph-try, and mozilla-taskcluster helpfully created
https://tools.taskcluster.net/tasks/Ufiap35oQNGb-3Z-VB9_mw

so I think this is verified. If I can find a few minutes to rub together today, I'll work on the next step.

https://hg.mozilla.org/projects/jamun/rev/d09dcd97b14ee71c16bd45068157e865766877f4
Bug 1526979: import another trivial change from mozilla-central to test pushes

https://treeherder.mozilla.org/#/jobs?repo=jamun&revision=d09dcd97b14ee71c16bd45068157e865766877f4 worked to run the decision task, but the necessary taskcluster/* changes aren't present there, so no ability to check scriptworker. Tom, does it make sense for me to just bring jamun up to date with esr60?

(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #18)

https://treeherder.mozilla.org/#/jobs?repo=jamun&revision=d09dcd97b14ee71c16bd45068157e865766877f4 worked to run the decision task, but the necessary taskcluster/* changes aren't present there, so no ability to check scriptworker. Tom, does it make sense for me to just bring jamun up to date with esr60?

Sure, though you don't need to actually run scriptworker, just have a success decision task.

verify_cot --cot-product firefox --task-type decision <task-id> --cleanup will tell you if the decision task passes Chain-of-Trust.

Thanks for that command -- I was trying to figure out where to look for it :)

$ verify_cot --cot-product firefox --task-type decision Vme5-8mKQECUWTVDpcO5Bg --cleanup
..
INFO:scriptworker.cot.verify:Good.
..
$ echo $?
0

so I think we're in good shape.

I sent an email announcing a deployment to autoland / inbound next Tuesday.

(that deployment is moved up to today)

Attachment #9049188 - Attachment is obsolete: true

Thanks Tom :)

In general, if the decision task runs at all, then the issue is outside the scope of impact from this bug.

Creating Hook=hg-push/oak
Creating Hook=hg-push/ash
Creating Hook=hg-push/elm
Creating Hook=hg-push/mozilla-inbound
Creating Hook=hg-push/maple
Creating Hook=hg-push/birch
Creating Hook=hg-push/cedar
Creating Hook=hg-push/pine
Creating Hook=hg-push/larch
Creating Hook=hg-push/mozilla-central
Creating Role=hook-id:hg-push/mozilla-inbound
Creating Role=hook-id:hg-push/mozilla-central
Creating Role=hook-id:hg-push/maple
Creating Role=hook-id:hg-push/pine
Creating Role=hook-id:hg-push/ash
Creating Role=hook-id:hg-push/cedar
Creating Role=hook-id:hg-push/elm
Creating Role=hook-id:hg-push/larch
Creating Role=hook-id:hg-push/oak
Creating Role=hook-id:hg-push/birch

Depends on: 1534283

I'll do the release branches tomorrow.

Scheduled to land for try on Thursday.

Then we just need to wrap up comm and nss, I think.

  • version-control-tools

The remaining list is:

  • mozilla-release, mozilla-esr60, mozilla-beta -- should be OK to go
  • comm-central -- should be OK to go
  • comm-esr60, comm-beta -- waiting for uplift in bug 1525072
  • try-comm-central -- waiting on bug 1534204
  • try -- scheduled for thurs per dev.platform post
  • stylo-try, stylo -- turn off pushes for these (2+ years!)
  • nss, nss-try -- ready to go per bug 1525946
  • version-control-tools -- ready to go per bug 1525950

I'll email folks for the non-Firefox repos as I turn things on. And as we've seen, reverting is easy (revert change in ci-configuration, run ci-admin to remove hooks, restart mozilla-taskcluster to load new config).

[edit: comm-esr60, comm-beta not ready to go]

Landed:

  • mozilla-release, mozilla-esr60, mozilla-beta -- should be OK to go
  • comm-central -- should be OK to go

Landed:

  • try
  • nss, nss-try
  • version-control-tools (several days ago)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4b57a3fde1db3d9c36f1a77d270815f498034b1c came in during the moments when nothing was watching, so I triggered it manually with payload

payload:
  data:
    source: serve
    pushlog_pushes:
    - time: 1552578644
      push_full_json_url: https://hg.mozilla.org/try/json-pushes?version=2&full=1&startID=341610&endID=341611
      pushid: 341611
      push_json_url: https://hg.mozilla.org/try/json-pushes?version=2&startID=341610&endID=341611
      user: ytausky@mozilla.com
    heads:
    - 4b57a3fde1db3d9c36f1a77d270815f498034b1c
    repo_url: https://hg.mozilla.org/try
  type: changegroup.1                                                                                                                                                                                                                                         

based on what I saw in pulse inspector.

nss-try failed with "does not have sufficient scopes and are missing the following scopes:\n\n\nqueue:scheduler-id:nss-level-1\n"

So, the remainder is

  • comm-esr60, comm-beta -- waiting for uplift in bug ..??
  • try-comm-central -- waiting on bug 1534204

Jorg, is there anything I can do to help those things along?

Flags: needinfo?(jorgk)

I'm sure you can ;-) - This is the first time I see this bug. In comment #32 we have "Landed: comm-central -- should be OK to go". Umm, what has landed where? I usually handle uplifts for c-* repositories unless they need a heavy rebase, but I'd need to see the C-C landing first.

Flags: needinfo?(jorgk)

The in-repo changes were landed in bug 1525072. The "landed" to which I referred in comment 32 was enabling the functionality that required those changes (an out-of-tree change).

I thought bug 1525072 comment 20 was part of an uplift, and it looks from bug 1525072 comment 28 like that's done. A look at https://hg.mozilla.org/releases/comm-beta/file/tip/.taskcluster.yml suggests it's uplifted to comm-beta, too. So perhaps that's ready to go! I will try it and we'll see what happens.

For try-comm-central, I suppose the thing I can help with is to make a patch for my suggestion :)

Landed for comm-beta and comm-esr60.

Yes, bug 1525072 was landed on c-c, c-b and c-esr60.

Sorry, I'm not a releng guy, so I don't understand what would be needed for try-comm-central.

this isn't actually blocking -- the try issues in that bug were already present (and hopefully now fixed). So I've landed the change for this repo too.

Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Blocks: 1535635
You need to log in before you can comment on or make changes to this bug.