Configure on-push hooks with ci-admin
Categories
(Firefox Build System :: Task Configuration, task)
Tracking
(Not tracked)
People
(Reporter: dustin, Assigned: dustin)
References
Details
Attachments
(4 files, 1 obsolete file)
..protected behind some feature so that we can turn them on one branch at a time.
| Assignee | ||
Comment 1•7 years ago
|
||
Per discussion with Tom, I'm going to try to implement this similar to how we implement crons; that is, a short hook definition creates a quick-running task which interprets .taskcluster.yml and creates the decision task.
| Assignee | ||
Comment 2•7 years ago
|
||
I have had some preliminary success:
- hook template in ci-configuration
- installed as a hook by ci-admin, with some values (repo URL, level, etc.) supplied at that time
- hook payload runs an embedded python script in the
python:3image - script downloads .taskcluster.yml, renders, and creates task
To Do:
- build a docker image encapsulating this Python script (maybe this should be part of ci-taskgraph??)
- support repos that pull .taskcluster.yml from another repo (try, ci-configuration)
- double-check scopes are restricted appropriately
- support for re-running with a triggerHook call
Comment 3•7 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #2)
To Do:
- build a docker image encapsulating this Python script (maybe this should be part of ci-taskgraph??)
I think either ci-admin or ci-config would be a better home (probably ci-admin for now).
- support repos that pull .taskcluster.yml from another repo (try, ci-configuration)
I'm not sure what you mean about try, but supporting out-of-repo .taskcluster.yml (such as for ci-configuration) can probably be postponed till after we get the rest in production.
| Assignee | ||
Comment 4•7 years ago
|
||
Writing try was an error.
I'm building the docker image in ci-configuration (with the usual shell-script + Dockerfile directory combination).
All push tasks will run on an hg-push workerType. Only the hooks have permission to create tasks there -- no repo:hg.mozilla.org:* roles should have that permission. The tasks do not execute arbitrary code and take a very constrained input, so I think this sharing is OK.
Still do do:
- support re-running with a triggerHook call (set the triggerSchema correctly)
- check that only expected things can create tasks on hg-push
| Assignee | ||
Comment 5•7 years ago
|
||
Here's a successful hg-push task -- https://taskcluster-web.netlify.com/tasks/RKlll4q8TVq1l62T1ot7pQ
| Assignee | ||
Comment 6•7 years ago
|
||
This includes a hook task template, as well as a small Python script embedded
in a Docker image that creates decision tasks based on .taskcluster.yml
This is currently gated behind a temporary feature in projects.yml. We can
slowly add this feature to various projects as we disable them in
mozilla-taskcluster, until everything is moved over.
| Assignee | ||
Comment 7•7 years ago
|
||
This adds support for "bindings" in the Hooks API, and uses it to support hooks
that run when a push is generated, with the hook template based on a file in
ci-configuration.
| Assignee | ||
Comment 8•7 years ago
|
||
If you have time for a 30% review of this, I'd appreciate it. There are a few XXX where things are not yet done, but they're minor.
Comment 10•7 years ago
|
||
Once this lands and sticks, I think we should role it out as follows:
- Enable it on all hg.m.o/ci repos
- Determine how to roll back changes. I think this is:
a) close the tree
b) re-enable in ci-config
c) reset push-id in mozilla-taskcluster db (this may depend on whether the database gets updated for disabled repos)
d) disable hooks
e) open the tree - Enable it on a project repo (jamun?) and test hooks and rolling back.
- test CoT on the pushes
- On a Monday or Tuesday morning (probably Tue to avoid releases), enable hooks for autoland
- coordinate with sherrifs to let in a few pushes and then close the tree again
- verify decision tasks run as expected and test cot
- re-open trees and monitor
- After a day or two, enable hooks for try.
- The following week, enable hooks for remaining gecko/comm reposiotires
- Enable hooks for remaining non-gecko trees.
| Assignee | ||
Comment 11•7 years ago
|
||
Sounds like a good plan! I can land something in mozilla-taskcluster that will cause it to continue to track the repo while not actually starting tasks for projects with this feature. That will avoid the dangerous work of resetting push-ids.
I'm not worried about the rest, but a phased roll-out is a good idea nonetheless.
| Assignee | ||
Comment 12•7 years ago
|
||
Step 1 is complete, including restarting mozilla-taskcluster.
I'll check out step 2 next.
| Assignee | ||
Comment 13•7 years ago
|
||
ci-admin's config is:
{
_id: ObjectId("5c5cd5210ef8e6f9b5902f8e"),
id: "3673abd8bee52ae6d10bc8aa3936e6cb",
alias: "ci-admin",
url: "https://hg.mozilla.org/ci/ci-admin/",
lastPushId: 53,
lastChangeset: "c78adb7fd28d2f0d7c3f3c9bc1af773944ee7809"
}
and those last* properties are up-to-date. The only decision task showing in treeherder for that push is
https://tools.taskcluster.net/groups/VAtyDHBTRCGKHgD7CbCOJw/tasks/VAtyDHBTRCGKHgD7CbCOJw/details
which was created by
https://tools.taskcluster.net/groups/Yor_hlLcQkW9j368G-uBbw/tasks/Yor_hlLcQkW9j368G-uBbw/details
so I think there's no need for step 2c. Looking at the source, it appears to poll everything in the "repositories" Mongo table, regardless of configuration. Which is great for purposes of rolling back!
So, I'm going to roll back ci-taskgraph-try. I'll skip the tree closure / opening.
| Assignee | ||
Comment 14•7 years ago
|
||
a) close tree (skipped)
b) land change to revert config in projects.yml
c) ci-admin apply
d) restart dynos for mozilla-taskcluster
e) open tree
I did that for ci-taskgraph-try, and mozilla-taskcluster helpfully created
https://tools.taskcluster.net/tasks/Ufiap35oQNGb-3Z-VB9_mw
so I think this is verified. If I can find a few minutes to rub together today, I'll work on the next step.
| Assignee | ||
Comment 15•7 years ago
|
||
| Assignee | ||
Comment 16•7 years ago
|
||
| Assignee | ||
Comment 17•7 years ago
|
||
| Assignee | ||
Comment 18•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=jamun&revision=d09dcd97b14ee71c16bd45068157e865766877f4 worked to run the decision task, but the necessary taskcluster/* changes aren't present there, so no ability to check scriptworker. Tom, does it make sense for me to just bring jamun up to date with esr60?
Comment 19•7 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #18)
https://treeherder.mozilla.org/#/jobs?repo=jamun&revision=d09dcd97b14ee71c16bd45068157e865766877f4 worked to run the decision task, but the necessary taskcluster/* changes aren't present there, so no ability to check scriptworker. Tom, does it make sense for me to just bring jamun up to date with esr60?
Sure, though you don't need to actually run scriptworker, just have a success decision task.
verify_cot --cot-product firefox --task-type decision <task-id> --cleanup will tell you if the decision task passes Chain-of-Trust.
| Assignee | ||
Comment 20•7 years ago
|
||
Thanks for that command -- I was trying to figure out where to look for it :)
$ verify_cot --cot-product firefox --task-type decision Vme5-8mKQECUWTVDpcO5Bg --cleanup
..
INFO:scriptworker.cot.verify:Good.
..
$ echo $?
0
so I think we're in good shape.
| Assignee | ||
Comment 21•7 years ago
|
||
I sent an email announcing a deployment to autoland / inbound next Tuesday.
| Assignee | ||
Comment 22•7 years ago
|
||
| Assignee | ||
Comment 23•7 years ago
|
||
(that deployment is moved up to today)
Updated•7 years ago
|
| Assignee | ||
Comment 24•7 years ago
|
||
Deployed just to autoland, and saw a successful decision task
https://tools.taskcluster.net/groups/LPxk9rVjTYqbrcj8a5MQow/tasks/LPxk9rVjTYqbrcj8a5MQow/runs/0/logs/public%2Flogs%2Flive.log
Comment 25•7 years ago
|
||
Hi Dustin,
We've had two failed decision tasks on Autoland for two different pushes; reruning the jobs was unsuccessful:
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=pending%2Crunning%2Ctestfailed%2Cbusted%2Cexception&revision=904bfb7c423ff05ad8ca6036c6abf90c277bb609&selectedJob=232592511
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=232592511&repo=autoland&lineNumber=5428
and
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=232513606&repo=autoland&lineNumber=1058
Might this be related this change? Can you please take a look?
Comment 26•7 years ago
|
||
(In reply to Cristian Brindusan [:cbrindusan] from comment #25)
Hi Dustin,
We've had two failed decision tasks on Autoland for two different pushes; reruning the jobs was unsuccessful:
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=pending%2Crunning%2Ctestfailed%2Cbusted%2Cexception&revision=904bfb7c423ff05ad8ca6036c6abf90c277bb609&selectedJob=232592511https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=232592511&repo=autoland&lineNumber=5428
This is due to a typo in Bug 1532783.
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=232513606&repo=autoland&lineNumber=1058
This looks like an intermittent failure (the URL that failed is now accessible). The retries failed because they don't handle decision tasks gracefully.
| Assignee | ||
Comment 27•7 years ago
|
||
Thanks Tom :)
In general, if the decision task runs at all, then the issue is outside the scope of impact from this bug.
| Assignee | ||
Comment 28•7 years ago
|
||
Creating Hook=hg-push/oak
Creating Hook=hg-push/ash
Creating Hook=hg-push/elm
Creating Hook=hg-push/mozilla-inbound
Creating Hook=hg-push/maple
Creating Hook=hg-push/birch
Creating Hook=hg-push/cedar
Creating Hook=hg-push/pine
Creating Hook=hg-push/larch
Creating Hook=hg-push/mozilla-central
Creating Role=hook-id:hg-push/mozilla-inbound
Creating Role=hook-id:hg-push/mozilla-central
Creating Role=hook-id:hg-push/maple
Creating Role=hook-id:hg-push/pine
Creating Role=hook-id:hg-push/ash
Creating Role=hook-id:hg-push/cedar
Creating Role=hook-id:hg-push/elm
Creating Role=hook-id:hg-push/larch
Creating Role=hook-id:hg-push/oak
Creating Role=hook-id:hg-push/birch
| Assignee | ||
Comment 29•7 years ago
|
||
I'll do the release branches tomorrow.
Scheduled to land for try on Thursday.
Then we just need to wrap up comm and nss, I think.
Comment 30•7 years ago
|
||
- version-control-tools
| Assignee | ||
Comment 31•7 years ago
•
|
||
The remaining list is:
- mozilla-release, mozilla-esr60, mozilla-beta -- should be OK to go
- comm-central -- should be OK to go
- comm-esr60, comm-beta -- waiting for uplift in bug 1525072
- try-comm-central -- waiting on bug 1534204
- try -- scheduled for thurs per dev.platform post
- stylo-try, stylo -- turn off pushes for these (2+ years!)
- nss, nss-try -- ready to go per bug 1525946
- version-control-tools -- ready to go per bug 1525950
I'll email folks for the non-Firefox repos as I turn things on. And as we've seen, reverting is easy (revert change in ci-configuration, run ci-admin to remove hooks, restart mozilla-taskcluster to load new config).
[edit: comm-esr60, comm-beta not ready to go]
| Assignee | ||
Comment 32•7 years ago
|
||
Landed:
- mozilla-release, mozilla-esr60, mozilla-beta -- should be OK to go
- comm-central -- should be OK to go
| Assignee | ||
Comment 33•7 years ago
|
||
| Assignee | ||
Comment 34•7 years ago
|
||
Landed:
- try
- nss, nss-try
- version-control-tools (several days ago)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4b57a3fde1db3d9c36f1a77d270815f498034b1c came in during the moments when nothing was watching, so I triggered it manually with payload
payload:
data:
source: serve
pushlog_pushes:
- time: 1552578644
push_full_json_url: https://hg.mozilla.org/try/json-pushes?version=2&full=1&startID=341610&endID=341611
pushid: 341611
push_json_url: https://hg.mozilla.org/try/json-pushes?version=2&startID=341610&endID=341611
user: ytausky@mozilla.com
heads:
- 4b57a3fde1db3d9c36f1a77d270815f498034b1c
repo_url: https://hg.mozilla.org/try
type: changegroup.1
based on what I saw in pulse inspector.
nss-try failed with "does not have sufficient scopes and are missing the following scopes:\n\n\nqueue:scheduler-id:nss-level-1\n"
| Assignee | ||
Comment 35•7 years ago
|
||
So, the remainder is
- comm-esr60, comm-beta -- waiting for uplift in bug ..??
- try-comm-central -- waiting on bug 1534204
Jorg, is there anything I can do to help those things along?
Comment 36•7 years ago
|
||
I'm sure you can ;-) - This is the first time I see this bug. In comment #32 we have "Landed: comm-central -- should be OK to go". Umm, what has landed where? I usually handle uplifts for c-* repositories unless they need a heavy rebase, but I'd need to see the C-C landing first.
| Assignee | ||
Comment 37•7 years ago
|
||
The in-repo changes were landed in bug 1525072. The "landed" to which I referred in comment 32 was enabling the functionality that required those changes (an out-of-tree change).
I thought bug 1525072 comment 20 was part of an uplift, and it looks from bug 1525072 comment 28 like that's done. A look at https://hg.mozilla.org/releases/comm-beta/file/tip/.taskcluster.yml suggests it's uplifted to comm-beta, too. So perhaps that's ready to go! I will try it and we'll see what happens.
For try-comm-central, I suppose the thing I can help with is to make a patch for my suggestion :)
| Assignee | ||
Comment 38•7 years ago
|
||
Landed for comm-beta and comm-esr60.
Comment 39•7 years ago
|
||
Yes, bug 1525072 was landed on c-c, c-b and c-esr60.
Sorry, I'm not a releng guy, so I don't understand what would be needed for try-comm-central.
| Assignee | ||
Comment 40•7 years ago
|
||
- try-comm-central -- waiting on bug 1534204
this isn't actually blocking -- the try issues in that bug were already present (and hopefully now fixed). So I've landed the change for this repo too.
Description
•