Closed Bug 1443974 Opened 2 years ago Closed 2 years ago

Add ability to run chemspill-related jobs at a higher priority

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set

Tracking

(firefox61 fixed)

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: coop, Assigned: dustin)

References

Details

Attachments

(1 file)

Release tasks (including chemspills) already run at the highest priority, but the Try pushes used to fix those initial chemspill bugs do not. This could extend the time it takes us to get a chemspill release out the door.

Bug 1271677 suggests we already have the scopes to allow these jobs to run at an elevated priority, it's a matter of:

1. Handing out those scopes to a subset of useful people. I'd suggest relman + sheriffs.

2. Possibly having a way to post-facto adjust priorities for jobs that are already in the system but haven't been scheduled yet. e.g. Developer pushes a chemspill fix to Try as soon as it's ready and then sheriffs can go increase the priority on the resultant task graph. This avoids needing to figure out in advance which developers should get this scope.
I think the plan we developed in Austin was to grant higher-priority access to try (access is granted to projects, not to people, so we can't distinguish).  Then we'll add a tryselect option that will schedule the try jobs at that higher priority, and encourage people not to abuse it for non-chemspill issues.

Tasks are immutable by design, so we can't adjust priority after they're created.  We could potentially cancel the task and re-create it with a higher priority, but that seems quite a bit more complicated.

The only issue I see is, much like an hg push with "Bug XYZ: .." where XYZ is not public, this is a big red flag that the try push is "interesting" to Mal.  Mercurial makes a pulse message for every push, and it's not hard to set up a listener that looks at the tryselect config for each one.  I don't think there's any simple fix for this, since we operate in the open.  In past similar discussions we've reluctantly settled on that conclusion.

So, does this sound like a suitable fix?  I still need to look up the priorities available to each branch and figure out what to add to try -- should such try jobs outstrip just level-2, or level-3, or releases?
Assignee: nobody → dustin
Component: General → Task Configuration
Product: Taskcluster → Firefox Build System
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #1) 
> So, does this sound like a suitable fix?  I still need to look up the
> priorities available to each branch and figure out what to add to try --
> should such try jobs outstrip just level-2, or level-3, or releases?

The security issue is longstanding and I don't think we're going to solve it here. Getting fixes out the door faster helps more than a new architecture for now.

If we need to enable higher-priority access to Try to make this work, I think that's OK. If we can also restrict it to a subset of people, that would be better. Since we have true 24/7 sheriff coverage now, sheriffs could be in the loop for chemspills and could be the ones who push patches to Try for developers at highest priority.

My concern here is abuse. If we can't limit priority bumps to a subset of people, we need some sort of logging/tracking of priority escalation so I can have words with the cheaters.
(In reply to Chris Cooper [:coop] from comment #2)
> (In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #1) 
> > should such try jobs outstrip just level-2, or level-3, or releases?

...and to explicitly answer this question, I think priority escalation on Try should outstrip everything else *except* releases. 

If we're in a chemspill situation, we shouldn't be doing other releases simultaneously anyway, and this will prevent the aforementioned abuse scenario from interfering with non-chemspill releases.
All pushes to try are treated alike, so there's no mechanism to give one person's push different scopes than another.  We can certainly look for abuse in the pushlog, but I'm not especially worried -- everyone with push permissions already has 1,000 ways to abuse, and for the most part don't, so I think we'll be OK with #1,001.
OK, here's the current allowed levels for each workerType:

workerType                                 | v-low  . low  . medium  . high  . v-high  . highest 
aws-provisioner-v1/ami-test*               |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/android-api-*           |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/b2gbuild*               |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/b2gtest*                |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/balrog                  |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/dbg-*                   |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/desktop-test*           |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/flame-kk*               |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/gecko-1-*               |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/gecko-2-*               |  23    .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/gecko-3-*               |  3     .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/gecko-decision          |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/gecko-images            |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/gecko-misc              |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/gecko-symbol-upload     |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/gecko-t-*               |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/loan-1-*                |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/loan-t-*                |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/mulet-debug             |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/mulet-opt               |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/opt-*                   |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/rustbuild               |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/spidermonkey            |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/symbol-upload           |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/taskcluster-images      |  123   .  3   .  3      .  3    .  3      .  3      
aws-provisioner-v1/testdroid-device        |  23    .  3   .  3      .  3    .  3      .  3      
buildbot-bridge/buildbot-bridge            |  123   .  3   .  3      .  3    .  3      .  3      
dummy-test-provisioner/dummy-test-type     |  123   .  3   .  3      .  3    .  3      .  3      
gecko-t-tc-worker/*                        |  123   .  3   .  3      .  3    .  3      .  3      
localprovisioner/*                         |  123   .  3   .  3      .  3    .  3      .  3      
manual-packet/tc-worker-qemu-v1            |  123   .      .         .       .         .         
null-provisioner/buildbot                  |  123   .  3   .  3      .  3    .  3      .  3      
null-provisioner/buildbot-try              |  123   .  3   .  3      .  3    .  3      .  3      
null-provisioner/human-breakpoint          |  3     .  3   .  3      .  3    .  3      .  3      
packetnet/*                                |  123   .  3   .  3      .  3    .  3      .  3      
proj-autophone/*                           |  123   .  123 .  123    .       .         .         
proj-awfy/*                                |  123   .  123 .  123    .       .         .         
releng-hardware/gecko-b-*                  |  123   .  123 .  123    .  123  .  123    .         
releng-hardware/gecko-t-*                  |  123   .  3   .  3      .  3    .  3      .  3      
releng-hardware/my-talos                   |  123   .      .         .       .         .         
scl3-puppet/os-x-10-10-gw                  |  123   .  3   .  3      .  3    .  3      .  3      
scl3-puppet/os-x-build-gw                  |  123   .  3   .  3      .  3    .  3      .  3      
scriptworker-prov-v1/balrog-dev            |  123   .  123 .         .       .         .         
scriptworker-prov-v1/beetmoverworker-dev   |  123   .  123 .         .       .         .         
scriptworker-prov-v1/dep-pushapk           |  123   .  3   .  3      .  3    .  3      .  3      
scriptworker-prov-v1/depsigning            |  123   .  3   .  3      .  3    .  3      .  3      
scriptworker-prov-v1/dummy-worker-transpar |  3     .  3   .  3      .  3    .  3      .  3      
scriptworker-prov-v1/pushapk-v1            |  3     .  3   .  3      .  3    .  3      .  3      
scriptworker-prov-v1/shipit-dev            |  123   .      .         .       .         .         
scriptworker-prov-v1/signing-linux-dev     |  3     .  3   .  3      .  3    .  3      .  3      
tc-worker-provisioner/*                    |  123   .  3   .  3      .  3    .  3      .  3      
test-dummy-provisioner/*                   |  123   .  3   .  3      .  3    .  3      .  3
releng-hardware/gecko-b-*                  |  123   .  123 .  123    .  123  .  123    .         
 ^^ no such thing.. I removed this (it was also unnecessarily specified at every priority)
So, looking at that chart, I think what we could do is to replace everywhere that try is only allowed at "very-low" with allowing try at "low", but continue to create tasks at "very-low" by default.  The result would be

workerType                                 | v-low  . low  . medium  . high  . v-high  . highest 
aws-provisioner-v1/ami-test*               |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/android-api-*           |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/b2gbuild*               |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/b2gtest*                |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/balrog                  |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/dbg-*                   |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/desktop-test*           |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/flame-kk*               |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/gecko-1-*               |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/gecko-2-*               |  23    .  23  .  3      .  3    .  3      .  3          
aws-provisioner-v1/gecko-3-*               |  3     .  3   .  3      .  3    .  3      .  3          
aws-provisioner-v1/gecko-decision          |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/gecko-images            |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/gecko-misc              |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/gecko-symbol-upload     |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/gecko-t-*               |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/loan-1-*                |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/loan-t-*                |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/mulet-debug             |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/mulet-opt               |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/opt-*                   |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/rustbuild               |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/spidermonkey            |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/symbol-upload           |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/taskcluster-images      |  123   .  123 .  3      .  3    .  3      .  3          
aws-provisioner-v1/testdroid-device        |  23    .  23  .  3      .  3    .  3      .  3          
buildbot-bridge/buildbot-bridge            |  123   .  123 .  3      .  3    .  3      .  3          
dummy-test-provisioner/dummy-test-type     |  123   .  123 .  3      .  3    .  3      .  3          
gecko-t-tc-worker/*                        |  123   .  123 .  3      .  3    .  3      .  3          
localprovisioner/*                         |  123   .  123 .  3      .  3    .  3      .  3          
manual-packet/tc-worker-qemu-v1            |  123   .  123 .         .       .         .             
null-provisioner/buildbot                  |  123   .  123 .  3      .  3    .  3      .  3          
null-provisioner/buildbot-try              |  123   .  123 .  3      .  3    .  3      .  3          
null-provisioner/human-breakpoint          |  3     .  3   .  3      .  3    .  3      .  3          
packetnet/*                                |  123   .  123 .  3      .  3    .  3      .  3          
proj-autophone/*                           |  123   .  123 .  123    .       .         .             
proj-awfy/*                                |  123   .  123 .  123    .       .         .             
releng-hardware/gecko-t-*                  |  123   .  123 .  3      .  3    .  3      .  3          
releng-hardware/my-talos                   |  123   .      .         .       .         .             
scl3-puppet/os-x-10-10-gw                  |  123   .  3   .  3      .  3    .  3      .  3          
scl3-puppet/os-x-build-gw                  |  123   .  3   .  3      .  3    .  3      .  3          
scriptworker-prov-v1/balrog-dev            |  123   .  123 .         .       .         .             
scriptworker-prov-v1/beetmoverworker-dev   |  123   .  123 .         .       .         .             
scriptworker-prov-v1/dep-pushapk           |  123   .  123 .  3      .  3    .  3      .  3          
scriptworker-prov-v1/depsigning            |  123   .  123 .  3      .  3    .  3      .  3          
scriptworker-prov-v1/dummy-worker-transpar |  3     .  3   .  3      .  3    .  3      .  3          
scriptworker-prov-v1/pushapk-v1            |  3     .  3   .  3      .  3    .  3      .  3          
scriptworker-prov-v1/shipit-dev            |  123   .      .         .       .         .             
scriptworker-prov-v1/signing-linux-dev     |  3     .  3   .  3      .  3    .  3      .  3                                                                                              
tc-worker-provisioner/*                    |  123   .  123 .  3      .  3    .  3      .  3          
test-dummy-provisioner/*                   |  123   .  123 .  3      .  3    .  3      .  3

Once that's done, we can land some tryselect support for running try jobs at "low" instead of "very-low", and just not publicize it very widely.

Coop, does this sound good?
Flags: needinfo?(coop)
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #7)
> Once that's done, we can land some tryselect support for running try jobs at
> "low" instead of "very-low", and just not publicize it very widely.
> 
> Coop, does this sound good?

I am fine with this.
Flags: needinfo?(coop)
Duplicate of this bug: 1329235
OK, according to my calculations, these are the role changes necessary:

level 1
+ aws-provisioner-v1/ami-test*:low
- aws-provisioner-v1/ami-test*:very-low
+ aws-provisioner-v1/android-api-*:low
- aws-provisioner-v1/android-api-*:very-low
+ aws-provisioner-v1/b2gbuild*:low
- aws-provisioner-v1/b2gbuild*:very-low
+ aws-provisioner-v1/b2gtest*:low
- aws-provisioner-v1/b2gtest*:very-low
+ aws-provisioner-v1/balrog:low
- aws-provisioner-v1/balrog:very-low
+ aws-provisioner-v1/dbg-*:low
- aws-provisioner-v1/dbg-*:very-low
+ aws-provisioner-v1/desktop-test*:low
- aws-provisioner-v1/desktop-test*:very-low
+ aws-provisioner-v1/flame-kk*:low
- aws-provisioner-v1/flame-kk*:very-low
+ aws-provisioner-v1/gecko-1-*:low
- aws-provisioner-v1/gecko-1-*:very-low
+ aws-provisioner-v1/gecko-decision:low
- aws-provisioner-v1/gecko-decision:very-low
+ aws-provisioner-v1/gecko-images:low
- aws-provisioner-v1/gecko-images:very-low
+ aws-provisioner-v1/gecko-misc:low
- aws-provisioner-v1/gecko-misc:very-low
+ aws-provisioner-v1/gecko-symbol-upload:low
- aws-provisioner-v1/gecko-symbol-upload:very-low
+ aws-provisioner-v1/gecko-t-*:low
- aws-provisioner-v1/gecko-t-*:very-low
+ aws-provisioner-v1/loan-1-*:low
- aws-provisioner-v1/loan-1-*:very-low
+ aws-provisioner-v1/loan-t-*:low
- aws-provisioner-v1/loan-t-*:very-low
+ aws-provisioner-v1/mulet-debug:low
- aws-provisioner-v1/mulet-debug:very-low
+ aws-provisioner-v1/mulet-opt:low
- aws-provisioner-v1/mulet-opt:very-low
+ aws-provisioner-v1/opt-*:low
- aws-provisioner-v1/opt-*:very-low
+ aws-provisioner-v1/rustbuild:low
- aws-provisioner-v1/rustbuild:very-low
+ aws-provisioner-v1/spidermonkey:low
- aws-provisioner-v1/spidermonkey:very-low
+ aws-provisioner-v1/symbol-upload:low
- aws-provisioner-v1/symbol-upload:very-low
+ aws-provisioner-v1/taskcluster-images:low
- aws-provisioner-v1/taskcluster-images:very-low
+ buildbot-bridge/buildbot-bridge:low
- buildbot-bridge/buildbot-bridge:very-low
+ dummy-test-provisioner/dummy-test-type:low
- dummy-test-provisioner/dummy-test-type:very-low
+ gecko-t-tc-worker/*:low
- gecko-t-tc-worker/*:very-low
+ localprovisioner/*:low
- localprovisioner/*:very-low
+ manual-packet/tc-worker-qemu-v1:low
- manual-packet/tc-worker-qemu-v1:very-low
+ null-provisioner/buildbot-try:low
- null-provisioner/buildbot-try:very-low
+ null-provisioner/buildbot:low
- null-provisioner/buildbot:very-low
+ packetnet/*:low
- packetnet/*:very-low
- releng-hardware/gecko-b-*:high
- releng-hardware/gecko-b-*:low
- releng-hardware/gecko-b-*:medium
- releng-hardware/gecko-b-*:very-low
+ releng-hardware/gecko-t-*:low
- releng-hardware/gecko-t-*:very-low
+ releng-hardware/my-talos:low
- releng-hardware/my-talos:very-low
+ scl3-puppet/os-x-10-10-gw:low
- scl3-puppet/os-x-10-10-gw:very-low
+ scl3-puppet/os-x-build-gw:low
- scl3-puppet/os-x-build-gw:very-low
+ scriptworker-prov-v1/dep-pushapk:low
- scriptworker-prov-v1/dep-pushapk:very-low
+ scriptworker-prov-v1/depsigning:low
- scriptworker-prov-v1/depsigning:very-low
+ scriptworker-prov-v1/shipit-dev:low
- scriptworker-prov-v1/shipit-dev:very-low
+ tc-worker-provisioner/*:low
- tc-worker-provisioner/*:very-low
+ test-dummy-provisioner/*:low
- test-dummy-provisioner/*:very-low
level 2
- aws-provisioner-v1/ami-test*:very-low
- aws-provisioner-v1/android-api-*:very-low
- aws-provisioner-v1/b2gbuild*:very-low
- aws-provisioner-v1/b2gtest*:very-low
- aws-provisioner-v1/balrog:very-low
- aws-provisioner-v1/dbg-*:very-low
- aws-provisioner-v1/desktop-test*:very-low
- aws-provisioner-v1/flame-kk*:very-low
- aws-provisioner-v1/gecko-1-*:very-low
+ aws-provisioner-v1/gecko-2-*:low
- aws-provisioner-v1/gecko-2-*:very-low
- aws-provisioner-v1/gecko-decision:very-low
- aws-provisioner-v1/gecko-images:very-low
- aws-provisioner-v1/gecko-misc:very-low
- aws-provisioner-v1/gecko-symbol-upload:very-low
- aws-provisioner-v1/gecko-t-*:very-low
- aws-provisioner-v1/loan-1-*:very-low
- aws-provisioner-v1/loan-t-*:very-low
- aws-provisioner-v1/mulet-debug:very-low
- aws-provisioner-v1/mulet-opt:very-low
- aws-provisioner-v1/opt-*:very-low
- aws-provisioner-v1/rustbuild:very-low
- aws-provisioner-v1/spidermonkey:very-low
- aws-provisioner-v1/symbol-upload:very-low
- aws-provisioner-v1/taskcluster-images:very-low
+ aws-provisioner-v1/testdroid-device:low
- aws-provisioner-v1/testdroid-device:very-low
- buildbot-bridge/buildbot-bridge:very-low
- dummy-test-provisioner/dummy-test-type:very-low
- gecko-t-tc-worker/*:very-low
- localprovisioner/*:very-low
- null-provisioner/buildbot-try:very-low
- null-provisioner/buildbot:very-low
- packetnet/*:very-low
- releng-hardware/gecko-t-*:very-low
- scl3-puppet/os-x-10-10-gw:very-low
- scl3-puppet/os-x-build-gw:very-low
- tc-worker-provisioner/*:very-low
- test-dummy-provisioner/*:very-low

The level-1 changes are largely changing very-low to low.

The level-2 role currently has some redundant scopes, since it inherits level 1.  So those will get removed, with the exception of gecko-2-* and testdroid-device (which level 1 doesn't have).

I won't land this change today, though.
It would probably be good to have a way to track usage of this so we can sanity-check that it's not being abused. I suspect that it will be fine, but we do grant L1 access very freely, so I would certainly feel better if we had a way to notice usage of this feature increasing. It would not be the end of the world to add additional restrictions here in the future like, "only people with L3 access can use this feature".
End of the world, no.. difficult, yes :)

I think that interested parties could pretty easily write something against the TH API to see how often this is used.  Likely "security researchers" will be writing such a tool, so we could just borrow theirs :)
landed for level 1
Landed for level 2.  All that remains is to build the in-tree stuff.
Comment on attachment 8964386 [details]
Bug 1443974: add a template to run try jobs at 'low' priority;

https://reviewboard.mozilla.org/r/233106/#review238780

Lgtm!

::: tools/tryselect/selectors/fuzzy.py:94
(Diff revision 1)
>            'default': False,
>            'help': "Update fzf before running.",
>            }],
>      ]
>      common_groups = ['push', 'task', 'preset']
> -    templates = ['artifact', 'path', 'env', 'rebuild']
> +    templates = ['artifact', 'path', 'env', 'rebuild', 'chemspill-prio']

Just throwing out ideas, but what if we made `chemspill` a top-level subcommand?

    $ mach try chemspill
    
It could set this template as well as do things like turn off artifact builds, run all tasks by default (and dispatch to `fuzzy` if a subset is desired), and anything else that might be useful. I don't know how people normally test chemspills, so these are just examples but you get the idea.

This would definitely be follow-up fodder.
Attachment #8964386 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8964386 [details]
Bug 1443974: add a template to run try jobs at 'low' priority;

https://reviewboard.mozilla.org/r/233106/#review238780

> Just throwing out ideas, but what if we made `chemspill` a top-level subcommand?
> 
>     $ mach try chemspill
>     
> It could set this template as well as do things like turn off artifact builds, run all tasks by default (and dispatch to `fuzzy` if a subset is desired), and anything else that might be useful. I don't know how people normally test chemspills, so these are just examples but you get the idea.
> 
> This would definitely be follow-up fodder.

We could also hardcode a set of tasks that should *always* be run when testing chemspills (assuming such tasks exist).
If those are useful to people who test chemspills, that sounds great.  I'm not one of those people, so I have no idea.  I would expect that every chemspill is a little different, and there's no "chemspill" test suite.
Pushed by dmitchell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cf8ffb2a11f2
add a template to run try jobs at 'low' priority; r=ahal
https://hg.mozilla.org/mozilla-central/rev/cf8ffb2a11f2
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in before you can comment on or make changes to this bug.