Closed Bug 1666809 Opened 5 years ago Closed 5 years ago

Prevent multiple backstops in a row

Categories

(Firefox Build System :: Task Configuration, task, P1)

task

Tracking

(firefox83 fixed)

RESOLVED FIXED
83 Branch
Tracking Status
firefox83 --- fixed

People

(Reporter: ahal, Assigned: ahal)

Details

Attachments

(4 files)

The decision task updates the taskcluster index to find backstop pushes after the time interval has been surpassed, though the index is updated via an "index-task" that runs after the decision task completes.

This means if we've reached the time interval, and two pushes come in at roughly the same time (such that push 2's Decision task starts before push 1's finishes), then they will both become backstops.

This seems to happen fairly often in practice.

One idea to solve this is to call this taskcluster API directly from the decision task, immediately after the parameters have been resolved:
https://docs.taskcluster.net/docs/reference/core/index/api#insertTask

Then the odds of this happening should be very very small (though still theoretically possible). Tolerating double backstops once in a blue moon is reasonable.

Assignee: nobody → ahal
Severity: -- → S2
Status: NEW → ASSIGNED
Priority: -- → P1

:bstack, are there any gotchas with calling taskcluster API for updating the index? I cannot think of any, but wanted to ask.

Flags: needinfo?(bstack)

Ah, this sounds like it could be related to our recent backlog of b-linux and b-win2012 worker pools?

Possibly, would definitely be a contributing factor at least. I should hopefully have a patch soon.

As a short term work around, we could bump the backstop interval up by a couple hours to avoid this happening more frequently. Let me know if you think that is necessary, otherwise I'll aim to have this fixed this week.

We started using the "backstop" index added by bug 1660506 to determine whether
a push should be a backstop based on a time interval. The problem is that this
index gets added by an index-task that runs after the decision task has
completed. Therefore, if two pushes land at roughly the same time (i.e, the
second decision task starts before the first completes), then they can both
determine themselves as backstops.

This patch gets around the problem by inserting the "backstop" index as early
as possible (immediately after resolving parameters), so the chances of this
happening become very low. It's still theoretically possible that it could
happen again, but we don't need this to be 100% perfect. As long as it is rare,
it's good enough.

Depends on D91191

I think there shouldn't be any additional gotchas to calling it directly. iiuc what is being described here it makes sense!

Flags: needinfo?(bstack)

This appears to work, but we may not want to take it if there are security implications around the additional scope to the decision task, or using the proxy from the decision task. For this approach:
PROS:

  • Makes no assumptions about what can / can't be a backstop, gives us flexibility
  • Simpler logic for finding backstop and fewer requests made
  • Already implemented :p

CONS:

  • Race condition still exists (though should be rare)
  • Possible security implications?

Aki proposed an alternate approach where we first find the last backstop from the index, and then use the pushlog to determine if any of the pushes in-between that and this should be a backstop (but just haven't been added to the index yet).

PROS:

  • Race condition eliminated
  • Definitely safe

CONS:

  • Limits backstop determination to data contained in pushlog
  • More complicated / adds an extra request to pushlog

If it makes a difference, the decision task also uses the proxy service to submit the tasks:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/create.py#123

So this wouldn't be the first use of it in the decision task.

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8a5cfdf692af Allow HTTP methods other than GET and POST in taskgraph/util/taskcluster.py, r=taskgraph-reviewers,aki https://hg.mozilla.org/integration/autoland/rev/8289b6083a61 Insert decision task indexes directly via taskluster API rather than index-task, r=taskgraph-reviewers,aki https://hg.mozilla.org/integration/autoland/rev/a17a40e40177 [taskgraph] Refactor test_util_backstop.py to make use of parametrization, r=taskgraph-reviewers,aki https://hg.mozilla.org/integration/autoland/rev/569d61169121 [taskgraph] Consider pushes a backstop if the last backstop had a broken decision task, r=taskgraph-reviewers,aki
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: