Closed Bug 1658174 Opened 4 years ago Closed 2 years ago

backfill tasks with still to built dependencies built these multiple times if multiple task runs per push requested - only 1 dependency build needed

Categories

(Firefox Build System :: Task Configuration, defect)

defect

Tracking

(firefox103 fixed)

RESOLVED FIXED
103 Branch
Tracking Status
firefox103 --- fixed

People

(Reporter: aryx, Assigned: ahal)

References

Details

Attachments

(1 file)

See the Windows instr tasks for this Treeherder view after a backfill of the Windows AArch R1 reftest with 3 runs per push had been requested.

The dependencies need to be built only once and their artifacts can be used by all reftest runs of the push. The current state would be a waste of money.

is this a taskcluster issue as it would be a dependency graph problem?

This can lead to some pretty bad backlogs on resource-constrained hardware pools. The current logic seems incredibly wasteful for when perf sheriffs are doing backfills. Can we find a way to prioritize fixing this?

Flags: needinfo?(cbond)

Ryan, do you know of a recent example of one of these backlogs happening?

I'm wondering if maybe the new backfill action the perftest team set up might be the cause here. Assuming you've noticed it more frequently in recent weeks.

(also the tasks in comment 0 have expired, so would help to have a concrete task to use for debugging)

Flags: needinfo?(cbond) → needinfo?(ryanvm)

There was discussion about it a couple times in #sheriffs last week. I don't know how to trace that back to a specific backfill job triggered by the perf sheriffs.

Flags: needinfo?(ryanvm)

Maybe Aryx can help?

Flags: needinfo?(aryx.bugmail)

In this case the backfills were for pushes from 2 weeks ago and had been requested by performance sheriffs afinder (task). The issue only causes noticeable backlog if worker pools with a very restricted pool size get used (like macOS or Android) - see the multiple runs for Linux and Windows.

There has been no subjective increase in backlogs.

Flags: needinfo?(aryx.bugmail)

Thanks, I found this push which I think illustrates the problem most clearly:
https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&revision=5f7167a8686ef3c397b9df72eca7528fdd8acadb&searchStr=osx

Looks like this happens with retriggers too, and only for perf (or at least talos) tasks.

Retriggers should not be affected because the dependencies are already available. These are backfills which get requested to run 10 times.

Ah, yeah I see the problem. Got confused because they didn't have -bk in the symbol.

Assignee: nobody → ahal
Status: NEW → ASSIGNED

Previously we we're looping over 'times' and submitting N separate graphs for
each. This is inefficient because it means that we also rerun dependencies that
many times.

This patch fixes this by instead setting the task_duplicates attribute on
every task we are trying to backfill. Therefore, ensuring the dependencies only
run once.

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0abfb85ba5b4
[taskgraph] Ensure backfills with 'times' only apply to desired tasks, r=gabriel,taskgraph-reviewers
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 103 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: