Closed Bug 1652086 Opened 4 years ago Closed 2 years ago

task generation performance has significantly regressed

Categories

(Firefox Build System :: Task Configuration, defect)

defect

Tracking

(firefox-esr68 unaffected, firefox-esr78 unaffected, firefox78 unaffected, firefox79 unaffected, firefox80 wontfix)

RESOLVED FIXED
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox78 --- unaffected
firefox79 --- unaffected
firefox80 --- wontfix

People

(Reporter: froydnj, Assigned: tomprince)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(5 files, 1 obsolete file)

Somewhere between 5766d99b88f379d3eb631085387cc9cbae438b6a and 4e9d6619c9d5a7306f66ca8d6f9d97579ba77b4d, task generation performance (e.g. generate the list of tasks for mach try fuzzy) has moved from "mildly annoying" to "borderline unusable".

Component: General → Task Configuration
Product: Taskcluster → Firefox Build System

I was looking at py-spy for generating the taskgraph, and found that a bunch
of time was spent in taskgraph.transforms.job.use_fetches[1]. Use a dictionary
there instead saves about 20-30s on my machine.

[1] https://searchfox.org/mozilla-central/rev/622dbd3409610ad3f71b56c9a6a92da905dab0aa/taskcluster/taskgraph/transforms/job/__init__.py#243-247

Assignee: nobody → mozilla
Status: NEW → ASSIGNED
Keywords: leave-open

I just ran this through hg bisect and:

The first bad revision is:
changeset: 617179:c90d36eecc9e
user: Gregory Mierzwinski <gmierz2@outlook.com>
date: Wed Jul 08 07:48:16 2020 +0000
summary: Bug 1650871 - Add all browsertime desktop tests. r=perftest-reviewers,AlexandruIonescu

Regressed by: 1650871
Has Regression Range: --- → yes
Depends on: 1652123

Decision tasks are now taking > 10 minutes instead of 3~4. Just for this, until this is sorted out, I think the revision listed in comment 2 should be backed out.

Huh... so bug 1650871 only made decision tasks take 6~7 minutes... what bumped to double that is... https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=5cff164097d9d5eb441d7908ceaf370b865258c1 (a backout!) which makes no sense.
The log for a slow decision task shows something like:

[task 2020-07-10T20:41:47.997Z] optimize: system-symbols-win-upload-symbols kept because of 'never' strategy
[task 2020-07-10T20:49:50.154Z] Retrieving low-value jobs list from SETA

So the bad state of decision tasks would be unrelated.

Python's copy.deepcopy does a bunch of work to handle recusive and otherwise
non-graph-like strucutres. The way we use it most of the time in taskgraph, we
are only care about nested dictionaries and lists, so we can use a simpler
implementation. On my machine, this saves about 15s.

Set release status flags based on info from the regressing bug 1650871

Pushed by mozilla@hocat.ca:
https://hg.mozilla.org/integration/autoland/rev/ab433cee6227
Make `kind_dependencies_tasks` a dictionary based on the label; r=Callek

The docstring for merge claims it returns an new object, without modifying
any of the arguments, but will (depending on the shape) return an object
with shared subobjects in some cases.

I'm planning on replacing voluptuous with something faster in Bug 1652123, but
in the meantime, compile the schema is slow, so there is no reason to do so if
we aren't going to check them anyway.

Pushed by mozilla@hocat.ca:
https://hg.mozilla.org/integration/autoland/rev/e1c5c97858c8
[taskgraph] Don't compile schemas if we aren't going to check them; r=Callek

The leave-open keyword is there and there is no activity for 6 months.
:ahal, maybe it's time to close this bug?

Flags: needinfo?(ahal)

We should file bugs blocking bug 1617598 for future performance work.

Blocks: 1617598
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(ahal)
Resolution: --- → FIXED
Attachment #9163550 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: