Closed Bug 1555032 Opened 5 years ago Closed 5 years ago

Support prototyping external scheduling algorithms (aka optimizers)

Tracking

(firefox69 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla69

Tracking Flags:

Tracking

Status

firefox69

---

fixed

People

(Reporter: ahal, Assigned: ahal)

Details

Attachments

(1 file, 2 obsolete files)

Bug 1555032 - [taskgraph] Split strategies out of taskgraph.optimize 5 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1555032 - [taskgraph] Allow defining optimization strategies in an environment variable, r?tomprince 5 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1555032 - [taskgraph] Add ability to pass external optimize strategies via env, r?tomprince 5 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review

Andrew Halberstadt [:ahal]

Assignee

Description

•

5 years ago

Problem Statement

Our current scheduling algorithm SETA has served us well over the years, but it is very naive and we believe we can improve the ratio of regressions caught / hours spent running tasks. We believe things like code coverage, inspecting task queues and even simple tweaks to SETA's parameters can help.

The problem is that we have no way of testing our theories. The best we can do is:

Gather a bunch of data with the current algorithm.
Make a change to the algorithm.
Wait a month or two.
Gather a different bunch of data.
Compare how each did.

This is problematic because:

A. The turnaround time for testing changes is months.
B. We are using different data sets to compare, so the comparison might not be fair.

We need an established way to test novel optimizers.

Possible Solution #1 - Shadow Scheduling

This is the simplest method. We could create one or more dummy decision task(s) that run(s) on every push (but doesn't schedule anything). These would emit artifacts that say which tasks would have been scheduled, had the associated scheduling algorithm been used. We can then download these artifacts and compare them to what actually got scheduled.

The downside here is that this only fixes problem B. We would still have month+ long turn-around times for testing out new algorithms. It also introduces a new problem:

C. If an algorithm is tweaked, any data that spans before+after that change will be invalid. If we have multiple algorithms being updated at various times, it might become easy to lose track of which ranges are valid for which algorithms.

Possible Solution #2 - Retroactive Analysis

I think the better approach is to build a mechanism that lets us plug in "external" algorithms and see how they would have performed on past data. To avoid bit-rot, these "test" algorithms would need to live outside of the tree, but still be able to get "plugged in" to the taskgraph.

Running an analysis this way would be a bit more complicated, but I think it will be worth it. I envision creating a script which:

Accepts one or more test algorithms as input.
Uses ActiveData to get push information (like parent revision).
Physically updates a clone of mozilla-central to each push head in the desired range.
Runs |mach taskgraph| (with the experimental schedulers applied) to get list of tasks that would be scheduled.
Does the comparison.

Such an analysis might take a few hours to complete (depending on the range), but it could be run in an on demand try-only task only when necessary. This would give us a turn around of ~half a day for testing new scheduling hypotheses.

I already have a patch that allows me to support option #2 (which I'll upload shortly). This comment is meant to provide some context around why I'm proposing it.

Marco Castelluccio [:marco]

Comment 1

•

5 years ago

I think in order for 2 to be effective, we also need easy ways (maybe more recipes?) to get past data from ActiveData. Some algorithms may use data from past failures and code changes in order to schedule tasks.

Andrew Halberstadt [:ahal]

Assignee

Comment 2

•

5 years ago

Attached file Bug 1555032 - [taskgraph] Split strategies out of taskgraph.optimize (obsolete) — Details

Andrew Halberstadt [:ahal]

Assignee

Comment 3

•

5 years ago

Attached file Bug 1555032 - [taskgraph] Allow defining optimization strategies in an environment variable, r?tomprince (obsolete) — Details

Depends on D32853

Andrew Halberstadt [:ahal]

Assignee

Comment 4

•

5 years ago

(In reply to Marco Castelluccio [:marco] from comment #1)

I think in order for 2 to be effective, we also need easy ways (maybe more recipes?) to get past data from ActiveData. Some algorithms may use data from past failures and code changes in order to schedule tasks.

Yep, I have a prototype that (mostly) solves this. See:
https://github.com/ahal/ci-recipes/blob/master/recipes/push_health.py

My plan is to set up a Redis server that caches the AD results so we don't need to hit it over and over again for every single push.

Andrew Halberstadt [:ahal]

Assignee

Comment 5

•

5 years ago

Hey Dustin, Tom.

I've attached my attempt at option #2. I kind of expect you to not be thrilled about it (see comment 0 for context). I could potentially accomplish this via monkeypatching.. but would prefer something more properly supported.

Feedback greatly appreciated, I'm open to any and all suggestions!

Flags: needinfo?(mozilla)

Flags: needinfo?(dustin)

Andrew Halberstadt [:ahal]

Assignee

Comment 6

•

5 years ago

With the above patches, here's how I would test out my custom scheduling algorithm:

Save the following file somewhere on $PYTHONPATH:

custom_scheduler.py

import random

from taskgraph.optimize.base import Either, OptimizationStrategy
from taskgraph.optimize.strategies import (
    IndexSearch,
    SETA,
    SkipUnlessChanged,
    SkipUnlessSchedules,
)


class RandomOptimizer(OptimizationStrategy):
    probability = 0.5  # probability task is optimized away

    def should_remove_task(self, task, params, _):
        if random.random() < self.probability:
            return True
        return False


CUSTOM_STRATEGIES = {
    'never': OptimizationStrategy(),  # "never" is the default behavior
    'index-search': IndexSearch(),
    'seta': RandomOptimizer(),
    'skip-unless-changed': SkipUnlessChanged(),
    'skip-unless-schedules': SkipUnlessSchedules(),
    'skip-unless-schedules-or-seta': Either(SkipUnlessSchedules(), RandomOptimizer()),
}

Run:

$ ./mach taskgraph optimized --strategies "custom_scheduler:CUSTOM_STRATEGIES"

Dustin J. Mitchell [:dustin] (he/him)

Comment 7

•

5 years ago

It looks like this is mainly intended for testing replacements for SETA: skip-unless-changed isn't really a name that suggests multiple competing implementations.

And SETA brings its own testing challenges, as it has its own external state that isn't included in Parameters -- it might give different answers today than it did a month ago (at least, that's my understanding). I'm not sure how to solve that except by "snapshotting" the SETA data.

Anyway, an alternative place to "splice" the alternatives in would be in taskcluster/taskgraph/util/seta.py, perhaps after applying an extra layer of abstraction by renaming the SETA/"seta" optimization strategy to only-if-important or something similarly generic, and likewise seta.py -> importance.py. Then importance.py can have a way to apply a variety of approaches to defining "importance".

For after-the-fact analysis, it's probably easier to just feed an environment variable in, as it avoids threading through a new parameter and command-line arg through a bunch of files. Something like:

FORCE_IMPORTANCE_APPROACH=random ./mach taskgraph optimized

Flags: needinfo?(dustin)

Andrew Halberstadt [:ahal]

Assignee

Comment 8

•

5 years ago

(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #7)

It looks like this is mainly intended for testing replacements for SETA: skip-unless-changed isn't really a name that suggests multiple competing implementations.

Yes, one needs to shadow the same strategy names to avoid needing to update all the configs/transforms. Maybe we can come up with some more generic labels for these, though it's not strictly necessary.

And SETA brings its own testing challenges, as it has its own external state that isn't included in Parameters -- it might give different answers today than it did a month ago (at least, that's my understanding). I'm not sure how to solve that except by "snapshotting" the SETA data.

The SETA results are already frozen in the task-labels.json artifact on every decision task. Though in general, you are correct that any non-deterministic algorithm will have this problem (think about a scheduler that inspects task queues). Using a shadow scheduler would solve this.. maybe there's room for both approaches in the future. For the immediate case, code coverage should be deterministic though.

Anyway, an alternative place to "splice" the alternatives in would be in taskcluster/taskgraph/util/seta.py, perhaps after applying an extra layer of abstraction by renaming the SETA/"seta" optimization strategy to only-if-important or something similarly generic, and likewise seta.py -> importance.py. Then importance.py can have a way to apply a variety of approaches to defining "importance".

SETA is definitely the initial focus, but I'd prefer to prove this out a bit more generically. For example, a question that might come up once we have a robust code coverage scheduler is whether skip-unless-schedules is still necessary or not. I think we'll want to be in a habit of continually testing our scheduling to see what performs best going into the future. I.e this isn't intended to be a one-off analysis with the only goal of replacing SETA.

For after-the-fact analysis, it's probably easier to just feed an environment variable in, as it avoids threading through a new parameter and command-line arg through a bunch of files. Something like:
FORCE_IMPORTANCE_APPROACH=random ./mach taskgraph optimized

Yes, this is simpler. Good idea!

Dustin J. Mitchell [:dustin] (he/him)

Comment 9

•

5 years ago

I don't see a task-labels.json in https://tools.taskcluster.net/groups/YJnsHDOhQWGiav8GfGXIcA/tasks/YJnsHDOhQWGiav8GfGXIcA/runs/0/artifacts for example. But it's good to know SETA data is captured somewhere.

Andrew Halberstadt [:ahal]

Assignee

Comment 10

•

5 years ago

Attached file Bug 1555032 - [taskgraph] Add ability to pass external optimize strategies via env, r?tomprince — Details

This opens the door to swap in external strategies at runtime. This will be
used for back testing experimental strategies.

Phabricator Automation

Updated

•

5 years ago

Attachment #9068079 - Attachment description: Bug 1555032 - [taskgraph] Allow defining optimization strategies in parameters → Bug 1555032 - [taskgraph] Allow defining optimization strategies in an environment variable, r?tomprince

Phabricator Automation

Updated

•

5 years ago

Attachment #9068078 - Attachment is obsolete: true

Andrew Halberstadt [:ahal]

Assignee

Comment 11

•

5 years ago

We can handle feedback in review moving forward.

Flags: needinfo?(mozilla)

Phabricator Automation

Updated

•

5 years ago

Attachment #9068816 - Attachment description: Bug 1555032 - [taskgraph] Allow optimize strategies to be python_path.find_object strings, r?tomprince → Bug 1555032 - [taskgraph] A ability to pass external optimize strategies via env, r?tomprince

Phabricator Automation

Updated

•

5 years ago

Attachment #9068079 - Attachment is obsolete: true

Phabricator Automation

Updated

•

5 years ago

Attachment #9068816 - Attachment description: Bug 1555032 - [taskgraph] A ability to pass external optimize strategies via env, r?tomprince → Bug 1555032 - [taskgraph] Add ability to pass external optimize strategies via env, r?tomprince

Pulsebot

Comment 12

•

5 years ago

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3a5ec7efe864
[taskgraph] Add ability to pass external optimize strategies via env, r=tomprince

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 13

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3a5ec7efe864

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox69: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla69

You need to log in before you can comment on or make changes to this bug.