Closed Bug 848589 Opened 11 years ago Closed 8 years ago

Fair tryserver scheduling

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: justin.lebar+bug, Unassigned)

Details

Tryserver turnaround times are approaching an unacceptable level.  This is hurting the ability of developers to write code and turn it around in a timely fashion.

We are quickly approaching an inflection point at which some jobs will never hit zero pending builds overnight; at this point, wait times for those jobs will increase dramatically.

I understand that there is work to increase our capacity, and I totally support this work.  But given past experience, I do not expect these capacity increases to materialize in the next few weeks and solve all our problems.  We need something now.

My suggestion is that we implement scheduling on tryserver to give users an incentive not to use too many resources on try.  I think we should use as a rough model the way operating systems schedule CPU time.  What follows is a proposal for how we could implement this.  I'm sorry it's so long.

For every job j (e.g. "linux 64 debug mochitest-1") let T(j) be the average runtime in days (*) of all green runs of j in the past week.  Let M(j, t) be the number of machines we own that are capable of running j and are idle at time t.

(*) Using the somewhat awkward unit of days for time will help a bit later on, but the units don't really matter.

Define j's "cost" at time t, C(j, t), as

  C(j, t) = T(j) / (M(j, t) + 1).

(Perhaps we should instead define M(j) to be the number of machines we have that are capable of running job j and ignore how many machines are idle at time t.  This would penalize people for running jobs when we're under minimal load, but would get rid of the fact that C(j, t) increases dramatically if you get "sniped" and someone snatches all of our available capacity for job j right before you push to try.)

Notice that if we have arbitrarily many machines which can run j (e.g., we can run j in the cloud), then M(j, t) = infinity and C(j, t) = 0.

Each user u has an associated debt D(u) which is an exponentially-weighted sum of the cost of all the jobs she has run.  I'll explain exactly how to compute D(u) later, but let's skip to the interesting bit, which is how to schedule jobs.

Each time a job comes in that can't be satisfied using an idle machine, tryserver adds the job to a priority queue Q of pending jobs.  Each time a new machine becomes available, tryserver scans the queue and runs the first job in the queue that this machine is capable of running.  It then recomputes all the priorities in the queue (in practice, this just means recomputing the priorities of the jobs associated with the user whose job was just scheduled) and waits for the next machine to become available.

The priority queue Q is ordered in increasing order of P(j, u, t), which is computed as

  P(j, u) = D(u) + C(j, time_now())

Note that because this job is currently pending, M(j, time_now()) == 0, so in this case P(j, u) simplifies to

  P(j, u) = D(u) + T(j)  (if j is pending).

The only question remaining is, how do we compute D(u)?

Essentially if a user ran a job with cost 2 on day 1.1 and a job with cost 3 on day 1.2, we want

  D(u) = log(2 * exp(1.1y) + 3 * exp(1.2y))    (1)

for magic constant y.  Essentially instead of exponential decline in old debts over time, we have exponential growth of new debts over time (in the log scale).  These two are equivalent, except using exponential growth means that we don't have to constantly go through all the D(u)'s and make them smaller.

Anyway, given a user's current debt D(u), suppose we want to add on a job of cost 4 on day 1.3.  We can compute the new debt, D'(u), as:

  D'(u) = log(exp(D(u) - log(4) - 1.3y) + 1) + log(4) + 1.3y.

A bit of algebra shows that if D(u) was computed as in (1) above, then D'(u) is equal to

  D'(u) = log(2exp(1.1y) + 3exp(1.2y) + 4exp(1.3y)),

as desired.

See http://mathb.in/713 for all the steps written out.  (It's for a slightly different setup; you want to replace e^(4l) with 4exp(1.3y).  I'm happy to write it out more concretely if that's helpful.)  You can also prove this to yourself by picking an arbitrary value for y and typing it into a calculator.  For y=log(2), I got

  D(u) = log(2 * exp(1.1y) + 3 * exp(1.2y))          = 2.41406
  log(exp(D(u) - log(4) - 1.3y) + 1) + log(4) + 1.3y = 3.04587
  log(2exp(1.1y) + 3exp(1.2y) + 4exp(1.3y))          = 3.04588

The only remaining piece is picking our constant y.  It represents how quickly we "forgive" debt, or equivalently, how much bigger a debt tomorrow is than a debt today.  If we pick y=log(2) then a debt tomorrow is worth twice as much as a debt today.  That's probably a reasonable place to start.

I understand that I'm asking for a large change to the tryserver software here.  But our current tack is unsustainable.  We either need to invest in our software to encourage people to use fewer resources, or we need to add more hardware.  Given that we've been aware for months now that we don't have sufficient hardware and we're still approaching an inflection point, I think it's time to make a software change.

I'm totally in favor of new tryserver features such as "push to the branch that has the most available machines and, if that's green, push -a."  But I think people first need an incentive to use such a system.
Would it make sense to bring mozilla-inbound back to the same priority as try?
I vote for taking more coalescing on m-i rather than developers starting to stop trying patches on try before landing on m-i.
I can already see the increase of pushes on m-i and decrease on try.
We could do it as a stopgap until we can implement jlebar's proposal or have more capacity.
> Would it make sense to bring mozilla-inbound back to the same priority as try?

I'm not immediately in favor of this.  It just means more, longer tree closures when people bust inbound.  But we should ask the sheriffs what they think.
(In reply to Justin Lebar [:jlebar] (PTO 3/11-3/15, then GMT+9 until 3/24) from comment #2)
> > Would it make sense to bring mozilla-inbound back to the same priority as try?
> 
> I'm not immediately in favor of this.  It just means more, longer tree
> closures when people bust inbound.  But we should ask the sheriffs what they
> think.

Fair point!
FTR, before we had this problem the priorities (if I got this right) was [1]:

Before:
m-i = 2
try = 4
project branches = 5

Now:
m-i = 4
try & p.b =5

Proposed:
m-i, try & p.b. = 5
OR
m-i, try = 4
p.b. = 5

Also to point out something that I said implicitly (for the sake of clarity) is that people starting to land on m-i and skipping landing on try (change of behavior due to long delays on try) would also cause longer tree closures on m-i (increase chances that patches will break due to lack of try testing).

As I said, this would be a stopgap to prevent the change of behavior (skip landing on try first).

[1] http://hg.mozilla.org/build/buildbot-configs/file/ea56c3600cd4/mozilla/master_common.py#l93
I would be very much against inbound being an equal priority to try -- we'd then be allowing abusers of Try to affect load on critical trees, rather than just Try, which would mean we'd still need the proposal in comment 0 after all.

I like the sound of Justin's idea - thank you for filing this :-)
(In reply to Ed Morley [:edmorley UTC+0] from comment #4)
> I would be very much against inbound being an equal priority to try -- we'd
> then be allowing abusers of Try to affect load on critical trees, rather
> than just Try, which would mean we'd still need the proposal in comment 0
> after all.
> 
> I like the sound of Justin's idea - thank you for filing this :-)

Thanks edmorley! I've got nothing else to add (consider my proposal officially revoked :D).
Very interesting idea, thanks!

I've been thinking about scheduling as well, and have been wondering if an auction system could work.

On try, each developer would get some number of credits per day.

Each job type would have a "minimum bid" which represents its base cost. I'd like to make this correspond to the actual costs of the machine (e.g. per-hour cost in EC2, and some approximation of HW + colo + staff costs for in-house capacity), but that doesn't matter as much. The base cost could be 0.

If there's contention for slaves, bids get automatically increased up to a maximum bid that the developer can specify, or until you have no more credits left. Think eBay. The job with the highest bid runs next.

I think this has some nice properties:
- it makes it expensive to run try jobs in peak times
- penalizes users who push a lot
- makes it possible to push stuff at a low priority by setting a low maximum bit, or a high priority by setting a high maximum bid
I think the system I proposed and the system in comment 6 are pretty similar at a high level, but I want to focus on the differences.

The advantages I see to the proposal in comment 6 are:

1. It's easier to understand, because we all know how money works.

2. I can prioritize between my own jobs.  The system I proposed essentially assumes that all of your tryserver jobs are your highest priority.

If prioritization of one user's jobs is important, I think we could probably extend the system from comment 0 to extend that.

There are some disadvantages to the proposal in comment 6 as well.

1. The main difference I see is that with the system in comment 6, we have to figure out how many credits users should get ahead of time.  Figuring out how much credit to dole out isn't easy; see the current glut of carbon credits in the EU.  And I'd contend that in this system it's even harder to pick the right number of credits to dole out, because we don't allow users to trade credits.

Once two users are both broke, their priorities are equal.  Similarly, if two users are both flush, their priorities are effectively equal.  So the scheme from comment 6 requires us to carefully manage the number of credits users have so that users are always just above poverty.  This requires managing not only the number of credits we dole out each day, but also how quickly those credits expire. [1]

In contrast, under the scheme in comment 0, we can always make a fair decision about any two users' job priorities.

2. Another disadvantage of the system from comment 6 is that there are fiddly knobs for devs to turn.  How much do I bid for my jobs?  I have to make a decision based on the current ask/bid prices every time I push to try.  And I potentially may want to revise that decision as I get more cash.  The purpose here is to make developers' lives easier, not to let us play ebay with infra.

3. Under the scheme from comment 6, users have an incentive to get more credits, perhaps by asking their manager ("hey, I'm about to run a lot of jobs, I need them").  But under the scheme from comment 0, I think fewer users would be asking for their debts to be forgiven at a faster rate than others' (which would be the equivalent), perhaps because that sounds more unfair than asking for additional cash.

4. The last disadvantage I see is that the scheme from comment 0 can reflect the fact that some jobs take longer than others.  The scheme from comment 6 can factor this into the jobs' cost, but only if we make jobs' costs nonzero.  But if we did that, we'd have the problem that users might be broke and not be able to any tryserver jobs!  It's not clear to me how to design a system which both reflects how long jobs take and allows broke users to push low-priority jobs to try.

[1] The following show might be enlightening.  In part of it, an economist gives his daughter an allowance in an attempt to teach her to live within limited means, and uses a combination of taxes and subsidies in an attempt to influence her behavior.  But she ends up building up a large war chest of funds and circumventing his plans.  http://www.npr.org/blogs/money/2012/07/06/156391538/episode-205-allowance-taxes-and-potty-training
> 4. The last disadvantage I see is that the scheme from comment 0 can reflect
> the fact that some jobs take longer than others.  The scheme from comment 6
> can factor this into the jobs' cost, but only if we make jobs' costs
> nonzero.  But if we did that, we'd have the problem that users might be
> broke and not be able to any tryserver jobs!  It's not clear to me how to
> design a system which both reflects how long jobs take and allows broke
> users to push low-priority jobs to try.

Just wanted to clarify this point first - my idea was to have the bids be a per-hour price. So you're also incented to make your jobs run faster.
> my idea was to have the bids be a per-hour price. So you're also incented to make your jobs run 
> faster.

Aha!  That's fair enough.
Product: mozilla.org → Release Engineering
We're not going to fix this in buildbot. Maybe we can come up with a fairer scheduler in Taskcluster.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.