772458 - Try is extremely backed up

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Reporter

Description

•

12 years ago

Wait times for tests are closing in on 24 hrs.

Seems that Android and Windows are more backed up than the other platforms, but that's just anecdotal.

Ed Morley [:emorley]

Updated

•

12 years ago

Severity: normal → major

Aki Sasaki (not active)

Updated

•

12 years ago

Depends on: 771421, 771756

Mounir Lamouri (:mounir)

Comment 1

•

12 years ago

Do we have any idea of what is happening? Do people push more to try those days than before? Do we simply need more slaves?

It seems to be a critical issue for engineering: pushing to try takes a so ridiculous amount of time that it will whether reduce productivity or people will just push to m-i without waiting for full results.

Ed Morley [:emorley]

Comment 2

•

12 years ago

(In reply to Mounir Lamouri (:mounir) from comment #1)
> or people
> will just push to m-i without waiting for full results.

Which has already happened several times this week; with the ensuing layers of bustage made worse by high infra load on non-try trees and the coalescing that brings :-(

Chris Cooper [:coop] (he/him)

Comment 3

•

12 years ago

(In reply to Mounir Lamouri (:mounir) from comment #1)
> Do we have any idea of what is happening? Do people push more to try those
> days than before? Do we simply need more slaves?
> 
> It seems to be a critical issue for engineering: pushing to try takes a so
> ridiculous amount of time that it will whether reduce productivity or people
> will just push to m-i without waiting for full results.

One of the issues with test pool capacity is that *all* current tests run on Mac minis of various vintages. This is due to a historical notion that we wanted to be able to compare test results between different platforms/OSes on the same hardware. Apple's hardware rev cycle is aggressive, so we simply can't buy any more of these older rev minis any more. The existing pool capacity is static, modulo attrition via hardware failure. We can create some extra capacity on one platform only at the expense of another, e.g. stopping tests on 10.5 (bug 773120).

We no longer think these inter-platform comparisons are meaningful. Releng is extremely resource-constrained at present for setting up new hardware. We have an effort underway to refresh our test pool to newer non-Mac hardware (for non-Mac OSes), but this is blocked by getting test coverage setup for Win8 (bug 731280) and Mountain Lion (10.8) (bug 731278), platforms where we currently have no coverage at all.

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 764713, 775149, 773120, 750285

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 773331, 602949

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 725362

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 771508

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 775744

Ed Morley [:emorley]

Comment 4

•

12 years ago

We've just had 38 consecutive pushes (81 changesets) of bustage on inbound, since Try results are taking so long to come back, that people are pushing regardless. 

Are there any other quick wins that we can do here? eg: disabling platforms/tests on twigs that could do without them; or re-balancing the try vs non-try buildpool?

Looking at http://build.mozilla.org/builds/pending/pending.html shows that the Try linux compile pending count is always an order of magnitude higher than the others. Can we spare some more non-try linux builds for Try? The graph for non-try would imply there is capacity going unused that could be switched over perhaps?

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 774799

Ed Morley [:emorley]

Updated

•

12 years ago

No longer depends on: 725362

Chris Cooper [:coop] (he/him)

Updated

•

12 years ago

Component: Release Engineering → Release Engineering: Developer Tools

QA Contact: lsblakk

Whiteboard: [tryserver][buildduty][capacity]

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 5

•

12 years ago

bug#750285, bug#777037 track disabling a bunch of android builds/unittest/talos jobs which will help reduce android load. This is an interim solution while we wait for additional tegras to come online.

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 774424, 777037

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 777273

Patrick McManus [:mcmanus]

Comment 6

•

12 years ago

If this bug and its dependents were resolved fix, what would the expected turn around time for try be?

try used to a tremendously useful development tool. Now its pretty much just a pain.

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 767456

Ed Morley [:emorley]

Updated

•

12 years ago

Whiteboard: [tryserver][buildduty][capacity] → [tryserver][buildduty][capacity][sheriff-want]

Hal Wine [:hwine] use NI!

Updated

•

12 years ago

Depends on: 777521

Ed Morley [:emorley]

Updated

•

12 years ago

No longer depends on: 777521

Chris AtLee [:catlee]

Updated

•

12 years ago

Depends on: 765830

Ed Morley [:emorley]

Updated

•

12 years ago

No longer depends on: 765830

Jared Wein [:jaws] (please needinfo? me)

Comment 7

•

12 years ago

We can turn off tests for the UX branch (https://tbpl.mozilla.org/?tree=UX). I've been maintaining the branch for the past N months (doing daily merges between m-c and ux).

Devs sending their patch to UX branch can just run it through try server first and in total that will save some build resources since we have vastly more merges between m-c to ux then we do have checkins to ux.

It should also be noted that ux is a dead-end branch, which doesn't feed anywhere but is used for functional testing of new ux features.

Justin Wood (:Callek)

Updated

•

12 years ago

Depends on: 779419

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 737661

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 779784

Ed Morley [:emorley]

Updated

•

12 years ago

No longer depends on: 779784

bhearsum@mozilla.com (:bhearsum)

Updated

•

12 years ago

No longer depends on: 602949

Armen [:armenzg]

Updated

•

12 years ago

Depends on: 779921

Hal Wine [:hwine] use NI!

Updated

•

12 years ago

QA Contact: lsblakk → hwine

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 782627

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Updated

•

12 years ago

Depends on: 784681

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 8

•

12 years ago

Now that we're running android/b2g/nativefennec builds over on AWS, we're freeing up cycles on our linux32/linux64 machines. 

bug#784891 tracks converting a bunch of existing linux32/linux64/win32 physical ix builders into win64 builders. This will improve wait times for windows builds in both the production build pool and try build pool.

Depends on: 784891

Hal Wine [:hwine] use NI!

Updated

•

12 years ago

Depends on: 785056

Ed Morley [:emorley]

Comment 9

•

12 years ago

(In reply to Jared Wein [:jaws] from comment #7)
> We can turn off tests for the UX branch (https://tbpl.mozilla.org/?tree=UX).

Done in bug 779419.

bhearsum@mozilla.com (:bhearsum)

Comment 10

•

12 years ago

This isn't an acute issue, and thus not a buildduty concern.

Whiteboard: [tryserver][buildduty][capacity][sheriff-want] → [tryserver][capacity][sheriff-want]

Ed Morley [:emorley]

Updated

•

12 years ago

Keywords: sheriffing-P1

Whiteboard: [tryserver][capacity][sheriff-want] → [tryserver][capacity]

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: toodamnhigh!

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 691177

Ed Morley [:emorley]

Updated

•

12 years ago

Depends on: 847868

Chris AtLee [:catlee]

Comment 11

•

11 years ago

How are try turn around times these days?

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 12

•

11 years ago

:khuey: per dev.tree-management, we've been hitting consistently great wait times on builds and tests, across the board including Try. This is thanks to moving more jobs to AWS, reshuffling existing hardware inhouse, and turning off broken builds/tests. 

Any objections to closing this as FIXED?

Flags: needinfo?(khuey)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Reporter

Comment 13

•

11 years ago

Yeah I think we're doing pretty well these days.  Someone else can file a new bug if they have current issues.

Good job folks.

Status: NEW → RESOLVED

Closed: 11 years ago

Flags: needinfo?(khuey)

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Assignee

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

Nobody; OK to take it and work on it

Assignee

Updated

•

7 years ago

Component: Tools → General