Closed Bug 798945 Opened 10 years ago Closed 8 years ago

Split up test suites that take longer than N minutes to run

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Unassigned)

References

Details

Long suite runtimes causes the following issues for sheriffs:
1) Turnaround time for retriggers is painful when bisecting (and wasteful, since most of the time we don't need to retrigger everything).
2) Log size for some of the large suites (eg m-oth) is excessive, causing TBPL to timeout more frequently when fetching/parsing logs.
3) Even in non-failure/bisection cases, it increases end-to-end time when not at max infra load.

Obviously we have to balance this out with the extra setup time caused by splitting suites up; but for tests taking up to 2 hours (in the case of win debug m-oth), the issues above outweigh this.

It makes sense to tackle bug 791389 first, but after that - a rough glance at TBPL shows other candidates for splitting up are:
* xpcshell - takes upto 85 mins on Win debug
* Android reftests - even with 4 chunks, they can take up to 70 mins each, so we could bump this to 5-6 chunks
* Android talos ts - takes up to 80 mins
Moved from bug 636546:

(In reply to Joel Maher (:jmaher) from comment #5)
> talos ts is down to 23 minutes once we roll out a new talos.
> 
> we should not bump up our count of chunks for android, I would like to
> decrease them if our stability increases.  Right now we have about 15
> minutes setup time for android, and reftests add 10 minutes of extra time
> per tegra that is uses.  We could reduce our time if we push to get bug
> 737961 fixed.  By fixing that, we would not need to change the resolution. 
> This saves two resolution changes/reboots per reftest job.

The talos changes sound great :-D

Good point about Android setup times, I was just sifting through some of the logs and was starting to think the same. We obviously still need to bear in mind end-to-end times when trying to bisect bustage, as it does hurt sheriffs (and we do have more tegra capacity then some other platforms at the moment) - but it's not quite as obvious a win.
Summary: Split up test suites that take longer than ~40 mins to run → Split up test suites that take longer than N minutes to run
Depends on: 814526
Whiteboard: [sheriff-want]
Depends on: 819963
This looks like a tracking bug. Ed, do you want to own this?
Priority: -- → P3
is there additional work to do on this bug?  We have recently cleaned up a lot of overall run times.
I'd say we're pretty close. The new Windows iX machines have helped a lot with the slow Win debug tests (eg xpcshell 75 mins -> 35 mins!), splitting mochitest-browser chrome made mochitest-other more bearable, and a few causes of runtime regressions have landed recently.

The only outstanding item IMO is bug 819963, given that browser-chrome is pretty much the long pole (eg 50-60 mins) on many runs post switch to Windows iX machines.
(In reply to Ed Morley [:edmorley UTC+1] from comment #4)
> pretty much the long pole (eg 50-60 mins)

make that 50-100 mins for debug
Product: mozilla.org → Release Engineering
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.