Closed Bug 659328 Opened 9 years ago Closed 6 years ago

Merge talos suites that finish in less than 10 minutes to improve wait times

Categories

(Release Engineering :: General, defect, P4)

x86
All
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: armenzg, Unassigned)

References

()

Details

(Whiteboard: [buildfaster:p1]merged a11y & scroll into chrome - in production since Aug. 16th)

Attachments

(1 file, 2 obsolete files)

There are some test suites that take a very short time to run (few mins) and then they reboot which means that the machine is not running jobs while it is rebooting.

We should determine which suites could be merged together and be on the right balance of wait times VS end time.
Priority: -- → P2
Bug 586418 already did some analysis here, but got blocked on bug 594415.
How much time does it take from "reboot now" to "start next test suite"?
reboot time varies a bit between platforms...the fedora and windows boxes take around 4 minutes to reboot, osx boxes are a bit faster
I have taken the times for each test run and I have done some analysys.

My goal is not to have jobs that take less than 10mins to run but instead group them so they run within 10 and 30 minutes.

I have added conditional formatting to mark test suites taking less than 10 minutes ("Average Time spent running Tests") in purple.

There is a sheet showing a summary to see which suites take a short time (purple) and which ones have a bad ratio (setup time VS setup time). The latter I would like to investigate in a separate bug (bug 661585).

What do we gain when joining suites?
We save a "setup time" + "reboot time" per push. This not a substantial improvement but it's worth doing.									
										
On one side (from catlee's suggestion), we would like to move a11y and scroll into 'chrome', remove ts, twinopen from 'chrome', and add tp5.		

From my analysis, these are some changes I propose doing to begin with:
* merge crashtest and jsreftest
* merge a11y & scroll into 'chrome'
* merge svg and nochrome (or paint)
* suites m3, m4 & m5 finish significantly earlier - could we just have m3 & m4? (merge m5 into the other two)
** btw debug mochitests can take up to 10 times more time than optimized; is that expected?

How does all of this sound like?

I would like to start with just tackling anything below 10 minutes and then do a second pass once we have new data and discover how much work is involved for these few changes (I am thinking of tbpl).

This is from the spreadsheet's summary. These are suites that take on average less than 10mins to run.
This normally means that the ratio of setup time VS test run is also low
							
	    fed	f64	leo	sno	w7	xp					
a11	     x	 x	n/a	n/a	x	x
scroll	     x	 x	x	x	x	x
crashtest    x	 x	x	x	x	x
m5	     x	 x	x	x	x	x
jsreftest    x	 x	x	x	x	x					
nochrome     x	 x	x	x	x	x
m4	     x	 x			x	x					
m3	     x	 x	x	x		x					
debug_crash  x	 x	x	x	x					
m2	     x	 x	x	x	x	x					
svg	     x	 x	x	x	x	x					
paint	     x	 x	x	x	x	x					
reftest	     		x								
xpcshell			x
Putting on the side for now.
Bug 661585 has higher priority as it will give more noticeable improvements.
Assignee: armenzg → nobody
Blocks: 661585
Priority: P2 → P4
(In reply to comment #4)
> From my analysis, these are some changes I propose doing to begin with:
> * merge crashtest and jsreftest
> * merge a11y & scroll into 'chrome'

Let's start with these 2 items first.
Assignee: nobody → armenzg
Priority: P4 → P3
Summary: Determine suites to merge into one job → Merge test and talos suites that finish in less than 10 minutes to improve wait times
Whiteboard: [waittimes]
Also to disable ipcplugin in m-o.
You do mean "disable ipcplugins on 10.5 only," right?
Yes indeed! (mind -> bug dump fail!)
I'm in favour of combining short-running suites, but we still need to sort out the reporting side of that on TBPL.
Depends on: 594415
Is there actually a *third* bug, besides just this and bug 586418 (of which this is a no-question straight-up duplicate)? I know that somewhere I typed a comment explaining that if you want to get unblocked and gain some machine time, you just need to do the Talos combining separately, because absolutely nobody cares where you put a particular Talos test, because absolutely nobody knows where they are now. Even TryChooser just shrugs and says "go look at SUITES in config.py if you have to know." No need to post to newsgroups asking for permission, no need to patch tbpl, just push-and-reconfig and post that it's already done. You could even claim no need for review, since there's already a reviewed patch to do it in bug 586418, if not for the way that review's 9 months old.
To confirm what philor says; For instance, for talos suites it is not required any extra work unless it is a new suite (e.g. tp5).

Perhaps some TryChooser website/syntax needs to be adjusted.

You are right, it is a straight dupe of bug 586418 but let me take care of dupping it myself when I start working on it.
To get this detangled I will take care of just doing talos suite merges on this bug.

Merging unit test suites on bug 586418 will require a lot of work on bug 594415.

Formatting sucks in comment 4. Let me adjust it.
(In reply to comment #4)
>			
> 	    fed	f64 leo	sno w7 xp
> a11	     x	 x  n/a	n/a  x	x
> scroll     x	 x    x	  x  x	x
> nochrome   x	 x    x   x  x	x

> crashtest  x	 x    x   x  x	x
> m5	     x	 x    x   x  x	x
> jsreftest  x	 x    x   x  x	x
> m4	     x	 x   ok  ok  x	x
> m3	     x	 x    x   x ok  x
> d_crash   ok   x    x   x  x  x
> m2	     x	 x    x   x  x	x
> svg	     x	 x    x   x  x	x
> paint	     x	 x    x   x  x	x
> reftest   ok  ok    x	 ok ok ok
> xpcshell  ok  ok   ok   x ok ok
No longer depends on: 594415
Priority: P3 → P2
Summary: Merge test and talos suites that finish in less than 10 minutes to improve wait times → Merge talos suites that finish in less than 10 minutes to improve wait times
This is an unbitrotten version of attachment 473040 [details] [diff] [review].

This has not yet been tested. I had to do some adjustments with OLD_BRANCH support for 1.9.2.
Whiteboard: [waittimes] → [waittimes][buildfaster:p1]
Attachment #544315 - Attachment is obsolete: true
Attachment #552199 - Flags: review?(catlee)
I have decided to put together in this patch the work from bug 660124 as well.
Attachment #552199 - Attachment is obsolete: true
Attachment #552199 - Flags: review?(catlee)
Attachment #552468 - Flags: review?(jmaher)
Attachment #552468 - Flags: review?(catlee)
Duplicate of this bug: 660124
Comment on attachment 552468 [details] [diff] [review]
merge a11y and scroll into chrome suite & create chrome_mac for mac only & remove twinopen except 1.9.2

Review of attachment 552468 [details] [diff] [review]:
-----------------------------------------------------------------

this look pretty good.  Just a simple question below (probably lack of understanding of all the scripts).  Also will there be other patches on this bug to remove ts and txul(twinopen)?

::: mozilla/project_branches.py
@@ +12,5 @@
>              'tp': 0,
>              'chrome': 0,
>              'nochrome': 0,
>              'dromaeo': 0,
>              'svg': 0,

do we need to add chrome_twinopen and chrome_mac here?
Attachment #552468 - Flags: review?(jmaher) → review+
Comment on attachment 552468 [details] [diff] [review]
merge a11y and scroll into chrome suite & create chrome_mac for mac only & remove twinopen except 1.9.2

Review of attachment 552468 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla-tests/config.py
@@ +137,5 @@
>  
>  SUITES = {
>      'chrome': {
>          'enable_by_default': True,
> +        'suites': GRAPH_CONFIG + ['--activeTests', 'tsscroll:a11y:ts:tdhtml:tsspider'],

this should be 'tscroll' I believe, not 'tsscroll' same below
Attachment #552468 - Flags: review?(catlee) → review+
Comment on attachment 552468 [details] [diff] [review]
merge a11y and scroll into chrome suite & create chrome_mac for mac only & remove twinopen except 1.9.2

Landed on default:
http://hg.mozilla.org/build/buildbot-configs/rev/dba3f0b5e54e

Addressed all issues including not disabling "chrome" for accessibility branch which needed to run the "a11y" suite.
Attachment #552468 - Flags: checked-in+
Landed in production yesterday:
http://hg.mozilla.org/build/buildbot-configs/rev/3f14385485dc

I will get new numbers tomorrow and measure what's next.
Whiteboard: [waittimes][buildfaster:p1] → [buildfaster:p1]merged a11y & scroll into chrome - in production since Aug. 16th
Depends on: 682686
Depends on: 682601
Priority: P2 → P3
Priority: P3 → P4
Assignee: armenzg → nobody
Product: mozilla.org → Release Engineering
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.