Closed Bug 1398193 Opened 3 years ago Closed 3 years ago

Port base vs reference style-perf-tests into talos as subtests

Categories

(Testing :: Talos, enhancement)

Version 3
enhancement
Not set
normal

Tracking

(firefox58 fixed)

RESOLVED FIXED
mozilla58
Tracking Status
firefox58 --- fixed

People

(Reporter: rwood, Assigned: rwood)

References

Details

(Whiteboard: [PI:September])

Attachments

(3 files)

Now that stylo is merged, port over the style-perf-tests [1] into talos, and have them run as base vs reference i.e. bloom-basic vs bloom-basic-ref. This list [2] indicates which test page is compared with which reference page.

Currently talos can run base vs ref tests one at a time (i.e. the existing perf-reftest suite), but we want to land these as subtests - where each subtest is actually a base vs ref test. Support for that kind of subtest is being added in Bug 1397438.

Note: The style-perf-tests running stand-alone (i.e. not being compared with reference runs) are landing in the talos 'perf-reftest-singletons' test in Bug 1374750.

[1]https://github.com/heycam/style-perf-tests

[2]https://github.com/heycam/style-perf-tests/blob/master/perf-reftest/perf-reftest.list
Whiteboard: [PI:September]
The tests need to be split into multiple jobs as they run too long altogether. Here's the first set (12) on try [1]:

& http://localhost/tests/perf-reftest/bloom-basic.html, http://localhost/tests/perf-reftest/bloom-basic-ref.html
& http://localhost/tests/perf-reftest/bloom-basic-2.html, http://localhost/tests/perf-reftest/bloom-basic-ref.html
& http://localhost/tests/perf-reftest/coalesce-1.html, http://localhost/tests/perf-reftest/coalesce-ref.html
& http://localhost/tests/perf-reftest/coalesce-2.html, http://localhost/tests/perf-reftest/coalesce-ref.html
& http://localhost/tests/perf-reftest/style-sharing.html, http://localhost/tests/perf-reftest/style-sharing-ref.html
& http://localhost/tests/perf-reftest/style-sharing-style-attr.html, http://localhost/tests/perf-reftest/style-sharing-ref.html
& http://localhost/tests/perf-reftest/display-none-1.html, http://localhost/tests/perf-reftest/display-none-1-ref.html
& http://localhost/tests/perf-reftest/only-children-1.html, http://localhost/tests/perf-reftest/only-children-ref.html
& http://localhost/tests/perf-reftest/dep-check-1.html, http://localhost/tests/perf-reftest/dep-check-1-ref.html
& http://localhost/tests/perf-reftest/slow-selector-1.html, http://localhost/tests/perf-reftest/slow-selector-1-ref.html
& http://localhost/tests/perf-reftest/slow-selector-2.html, http://localhost/tests/perf-reftest/slow-selector-2-ref.html
& http://localhost/tests/perf-reftest/style-attr-1.html, http://localhost/tests/perf-reftest/style-attr-1-ref.html

Approx. duration to run this set/job:

Linux x64: 7 min
OSX 10.10: 7 min
Win 10: 8 min
Win 7: 12 min

:bholley, unless any objections I'm going to land this set as the 'perf-reftest' job 'p'. I'll need to add at least one more job (i.e.'p2') after for the remaining tests.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=73c83bd55ba4f8ed178e888141f6424bbd085748
Flags: needinfo?(bobbyholley)
FYI :bholley, looks like the stop-cascade-* tests crash on Win 7:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a78e82648cf76b6bb09c229d822a5bd233c49519&selectedJob=133848772

Trying to run as many tests as possible just to have more data re: individual test durations (re: comment 11)
It looks like the coalesce tests dominate the total time. I think it should be fine to drop the iterations there from 50000 to 5000, which should greatly speed up total test time. Can you PR that against cameron's repo, which will give him the opportunity to speak up if there was any particular reason he made it large?

Similarly, I think the stop-cascade tests could be tweaked. Right now setup time dominates, because we create an array of 1000 subtrees in order to make the effect noticeable. Instead, we could make a smaller array (say, 200), and then do more iterations within the measured section to toggle the class on/off, which should have the same effect of making the numbers large enough, while vastly saving on machine time.
Flags: needinfo?(bobbyholley)
Basically, if any of these tests are taking more than a hundred milliseconds or so per iteration, we can probably tweak the test to be more efficient, which will save us a lot of machine time over the long run.
Reduce coalesce iterations (as :bholley suggested in comment 15, thanks!).
Attachment #8913371 - Flags: review?(cam)
Reduce the duration of cascade 1 and 2
Attachment #8913391 - Flags: review?(cam)
(In reply to Robert Wood [:rwood] from comment #19)

> https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffeff79f676cd44ca8767cb3cad85dde99cfb472

This is the full test suite on try with the modifications I made to the coalesce and cascade tests (comment 17 and comment 18). This full suite is now running at:

Linux x64: 16 min
OS 10.10: 15 min
Win 10: 17 min
Win 7: doesn't finish, looks like a crash with stop-cascade-3

Still not an ideal duration for this job. Here's how long the tests take on Win 10 (tppagecycles 10):

bloom-basic.html 06:37:52, 06:38:00 => 0:08
bloom-basic-ref.html 06:38:00, 06:38:07 => 0:07
bloom-basic-2.html 06:38:08, 06:38:16 => 0:08
coalesce-1.html 06:38:25, 06:38:46 => 0:21
coalesce-ref.html 06:38:47, 06:39:07 => 0:20
coalesce-2.html 06:39:08, 06:39:29 => 0:21
style-sharing.html 06:39:51, 06:39:58 => 0:07
style-sharing-ref.html 06:39:58, 06:40:04 => 0:06
style-sharing-style-attr.html 06:40:04, 06:40:10 => 0:06
style-sharing-ref.html 06:40:11, 06:40:17 => 0:06
display-none-1.html 06:40:17, 06:40:26 => 0:09
display-none-1-ref.html 06:40:28, 06:40:33 => 0:05
nth-index-1.html 06:40:34, 06:40:58 => 0:24
nth-index-ref.html 06:40:59, 06:41:23 => 0:24
nth-index-2.html 06:41:24, 06:41:49 => 0:25
some-descendants-1.html 06:42:15, 06:42:34 => 0:19
some-descendants-1-ref.html 06:42:35, 06:42:55 => 0:20
only-children-1.html 06:42:56, 06:43:27 => 0:31
only-children-ref.html 06:43:27, 06:43:33 => 0:06
dep-check-1.html 06:43:36, 06:44:01 => 0:25
dep-check-1-ref.html 06:44:05, 06:44:13 => 0:08
slow-selector-1.html 06:44:14, 06:44:33 => 0:19
slow-selector-1-ref.html 06:44:33, 06:44:39 => 0:06
slow-selector-2.html 06:44:39, 06:44:58 => 0:19
slow-selector-2-ref.html 06:44:58, 06:45:04 => 0:06
style-attr-1.html 06:45:06, 06:45:28 => 0:22
style-attr-1-ref.html 06:45:28, 06:45:34 => 0:06
stop-cascade-1.html 06:45:36, 06:46:26 => 0:50
stop-cascade-ref.html 06:46:27, 06:46:33 => 0:06
stop-cascade-2.html 06:46:35, 06:47:29 => 0:54
stop-cascade-3.html 06:47:46, 06:51:54 => 4:08
stop-cascade-3-ref.html 06:51:54, 06:52:02 => 0:08

:bholley, I'm going to remove stop-cascade-3 since it crashes on win7, I'll file a bug for that issue. That will take about 4 minutes off the test jobs, that *might* make it doable as a single test job for this suite.
Flags: needinfo?(bobbyholley)
(In reply to Robert Wood [:rwood] from comment #20)
> :bholley, I'm going to remove stop-cascade-3 since it crashes on win7, I'll
> file a bug for that issue. That will take about 4 minutes off the test jobs,
> that *might* make it doable as a single test job for this suite.

Did you try making the modifications I proposed in comment 15 to the stop-cascade tests?
Flags: needinfo?(bobbyholley)
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #23)
> (In reply to Robert Wood [:rwood] from comment #20)
> > :bholley, I'm going to remove stop-cascade-3 since it crashes on win7, I'll
> > file a bug for that issue. That will take about 4 minutes off the test jobs,
> > that *might* make it doable as a single test job for this suite.
> 
> Did you try making the modifications I proposed in comment 15 to the
> stop-cascade tests?

Yessir, my pull request is in comment 18. I didn't change stop-cascade-3 though because it's different than the others and just has one 'root' and no loop to cycle through in the measurement part. Also it crashes on win7 anyway. Without it, the test suite is running at about 16 min on win7 so it looks good.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb6edd48a5962ec6a681f157edb1dd69bfe7453b
I further reduced cascade 1 and 2, removed stop-cascade-3* since it fails on win7, delinted and updated the PR above. Looks good on try.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb6edd48a5962ec6a681f157edb1dd69bfe7453b

Linux x64: 10 min
OSX 10.10: 11 min
Win 10: 13 min
Win 7: 16 min

I'd like to land this (if my perf-style-reftest updates in comment 17 and comment 18 are approved)... and leave the addition of stop-cascades-3* and any further potential optimizations/additions to possible future work.
(In reply to Robert Wood [:rwood] from comment #25)
> I'd like to land this (if my perf-style-reftest updates in comment 17 and
> comment 18 are approved)... and leave the addition of stop-cascades-3* and
> any further potential optimizations/additions to possible future work.

sgtm.
Comment on attachment 8913391 [details] [review]
https://github.com/heycam/style-perf-tests/pull/9

Comments to address on this one, in the PR.  Please cancel NI when it's been updated (since I don't always seem to get GitHub notifications).
Flags: needinfo?(rwood)
(In reply to Cameron McCormack (:heycam) (away 4 Oct) from comment #28)
> Comment on attachment 8913391 [details] [review]
> https://github.com/heycam/style-perf-tests/pull/9
> 
> Comments to address on this one, in the PR.  Please cancel NI when it's been
> updated (since I don't always seem to get GitHub notifications).

Thanks for the comments, I'll update the PR accordingly. I really don't understand these tests and prefer to leave future optimizations up to the test author, after this PR, thanks!
Flags: needinfo?(rwood)
(In reply to Robert Wood [:rwood] from comment #31)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=eba7a238f8a48e727dacbbc90992ff65937bfdce

Try run with latest test updates. Green, and job durations not bad:

Linux x64: 10 min, stylo-disabled 14 min
OS 10.10: 11 min, stylo-disabled 13 min
Win 7: 18 min, stylo-disabled 22 min
Win 10: 13 min, stylo-disabled 17 min
Comment on attachment 8914303 [details]
Bug 1398193 - Port base vs reference style-perf-tests into talos as subtests;

https://reviewboard.mozilla.org/r/185594/#review190782

awesome to see more tests
Attachment #8914303 - Flags: review?(jmaher) → review+
Attachment #8913391 - Flags: review?(cam) → review+
Thanks guys, landing this now
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/943ad1c9a59a
Port base vs reference style-perf-tests into talos as subtests; r=jmaher
I see an improvement in the summary number:
== Change summary for alert #9796 (as of October 03 2017 12:37 UTC) ==

Improvements:

 63%  perf_reftest summary windows7-32 opt e10s     10.57 -> 3.89

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=9796
https://hg.mozilla.org/mozilla-central/rev/943ad1c9a59a
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
You need to log in before you can comment on or make changes to this bug.