1397438 - Add subtests support for talos base vs ref pageloader tests

Assignee

Description

•

7 years ago

Currently in talos we support having base vs ref pageloader tests, for example bloom_basic. In this scenario, the pageloader test is run on the base page (i.e. bloom-basic) followed by the reference page (bloom-basic-ref), and ultimately the test result is reported as the difference between those two. This talos test type is specified in test.py with the "base_vs_ref=True" flag.

This is done by having the base url listed in the test manifest first, followed by the reference URL. This is passed into the pageloader add-on and pageloader sees it as two subtests; then back in talos we process the results and compare them,  and change the test output such that it reports a single test being run - with the results replicates actually being the difference between the two.

This works great for single tests, but doesn't support subtests. Add support to talos so that we can run base_vs_ref tests but have multiple of those in one manifest; so they are run and reported as subtests. Each subtest result will be the difference of the corresponding base vs ref for that subtest.

This requires test manifest changes to be able to list multiple base vs ref test pages, pageloader add-on changes to process and run tests accordingly, and talos changes to process and report corresponding results.

Robert Wood [:rwood]

Assignee

Comment 1

•

7 years ago

For the test manifest I would suggest something like this:

% http://localhost/tests/perf-reftest/bloom-basic.html vs http://localhost/tests/perf-reftest/bloom-basic-ref.html

And the ability to have multiple lines of those; each of those being one subtest. The pageloader addon can detect the 'vs' keyword and know how to run the tests accordingly (no need to set another tp pageloader command line arg).

Joel Maher ( :jmaher ) (UTC -8)

Comment 2

•

7 years ago

I think:
& http://localhost/tests/perf-reftest/bloom-basic.html, http://localhost/tests/perf-reftest/bloom-basic-ref.html

this would assume the '&' character will compare the two files split by a ','.


as for the output we just need to have pageloader output the diff in this case and the resulting parsing will not need to be adjusted.

I can imagine the work to load two pages in pageloader, then aggregate the times will be a slight bit of work, is this something that you think would take a couple days to write code for?

Before doing this, we should maybe mock up what some results would look like

Robert Wood [:rwood]

Assignee

Comment 3

•

7 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #2)
> I think:
> & http://localhost/tests/perf-reftest/bloom-basic.html,
> http://localhost/tests/perf-reftest/bloom-basic-ref.html
> 
> this would assume the '&' character will compare the two files split by a
> ','.

Ok, as long as all the tests will always provide their own measurements i.e. current manifest entries start with '%' vs without.

> as for the output we just need to have pageloader output the diff in this
> case and the resulting parsing will not need to be adjusted.
> 
> I can imagine the work to load two pages in pageloader, then aggregate the
> times will be a slight bit of work, is this something that you think would
> take a couple days to write code for?

Yep I would think so.

> Before doing this, we should maybe mock up what some results would look like

Ok, will do.

Robert Wood [:rwood]

Assignee

Comment 4

•

7 years ago

Attached file perf-reftest-mock-local.json — Details

Mock-up of what talos output would look like for perf-reftest suite if there is just one main test (perf_reftest) with all the tests as subtests; and each subtest is actually a base vs ref test. The mock-up just has 3 subtests just to make it shorter.

This is what the current output looks like for per-reftest (i.e. with 'base_replicates', 'ref_replicates', and 'replicates' being the difference) but added as subtests.

Attachment #8905640 - Flags: feedback?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 5

•

7 years ago

Comment on attachment 8905640 [details]
perf-reftest-mock-local.json

this looks right, hard to tell without real data, luckily I just spent a while this morning looking at raw perfherder_data json blobs and comparing them in detail, so this was easy to 'visualize'

Attachment #8905640 - Flags: feedback?(jmaher) → feedback+

Robert Wood [:rwood]

Assignee

Updated

•

7 years ago

Whiteboard: [PI:September]

Robert Wood [:rwood]

Assignee

Updated

•

7 years ago

No longer blocks: 1374750

Robert Wood [:rwood]

Assignee

Updated

•

7 years ago

Blocks: 1398193

Comment hidden (mozreview-request)

***

Review commit: https://reviewboard.mozilla.org/r/180972/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/180972/

Robert Wood [:rwood]

Assignee

Comment 7

•

7 years ago

Attached file actual_local.json — Details

local.json example as output from the attached patch

Comment hidden (mozreview-request)

Comment on attachment 8909497 [details]
Bug 1397438 - Add subtests support for talos base vs ref pageloader tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/180972/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8909497 [details]
Bug 1397438 - Add subtests support for talos base vs ref pageloader tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/180972/diff/2-3/

Robert Wood [:rwood]

Assignee

Comment 10

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0502ab14ff8f197f12bc183f8eaa8c593e9612d3

Joel Maher ( :jmaher ) (UTC -8)

Comment 11

•

7 years ago

mozreview-review

Comment on attachment 8909497 [details]
Bug 1397438 - Add subtests support for talos base vs ref pageloader tests;

https://reviewboard.mozilla.org/r/180972/#review186500

overall this is really nice to see and I think close to an r+, a list of small things I have called out.

::: testing/talos/talos/pageloader/chrome/pageloader.js:973
(Diff revision 3)
> +        // be the comparison values of those two pages; more than one line will result in base vs ref subtests
> +        if (items[0].indexOf("&") != -1) {
> +          baseVsRef = true;
> +          flags |= TEST_DOES_OWN_TIMING;
> +          var urlspecBase = items[1].slice(0, -1);
> +          var urlspecRef = items[2];

why do we slice the base and not the ref?  I assume the slide is for the , ?  If so, please document that in a brief comment.

::: testing/talos/talos/pageloader/chrome/pageloader.js:975
(Diff revision 3)
> +          baseVsRef = true;
> +          flags |= TEST_DOES_OWN_TIMING;
> +          var urlspecBase = items[1].slice(0, -1);
> +          var urlspecRef = items[2];
> +        } else {
> +          dumpLine("tp: Error - unknown manifest format!");

I would like to output the invalid line as well so the error message makes the failure more actionable.

::: testing/talos/talos/pageloader/chrome/pageloader.js:1003
(Diff revision 3)
> +
> +        url = gIOS.newURI(urlspecRef, null, manifestUri);
> +        if (pageFilterRegexp && !pageFilterRegexp.test(url.spec))
> +          continue;
> +        var pre = 'ref_page_'+ baseVsRefIndex + '_';
> +        d.push({ url, flags, pre });

we do d.push(...), but what is d and how does the 3rd value in the json get handled?

::: testing/talos/talos/run_tests.py:358
(Diff revision 3)
> +    # each set of two results is actually a base test followed by the
> +    # reference test; we want to go through each set of base vs reference
> +    for x in range(0, len(base_and_reference_results.results[0].results), 2):
> +
> +        # separate the 'base' and 'reference' result run values
> +        base_result_runs = base_and_reference_results.results[0].results[x]['runs']

this would be easier to read if we had:
results = base_and_reference_results.results[0].results

then:
base_result_runs = results[x]['runs']

::: testing/talos/talos/run_tests.py:362
(Diff revision 3)
> +        # separate the 'base' and 'reference' result run values
> +        base_result_runs = base_and_reference_results.results[0].results[x]['runs']
> +        ref_result_runs = base_and_reference_results.results[0].results[x + 1]['runs']
> +
> +        # for the subtest name, use the name of the base page
> +        sub_test_name = base_and_reference_results.results[0].results[x]['page']

could we verify that base_and_reference_results.results[0].results[x]['page'] == base_and_reference_results.results[0].results[x+1]['page']

::: testing/talos/talos/run_tests.py:373
(Diff revision 3)
> -                                                 'base_runs': base_result_runs,
> +                                                     'base_runs': base_result_runs,
> -                                                 'ref_runs': ref_result_runs})
> +                                                     'ref_runs': ref_result_runs})
>  
> -    # now step thru each result, compare 'base' vs 'ref', and store the difference in 'runs'
> +        # now step thru each result, compare 'base' vs 'ref', and store the difference in 'runs'
> -    _index = 0
> +        _index = 0
> -    for next_ref in comparison_result.results[0].results[0]['ref_runs']:
> +        for next_ref in comparison_result.results[0].results[subtest_index]['ref_runs']:

similar for this section as above:
results = comparison_result.results[0].results

Attachment #8909497 - Flags: review?(jmaher) → review-

Robert Wood [:rwood]

Assignee

Comment 12

•

7 years ago

mozreview-review-reply

Comment on attachment 8909497 [details]
Bug 1397438 - Add subtests support for talos base vs ref pageloader tests;

https://reviewboard.mozilla.org/r/180972/#review186500

> why do we slice the base and not the ref?  I assume the slide is for the , ?  If so, please document that in a brief comment.

yes exactly, ok done

> we do d.push(...), but what is d and how does the 3rd value in the json get handled?

'd' is a carry-over from existing pageloader code, defined at the top of function plLoadURLsFromURI; I'll change the name and update the comments more. The third value ('pre') is used in function plRecordTime; it is needed to differentiate the pagename for base vs ref type tests. When recording results / time values, we are basing it all on the pagename to be the unique key in the results. In the case of base vs ref then I add a 'pre' value so the pagename is unique, even when using the same test page as a reference page more than once in the same suite. I'll add more comments.

> could we verify that base_and_reference_results.results[0].results[x]['page'] == base_and_reference_results.results[0].results[x+1]['page']

No, because those are two different pages; the first is the base test page i.e. 'bloom-basic.html' and the second (x+1) is the reference test page i.e. 'bloom-basic-ref.html'. For the results (difference between both) subtest name, we want to use the base page name.

Comment hidden (mozreview-request)

Comment on attachment 8909497 [details]
Bug 1397438 - Add subtests support for talos base vs ref pageloader tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/180972/diff/3-4/

Joel Maher ( :jmaher ) (UTC -8)

Comment 14

•

7 years ago

mozreview-review

Comment on attachment 8909497 [details]
Bug 1397438 - Add subtests support for talos base vs ref pageloader tests;

https://reviewboard.mozilla.org/r/180972/#review186648

great stuff!

Attachment #8909497 - Flags: review?(jmaher) → review+

Robert Wood [:rwood]

Assignee

Comment 15

•

7 years ago

Thanks for the review!

Along with the base vs ref subtest support this patch includes bloom-basic in this new subtest format, and the addition of bloom-basic-2 and coalesce-1. Bug 1398193 will be used to land the rest of the tests (and ensure the tests are green, fast enough, etc. first).

Updated try run:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6ed1fe957f94c1df6a8179bf041aaef97e6c639c

:bholley, do the base vs ref style-perf-tests require 25 iterations each, or is it ok to reduce them to 15 cycles like we did for the perf-reftest-singletons?

Flags: needinfo?(bobbyholley)

Bobby Holley (:bholley)

Comment 16

•

7 years ago

(In reply to Robert Wood [:rwood] from comment #15)
> Thanks for the review!
> 
> Along with the base vs ref subtest support this patch includes bloom-basic
> in this new subtest format, and the addition of bloom-basic-2 and
> coalesce-1. Bug 1398193 will be used to land the rest of the tests (and
> ensure the tests are green, fast enough, etc. first).
> 
> Updated try run:
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=6ed1fe957f94c1df6a8179bf041aaef97e6c639c
> 
> :bholley, do the base vs ref style-perf-tests require 25 iterations each, or
> is it ok to reduce them to 15 cycles like we did for the
> perf-reftest-singletons?

That's fine. Fewer iterations may also be fine, realistically. With the base-vs-ref tests, we're looking for big differences, not small ones.

Flags: needinfo?(bobbyholley)

Robert Wood [:rwood]

Assignee

Comment 17

•

7 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #16)
...
> 
> That's fine. Fewer iterations may also be fine, realistically. With the
> base-vs-ref tests, we're looking for big differences, not small ones.

Ok great, I'll do some try runs at 10 iterations. The current alert threshold right now is 5% - do you want to increase that to only alert on larger regressions, or is the 5% threshold fine?

Flags: needinfo?(bobbyholley)

Bobby Holley (:bholley)

Comment 18

•

7 years ago

(In reply to Robert Wood [:rwood] from comment #17)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #16)
> ...
> > 
> > That's fine. Fewer iterations may also be fine, realistically. With the
> > base-vs-ref tests, we're looking for big differences, not small ones.
> 
> Ok great, I'll do some try runs at 10 iterations. The current alert
> threshold right now is 5% - do you want to increase that to only alert on
> larger regressions, or is the 5% threshold fine?

Is that 5% difference between the testcases, or 5% change in the difference?

In general keeping the threshold precise where possible seems like a good starting point, we can always bump it if it generates spurious alerts.

Flags: needinfo?(bobbyholley)

Robert Wood [:rwood]

Assignee

Comment 19

•

7 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #18)
...
> 
> Is that 5% difference between the testcases, or 5% change in the difference?
> 
> In general keeping the threshold precise where possible seems like a good
> starting point, we can always bump it if it generates spurious alerts.

5% change in the difference itself (i.e. 5% change in the overall perf-reftest suite result). I'll leave it at 5%.

Comment hidden (mozreview-request)

Comment on attachment 8909497 [details]
Bug 1397438 - Add subtests support for talos base vs ref pageloader tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/180972/diff/4-5/

Robert Wood [:rwood]

Assignee

Comment 21

•

7 years ago

Try run with 10 iterations/tppagecycles:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f7f0904f1bd801d43a134a0e4ccaa2a83c559d78

Pulsebot

Comment 22

•

7 years ago

Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fc352698b470
Add subtests support for talos base vs ref pageloader tests; r=jmaher

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 23

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/fc352698b470

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox57: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

Ryan VanderMeulen [:RyanVM]

Comment 24

•

7 years ago

Backed out for permafailing talos perf-reftest-stylo-disabled-e10s as shown in the logs below.
https://hg.mozilla.org/mozilla-central/rev/47f7b6c64265bc7bdd22eef7ab71abc97cf3f8bf

https://treeherder.mozilla.org/logviewer.html#?job_id=132317866&repo=mozilla-central

Status: RESOLVED → REOPENED

status-firefox57: fixed → affected

Flags: needinfo?(rwood)

Resolution: FIXED → ---

Target Milestone: mozilla57 → ---

Robert Wood [:rwood]

Assignee

Comment 25

•

7 years ago

Ugh, apologies, thanks for the backout

Flags: needinfo?(rwood)

Robert Wood [:rwood]

Assignee

Updated

•

7 years ago

Attachment #8909497 - Attachment is obsolete: true

Comment hidden (mozreview-request)

Comment on attachment 8909497 [details]
Bug 1397438 - Add subtests support for talos base vs ref pageloader tests;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/180972/diff/5-6/

Robert Wood [:rwood]

Assignee

Comment 27

•

7 years ago

Comment on attachment 8909497 [details]
Bug 1397438 - Add subtests support for talos base vs ref pageloader tests;

Updated patch after backout (missed an entry for the no longer existing 'bloom-basic' in talos.json for sytlo-disabled). Also rebased so there's a couple changes from grabbing the latest.

Attachment #8909497 - Flags: review+ → review?

Robert Wood [:rwood]

Assignee

Updated

•

7 years ago

Attachment #8909497 - Flags: review? → review?(jmaher)

Robert Wood [:rwood]

Assignee

Comment 28

•

7 years ago

Looking good on try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d2bb036e1289072ad33e68109590959e447ef1e8

Joel Maher ( :jmaher ) (UTC -8)

Comment 29

•

7 years ago

mozreview-review

Comment on attachment 8909497 [details]
Bug 1397438 - Add subtests support for talos base vs ref pageloader tests;

https://reviewboard.mozilla.org/r/180972/#review187584

still looks great

Attachment #8909497 - Flags: review?(jmaher) → review+

Pulsebot

Comment 30

•

7 years ago

Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/87ffa54a5436
Add subtests support for talos base vs ref pageloader tests; r=jmaher

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 31

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/87ffa54a5436

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

status-firefox58: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

Sylvestre Ledru [:Sylvestre]

Updated

•

7 years ago

status-firefox57: affected → ---

perf-reftest-mock-local.json 7 years ago Robert Wood [:rwood] 1.55 KB, application/json	jmaher : feedback+	Details
Bug 1397438 - Add subtests support for talos base vs ref pageloader tests; 7 years ago Robert Wood [:rwood] 59 bytes, text/x-review-board-request	jmaher : review+	Details
actual_local.json 7 years ago Robert Wood [:rwood] 3.26 KB, application/json		Details