Closed Bug 1006551 Opened 5 years ago Closed 5 years ago

Add talos scroll test for tp5 pages

Categories

(Testing :: Talos, defect)

x86_64
Windows 8.1
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: avih, Assigned: avih)

References

Details

Attachments

(1 file, 2 obsolete files)

The current talos tscroll measures scroll performance over a set of 6 synthetic pages.

This bug is about scroll-testing the tp5 corpus such that we have more real-world numbers.

My current approach is:
1. Wait for the page to finish loading (during a tp5 run).
2. Scroll down 30 steps of 10px each (using rAF, and probably also APZC following bug 1006531).
3. Report average interval.

- The numbers 30/10 were chosen arbitrarily, but I feel they offer a good balance of usefulness, speed and noise (especially when executed over a big number of pages).

The results would then be collected into a new test name, maybe tp5-scroll? where each subtest corresponds to scroll-perf of one tp5 page.

One thing to consider here, is that for effective scroll perf results, we currently need ASAP mode (which might also carry for the APZC case), and if we want this to run during a tp5 run, it means tp5 would have to run in ASAP mode.

I don't _think_ ASAP would affect "normal" tp5 results much, if at all, but I can't predict it with 100% confidence.
Attached patch bug1006551.v1.WIP.patch (obsolete) — Splinter Review
This patch add a test, tp5os, which is a duplicate of tp5o, with the following differences:

- Adds scroll test for each page (takes less than a second to complete).
- Sets the (whole) test to run in ASAP mode.

Missing:
- Reformat the report to the console, capture the report and sending of the test result as a distinct test name.

It does print the scroll-test result to the console, but as a placeholder for a format which should probably be more in line with talos practices.

Also, while we can run it as a new test while tp5o is still being used, I think, if we could overcome the issues, that it should replace the tp5o run, and that this new tp5o test reports as two different tests:
1. tp5o (replicating the tp5o report and replacing the tp5o run)
2. tp5o-scrollTest (or another name. No preference on my side).

My only unknown here (apart from the technical issues I noted earlier) is that ASAP mode may produce the "tp5o" part of the result with somewhat different numbers.

If that happens, we'll need to think if we still want to run it instead of tp5o, or keep tp5o and make the new tp5os test only report the scroll results.

But this means that both tp5o and tp5os will load the same pages, only tp5os will also scroll them. Feels a bit of wasted resources to me.

Let me know how do you think we should proceed.
Attachment #8425948 - Flags: feedback?(jmaher)
Also, I did test (locally) that tscroll and tp5o didn't break from this patch.
Comment on attachment 8425948 [details] [diff] [review]
bug1006551.v1.WIP.patch

Review of attachment 8425948 [details] [diff] [review]:
-----------------------------------------------------------------

just a few comments, overall this is looking great.

::: talos/compare.py
@@ +42,5 @@
>  test_map['tsvgr_opacity'] = {'id': 225, 'tbplname': 'tsvgr_opacity'}
>  test_map['tresize'] = {'id': 254, 'tbplname': 'tresize'}
>  test_map['tp5n'] = {'id': 206, 'tbplname': 'tp5n_paint'}
>  test_map['tp5o'] = {'id': 255, 'tbplname': 'tp5o_paint'}
> +#placeholder test_map['tp5os'] = {'id': xxx, 'tbplname': 'tp5o_scroll ???'}

I think we can make this normal, I usually file these as good_first_bugs- I need to add ts_paint_cold as well :)

::: talos/page_load_test/scroll/scroll-test.js
@@ +97,5 @@
> +        if (opt_callback) {
> +          setTimeout(opt_callback, 0, result);
> +        } else {
> +          tpRecordTime(result);
> +        }

too bad we can't just set opt_callback to tpRecordTime for the scroll tests :)

::: talos/pageloader/chrome/pageloader.js
@@ +581,5 @@
>  
>    if (doRenderTest)
>      runRenderTest();
>  
> +  plTestScroll();

can we do a:
if (scrollTest)
  plTestScroll();

@@ +591,5 @@
> +  var next = plScrollTestDone;
> +  if (scrollTest)
> +    testScroll(content.contentWindow.wrappedJSObject, SCROLL_TEST_STEP_PX, next, SCROLL_TEST_NUM_STEPS);
> +  else
> +    next();

personally I would rather see plScrollTestDone than next.

@@ +595,5 @@
> +    next();
> +}
> +
> +function plScrollTestDone(result) {
> +  dumpLine("------------> scroll average interval: " + result);

if we call next(), dumpLine will not have anything- could that be confusing?
Attachment #8425948 - Flags: feedback?(jmaher) → feedback+
(In reply to Avi Halachmi (:avih) from comment #1)
> - Sets the (whole) test to run in ASAP mode.

Does this mean that regular tp5 will run in ASAP mode?
(In reply to Timothy Nikkel (:tn) from comment #4)
> Does this mean that regular tp5 will run in ASAP mode?

We're going to check first in a different test that running a test identical to the current tp5 (with all its derivatives) in ASAP mode provides identical/similar-enough/useful-enough (TBD) results compared to the current tp5, and if it does, then yes, that's what it means.

But we don't have to. We could also add a new test tp5-scroll as a stand-alone test, which tests only scroll on the tp5 pages.

The only incentive to combine/piggyback the new scroll test into tp5 is because we already launch all the tp5 pages in the tp5 test (and piggyback them with all the startup and paint tests beyond the original goal of page load time), so I hope to be able to piggyback also the scroll once the rest of the data per page was collected, therefore saving a lot of time per talos run.

But if it turns out to hurt the quality of the current tp5 results, then we won't.

Do you have reason to believe that running the current tp5 in ASAP will reduce the quality of the results?
(In reply to Avi Halachmi (:avih) from comment #5)
> ... so I hope to be able to piggyback also the scroll once the rest
> of the data per page was collected ...

And since changing ASAP mode requires restart, we can't have the tp5 parts in "normal" mode and the scroll parts in ASAP. So since scroll is really not useful without ASAP, then if we piggyback scroll onto tp5, we'll have to ASAP the whole run.
Tp5 seems to be a test designed to measured page load time as a user would see. So changing anything in that environment that users wouldn't have should be considered and justified. I think ASAP mode would materially change what we measure because when nodes are added to the document we handle them in batches. So that if we flush more frequently we will be handling more smaller batches (as opposed to fewer larger batches). This could have a significant perf impact as fewer larger batches are better. So I would support running them as separate tests.
Considering our limited bandwidth, I'm going to drop the optimization considerations about combining tp5o with the scroll test.

My new approach is as follows:

1. Modify the pageloader addon generically such that it can accept a new flag tpscrolltest (the current patch already does it).

2. If this flag is set, then instead of the reporting page load times for each page at the manifest, it will report the average frame interval after loading the page and scrolling it a bit (same methodology and shared code with tscrollx). Will require hopefully not many modifications from the existing patch.

3. Will add a new test tp5o_scroll which runs the same manifest/pages as tp5o, but with this tpscrolltest flag, and without the extra measurements which are typically associated with tp5o (various startup tests etc).


This means we're dropping for now:
- Trying to combine two tests into one run of the same pages (will require several pageloader addon modifications and enhanced parsing and reporting code).

What we're losing:
- Without this optimization, we'll have to run the whole tp5o page set twice: once to produce the "normal" tp5o numbers, and once to produce the scroll numbers. So longer tests run time, but nothing else otherwise.

So this should simplify the system a fair bit and will require overall less modifications and considerations, so it's more likely to land sooner.
Attached patch bug1006551.v2.patch (obsolete) — Splinter Review
This implements the approach from comment 8, and also addresses the review (where applicable) from comment 3. Overall it's a cleaner approach than piggybacking it onto the tp5o run.

TODO:

- Decide on parameters/filters. Currently it's standard-ish 25 pagecycles, median, and ignore-first: 1. But it will probably run a bit longer than tp5o, so if this is an issue, we might want to use less cycles or otherwise tune these parameters.

- I _think_ it should apply relatively cleanly to e10s, but I don't know how to make talos use e10s etc, so I didn't try to make it work without being able to test it. But if you think we should also make it available on e10s, then I'd need some help.

- The usual stuff: add the test and pages to the DBs, set the correct id on compare.py

- Address any other review comments.
Attachment #8425948 - Attachment is obsolete: true
Attachment #8439148 - Flags: review?(jmaher)
Comment on attachment 8439148 [details] [diff] [review]
bug1006551.v2.patch

Review of attachment 8439148 [details] [diff] [review]:
-----------------------------------------------------------------

overall this is a nice patch, thanks for writing it.  Please address my questions/nits here and land it!  After that we need to add graph server entries for tp5o_scroll.  If we want to do a subset of tests, lets figure it out now and we can build the tp5scroll.manifest file up.  I suspect 20 pages will be adequate.

::: talos/page_load_test/scroll/scroll-test.js
@@ +72,5 @@
>        durations.push(duration);
>        doScrollTick();
>  
> +      /* stop scrolling if we can't scroll more, or if we've reached requested number of steps */
> +      if (getPos() == lastScrollPos || opt_numSteps && durations.length >= (opt_numSteps + 2)) {

please add more parens here, this could be interpreted in various ways.  Do we know what opt_numSteps will be?

::: talos/pageloader/chrome/pageloader.js
@@ +52,5 @@
>  
>  var content;
>  
>  var TEST_DOES_OWN_TIMING = 1;
> +var EXECUTE_SCROLL_TEST  = 2;

can you put a comment here that these are binary flags?

@@ +489,5 @@
> +
> +    // Let the page settle down after its load event, then execute the scroll test.
> +    setTimeout(testScroll, 500,
> +               content.contentWindow.wrappedJSObject, SCROLL_TEST_STEP_PX,
> +               content.contentWindow.wrappedJSObject.tpRecordTime, SCROLL_TEST_NUM_STEPS);

setTimeout with 500ms delay, is that smart?  I have always passed params in via:
setTimeout(function () { testScroll(param1, param2, ...) }, 500);

I assume your method will work as well.

@@ +792,5 @@
> +      // own tpRecordTime and not the one from the scroll test).
> +      if (scrollTest && items[0].indexOf("%") < 0) {
> +        items.unshift("%");
> +        flags |= EXECUTE_SCROLL_TEST;
> +      }

do we need to do a flags |= TEST_DOES_OWN_TIMING as well?  the unshift adds the % to the line, but we might be able to make it simipler.  not a requirement
Attachment #8439148 - Flags: review?(jmaher) → review+
Thanks for the quick review.

(In reply to Joel Maher (:jmaher) from comment #10)
> ... If we want to do a subset of tests, lets figure it
> out now and we can build the tp5scroll.manifest file up.  I suspect 20 pages
> will be adequate.

I think we should stick to the tp5o page set. It was decided it's a good set to test page load times, let's derive from it that it's a good set for scroll test. We can play with the pagecycles - _if_ it turns out to consume more resources than we'd like. Which I think is an argument which could also use backing up.

> Do we know what opt_numSteps will be?

The patch uses 100 when the tpscrolltest flag is set.

> can you put a comment here that these are binary flags?

Sure.

> 
> @@ +489,5 @@
> > +
> > +    // Let the page settle down after its load event, then execute the scroll test.
> > +    setTimeout(testScroll, 500,
> > +               content.contentWindow.wrappedJSObject, SCROLL_TEST_STEP_PX,
> > +               content.contentWindow.wrappedJSObject.tpRecordTime, SCROLL_TEST_NUM_STEPS);
> 
> setTimeout with 500ms delay, is that smart?  I have always passed params in
> via:
> setTimeout(function () { testScroll(param1, param2, ...) }, 500);
> 
> I assume your method will work as well.

For our cause the two forms are identical, and IMO the form at the patch is cleaner.

The difference will be that at the patch the testScroll arguments are resolved before the setTimeout statement is executed, while with your version they're resolved once the lambda function callback is invoked. However, since none of those change between these two points in time, they're identical on this case.


> 
> @@ +792,5 @@
> > +      // own tpRecordTime and not the one from the scroll test).
> > +      if (scrollTest && items[0].indexOf("%") < 0) {
> > +        items.unshift("%");
> > +        flags |= EXECUTE_SCROLL_TEST;
> > +      }
> 
> do we need to do a flags |= TEST_DOES_OWN_TIMING as well?  the unshift adds
> the % to the line, but we might be able to make it simipler.  not a
> requirement

If you don't use this flag, then you'll need some other method to identify once the page loads that it needs to execute the scroll test.

But I'm open to any simpler solution which I missed. Just post your suggested code, and if it's simpler and still works I'll take it.
Carrying r+.

Addressed review notes from comment 10.

We've decided on IRC that instead of using less pages we'll use less page cycles, so reduced to 12 cycles while keeping the full tp5o page set (49 pages). The timeout invocation was agreed OK, as well as that the usage of the flag is necessary.

Thanks.

https://hg.mozilla.org/build/talos/rev/c3686483ae64
Attachment #8439148 - Attachment is obsolete: true
Attachment #8439429 - Flags: review+
Would you please pick it up with followup bugs as necessary to complete the DBs integrations etc?
Flags: needinfo?(jmaher)
Depends on: 1025013
started the graph server work in bug 1025013
Flags: needinfo?(jmaher)
Depends on: 1025059
Depends on: 1026579
Depends on: 1028999
Joel pushed a test run of this test on all desktop platforms - https://tbpl-dev.allizom.org/?tree=Try&rev=460d6981994d

I went over one full log from each platform and confirmed the results look good.

I think we're good to go production with this.
Joel added the template and I added the content at https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5o_scroll

If need for more docs arise, please set this keyword again.
Keywords: dev-doc-needed
Blocks: 771326
can we close this bug?
Yeah. It works well.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.