1261152 - Ensure tps is treating e10s fairly

Assignee

Description

•

8 years ago

The tps test is supposed to measure how long it takes to switch tabs in the browser, and present the content of those tabs to the user. It loads up a bunch of sites from the tp5 set, waits for them all to finish loading, and then switches to each of those tabs, one by one.

Here's a profile I recently gathered from an e10s run on Windows:

http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/c44bd51496971b55c413c5c923bf8b6c2c8f2fdcd915a20851d21e9fb819560a995dfc2e9a0b0cbc6f608655a62dfbd4a06199666683507c7974d2d0678c0c42#report=29db309b072a460b01461c67a659fad5cee2d0fd&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A80290,%22end%22%3A80807%7D,%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A80513,%22end%22%3A80570%7D%5D&selection=0,4700,6,7,4701,4702,4,5,6,7,4701,8,9,10,11,12,13,14,15,16,17,4861,4862,4863,4864,4865,287,288,289,290,291,292,293,294,295,296


One thing that strikes me is that it looks like content was done painting at Rasterize #45, and that the next composite is Composite #516 - and that's when the web content should have been displayed to the user.

Instead, we wait a while - we wait for another vsync before we mark the test as done.

I think this test is slightly flawed here - we shouldn't be waiting for that extra vsync.

Mike Conley (:mconley) (:⚙️)

Assignee

Updated

•

8 years ago

Assignee: nobody → mconley

Tracy Walker [:tracy]

Updated

•

8 years ago

Blocks: e10s-tests

tracking-e10s: --- → +

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 1

•

8 years ago

I have more details. Here's a snippet of a tps run on a Linux 64 machine:

http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try/sha512/e4e9f40b66605cb4cff3af7bddb8f8574e5b9215357a696a69c6ac1f4c6c0179b57be480fd395e1588ad577bcdfb7fd609a927342bc76cdc6fabe5958a9b3454#report=72c5d0ad920d1f09457659de33518be2f2d8fd4d&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A93860,%22end%22%3A102630%7D,%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A94760,%22end%22%3A96513%7D,%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A94799,%22end%22%3A94945%7D,%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A94828,%22end%22%3A94901%7D%5D

At 94830, we start the tab switch, and the paint is kicked off in the content process.

That paint completes around 94858. A Vsync tick causes a second, shorter paint to occur (I'm not 100% sure this paint is necessary, but there it is) - that paint goes from 94859 to 94862.

The parent, meanwhile, has heard that the layers are ready from (presumably) that first paint that occurred.

At around 94862, the parent hears both a MozAfterPaint in the parent (since a composite occurred not long ago), as well as the message that the layers are ready.

At that point, the async tab switcher flips the tabpanel deck to reveal the newly painted browser content.

We then wait 12ms for the next composite, which _should_, I believe, show the web content.

The composite occurs, and at 94876, the TabChild dispatches its own MozAfterPaint since the tab has finally been presented to the user.

Here's the kicker - then we wait a whole 16ms before stopping the tps stopwatch.

I suspect tps needs to be modified to pay attention to the MozAfterPaint events that are being dispatched by the TabChild, instead of the requestAnimationFrame stuff it's currently doing.

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 2

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=056d696dd67c

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 3

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e8211e5d7d52

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 4

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b5615c73476c

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 5

•

8 years ago

Attached file MozReview Request: Bug 1261152 - Make TPS more accurate using MozAfterPaint with layer transaction ids. r?jmaher — Details

Review commit: https://reviewboard.mozilla.org/r/46803/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/46803/

Attachment #8741868 - Flags: review?(jmaher)

Attachment #8741869 - Flags: review?(jmaher)

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 6

•

8 years ago

Attached file MozReview Request: Bug 1261152 - Keep the content process alive between tps runs. r?jmaher — Details

Review commit: https://reviewboard.mozilla.org/r/46805/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/46805/

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 7

•

8 years ago

Hey Joel - I think this is going to address our tps "regression". It turns out this isn't really a regression - we're just measuring wrong.

e10s actually appears to perform better than non-e10s in this test with the first patch:

https://treeherder.mozilla.org/perf.html#/e10s?repo=try&revision=fba2a2487c10

The second patch is hopefully going to address the OS X regression here, since it keeps the content process alive between cycles of the test. I think the reason OS X is regressing is because of the asian character sets that are being cached in the single-process case, but in the multi-process case, because the content process shuts down between cycles, no cache exists.

Try build with both patches is here:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=09910a98365f9982fc9331b4b44fb72eb716fddd

Depends on: 1264409

Chris Peterson [:cpeterson]

Updated

•

8 years ago

Blocks: 1251547

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

8 years ago

Attachment #8741869 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 8

•

8 years ago

Comment on attachment 8741869 [details]
MozReview Request: Bug 1261152 - Keep the content process alive between tps runs. r?jmaher

https://reviewboard.mozilla.org/r/46805/#review43401

one issue, It appears we still look for data.times... in the code that receives the data.

::: testing/talos/talos/tests/tabswitch/bootstrap.js:437
(Diff revision 1)
>      if (pref)
>        Services.prefs.setBoolPref("browser.tabs.warnOnCloseOtherTabs", pref);
> -    Services.obs.notifyObservers(win, 'tabswitch-test-results', JSON.stringify({'times': times, 'urls': testURLs}));
> +
> +    remotePage.sendAsyncMessage("tabswitch-test-results", {
> +      times,
> +      urls: testURLs,

previously you had a structure of:
\{'times': value,
 'urls': value\}
 
 now you have a similar structure, just there is nothing labeled 'times'

Joel Maher ( :jmaher ) (UTC -8)

Comment 9

•

8 years ago

Comment on attachment 8741868 [details]
MozReview Request: Bug 1261152 - Make TPS more accurate using MozAfterPaint with layer transaction ids. r?jmaher

https://reviewboard.mozilla.org/r/46803/#review43405

::: testing/talos/talos/tests/tabswitch/bootstrap.js:407
(Diff revision 1)
> +    let times = [];
> +
> +    for (let tab of tabs) {
> +      yield forceGC(win, tab.linkedBrowser);
> +      let time = yield switchToTab(tab);
> +      dump(`${tab.linkedBrowser.currentURI.spec}: ${time}ms\n`);

is this dump seen in automation?  I would think it is not visible.

Attachment #8741868 - Flags: review?(jmaher) → review+

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 10

•

8 years ago

(In reply to Joel Maher (:jmaher) from comment #8)
> Comment on attachment 8741869 [details]
> MozReview Request: Bug 1261152 - Keep the content process alive between tps
> runs. r?jmaher
> 
> https://reviewboard.mozilla.org/r/46805/#review43401
> 
> one issue, It appears we still look for data.times... in the code that
> receives the data.
> 
> ::: testing/talos/talos/tests/tabswitch/bootstrap.js:437
> (Diff revision 1)
> >      if (pref)
> >        Services.prefs.setBoolPref("browser.tabs.warnOnCloseOtherTabs", pref);
> > -    Services.obs.notifyObservers(win, 'tabswitch-test-results', JSON.stringify({'times': times, 'urls': testURLs}));
> > +
> > +    remotePage.sendAsyncMessage("tabswitch-test-results", {
> > +      times,
> > +      urls: testURLs,
> 
> previously you had a structure of:
> \{'times': value,
>  'urls': value\}
>  
>  now you have a similar structure, just there is nothing labeled 'times'

Ah, this some ES6 goodness.

Specifically, this is a shorthand property name. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Object_initializer#New_notations_in_ECMAScript_2015

Essentially, given some local variable named times and testURLs:

{ times: times, urls: testURLs }

is equivalent to

{ times, urls: testURLs }

Joel Maher ( :jmaher ) (UTC -8)

Comment 11

•

8 years ago

nice, then ship it!

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 12

•

8 years ago

(In reply to Joel Maher (:jmaher) from comment #9)
> Comment on attachment 8741868 [details]
> MozReview Request: Bug 1261152 - Make TPS more accurate using MozAfterPaint
> with layer transaction ids. r?jmaher
> 
> https://reviewboard.mozilla.org/r/46803/#review43405
> 
> ::: testing/talos/talos/tests/tabswitch/bootstrap.js:407
> (Diff revision 1)
> > +    let times = [];
> > +
> > +    for (let tab of tabs) {
> > +      yield forceGC(win, tab.linkedBrowser);
> > +      let time = yield switchToTab(tab);
> > +      dump(`${tab.linkedBrowser.currentURI.spec}: ${time}ms\n`);
> 
> is this dump seen in automation?  I would think it is not visible.

It is, yes - you can see these in the test log.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

8 years ago

Attachment #8741869 - Flags: review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 13

•

8 years ago

Comment on attachment 8741869 [details]
MozReview Request: Bug 1261152 - Keep the content process alive between tps runs. r?jmaher

https://reviewboard.mozilla.org/r/46805/#review43409

Pulsebot

Comment 14

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/d7d96b820538
https://hg.mozilla.org/integration/mozilla-inbound/rev/44cd41f66946

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 15

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/dcd51036ec228ea2ff054c3e9050b4e87e604b24
Bug 1261152 - Follow-up: bump version, repack and sign tps add-on. r=jmaher

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 16

•

8 years ago

So this seems to have put a pretty dramatic bullet through the tps regression.

non-e10s has gone up. e10s has gone down.

In case people are worried that I've somehow unfairly shifted the test in e10s's favour, here are some profiles I took from my local machine from a single run:

non-e10s: https://cleopatra.io/#report=c77d23d2314b897b2eed8c85b1337d47103711b8&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A48038,%22end%22%3A75017%7D%5D

e10s: https://cleopatra.io/#report=c835c4853b5832ce6162415ff5e95ba1166ec479&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A46761,%22end%22%3A90595%7D%5D

Here's a little snippet of a single tab switch to look at, which I'll discuss next:

non-e10s: https://cleopatra.io/#report=c77d23d2314b897b2eed8c85b1337d47103711b8&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A48038,%22end%22%3A75017%7D,%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A48038,%22end%22%3A48734%7D,%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A48110,%22end%22%3A48241%7D%5D

e10s: https://cleopatra.io/#report=c835c4853b5832ce6162415ff5e95ba1166ec479&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A46761,%22end%22%3A90595%7D,%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A46863,%22end%22%3A47980%7D,%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A47014,%22end%22%3A47101%7D%5D

Notice in the non-e10s snippet, the big paint goes from time 48128 all the way until around 48190, taking approximately ~62ms. The next composite finishes at 48213, and the MozAfterPaint event for it is serviced at 48218, which ends the test with a time of about ~106ms.

In the e10s snippet, the paint gets to start a _lot_ sooner, at 47019 in the content process. That paint lasts until around 47075, lasting about ~55ms. This is likely faster than non-e10s because we don't have to worry about painting any of the browser UI. I believe the composite with that web content finishes around 47078, but it's 15ms later that the MozAfterPaint event finally has a chance to be serviced, and we can stop the stopwatch for a total time of ~77ms.

There's room for improvement in terms of the accuracy of the measurement in both cases. Specifically, if we were to fix bug 1264798, we wouldn't need to care about when the MozAfterPaint event was serviced - we'd be able to get the time that the pixels were presented to the user with a very high-degree of precision.

Katsuki

Comment 17

•

8 years ago

In my netbook, when I open heavy javascript page on a tab and before this tab finished loading / rendering / completing javascript execution, switch to another tabs make spinner on middle page appear for quite a while. This make switching tab experience worse than non e10s. Is this test include heavy javascript page? Or Is this test account for switching tabs before another tab completely loaded or rendered? Or Is it tested on another talos test? Thank you.

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 18

•

8 years ago

(In reply to Katsuki from comment #17)
> In my netbook, when I open heavy javascript page on a tab and before this
> tab finished loading / rendering / completing javascript execution, switch
> to another tabs make spinner on middle page appear for quite a while. This
> make switching tab experience worse than non e10s. Is this test include
> heavy javascript page?

This test opens a series of pages from the tp5 set, some with reasonably complex JS running on them, yes.

> Or Is this test account for switching tabs before
> another tab completely loaded or rendered?

This is different. We wait for all of the pages to be loaded before we start switching through them.

Do you have an example site that I could test?

Flags: needinfo?(Shiratori1991)

Katsuki

Comment 19

•

8 years ago

If I open bugzilla and google on different tab, switching tab between these two tabs seem smooth, google and bugzilla page displayed properly. (I try to quickly switch tab on about 500ms interval).

Then I open this page : https://reviewable.io/reviews/servo/servo/10438  on another tab. Before this page completely rendered or before completely loaded or before review being loaded, I try to switch tab between google and bugzilla on about 500 ms interval, but what I see is just the spinner on middle page. Even switching tab on 2 seconds interval just display the spinner.

Flags: needinfo?(Shiratori1991)

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 20

•

8 years ago

(In reply to Katsuki from comment #19)
> If I open bugzilla and google on different tab, switching tab between these
> two tabs seem smooth, google and bugzilla page displayed properly. (I try to
> quickly switch tab on about 500ms interval).
> 
> Then I open this page : https://reviewable.io/reviews/servo/servo/10438  on
> another tab. Before this page completely rendered or before completely
> loaded or before review being loaded, I try to switch tab between google and
> bugzilla on about 500 ms interval, but what I see is just the spinner on
> middle page. Even switching tab on 2 seconds interval just display the
> spinner.

Interesting. I am able to recreate the spinner that you're seeing, but when I use the same steps in single-process mode, the entire browser UI locks up instead. Are you not seeing this?

Flags: needinfo?(Shiratori1991)

Katsuki

Comment 21

•

8 years ago

Yes I am seeing it on single process, firefox hangs for couple of seconds. Seems like other bug? Is it about jank or responsiveness bug?

Flags: needinfo?(Shiratori1991)

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 22

•

8 years ago

(In reply to Katsuki from comment #21)
> Yes I am seeing it on single process, firefox hangs for couple of seconds.
> Seems like other bug? Is it about jank or responsiveness bug?

It's essentially the same bug, just manifesting differently. In the non-e10s case, the main process is being bogged down by some heavy JS that reviewable.io is doing. In the e10s case, the content process is getting bogged down by that JS. The fact that the e10s-case allows the browser to still be somewhat responsive is a bonus, though it's not amazing that you can't see any tab content until the content process frees up.

The transition to multiple content processes will ease this up, most likely. You might want to follow the e10s-multi metabug at bug 1207306.

Otherwise, the issue you've reported is unrelated to the tps test.

Katsuki

Comment 23

•

8 years ago

OK, thank you.

Carsten Book [:Tomcat]

Comment 24

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d7d96b820538
https://hg.mozilla.org/mozilla-central/rev/44cd41f66946
https://hg.mozilla.org/mozilla-central/rev/dcd51036ec22

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox48: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla48

Joel Maher ( :jmaher ) (UTC -8)

Comment 25

•

8 years ago

lots of improvements/regressions as expected:
https://treeherder.mozilla.org/perf.html#/alerts?id=886

Mike Conley (:mconley) (:⚙️)

Assignee

Comment 26

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c8c69ccd2bdb

Dão Gottwald [:dao]

Updated

•

8 years ago

Depends on: 1283812

MozReview Request: Bug 1261152 - Make TPS more accurate using MozAfterPaint with layer transaction ids. r?jmaher 8 years ago Mike Conley (:mconley) (:⚙️) 58 bytes, text/x-review-board-request	jmaher : review+	Details
MozReview Request: Bug 1261152 - Keep the content process alive between tps runs. r?jmaher 8 years ago Mike Conley (:mconley) (:⚙️) 58 bytes, text/x-review-board-request	jmaher : review+	Details