Assess MozAfterPaint/isMozAfterPaintPending usage in talos

NEW
Unassigned

Status

defect
P3
normal
4 years ago
10 months ago

People

(Reporter: avih, Unassigned)

Tracking

(Depends on 1 bug, Blocks 2 bugs)

unspecified
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10s+)

Details

Attachments

(3 attachments)

There are several talos tests which use MozAfter[Remote?]Paint (from now on - AKA MAP) in talos: a11y, ts_paint, tpaint, tresize, tp5o, tsvg_opacity.

While I haven't used it myself in tests, my recollection is that whenever MAP was involved, there were concerns/evidence that its effect within talos is.. weird or not fully expected.

I want to use this bug to:
- Document how it should be used (if at all).
- Document if it affects different platforms differently.
- Verify that it's used correctly in talos tests (listed above).
- Apply the above to e10s and non e10s (and answer: are they comparable?).

The most recent places where it came up are:

- tsvgx_opacity where matt added MAP few days ago (specifically, see bug 1250717 comment 18 - adding MAP didn't affect non-e10s on win7, but in e10s made the results from much "better" to much "worse" than non e10s).

- My observations with tsvgx at bug 1254121 (different groups of subtests are affected differently when changing MAP, where no single MAP value makes all tests reasonable comparable between e10s and non e10s).

Code wise, tp5o and tsvg_opacity use MAP via a flag to the pageloader addon (also tsvgx if we end up using MAP there too - bug 1254121). While for the other tests AFAIK it's each case on its own.

As a first test case, we could assess its recent effect on tsvgx_opacity (bug 1250717 comment 18).
Jim, AFAIK know you're pretty familiar with tpaint and ts_paint, can we clear the doubt about those? Is MozAfterPaint used correctly in those tests?

(feel free to also refer to other tests in this regard if you're familiar with them too).
Flags: needinfo?(jmathies)
(In reply to Avi Halachmi (:avih) from comment #1)
> Jim, AFAIK know you're pretty familiar with tpaint and ts_paint, can we
> clear the doubt about those? Is MozAfterPaint used correctly in those tests?
> 
> (feel free to also refer to other tests in this regard if you're familiar
> with them too).

I've been curious about MozAfterPaint so I did a bit of debugging. How it appears to work under e10s: 

1)  Content process paints which generates remote layer updates that get shipped over to the browser process.
2) remote layer update get latched into the browser layer tree.
3) A composition cycle gets triggered which composes everything together and presents browser+content to the screen.
4) A 'did composition' event is sent back via CrossProcessCompositorParent to the content process wich triggers MozAfterPaint events on the test's html page listeners.

There's a slight added IPC overhead on the e10s side but generally this event appears to track what we're interested in.
Flags: needinfo?(jmathies)
<avih> jimm: were you also able to verify that it's used correctly in tpaint and ts_paint? (wouldn't mind others too)

As long as the test uses a test html page loaded in a remote browser, should be fine. I looked at a few of these and that what they all appear to do.

tresize:
http://mxr.mozilla.org/mozilla-central/source/testing/talos/talos/startup_test/tresize/addon/content/tresize.js#19

tpaint:
http://mxr.mozilla.org/mozilla-central/source/testing/talos/talos/startup_test/tpaint.html?force=1#93

ts_paint:
http://mxr.mozilla.org/mozilla-central/source/testing/talos/talos/startup_test/tspaint_test.html?force=1#73
as a note, those are the startup tests, we should validate the pageloader tests next.
Thanks Jim.

(In reply to Joel Maher (:jmaher) from comment #4)
> as a note, those are the startup tests, we should validate the pageloader
> tests next.

Yup.

tsvg_opacity should probably be first since it's a simpler case which shows suspicious results - single "simple" page load duration.

If confirmed, hopefully this would imply a confirmation of using MAP for "normal" page load tests, which means tp5o too. This would also mean that the "normal" pages of tsvgx should use MAP too.

tsvgx is a bit more complex since we have to figure out whether or not enabling MAP for page load tests affects animation (ASAP) tests differently between e10s and non e10s (initial results suggest it's affected differently).

Alternatively, as suggested at bug 1254121, we either disable the normal page load subtests of tsvgx (and not use MAP), or split it into two tests where one uses MAP and the other doesn't.

As for a11y, I really know nothing about it, and I _think_ it's lower priority than the above.
(In reply to Avi Halachmi (:avih) from comment #5)
> As for a11y, I really know nothing about it, and I _think_ it's lower
> priority than the above.

Yep, we're not blocking on a11y support for initial rollout.
Blocks: e10s-perf
tracking-e10s: --- → +
Priority: -- → P1
Avi, what are the next steps to confirm that the Talos tests are using MozAfterPaint correctly?
Flags: needinfo?(avihpit)
I'm investigating by running Talos locally using mach talos-test [--e10s] -a tsvgr_opacity, both in e10s mode and in non-e10s mode. I've instrumented my build with debug messages that print timestamps for our various parts of the pipeline.

My initial conclusion is that the measurement is good in e10s mode but incomplete in non-e10s mode. It looks like in non-e10s mode we're sometimes not including the composite of the paint we're interested in in the measurement.
(This patch will only build on OS X.)
(The [content] lines in here should be read as [parent] lines because everything is happening in the same process.)
In the non-e10s output, some tests are including everything we want to measure (and more), like this one:


17:58:00     INFO -  PROCESS | 4882 | Cycle 1(16): loaded http://localhost:61532/tests/svg_opacity/small-group-opacity-2500.svg (next: http://localhost:61532/tests/svg_opacity/big-optimizable-group-opacity-2500.svg)
17:58:00     INFO -  PROCESS | 4882 | [1458165480741] [parent] composite is starting
17:58:00     INFO -  PROCESS | 4882 | [1458165480741] [parent] composite has finished
17:58:00     INFO -  PROCESS | 4882 | [1458165480799] [parent] main thread paint starting
17:58:00     INFO -  PROCESS | 4882 | [1458165480836] [parent] sending layer transaction to the compositor
17:58:00     INFO -  PROCESS | 4882 | [1458165480838] [parent] main thread paint finished [38ms]
17:58:00     INFO -  PROCESS | 4882 | [1458165480838] [parent] nsView::DidCompositeWindow
17:58:00     INFO -  PROCESS | 4882 | [1458165480841] [parent] composite is starting
17:58:00     INFO -  PROCESS | 4882 | [1458165480841] [parent] composite has finished
17:58:00     INFO -  PROCESS | 4882 | [1458165480854] [parent] nsView::DidCompositeWindow
17:58:01     INFO -  PROCESS | 4882 | RSS: Main: 364511232
17:58:01     INFO -  PROCESS | 4882 |
17:58:01     INFO -  PROCESS | 4882 | [1458165481088] [parent] main thread paint starting
17:58:01     INFO -  PROCESS | 4882 | [1458165481130] [parent] sending layer transaction to the compositor
17:58:01     INFO -  PROCESS | 4882 | [1458165481131] [parent] main thread paint finished [42ms]
17:58:01     INFO -  PROCESS | 4882 | [1458165481161] [content] _contentLoadHandler called
17:58:01     INFO -  PROCESS | 4882 | [1458165481186] [parent] main thread paint starting
17:58:01     INFO -  PROCESS | 4882 | [1458165481387] [parent] sending layer transaction to the compositor
17:58:01     INFO -  PROCESS | 4882 | [1458165481389] [parent] main thread paint finished [202ms]
17:58:01     INFO -  PROCESS | 4882 | [1458165481390] [content] _contentPaintHandler called
17:58:01     INFO -  PROCESS | 4882 | [1458165481393] [parent] composite is starting
17:58:01     INFO -  PROCESS | 4882 | [1458165481393] [parent] main thread paint starting
17:58:01     INFO -  PROCESS | 4882 | [1458165481394] [parent] composite has finished
17:58:01     INFO -  PROCESS | 4882 | [1458165481429] [parent] sending layer transaction to the compositor
17:58:01     INFO -  PROCESS | 4882 | [1458165481431] [parent] main thread paint finished [37ms]
17:58:01     INFO -  PROCESS | 4882 | [1458165481432] [parent] nsView::DidCompositeWindow
17:58:01     INFO -  PROCESS | 4882 | [1458165481432] [content] MozAfterPaint fired, sending LoadEvent

There are multiple main thread paints here, and the one we're interested in is the one that's taking the longest time (here, 202ms). We also want to include the composite that is triggered by this paint. And in this particular case, we are including that composite:

17:58:01     INFO -  PROCESS | 4882 | [1458165481186] [parent] main thread paint starting
17:58:01     INFO -  PROCESS | 4882 | [1458165481387] [parent] sending layer transaction to the compositor
17:58:01     INFO -  PROCESS | 4882 | [1458165481389] [parent] main thread paint finished [202ms]
This is the paint we're interested in.

17:58:01     INFO -  PROCESS | 4882 | [1458165481390] [content] _contentPaintHandler called
This function is called from a setTimeout(, 0) and installs a MozAfterPaint listener. Note that we are adding the listener after the paint has already happened! Due to the setTimeout.

17:58:01     INFO -  PROCESS | 4882 | [1458165481393] [parent] composite is starting
17:58:01     INFO -  PROCESS | 4882 | [1458165481393] [parent] main thread paint starting
17:58:01     INFO -  PROCESS | 4882 | [1458165481394] [parent] composite has finished
This is the composite that we are interested in. Interestingly, a new paint has already started on the main thread in the meantime. We are not interested in that paint but it will affect our measurements.

17:58:01     INFO -  PROCESS | 4882 | [1458165481429] [parent] sending layer transaction to the compositor
17:58:01     INFO -  PROCESS | 4882 | [1458165481431] [parent] main thread paint finished [37ms]
The rest of that second paint.

17:58:01     INFO -  PROCESS | 4882 | [1458165481432] [parent] nsView::DidCompositeWindow
17:58:01     INFO -  PROCESS | 4882 | [1458165481432] [content] MozAfterPaint fired, sending LoadEvent
Now we're processing the MozAfterPaint event that was sent by the composite that we were interested in. Processing this event was delayed by the second main thread paint. But we haven't composited the second paint yet, so at least we're not including that paint's composite time in our measurement.



But there also are a few test runs where we don't measure everything. There are actually two types of problems here. Example one:

17:57:57     INFO -  PROCESS | 4882 | Cycle 1(11): loaded http://localhost:61532/tests/svg_opacity/small-group-opacity-2500.svg (next: http://localhost:61532/tests/svg_opacity/big-optimizable-group-opacity-2500.svg)
17:57:57     INFO -  PROCESS | 4882 | [1458165477363] [parent] composite is starting
17:57:57     INFO -  PROCESS | 4882 | [1458165477373] [parent] composite has finished
17:57:57     INFO -  PROCESS | 4882 | [1458165477414] [parent] nsView::DidCompositeWindow
17:57:57     INFO -  PROCESS | 4882 | [1458165477648] [parent] main thread paint starting
17:57:57     INFO -  PROCESS | 4882 | [1458165477684] [parent] sending layer transaction to the compositor
17:57:57     INFO -  PROCESS | 4882 | [1458165477685] [parent] main thread paint finished [36ms]
17:57:57     INFO -  PROCESS | 4882 | RSS: Main: 362844160
17:57:57     INFO -  PROCESS | 4882 |
17:57:57     INFO -  PROCESS | 4882 | [1458165477729] [parent] main thread paint starting
17:57:57     INFO -  PROCESS | 4882 | [1458165477766] [parent] sending layer transaction to the compositor
17:57:57     INFO -  PROCESS | 4882 | [1458165477767] [parent] main thread paint finished [37ms]
17:57:57     INFO -  PROCESS | 4882 | [1458165477768] [parent] nsView::DidCompositeWindow
17:57:57     INFO -  PROCESS | 4882 | [1458165477782] [parent] composite is starting
17:57:57     INFO -  PROCESS | 4882 | [1458165477782] [parent] composite has finished
17:57:57     INFO -  PROCESS | 4882 | [1458165477801] [content] _contentLoadHandler called
17:57:57     INFO -  PROCESS | 4882 | [1458165477827] [parent] main thread paint starting
17:57:58     INFO -  PROCESS | 4882 | [1458165478029] [parent] sending layer transaction to the compositor
17:57:58     INFO -  PROCESS | 4882 | [1458165478031] [parent] main thread paint finished [203ms]
17:57:58     INFO -  PROCESS | 4882 | [1458165478032] [parent] nsView::DidCompositeWindow
17:57:58     INFO -  PROCESS | 4882 | [1458165478032] [content] _contentPaintHandler called
17:57:58     INFO -  PROCESS | 4882 | [1458165478032] [content] no paint was pending, sending LoadEvent
17:57:58     INFO -  PROCESS | 4882 | [1458165478033] [parent] composite is starting
17:57:58     INFO -  PROCESS | 4882 | [1458165478033] [parent] composite has finished
17:57:58     INFO -  PROCESS | 4882 | Cycle 1(12): loaded http://localhost:61532/tests/svg_opacity/smal

Here the problem might be caused by the fact that we're installing the MozAfterPaint handler from a timeout and missing the actual paint. We end up including the paint in our measurements (because it already happened) but we're not waiting for the corresponding composite. So we're not measuring enough.

Example two:


17:57:51     INFO -  PROCESS | 4882 | Cycle 1(2): loaded http://localhost:61532/tests/svg_opacity/small-group-opacity-2500.svg (next: http://localhost:61532/tests/svg_opacity/big-optimizable-group-opacity-2500.svg)
17:57:51     INFO -  PROCESS | 4882 | [1458165471093] [parent] composite is starting
17:57:51     INFO -  PROCESS | 4882 | [1458165471094] [parent] composite has finished
17:57:51     INFO -  PROCESS | 4882 | [1458165471130] [parent] nsView::DidCompositeWindow
17:57:51     INFO -  PROCESS | 4882 | [1458165471211] [parent] main thread paint starting
17:57:51     INFO -  PROCESS | 4882 | [1458165471247] [parent] sending layer transaction to the compositor
17:57:51     INFO -  PROCESS | 4882 | [1458165471248] [parent] main thread paint finished [37ms]
17:57:51     INFO -  PROCESS | 4882 | [1458165471260] [parent] composite is starting
17:57:51     INFO -  PROCESS | 4882 | [1458165471260] [parent] composite has finished
17:57:51     INFO -  PROCESS | 4882 | [1458165471261] [parent] nsView::DidCompositeWindow
17:57:51     INFO -  PROCESS | 4882 | [1458165471378] [parent] main thread paint starting
17:57:51     INFO -  PROCESS | 4882 | [1458165471411] [parent] sending layer transaction to the compositor
17:57:51     INFO -  PROCESS | 4882 | [1458165471412] [parent] main thread paint finished [33ms]
17:57:51     INFO -  PROCESS | 4882 | RSS: Main: 362037248
17:57:51     INFO -  PROCESS | 4882 |
17:57:51     INFO -  PROCESS | 4882 | [1458165471417] [parent] main thread paint starting
17:57:51     INFO -  PROCESS | 4882 | [1458165471458] [parent] sending layer transaction to the compositor
17:57:51     INFO -  PROCESS | 4882 | [1458165471459] [parent] main thread paint finished [42ms]
17:57:51     INFO -  PROCESS | 4882 | [1458165471460] [parent] nsView::DidCompositeWindow
17:57:51     INFO -  PROCESS | 4882 | [1458165471461] [parent] composite is starting
17:57:51     INFO -  PROCESS | 4882 | [1458165471462] [parent] composite has finished
17:57:51     INFO -  PROCESS | 4882 | [1458165471532] [parent] main thread paint starting
17:57:51     INFO -  PROCESS | 4882 | [1458165471570] [parent] sending layer transaction to the compositor
17:57:51     INFO -  PROCESS | 4882 | [1458165471572] [parent] main thread paint finished [39ms]
17:57:51     INFO -  PROCESS | 4882 | [1458165471572] [parent] nsView::DidCompositeWindow
17:57:51     INFO -  PROCESS | 4882 | [1458165471576] [content] _contentLoadHandler called
17:57:51     INFO -  PROCESS | 4882 | [1458165471578] [parent] composite is starting
17:57:51     INFO -  PROCESS | 4882 | [1458165471579] [parent] composite has finished
17:57:51     INFO -  PROCESS | 4882 | [1458165471601] [parent] main thread paint starting
17:57:51     INFO -  PROCESS | 4882 | [1458165471800] [parent] sending layer transaction to the compositor
17:57:51     INFO -  PROCESS | 4882 | [1458165471802] [parent] main thread paint finished [200ms]
17:57:51     INFO -  PROCESS | 4882 | [1458165471804] [content] _contentPaintHandler called
17:57:51     INFO -  PROCESS | 4882 | [1458165471806] [parent] nsView::DidCompositeWindow
17:57:51     INFO -  PROCESS | 4882 | [1458165471806] [content] MozAfterPaint fired, sending LoadEvent

Here _contentPaintHandler (the function that installs the MozAfterPaint listener) is called after the paint, but apparently there is still another paint pending, so we do wait for MozAfterPaint to fire. But it ends up firing before the paint we're interested in has been composited. So we might be picking up a MozAfterPaint event from a previous composite.
I didn't find any problems with e10s on, other than the fact that we're installing the MozAfterPaint listener after we've painted instead of before.
Every run looks like this:


17:51:49     INFO -  PROCESS | 4699 | [1458165109840] [content] _contentLoadHandler called
17:51:49     INFO -  PROCESS | 4699 | [1458165109848] [parent] composite is starting
17:51:49     INFO -  PROCESS | 4699 | [1458165109848] [parent] sending layer transaction to the compositor
17:51:49     INFO -  PROCESS | 4699 | [1458165109849] [parent] composite has finished
17:51:49     INFO -  PROCESS | 4699 | [1458165109849] [parent] nsView::DidCompositeWindow
17:51:49     INFO -  PROCESS | 4699 | [1458165109854] [content] main thread paint starting
  [ ... lots of paints and composites in the parent process due to the blinking urlbar caret ... ] 
17:51:52     INFO -  PROCESS | 4699 | [1458165112188] [parent] main thread paint starting
17:51:52     INFO -  PROCESS | 4699 | [1458165112190] [parent] sending layer transaction to the compositor
17:51:52     INFO -  PROCESS | 4699 | [1458165112190] [parent] main thread paint finished [1ms]
17:51:52     INFO -  PROCESS | 4699 | [1458165112204] [parent] composite is starting
17:51:52     INFO -  PROCESS | 4699 | [1458165112205] [parent] composite has finished
17:51:52     INFO -  PROCESS | 4699 | [1458165112205] [parent] nsView::DidCompositeWindow
17:51:52     INFO -  PROCESS | 4699 | [1458165112582] [content] sending layer transaction to the compositor
17:51:52     INFO -  PROCESS | 4699 | [1458165112582] [content] main thread paint finished [2728ms]
17:51:52     INFO -  PROCESS | 4699 | [1458165112583] [content] _contentPaintHandler called
17:51:52     INFO -  PROCESS | 4699 | [1458165112622] [parent] composite is starting
17:51:52     INFO -  PROCESS | 4699 | [1458165112625] [parent] composite has finished
17:51:52     INFO -  PROCESS | 4699 | [1458165112625] [content] nsView::DidCompositeWindow
17:51:52     INFO -  PROCESS | 4699 | [1458165112626] [content] MozAfterPaint fired, sending LoadEvent



Going through the relevant lines once more:

17:51:49     INFO -  PROCESS | 4699 | [1458165109854] [content] main thread paint starting
This is the paint we're interested in. It's starting.

Then we have lots of parent process painting and compositing due to the blinking caret. We don't care about that. 

17:51:52     INFO -  PROCESS | 4699 | [1458165112582] [content] sending layer transaction to the compositor
17:51:52     INFO -  PROCESS | 4699 | [1458165112582] [content] main thread paint finished [2728ms]
Now we're finished painting our test.

17:51:52     INFO -  PROCESS | 4699 | [1458165112583] [content] _contentPaintHandler called
This is where we're installing the MozAfterPaint handler. Again we're doing this after the paint has already completed.

17:51:52     INFO -  PROCESS | 4699 | [1458165112622] [parent] composite is starting
17:51:52     INFO -  PROCESS | 4699 | [1458165112625] [parent] composite has finished
This is the composite we're interested in.

17:51:52     INFO -  PROCESS | 4699 | [1458165112625] [content] nsView::DidCompositeWindow
17:51:52     INFO -  PROCESS | 4699 | [1458165112626] [content] MozAfterPaint fired, sending LoadEvent
And here we're receiving the MozAfterPaint event after this composite has completed. So our measurement contains all the right things.
Needinfo'ing Matt in case he has comments or ideas on this.
Flags: needinfo?(matt.woodrow)
(In reply to Markus Stange [:mstange] from comment #12)

> Here the problem might be caused by the fact that we're installing the
> MozAfterPaint handler from a timeout and missing the actual paint. We end up
> including the paint in our measurements (because it already happened) but
> we're not waiting for the corresponding composite. So we're not measuring
> enough.

Is the bug here that isMozAfterPaintPending is lying to us?

It checks to see if a paint is scheduled, and returns false if it.

Should it also be checking to see if there is an outstanding DidComposite notification?

> 
> Here _contentPaintHandler (the function that installs the MozAfterPaint
> listener) is called after the paint, but apparently there is still another
> paint pending, so we do wait for MozAfterPaint to fire. But it ends up
> firing before the paint we're interested in has been composited. So we might
> be picking up a MozAfterPaint event from a previous composite.

Should we be repeatedly waiting on MozAfterPaint, for as long as isMozAfterPaintPending is true? This is what the reftest harness does to ensure that all paints have completed.

If we're running into the situation where there are paints *after* the ones we care about, then we should try fixing the test so that they don't happen.
Flags: needinfo?(matt.woodrow)
I've tried Matt's suggestion but something is triggering three or four more paints after the one we care about, even after I've changed pageloader to unfocus the urlbar so that the caret doesn't blink. Still looking into that.
Seeing that we don't have a clear reply yet, let's try to ask a different question:

Assuming we're interesting in measuring* "how quickly can Firefox load a page and make it reasonably ready** for the user?", what would be a good way to do that:

1. For pages we have control over (e.g. at tsvg_opacity).
2. For pages we don't have control over (e.g. at tp5).


* Measuring: we're not necessarily interested in absolute load time, but rather something which could detect performance changes reasonably reliably.

** Reasonably ready: probably be at least the load event for the page, and hopefully some rendering of the content.

Seeing that MozAfterPaint is not easy to use as a reliable tool, and especially with OMTC and e10s adding layers of indirectness which make it harder to compare different configurations, maybe we should try a more black-box approach, where we:

- use ASAP mode
- wait for the load event
- Register and wait for some fixed N requestAnimationFrame callbacks.

Does this have the potential to be more reliable (especially on a longer term and through gfx configurations/technologies) than using MAP?
Flags: needinfo?(avihpit)
maybe we should be asking "what is the purpose of the mozafterpaint event?"  Possibly we are using it incorrectly
Can summarize where we stand here? As far as I can tell, this bug is blocking bug 1255936, which in turn blocks e10s ship signoff.
Blocks: 1255936
Summary: Assess MozAfterPaint usage in talos → Assess MozAfterPaint/isMozAfterPaintPending usage in talos
Zibi, will this bug about reviewing Talos usage of MozAfterPaint still be relevant after you fix tpaint/timeToNonBlankPaint bug 1381988?
Depends on: 1381988
Flags: needinfo?(gandalf)
Hmmm, possibly.

It seems that in order to get bug 1381988 fixed we need to expose the right timing to use, and for that we need bug 1388157 (which will affect tp6 as well).

But with those fixed, we'll only have tpaint solved. There are still other tests to evaluate and possibly move to timeToNonBlankPaint.
Flags: needinfo?(gandalf)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(gijskruitbosch+bugs)
Moving to p3 because no activity for at least 24 weeks.
Priority: P1 → P3
You need to log in before you can comment on or make changes to this bug.