Closed Bug 1426688 Opened 6 years ago Closed 6 years ago

DAMP tests have pending reflows that overlap with subsequent tests

Categories

(DevTools :: General, enhancement, P3)

enhancement

Tracking

(firefox59 fixed)

RESOLVED FIXED
Firefox 59
Tracking Status
firefox59 --- fixed

People

(Reporter: ochameau, Assigned: ochameau)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files)

I noticed (while working on bug 1426206) that some reflows were happening after toolbox open sub-tests. Some happen in middle of the GC we force after toolbox opening. But ideally, these pending reflows should be included in the tests that introduced them. In some cases, these reflows are even happening during the next test, the page reload.

Here is how I knew about that.
In bug 1426206, to fix it, I introduce a requestIdleCallback before "open" and "reload", for the netmonitor.
The result was surprising:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=2cf552447043d4ab87fcdda33760ff5b737bd0b6&originalSignature=edaec66500db21d37602c99daa61ac983f21a6ac&newSignature=edaec66500db21d37602c99daa61ac983f21a6ac&showOnlyConfident=1&framework=1&selectedTimeRange=172800
  simple.netmonitor.reload and simple.netmonitor.requestsFinished are faster
Whereas I would expect them to be slower because the waterfall is going to render whereas it wasn't without this patch.

After looking at the profiler, I saw these reflows, here:
  https://perfht.ml/2BW8XWk
One in middle of GCs at 48.6s and another one, after GCs and right before the next test at 49.4s. Here is seems to be happening right before we start the next test, but I imagine it overlaps and slow down the reload tests.

These reflows are some leftovers from simple.netmonitor.open. It is hard to figure out which particular code triggers these two reflows. One seems to be related to SVG as we can see a "load" event related to a SVG document.
But we could add a generic waiting code for any pending update and that, for all the tools.

If I do that, and only then I apply bug 1426206 fix, DAMP reports no particular change:
  https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&originalRevision=cbb06e66013e5e1252c7ab84b9d1f7581c4a76c0&newProject=try&newRevision=3f88fa39d327e70f4a43f6b078125fe7b3ec6941&originalSignature=edaec66500db21d37602c99daa61ac983f21a6ac&newSignature=edaec66500db21d37602c99daa61ac983f21a6ac&filter=netmon&framework=1
  I would expect still to see reload/requestsFinished to be slower, but that is yet another issue. At least it confirms that adding a delay between open and reload no longer impact reload.

If you look at the profiler, with the waiting code:
  https://perfht.ml/2Bf63bx
You can see at 55.8s that there is still one reflow, the one related to SVG, which seems to be more asynchronous. But there other one is now included into simple.netmonitor.open.
I have two patches. One for waiting for pending reflow after "open" and another one to wait after "reload" tests.

The first one changes DAMP results like this:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=234766933e1fb76df36ddbbf8157e90c573391c6&originalSignature=edaec66500db21d37602c99daa61ac983f21a6ac&newSignature=edaec66500db21d37602c99daa61ac983f21a6ac&showOnlyConfident=1&framework=1&selectedTimeRange=172800
  complicated.(debugger,netmonitor,console).open: +4.3%, +5.9%, +4.8%
  complicated.netmonitor.requestsFinished -2.2%
  complicated.netmonitor.close -6%
  simple.(debugger,netmonitor).open +5%, +7%
  inspector.mutations +2.1%
  console.openwithcache.open +70%!!
    About this test, you can see on the profile, without the patches, that this test is not waiting correctly:
    https://perfht.ml/2Bfg3BB
    There is some react update, componentDidMount, happening after the test is done.

The second one changes DAMP like this:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&originalRevision=234766933e1fb76df36ddbbf8157e90c573391c6&newProject=try&newRevision=914d8052d1f41288a194b055b14742eb91fdcd05&originalSignature=edaec66500db21d37602c99daa61ac983f21a6ac&newSignature=edaec66500db21d37602c99daa61ac983f21a6ac&showOnlyConfident=1&framework=1
  complicated.debugger.reload +4%
  simple.(inspector,debugger,console).reload +7%, +30%, +13%
  simple.debugger.close -12%

Otherwise, I added some markers to help debug these pending reflows while writing a new test.
Ideally, there should be none.
Assignee: nobody → poirot.alex
Comment on attachment 8938390 [details]
Bug 1426688 - Wait for pending paint updates after opening the toolbox.

https://reviewboard.mozilla.org/r/209082/#review215350

Patch looks good, thanks for improving the tests! 

Let's discuss about including this timing in the measured performance. 
Need to check with the console folks if the symptoms seen on openwithcache are something done on purpose or the artifact of a problem.

::: testing/talos/talos/tests/devtools/addon/content/damp.js:494
(Diff revision 1)
> +    let panel = toolbox.getCurrentPanel();
> +    // All panels have its own way of exposing their window object...
> +    let window = panel.panelWin || panel._frameWindow || panel.panelWindow;
> +
> +    window.performance.mark("pending paints.start");
> +    let utils = window.QueryInterface(Ci.nsIInterfaceRequestor)

We should probably create `utils` before creating the marker.

::: testing/talos/talos/tests/devtools/addon/content/damp.js:500
(Diff revision 1)
> +                      .getInterface(Ci.nsIDOMWindowUtils);
> +    while (utils.isMozAfterPaintPending) {
> +      await new Promise(done => {
> +        window.addEventListener("MozAfterPaint", function listener() {
> +          window.performance.mark("pending paint");
> +          window.removeEventListener("MozAfterPaint", listener);

{ once: true } seems to work fine here. Are we concerned with window-like objects using exotic listener implementations?

::: testing/talos/talos/tests/devtools/addon/content/damp.js:512
(Diff revision 1)
> +
>    async openToolboxAndLog(name, tool, onLoad) {
>      dump("Open toolbox on '" + name + "'\n");
>      let test = this.runTest(name + ".open.DAMP");
>      let toolbox = await this.openToolbox(tool, onLoad);
> +    await this.waitForPendingPaints(toolbox);

I'm torn about including this in the performance measured for the opening. Since those reflows might sometimes happen during the open, sometimes right after, it makes sense to include them in the "open" measure to stabilize the results.

But there is a risk with this approach to pick up paints which are not related to the tool opening.  Let's imagine we are updating less critical parts of the UI after the initial display of the tool, they would trigger paints. Which event should we trust then? When the tool says it is ready, or when everything has settled.

If we look at the console openwithcache issue you spotted, see some screenshots:
- before requests settled: https://screenletstore.appspot.com/img/ed757d78-f06d-11e7-a2e4-79a45c76030b.png
- after requests settled: https://screenletstore.appspot.com/img/17a888c5-f06e-11e7-828a-79a45c76030b.png

As you can see the only difference is the scrollbar on the message list. I suppose the console is streaming some of the messages? What is the most important thing to measure here? It feels like waiting for all requests to settle is measuring when the tool is ready to be interacted with. I could see having two timings here: open and interactive.

Maybe as a first step we could just wait after measuring the open. This should be enough to reduce the impact of open tests on subsequent tests.
Attachment #8938390 - Flags: review?(jdescottes) → review+
Comment on attachment 8938391 [details]
Bug 1426688 - Wait for pending paints after page reload.

https://reviewboard.mozilla.org/r/209084/#review215360

::: testing/talos/talos/tests/devtools/addon/content/damp.js:533
(Diff revision 1)
>      let test = this.runTest(name + ".close.DAMP");
>      await gDevTools.closeToolbox(target);
>      test.done();
>    },
>  
> -  async reloadPageAndLog(name, onReload) {
> +  async reloadPageAndLog(name, toolbox, onReload) {

If we look at closeToolboxAndLog, we do a small dance:

  let tab = getActiveTab(getMostRecentBrowserWindow());
  let target = TargetFactory.forTab(tab);
  [...]
  await gDevTools.closeToolbox(target);
  
We could do a similar thing here, but maybe it's best to limit the impact of test code on performance. Do you think we should update closeToolboxAndLog to take a toolbox argument and get the target via toolbox.target's getter?

::: testing/talos/talos/tests/devtools/addon/content/damp.js:537
(Diff revision 1)
>  
> -  async reloadPageAndLog(name, onReload) {
> +  async reloadPageAndLog(name, toolbox, onReload) {
>      dump("Reload page on '" + name + "'\n");
>      let test = this.runTest(name + ".reload.DAMP");
>      await this.reloadPage(onReload);
> +    await this.waitForPendingPaints(toolbox);

(same comment than on the other patch about including this or not)
Attachment #8938391 - Flags: review?(jdescottes) → review+
Nicolas, could you share your thoughts about comment 1 and the fact that console's openwithcache test seems to be followed by a lot of reflows.

As you can see, I took some screenshots when the console says it is ready [1], and when reflows are done [2]. Looks like more log lines are being added/rendered in the meantime (scrollbar of the message list seems to indicate that). Is this something done on purpose by the console? Only display a few messages straight away and then stream more messages? If so, what do you think should be measured by DAMP in this scenario. 

[1] https://screenletstore.appspot.com/img/ed757d78-f06d-11e7-a2e4-79a45c76030b.png
[2] https://screenletstore.appspot.com/img/17a888c5-f06e-11e7-828a-79a45c76030b.png

Thanks!
Flags: needinfo?(nchevobbe)
(In reply to Julian Descottes [:jdescottes][:julian] from comment #4)
> ::: testing/talos/talos/tests/devtools/addon/content/damp.js:500
> I'm torn about including this in the performance measured for the opening.
> Since those reflows might sometimes happen during the open, sometimes right
> after, it makes sense to include them in the "open" measure to stabilize the
> results.

That's a good point.
The important point here is to untangle DAMP tests. To do that, we have to wait for reflows. That's my main goal.
Including it in the current test, or making it independent is a decision that may be specific to each test.

The main argument to wrap it into existing tests is to prevent introducing too many subtests/graphs that we have to watch.
I don't see any generic argument to justify making it a separate subtest.
But yes, there seems to be a very good argument to make it seperate in openwithcache.
I think nicolas tried to measure the firstpaint here. Whenever the console displayed the last visibles messages.
If I remember correctly, openwithcache was introduced when he optimized the console
to immediately display the last visible messages and delay the DOM node creation for
all the history. So openwithcache is measuring the firstpaint, but do not measure the time it takes to process the complete history. I imagine we would also be interested to also track the history processing, but here it would be important to do it in a distinct test.

Nicolas, could you confirm that I'm correct here?

> Maybe as a first step we could just wait after measuring the open. This
> should be enough to reduce the impact of open tests on subsequent tests.

The more I think about this, the more it seems related to many other async things I've been chasing to stabilize DAMP results.
I already added GC, but do not track its duration.
We may be interested to track GC pauses, as, the more we introduce long GCs, the more Firefox will freeze.
It is exactly the same with reflows. Even if it is outside of a very precise action we want to track, it may help identify regression we wouldn't have thought about.
In some other bugs, I also attempted to wait for all RDP requests after requests. But I never landed this patch. That is also similar to reflows and GCs.

What about a generic "settle" subtest, where we would put all the waiting code: reflows, gc, RDP requests, ...?

> (Diff revision 1)
> > +                      .getInterface(Ci.nsIDOMWindowUtils);
> > +    while (utils.isMozAfterPaintPending) {
> > +      await new Promise(done => {
> > +        window.addEventListener("MozAfterPaint", function listener() {
> > +          window.performance.mark("pending paint");
> > +          window.removeEventListener("MozAfterPaint", listener);
> 
> { once: true } seems to work fine here. Are we concerned with window-like
> objects using exotic listener implementations?

This is just me, still not thinking about {once:true}, which is something new to me.
(In reply to Julian Descottes [:jdescottes][:julian] from comment #6)
> Nicolas, could you share your thoughts about comment 1 and the fact that
> console's openwithcache test seems to be followed by a lot of reflows.
> 
> As you can see, I took some screenshots when the console says it is ready
> [1], and when reflows are done [2]. Looks like more log lines are being
> added/rendered in the meantime (scrollbar of the message list seems to
> indicate that). Is this something done on purpose by the console? Only
> display a few messages straight away and then stream more messages? If so,
> what do you think should be measured by DAMP in this scenario. 
> 
> [1]
> https://screenletstore.appspot.com/img/ed757d78-f06d-11e7-a2e4-79a45c76030b.
> png
> [2]
> https://screenletstore.appspot.com/img/17a888c5-f06e-11e7-828a-79a45c76030b.
> png
> 
> Thanks!

Yes, this is a wanted behavior. The idea was to improve the perceived load time for the user. So when opening the console, we only show the last ones that fits in the viewport.

What I think is important is the first messages display, we might as well call it meaningful first render.
But I also think it would be nice to have the "time to interact", i.e. when the user is able to do something in the console.

Currently, this time to interact is coupled with the time it takes to render all the messages (because the UI is frozen, between first meaningful paint and "full paint").

But, as we might introduce smarter ways to load cached messages, this time to interact may be represented by something else. One use case that seems right would be the time it takes to open the console, evaluate something, and have it displayed in the output.

I can't really say which one is better, they are expressing different things, but I am fine with the current approach, i.e. saying the console is ready when there's something meaningful for the user to see (and I might be wrong :) ).
Here is DAMP results with a settle subtest, including reflows and GC:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=3bd41aaf1873c089ed86a603ad06f3a164d43ed9&originalSignature=edaec66500db21d37602c99daa61ac983f21a6ac&newSignature=edaec66500db21d37602c99daa61ac983f21a6ac&filter=.settle&framework=1&selectedTimeRange=172800

settle pauses are quite long in many tests.
It no longer impacts tests drastically.
simple.jsdebugger.reload and simple.performace.reload are faster while simple.webconsole.close is slower (which I can't easily explain)
Thanks for the feedback Nicolas (clearing ni?). I think you're right, both are important to measure for this particular test. I'm just a bit scared of introducing too many timings in DAMP, but if Alex is fine adding new *.settle tests, I think that's the best approach for now.

(In reply to Alexandre Poirot [:ochameau] from comment #9)
> Here is DAMP results with a settle subtest, including reflows and GC:
> https://treeherder.mozilla.org/perf.html#/
> comparesubtest?originalProject=mozilla-
> central&newProject=try&newRevision=3bd41aaf1873c089ed86a603ad06f3a164d43ed9&o
> riginalSignature=edaec66500db21d37602c99daa61ac983f21a6ac&newSignature=edaec6
> 6500db21d37602c99daa61ac983f21a6ac&filter=.
> settle&framework=1&selectedTimeRange=172800
> 
> settle pauses are quite long in many tests.
> It no longer impacts tests drastically.

That looks good. Not 100% sure about including GC. The time to GC somehow represents the memory impact. It's interesting to have this data, but the measure is not the "time before the panel is interactive" then. I'm not saying it has to be, it's just what I vaguely had in mind.

Maybe we need open + settle + interactive (and interactive should probably be enabled only for specific tests such as openwithcache). 

> simple.jsdebugger.reload and simple.performace.reload are faster while
> simple.webconsole.close is slower (which I can't easily explain)

Can't see any explanation at first glance either, weird.
Flags: needinfo?(nchevobbe)
Here is a comparison of two builds with settle subtests.
The base build only waits for reflows, while the new build also waits for GC after open tests.
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&originalRevision=a0ca65506c363518d3d7748033113683657143a9&newProject=try&newRevision=3bd41aaf1873&originalSignature=edaec66500db21d37602c99daa61ac983f21a6ac&newSignature=edaec66500db21d37602c99daa61ac983f21a6ac&filter=.settle.&framework=1

May be GC is special. This isn't something that can be reduced to zero, while reflows and pending RDP requests should ideally be zero, or almost zero.
The average timing of open.settle is around 700. 2% of 700 is 21ms. So we will most likely miss regressions under 20ms, which is about the current timing of pending reflows for complicated.webconsole.open.
Here is a try run for the latest patches:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=fd7cd00254cedeb741cd63b0d819025a57dab570&originalSignature=edaec66500db21d37602c99daa61ac983f21a6ac&newSignature=edaec66500db21d37602c99daa61ac983f21a6ac&filter=.settle&framework=1&selectedTimeRange=172800

Julian, does that look good to you?
"settle" only covers reflows and I would like to give a try to also include RDP request waiting (revive bug 1420233 settle-requests)
Flags: needinfo?(jdescottes)
Yes I think that's good to land (also in favor of including RDP req settling).

I initially had a question: should settle test include the open time in order to reduce their variance? But thinking more about it I prefer your approach. I'll just summarize my thoughts below.

Part of what needs to settle is done during the open, which explains the high variance (plus the fact that most timings are small). Including the open time in the settle time will reduce the variance. But it will also make it harder to spot regressions or improvements focused on the "settling" of a tool. In the end I think this would only reduce the variance for the tests that have a small settle time (few ms) where the overlap with open() is significant. And we typically won't really care about those.

For the others, where the settle time is significant enough, it's great to be able to see a dedicated variance only for the settle part. For instance custom.inspector.reload (https://mzl.la/2CzVi7E): 20ms variance on the open test and 90ms variance for the settle test. Very interesting to see such a high variance for settle, probably worth investigating.
Flags: needinfo?(jdescottes)
(In reply to Julian Descottes [:jdescottes][:julian] from comment #15)
> I initially had a question: should settle test include the open time in
> order to reduce their variance? But thinking more about it I prefer your
> approach. I'll just summarize my thoughts below.

My thoughts on this, is like RDP requests. Ideally, it should be an assertion and make the test permafail if there is any pending request/reflow/... Like mochitest and pending RDP requests.
But DAMP isn't ready for that. We can't disable just one test yet and running DAMP is still painful (slow, need special try runs, ...)
I have some plans about that:
* have DAMP support ini manifest in order to be able to temporarely disable one flaky test,
* speedup DAMP runtime so that it runs as fast as a mochitest,
* try to have DAMP test scripts also be runnable as mochitests so that we can easily check such assertions.

> For instance custom.inspector.reload (https://mzl.la/2CzVi7E): 20ms variance on the open
> test and 90ms variance for the settle test. Very interesting to see such a
> high variance for settle, probably worth investigating.

I think inspector leaks computation related to node highlights we do here:
  https://searchfox.org/mozilla-central/source/devtools/client/inspector/markup/markup.js#460-474
It is slightly similar to openwithcache. We don't necessarely want to measure
that additional step, but we should wait for its completion.
Pushed by apoirot@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fb220a3cfbb4
Wait for pending paint updates after opening the toolbox. r=jdescottes
https://hg.mozilla.org/integration/autoland/rev/f730c9da9c1f
Wait for pending paints after page reload. r=jdescottes
Pushed by apavel@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/70822a59d337
Wait for pending paint updates after opening the toolbox. r=jdescottes
https://hg.mozilla.org/mozilla-central/rev/05c8d54e800f
Wait for pending paints after page reload. r=jdescottes
https://hg.mozilla.org/mozilla-central/rev/70822a59d337
https://hg.mozilla.org/mozilla-central/rev/05c8d54e800f
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 59
Changes pushed in comment 17 updated the DAMP baseline:

== Change summary for alert #11067 (as of Thu, 04 Jan 2018 08:19:59 GMT) ==

Improvements:

 69%  damp osx-10-10 opt e10s     349.56 -> 107.07

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=11067
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: