Closed Bug 1358712 Opened 7 years ago Closed 6 years ago

uninterruptible reflow at _calcMouseTargetRect@chrome://browser/content/tabbrowser.xml:7697:27 when moving the mouse

Categories

(Firefox :: General, defect, P1)

defect

Tracking

()

VERIFIED FIXED
Firefox 61
Performance Impact low
Tracking Status
firefox61 --- verified

People

(Reporter: rowbot, Assigned: mconley)

References

(Blocks 1 open bug)

Details

(Keywords: perf, stale-bug, Whiteboard: [ohnoreflow][fxperf:p1])

Attachments

(3 files)

Here's the stack:

_calcMouseTargetRect@chrome://browser/content/tabbrowser.xml:7697:27
getMouseTargetRect@chrome://browser/content/tabbrowser.xml:7661:13
_callListener@chrome://browser/content/browser.js:8233:16
addListener@chrome://browser/content/browser.js:8211:5
set_label@chrome://browser/content/tabbrowser.xml:7645:13
updateStatusField@chrome://browser/content/browser.js:4512:7
_showDelayed/this._timer<@chrome://browser/content/browser.js:4921:7
setTimeout handler*_showDelayed@chrome://browser/content/browser.js:4920:19
handleEvent@chrome://browser/content/browser.js:4914:9
EventListener.handleEvent*update@chrome://browser/content/browser.js:4905:7
setOverLink@chrome://browser/content/browser.js:4467:5


This occurs for me when mousing out of the "File a bug" and "File a new bug anyway" links on the OhNoReflow Report page.
This happens when moving the mouse after the status panel content has changed. It's the result of bug 1356663 where we changed this code to not trigger a sync reflow immediately when the status panel content changed.
Blocks: 1356663
Summary: 3.85ms uninterruptible reflow at _calcMouseTargetRect@chrome://browser/content/tabbrowser.xml:7697:27 → uninterruptible reflow at _calcMouseTargetRect@chrome://browser/content/tabbrowser.xml:7697:27 when moving the mouse
No longer blocks: 1356663
Component: Untriaged → General
Depends on: 1356663
Flags: qe-verify?
Priority: -- → P2
Another stack:

_calcMouseTargetRect@chrome://browser/content/tabbrowser.xml:7697:27
getMouseTargetRect@chrome://browser/content/tabbrowser.xml:7661:13
_callListener@chrome://browser/content/browser.js:8233:16
handleEvent/<@chrome://browser/content/browser.js:8225:9
handleEvent@chrome://browser/content/browser.js:8223:5
EventListener.handleEvent*_delayedStartup@chrome://browser/content/browser.js:1531:5
EventListener.handleEvent*onLoad@chrome://browser/content/browser.js:1239:5
onload@chrome://browser/content/browser.xul:1:1
Whiteboard: [ohnoreflow][qf][photon-performance] → [ohnoreflow][qf:p1][photon-performance]
Priority: P2 → P3
Whiteboard: [ohnoreflow][qf:p1][photon-performance] → [ohnoreflow][qf:p1][reserve-photon-performance]
Whiteboard: [ohnoreflow][qf:p1][reserve-photon-performance] → [ohnoreflow][qf:p2][reserve-photon-performance]
Flags: qe-verify? → qe-verify+
QA Contact: adrian.florinescu
Assignee: nobody → kmaglione+bmo
Depends on: 1383367
Status: NEW → ASSIGNED
Priority: P3 → P1
Comment on attachment 8889021 [details]
Bug 1358712: Don't trigger reflows from MousePosTracker.

https://reviewboard.mozilla.org/r/160066/#review166192

::: browser/base/content/browser.js:8568
(Diff revision 1)
>  
> -    this._callListener(listener);
> +    let rect = await PromiseUtils.layoutFlushed(
> +      document, "layout",
> +      () => listener.getMouseTargetRect());
> +
> +    this._callListener(listener, rect);

This looks like it might regress bug 646038?
Attachment #8889021 - Flags: review-
Comment on attachment 8889021 [details]
Bug 1358712: Don't trigger reflows from MousePosTracker.

https://reviewboard.mozilla.org/r/160066/#review166192

> This looks like it might regress bug 646038?

Certainly worth checking.
Comment on attachment 8889021 [details]
Bug 1358712: Don't trigger reflows from MousePosTracker.

https://reviewboard.mozilla.org/r/160066/#review166814

Clearing review until we're sure this doesn't re-open bug 646038.
Attachment #8889021 - Flags: review?(mconley)
(In reply to Dão Gottwald [::dao] from comment #5)
> Comment on attachment 8889021 [details]
> Bug 1358712: Don't trigger reflows from MousePosTracker.
> 
> https://reviewboard.mozilla.org/r/160066/#review166192
> 
> ::: browser/base/content/browser.js:8568
> (Diff revision 1)
> >  
> > -    this._callListener(listener);
> > +    let rect = await PromiseUtils.layoutFlushed(
> > +      document, "layout",
> > +      () => listener.getMouseTargetRect());
> > +
> > +    this._callListener(listener, rect);
> 
> This looks like it might regress bug 646038?

As far as I can tell, it does not. The behavior of the status panel seems to be identical with or without this patch.
Comment on attachment 8889021 [details]
Bug 1358712: Don't trigger reflows from MousePosTracker.

https://reviewboard.mozilla.org/r/160066/#review166848

Okay, I'll give this another look.
Attachment #8889021 - Flags: review- → review?(dao+bmo)
Comment on attachment 8889021 [details]
Bug 1358712: Don't trigger reflows from MousePosTracker.

https://reviewboard.mozilla.org/r/160066/#review171656

Finally was able to test this after figuring out that I need to replace PromiseUtils.layoutFlushed with BrowserUtils.promiseLayoutFlushed. Alas, this does regress bug 646038.
Attachment #8889021 - Flags: review?(dao+bmo) → review-
Keywords: stale-bug
Keywords: perf
This should be for non-remote pages only, which should be rare enough these days that I feel pretty comfortable making this a qf:p3.
Whiteboard: [ohnoreflow][qf:p2][reserve-photon-performance] → [ohnoreflow][qf:p3][reserve-photon-performance]
Whiteboard: [ohnoreflow][qf:p3][reserve-photon-performance] → [ohnoreflow][qf:p3][reserve-photon-performance] [fxperf]
Whiteboard: [ohnoreflow][qf:p3][reserve-photon-performance] [fxperf] → [ohnoreflow][qf:p3][reserve-photon-performance] [fxperf:p3]
I've realized that comment 11 is just dead wrong. We can and do still hit this when hovering links in remote tabs as well. I'm setting this to fxperf:p1 and taking.
Assignee: kmaglione+bmo → mconley
Whiteboard: [ohnoreflow][qf:p3][reserve-photon-performance] [fxperf:p3] → [ohnoreflow][qf:p3][reserve-photon-performance] [fxperf:p1]
Whiteboard: [ohnoreflow][qf:p3][reserve-photon-performance] [fxperf:p1] → [ohnoreflow][qf:p3][fxperf:p1]
Attachment #8960312 - Flags: review?(dao+bmo)
Comment on attachment 8960312 [details]
Bug 1358712 - Get rid of synchronous layout flushes when calculating where to put the StatusPanel.

https://reviewboard.mozilla.org/r/229070/#review235490

::: browser/base/content/browser.js:8998
(Diff revision 1)
> +   *
> +   * We wait until after the next refresh driver tick to account for
> +   * the case that that removeListener called soon after addListener,
> +   * but before a refresh driver tick. This ensures that clean-up
> +   * actions that need to occur after removeListener happen _after_
> +   * any set-up actions that occurred for addListener.

Why are you assuming that the addListener callback still wants to run after removeListener was called? Seems like the promise should instead be rejected.

::: browser/base/content/browser.js:9060
(Diff revision 1)
> -      listener.onMouseLeave();
> +        functionsToCall.push(listener.onMouseLeave.bind(listener));
> +      }
> +    }
> +
> +    window.requestAnimationFrame(() => {
> +      for (let fn of functionsToCall) {

What's the rationale here? I don't think you can assume much about what kind of things onMouseEnter and onMouseLeave will do.
Attachment #8960312 - Flags: review?(dao+bmo)
Comment on attachment 8960312 [details]
Bug 1358712 - Get rid of synchronous layout flushes when calculating where to put the StatusPanel.

https://reviewboard.mozilla.org/r/229070/#review235490

> Why are you assuming that the addListener callback still wants to run after removeListener was called? Seems like the promise should instead be rejected.

I'm thinking mostly of the case where the StatusPanel value is set and then cleared - for example, if a page load completes, and the "Transferring..." StatusPanel message ends up being set to "" before the next refresh driver tick.

During my own testing, this was pretty frequent. It seems wrong to me to reject for such a frequent case.

And the reason I am assuming that addListener still wants to run after removeListener is because I'm assuming the addListener callback is going to set some state, like the StatusPanel does, and that the removeListener will then remove / unset that state. I'm really just trying to make sure that we get an ordered sequence of events here, regardless of how quickly the StatusPanel string changes.

Does that convince you? Or is there another alternative that I should consider?

> What's the rationale here? I don't think you can assume much about what kind of things onMouseEnter and onMouseLeave will do.

\_callListeners is being called from within a promiseDocumentFlushed callback. In that scope, we desperately need to avoid touching the DOM (bug 1441168 will hopefully make doing this impossible). Because we have no real knowledge about what onMouseEnter and onMouseLeave are doing, we can either:

1. Assume that they'll obey the invariant that the DOM must not be touched, or
2. Assume that the consumers will not obey that invarient, and queue up the functions to run when it's safe to touch the DOM instead

I went with 2 on this one.
Are the above explanations sufficient to drop the issues?
Flags: needinfo?(dao+bmo)
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #15)
> Comment on attachment 8960312 [details]
> Bug 1358712 - Get rid of synchronous layout flushes when calculating where
> to put the StatusPanel.
> 
> https://reviewboard.mozilla.org/r/229070/#review235490
> 
> > Why are you assuming that the addListener callback still wants to run after removeListener was called? Seems like the promise should instead be rejected.
> 
> I'm thinking mostly of the case where the StatusPanel value is set and then
> cleared - for example, if a page load completes, and the "Transferring..."
> StatusPanel message ends up being set to "" before the next refresh driver
> tick.
> 
> During my own testing, this was pretty frequent. It seems wrong to me to
> reject for such a frequent case.

Why does that seem wrong? I don't interpret "reject" as a failure per se.

> And the reason I am assuming that addListener still wants to run after
> removeListener is because I'm assuming the addListener callback is going to
> set some state, like the StatusPanel does, and that the removeListener will
> then remove / unset that state.

The inactive attribute can be set unconditionally after the removeListener call, it doesn't really matter whether the addListener callback ended up removing it.

> I'm really just trying to make sure that we
> get an ordered sequence of events here, regardless of how quickly the
> StatusPanel string changes.

But my point is that the StatusPanel doesn't really want further events after calling removeListener. Briefly setting the label and removing the inactive attribute seems like pointless work, even more so if this happens frequently.

> > What's the rationale here? I don't think you can assume much about what kind of things onMouseEnter and onMouseLeave will do.
> 
> \_callListeners is being called from within a promiseDocumentFlushed
> callback. In that scope, we desperately need to avoid touching the DOM (bug
> 1441168 will hopefully make doing this impossible). Because we have no real
> knowledge about what onMouseEnter and onMouseLeave are doing, we can either:
> 
> 1. Assume that they'll obey the invariant that the DOM must not be touched,
> or
> 2. Assume that the consumers will not obey that invarient, and queue up the
> functions to run when it's safe to touch the DOM instead
> 
> I went with 2 on this one.

Okay. Can you document this briefly?
Flags: needinfo?(dao+bmo)
(In reply to Dão Gottwald [::dao] from comment #17)
> 
> Why does that seem wrong? I don't interpret "reject" as a failure per se.
> 

async functions that might consume this API _do_ interpret rejections as a failure, and throw exceptions if they're not explicitly caught. We also have code in our test framework that tries to ensure that we don't have any uncaught rejections: https://searchfox.org/mozilla-central/rev/c217fbde244344fedfd07b57a740c694a456dbca/testing/xpcshell/head.js#1389

Would you prefer that each callpoint must catch() when talking to MousePosTracker?

> > And the reason I am assuming that addListener still wants to run after
> > removeListener is because I'm assuming the addListener callback is going to
> > set some state, like the StatusPanel does, and that the removeListener will
> > then remove / unset that state.
> 
> The inactive attribute can be set unconditionally after the removeListener
> call, it doesn't really matter whether the addListener callback ended up
> removing it.

Alright. I'll see if I can find a way of cancelling the addListener callback if removeListener is called for the same listener.

> Okay. Can you document this briefly?

Sure, will do, thanks.
I've seen a bunch of review requests go by for you dao, so if you're overloaded, let me know and I'll redirect this.
Attachment #8960312 - Flags: review?(dao+bmo) → review?(gijskruitbosch+bugs)
I think Dao is busy - what do you think of this, Gijs?
Attachment #8960312 - Flags: review?(gijskruitbosch+bugs) → review?(felipc)
Hey felipe - do you have cycles to review this, or should I redirect it?
Flags: needinfo?(felipc)
Hey, sorry for the delay. I can review it today
Flags: needinfo?(felipc)
Comment on attachment 8960312 [details]
Bug 1358712 - Get rid of synchronous layout flushes when calculating where to put the StatusPanel.

https://reviewboard.mozilla.org/r/229070/#review240356

Awesome, thanks for the change to onTrackingStarted
Attachment #8960312 - Flags: review?(felipc) → review+
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d9110e28f361
Get rid of synchronous layout flushes when calculating where to put the StatusPanel. r=Felipe
Backout by rgurzau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e1d77f98515e
Backed out changeset d9110e28f361 for wpt mass failures on Linux QuantumRender, e.g. in /css/css-tables/height-distribution/computing-row-measure-1.html
Backed out changeset d9110e28f361 (bug 1358712) for wpt mass failures on Linux QuantumRender, e.g. in /css/css-tables/height-distribution/computing-row-measure-1.html

Backout link: https://hg.mozilla.org/integration/autoland/rev/e1d77f98515ee5dd3ac85f0850b259435d8d7bcc

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=d9110e28f3612bd40cfa21b29b0e05fc5afd10aa

Log link: https://treeherder.mozilla.org/logviewer.html#?job_id=172498430&repo=autoland&lineNumber=1972

Log snippet: 
[task 2018-04-07T21:29:17.334Z] 21:29:17     INFO - TEST-START | /css/css-grid/grid-definition/grid-change-fit-content-argument-001.html
[task 2018-04-07T21:29:17.562Z] 21:29:17     INFO - 
[task 2018-04-07T21:29:17.564Z] 21:29:17     INFO - TEST-PASS | /css/css-grid/grid-definition/grid-change-fit-content-argument-001.html | .grid 1 
[task 2018-04-07T21:29:17.565Z] 21:29:17     INFO - TEST-PASS | /css/css-grid/grid-definition/grid-change-fit-content-argument-001.html | .grid 2 
[task 2018-04-07T21:29:17.566Z] 21:29:17     INFO - TEST-UNEXPECTED-FAIL | /css/css-grid/grid-definition/grid-change-fit-content-argument-001.html | .grid 3 - assert_equals:
Flags: needinfo?(mconley)
So I think I know what's going on here.

TL;DR: lol, this is crazy.

Longer:

So it seems as if in certain circumstances (which are strangely easy to hit in automation with WebRender enabled but much harder locally), when we create a new remote <xul:browser>, it's possible for the content of that browser to finish loading and calculating before the parent has told the child how big the <xul:browser> is. That's because the parent has yet to run layout, so as far as its concerned, the <xul:browser> is of size 0,0.

This means that in TabChild::RecvShow, the TabChild is told that its PuppetWidget has no size, and layout is calculated appropriately. In the case of the test I'm currently working with, that means table cells get widths of 0 when they should be closer to 17. And so my test fails.

After the parent finally figures out how big the <xul:browser> is, it updates the TabChild, which then updates the dimensions of everything inside of it.

So I think the reflows that were being caused by the StatusPanel when updating browser status were keeping these tests passing by forcing sync layout in the parent and ensuring that the TabChild knew the right widget dimensions in time for the test to run.

Hilarious!
Depends on: 1454493
Thanks for the back-out - it looks like landing this would actually open us to potential web breakage because of the increased risk of web content opened in new tabs thinking that the viewport is too small early in their lifetime.

Hoping to fix that in bug 1454493.
Flags: needinfo?(mconley)
Attachment #8969658 - Flags: review?(nika)
Comment on attachment 8969658 [details]
Bug 1358712 - Force the frameloader to go through layout when content causes a TabParent to be created.

https://reviewboard.mozilla.org/r/238454/#review244250

::: dom/ipc/ContentParent.cpp:4744
(Diff revision 2)
>          aNewTabParent = frameLoader->GetTabParent();
> +        // At this point, it's possible the inserted frameloader hasn't gone through
> +        // layout yet. To ensure that the dimensions that we send down when telling the
> +        // frameloader to display will be correct (instead of falling back to a 10x10
> +        // default), we force layout if necessary to get the most up-to-date dimensions.
> +        TabParent::GetFrom(aNewTabParent)->ForceFrameLoaderLayoutIfNecessary();

Is there any chance you could add a mention of this bug to the comment so people can get context if they need it in the future?

::: dom/ipc/TabParent.h:146
(Diff revision 2)
>    void AddWindowListeners();
>  
> +  // Used when content is causing a TabParent to be created, and
> +  // needs to try forcing layout to flush in order to get accurate
> +  // dimensions for the content area.
> +  void ForceFrameLoaderLayoutIfNecessary();

This feels to me like it makes more sense on the nsFrameLoader, rather than here on the TabParent -- especially after Fission, when a single FrameLoader may have many TabParents.
Attachment #8969658 - Flags: review?(nika) → review+
Comment on attachment 8969658 [details]
Bug 1358712 - Force the frameloader to go through layout when content causes a TabParent to be created.

https://reviewboard.mozilla.org/r/238454/#review244250

> Is there any chance you could add a mention of this bug to the comment so people can get context if they need it in the future?

Sure thing.
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/52471746531a
Force the frameloader to go through layout when content causes a TabParent to be created. r=Nika
https://hg.mozilla.org/integration/autoland/rev/a7fc392e48ab
Get rid of synchronous layout flushes when calculating where to put the StatusPanel. r=Felipe
https://hg.mozilla.org/mozilla-central/rev/52471746531a
https://hg.mozilla.org/mozilla-central/rev/a7fc392e48ab
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 61
Depends on: 1456411
Can you please provide some steps to reproduce this issue? Thank you!
Flags: needinfo?(mconley)
(In reply to Cristian Comorasu [:CristiComo] from comment #43)
> Can you please provide some steps to reproduce this issue? Thank you!

Hi CristiComo,

STR:

1. Install Oh no! Reflow! (Instructions: https://mikeconley.github.io/ohnoreflow/)
2. Open the add-on panel, and set the threshold to 0, and click "Not recording. Click to enable." to start recording.
3. Hover any link on any webpage such that the status panel shows up in the bottom corner.

ER:

The add-on should not report a reflow occurred when hovering a link. It will increment a counter in a badge in the add-on button when it observes a reflow (to see some example reflows, type in the URL bar - we reflow like crazy in there).

AR:

We reflow when hovering links.



The AR are true without the patch. With this patch, we successfully avoid reflowing.

Hopefully that's enough!
Flags: needinfo?(mconley)
I can confirm the browser respects the ER from comment 44. I tested using Fx 61.0b6 Developer Edition.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Blocks: 1509489
Performance Impact: --- → P3
Whiteboard: [ohnoreflow][qf:p3][fxperf:p1] → [ohnoreflow][fxperf:p1]
You need to log in before you can comment on or make changes to this bug.