Closed Bug 1742797 Opened 1 year ago Closed 5 months ago

composites happen at 60Hz until the browser window is closed after showing an arrowpanel

Categories

(Core :: Graphics: Layers, defect)

Desktop
Windows
defect

Tracking

()

RESOLVED FIXED
104 Branch
Performance Impact high
Tracking Status
firefox104 --- fixed

People

(Reporter: florian, Assigned: hiro)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Keywords: perf:resource-use, power)

Attachments

(4 files, 2 obsolete files)

I found steps to reproduce for a case that causes composites to happen at 60Hz until the browser window is closed.

  1. Preparation: in a first browser window, start a download (eg. http://test-debit.free.fr/10485760.rnd), let it progress for a few seconds, then cancel it. This makes the downloads toolbar icon appear on all browser windows.
  2. Open a new browser window, and close the first browser window.
  3. Start the profiler.
  4. Click the downloads toolbar icon, this makes the downloads panel appear. Wait for a few seconds.
  5. Dismiss the downloads panel. Wait for a few seconds.
  6. (Optional) Open a new browser window, and close the browser window that was opened at step 2.
  7. Capture the profile.

I have been able to reproduce consistently on the current Nightly on 2 Windows machines. On my fast-ish development laptop (https://share.firefox.dev/3nPqRkF) and on the Quantum reference laptop (https://share.firefox.dev/30Vk6VH). I have not been able to reproduce on Mac (and I have not tried on Linux yet).

In the profiles, when we composite at 60Hz for no obvious reason, the Screenshots track of the downloads panel is black. Reopening the downloads panel makes the composition rate go down to normal.

After step 2, composition happens normally. After step 4 it still happens normally. After step 5 (hiding the downloads panel), it happens at 60Hz. After step 6 it goes back to normal.

Emilio, Markus, could you please forward this needinfo to someone who can figure this out? I'm not sure if this is in cross platform code or if it's Windows specific.

Flags: needinfo?(mstange.moz)
Flags: needinfo?(emilio)

(In reply to Florian Quèze [:florian] from comment #0)

I have not been able to reproduce on Mac (and I have not tried on Linux yet).

On Linux I can't reproduce the 60Hz composite. The one part of the behavior that exists on Linux too is that the panel has its own Screenshots track that ends only when closing the browser window it was attached to. But there's no black screenshots on that track. https://share.firefox.dev/3cMamj0

OS: Unspecified → Windows
Hardware: Unspecified → Desktop

I couldn't repro either on Linux... Maybe Sotaro knows what might cause this on Windows? Maybe windows keeps sending us paint events? Maybe his work on occlusion helps popups as well, unclear.

Flags: needinfo?(emilio) → needinfo?(sotaro.ikeda.g)

I currently don't have a Windows machine to reproduce.

Flags: needinfo?(mstange.moz)

Florian, can you check if this is a regression?

Flags: needinfo?(florian)
Blocks: 1742842

:florian, can you check FPS with debug overlay? It could be checked with the following prefs

  • gfx.webrender.debug.profiler-ui:FPS
  • gfx.webrender.debug.profiler:true

I saw the a lot of "Paint" markers in profiler. But FPS of debug overlay was very low.

It seems like handling of profiler marker is not good. "Paint" marker id set even when rendering does not happen. Rendering happens only when aRender==true.
https://searchfox.org/mozilla-central/rev/7fe9421af35256a95acc4620e9e0b76df7867287/gfx/webrender_bindings/RenderThread.cpp#489

Flags: needinfo?(sotaro.ikeda.g)

:mstange, can you comment to "Paint" marker of comment 5?

Flags: needinfo?(mstange.moz)

(In reply to Sotaro Ikeda [:sotaro] from comment #5)

:florian, can you check FPS with debug overlay? It could be checked with the following prefs

  • gfx.webrender.debug.profiler-ui:FPS
  • gfx.webrender.debug.profiler:true

I saw the a lot of "Paint" markers in profiler. But FPS of debug overlay was very low.

The debug overlay disappears at the same time as the panel when I close the panel, so I can't see what it would show when the bug occurs.

It seems like handling of profiler marker is not good. "Paint" marker id set even when rendering does not happen. Rendering happens only when aRender==true.
https://searchfox.org/mozilla-central/rev/7fe9421af35256a95acc4620e9e0b76df7867287/gfx/webrender_bindings/RenderThread.cpp#489

I don't know if that marker is accurate or misleading. The main problem in this bug is that activity is happening at 60Hz (eg. the PVsyncBridge::Msg_NotifyVsync IPCs are also a problem).

Flags: needinfo?(florian)
See Also: → 1714846

FYI the criticality of this issue will increase as we ship bug 1733587 since the panel will now open automatically for new downloads.

(In reply to Sotaro Ikeda [:sotaro] from comment #6)

:mstange, can you comment to "Paint" marker of comment 5?

I agree it could be better - we should emit a "Renderer Update #WIN" marker instead of a "Composite" marker if aRender is false. However, as Florian said, the real issue still stands: We are waking up at 60Hz. We should stop listening for vsync and let the CPU sleep.

Flags: needinfo?(mstange.moz)

The Downloads panel has fade-out effect when it closes: https://searchfox.org/mozilla-central/rev/65d4d3399afa79c8de5a0cc11752d2ba7c31edc1/toolkit/content/xul.css#319-326

This opacity animation runs on the compositor. It is possible that we never stop the compositor-side animation if the panel is closed, maybe because we never get to an aRender == true composite when the animation is stopped.

Hiro, could you take a look?

Flags: needinfo?(hikezoe.birchill)

Looks like I filed this four years ago, in bug 1413763, and then lost track of it. To answer Hiro's question from back then:

(In reply to Hiroyuki Ikezoe (:hiro) from comment #2)

How do we hide the panel?
If we do hide by setting visiblity:hidden, the transform animation keep
running.

The panel is hidden by making its nsView hidden and by closing the view's nsIWidget. The panel widget has its own compositor.

It is possible that making the hidden panel display:none would stop the compositor animation. This is currently being investigated in bug 1714846.

Depends on: 1714846
Flags: needinfo?(hikezoe.birchill)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #4)

Florian, can you check if this is a regression?

I can reproduce on the 2020-05-22 build, which is the oldest build where the profiler works.

It looks like the animation (in fact it's two transtions) in question has a finite duration (0.18s), so it sounds odd that the animation keeps running on the compositor.

(In reply to Hiroyuki Ikezoe (:hiro) from comment #15)

It looks like the animation (in fact it's two transtions) in question has a finite duration (0.18s), so it sounds odd that the animation keeps running on the compositor.

Never mind about this comment. I did forget that we use fill-mode: both for transitions to avoid glitches on the compositor, thus it keeps running until an explicit transaction which makes the transitions stop happens.

The severity field is not set for this bug.
:jimm, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmathies)
Severity: -- → S4
Flags: needinfo?(jmathies)
Blocks: 1733587
Blocks: gfx-triage

I think this is fixed by bug 1714846 - Florian, can you confirm? I'm not particularly convinced because I don't see any composite markers at all, so I assume the steps in comment 0 miss, say, which profiler preset you're meant to use or where to look for the markers? https://share.firefox.dev/3EVl5nu is my profile. I did already figure out I had to turn off screenshots because otherwise those trigger markers for the screenshots that seemed like they'd interfere with actually determining what was going on here... but I've no idea if there's anything else I'm missing.

Flags: needinfo?(florian)

(In reply to :Gijs (he/him) from comment #18)

I think this is fixed by bug 1714846 - Florian, can you confirm? I'm not particularly convinced because I don't see any composite markers at all, so I assume the steps in comment 0 miss, say, which profiler preset you're meant to use or where to look for the markers? https://share.firefox.dev/3EVl5nu is my profile.

I guess I forgot to say "look in the Renderer or compositor thread". Your profile has composites until you close the browser window (the parent process has a DOMWindowClose event when the composite markers stop) https://share.firefox.dev/3s2NcO4

I did already figure out I had to turn off screenshots because otherwise those trigger markers for the screenshots that seemed like they'd interfere with actually determining what was going on here...

I'm curious what made you think this. We create a screenshot marker for every composite, and won't trigger any if there's no composite happening.

Flags: needinfo?(florian)

(In reply to Florian Quèze [:florian] from comment #19)

(In reply to :Gijs (he/him) from comment #18)

I think this is fixed by bug 1714846 - Florian, can you confirm? I'm not particularly convinced because I don't see any composite markers at all, so I assume the steps in comment 0 miss, say, which profiler preset you're meant to use or where to look for the markers? https://share.firefox.dev/3EVl5nu is my profile.

I guess I forgot to say "look in the Renderer or compositor thread". Your profile has composites until you close the browser window (the parent process has a DOMWindowClose event when the composite markers stop) https://share.firefox.dev/3s2NcO4

So does that mean this is fixed or did we expect the patch to mean that the composites stop when the panel is closed even while that window remains open? If the latter, who is in a good position to investigate why that doesn't happen? If the former, should we close this bug?

I did already figure out I had to turn off screenshots because otherwise those trigger markers for the screenshots that seemed like they'd interfere with actually determining what was going on here...

I'm curious what made you think this. We create a screenshot marker for every composite, and won't trigger any if there's no composite happening.

The composite markers for screenshots seem to appear even when there is no composite marker on the compositor thread - compare https://share.firefox.dev/3IOceqd and https://share.firefox.dev/3pYsTid (different views on the same profile (with screenshots this time) . Perhaps that's a separate bug?

Flags: needinfo?(florian)

(In reply to :Gijs (he/him) from comment #20)

The composite markers for screenshots seem to appear even when there is no composite marker on the compositor thread - compare https://share.firefox.dev/3IOceqd and https://share.firefox.dev/3pYsTid (different views on the same profile (with screenshots this time) . Perhaps that's a separate bug?

Note that you can Cmd-click threads and get an integrated marker chart of multiple threads: https://share.firefox.dev/3p0PlYM (I also applied a search filter for "composit")
The screenshot markers span the distance between two consecutive composites of the same window.

Ah, and in this particular case, it looks like we're getting "Composite" markers for the hidden panel, but no updated "CompositorScreenshot" markers - instead, the panel's synthesized "CompositorScreenshot" marker spans the full duration of 4.6 seconds until the parent window is closed. So it seems like we're triggering a composite but then canceling compositing somewhere in the pipeline before we capture a new screenshot. Likely because something notices that the window is closed. But the frequent "Composite" and "CompositeToTarget" markers indicate that we still wake up the CPU. And that's the part that this bug is about.

(In reply to :Gijs (he/him) from comment #20)

So does that mean this is fixed or did we expect the patch to mean that the composites stop when the panel is closed even while that window remains open? If the latter, who is in a good position to investigate why that doesn't happen?

We expect composites for the panel to stop when the panel is closed. Given the previous comments (especially comment 16), I would say Hiro would be in a good position to keep investigating.

Flags: needinfo?(florian)

Hiro, based on the last few comments, can you take a look at what's happening in current nightly that keeps us compositing when the downloads panel has been closed, and its content slot has been marked display: none? (I just checked and that change does now take effect - though the downloads panel itself seems to still be display: -moz-popup, it's unclear to me if that's expected - Emilio, did you see that when working on bug 1714846?)

Flags: needinfo?(hikezoe.birchill)

Needinfo for comment #24, sorry for the spam.

Flags: needinfo?(emilio)

Yeah, that's kind of expected. In the past when mentioning this to Neil he mentioned that the current design of popups not being display: none when closed was to avoid recreating native widgets over and over for e.g., the context menu. Maybe the trade-offs are different now?

Flags: needinfo?(emilio)
No longer blocks: gfx-triage

The animation in question here is on this downloadsPanel, right? As far as I can tell, it's the animation keeps running on the compositor and I can't see display:none style on the element after the popup was closed. Oh now I see what you meant, yeah the display style is: -moz-popup...

Flags: needinfo?(hikezoe.birchill)

I am assuming what Gijs wanted to know is why the animation keeps running after closing the popup. The answer is;

  1. The animation is a CSS transition running on the compositor, we let CSS transitions keep running on the compositor until an explicit notification comes from the main-thread
  2. Though this is a guess, in the popup case the notification is tried to be sent after popup closed thus any display list building stuff won't happen thus the notification will never be arrived to the compositor

So, I guess setting transition:none just before closing the popup would solve this issue? (IIRC there's a such popup blah event)

(In reply to Hiroyuki Ikezoe (:hiro) from comment #27)

The animation in question here is on this downloadsPanel, right? As far as I can tell, it's the animation keeps running on the compositor and I can't see display:none style on the element after the popup was closed. Oh now I see what you meant, yeah the display style is: -moz-popup...

FWIW, this sounds very similar to bug 1739161, where the element is made zero size. In that testcase, the animation ends properly only when we take EndEmptyTransaction painting path, which is only used with retained display lists.
Retained display lists are not used for popups, so this might be related.

(In reply to Miko Mynttinen [:miko] from comment #29)

(In reply to Hiroyuki Ikezoe (:hiro) from comment #27)

The animation in question here is on this downloadsPanel, right? As far as I can tell, it's the animation keeps running on the compositor and I can't see display:none style on the element after the popup was closed. Oh now I see what you meant, yeah the display style is: -moz-popup...

FWIW, this sounds very similar to bug 1739161, where the element is made zero size. In that testcase, the animation ends properly only when we take EndEmptyTransaction painting path, which is only used with retained display lists.
Retained display lists are not used for popups, so this might be related.

That's a different, a problem about bug 1739161 is happening on the main-thread, it keeps restyling on the main-thread, on the other hand in this bug the animation isn't restyled on the main-thread.

Attached patch A possible fix (obsolete) — Splinter Review

I figured out a way to solve this. The way is to call ClearCachedWebrenderResources when the popup gets hidden. I was initially thinking that it's not a good way because of resource recreation cost what Emilio commented in comment 26, but we've already clear the resources when recreating the popup. So I am mostly 100% sure it's okay.

Flags: needinfo?(gijskruitbosch+bugs)

I'm a bit confused. I think this happens for every panel? Both based on the conversation here (that seems to suggest that this happens due to the transition that is on the root of the panel/popup node, which isn't being display: none'd), and because that's what this profile seems to show, AFAICT: https://share.firefox.dev/3sl7By8 , and because that's what the dupe (bug 1413763) says.

But the steps here make me think this is somehow DL-panel specific? Was that intentional when filing it?

Given this isn't DL-panel specific, I'm going to unblock bug 1733587 as it's unrelated, really? People in general are much more likely to open "any arrow panel" either before or after the changes on nightly right now.

AIUI, the outgoing opacity transition is intentional, so I don't think frontend and/or the toolkit css and custom element code can fix this, despite the attempt in bug 1714846, and we might need a fix like the one in attachment 9255857 [details] [diff] [review].

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(florian)
No longer blocks: 1733587

(In reply to :Gijs (he/him) from comment #32)

I'm a bit confused. I think this happens for every panel? Both based on the conversation here (that seems to suggest that this happens due to the transition that is on the root of the panel/popup node, which isn't being display: none'd), and because that's what this profile seems to show, AFAICT: https://share.firefox.dev/3sl7By8 , and because that's what the dupe (bug 1413763) says.

But the steps here make me think this is somehow DL-panel specific? Was that intentional when filing it?

It was intentional, but it was me being confused. I tried a bunch of other panels at the time and they didn't reproduce. Testing a bit more today, I can reproduce with the hamburger panel and the control center / identity panels. I can't reproduce with the profiler or pocket panels.

AIUI, the outgoing opacity transition is intentional, so I don't think frontend and/or the toolkit css and custom element code can fix this

I don't think front-end code can avoid the bug without removing the opacity & transform transitions entirely.

we might need a fix like the one in attachment 9255857 [details] [diff] [review].

Here is a profile with this attachment applied: https://share.firefox.dev/3pd4033 It confirms that this change does fix the bug.

We might be able to write a simple test for this using ChromeUtils.vsyncEnabled() before/after the hamburger panel has been shown and hidden.

Flags: needinfo?(florian)
Summary: composites happen at 60Hz until the browser window is closed after showing the downloads panel. → composites happen at 60Hz until the browser window is closed after showing an arrowpanel

(In reply to Florian Quèze [:florian] from comment #33)

We might be able to write a simple test for this using ChromeUtils.vsyncEnabled() before/after the hamburger panel has been shown and hidden.

This seems to just always return true for me on Nightly, on Windows, when run from the browser console, both when the panel is open and when it's not. How is it supposed to help here? And, is there an alternative (perhaps using profiler markers) ?

Flags: needinfo?(florian)

(In reply to :Gijs (he/him) from comment #34)

(In reply to Florian Quèze [:florian] from comment #33)

We might be able to write a simple test for this using ChromeUtils.vsyncEnabled() before/after the hamburger panel has been shown and hidden.

This seems to just always return true for me on Nightly, on Windows, when run from the browser console, both when the panel is open and when it's not.

This always returns true because the browser console uses a CSS animation to make the cursor blink. When testing locally I used a setTimeout to workaround it. Eg. run setTimeout(() => console.log(ChromeUtils.vsyncEnabled()), 1000); in the console and quickly move the focus away from the console to make the cursor disappear.

Flags: needinfo?(florian)
See Also: 1714846

Setting a needinfo for Florian to come back to this when he's back from PTO, as I didn't really get very far. The patch I attached works, as far as I can tell, and the test passes but (a) it's kinda ugly and (b) the test also passes before the patch, so that's not much use. Despite Florian's extensive help, I didn't get very far trying to figure out why - I tried to use the profiler to find out what was happening in the automated test, but it doesn't seem to include profiles of the gpu process, even though when I check if the GPU process is running by using requestProcInfo it is indeed running. If I try to collect shutdown profiles by failing the test, the browser crashes, without useful stacks. So I'm a bit lost, and I'm hoping Florian is in a better position to pick this up (but if anyone else has ideas, I'm sure they'd be appreciated too - we should really try to get this fixed sooner rather than later, considering the potential perf and power usage impacts).

Flags: needinfo?(florian)
Assignee: nobody → hikezoe.birchill
Status: NEW → ASSIGNED

I managed to make the test work (on Linux with xul.panel-animations.enabled=true). So what I've noticed is it seems the issue only happens when the popup is closed by a native mouse click event. And I am now unsure though, I've changed to use the download panel instead of the browser menu.

(Though I didn't want to assign this bug to me, but moz-phab forcibly did it)

Assignee: hikezoe.birchill → nobody
Status: ASSIGNED → NEW
Assignee: nobody → hikezoe.birchill
Status: NEW → ASSIGNED

(In reply to :Gijs (back Jan 4, 2022; he/him) from comment #37)

Setting a needinfo for Florian to come back to this when he's back from PTO, as I didn't really get very far. The patch I attached works, as far as I can tell, and the test passes but (a) it's kinda ugly and (b) the test also passes before the patch, so that's not much use.

Given that Hiro managed to make the test work, do you still need my help?

I tried yesterday Hiro's version of the test on my local Windows build. The test passes with the layout/xul/nsMenuPopupFrame.cpp fix applied, and fails without. So it looks like it's testing what we expect.

I forced the test to fail to have a full profile of it including the GPU process: https://share.firefox.dev/3HvPUQA
One thing I noticed in this profile is that the initial waitForCondition lasted 3.6s, waiting for vsync to be disabled at the beginning of the test. This is due to vsync notifications being observed by a content process for an about:blank document (https://share.firefox.dev/3zl1Th3), I don't have a good understanding of what's causing that yet, but it's something to figure out in a separate bug.

Flags: needinfo?(florian)
Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Florian Quèze [:florian] from comment #39)

One thing I noticed in this profile is that the initial waitForCondition lasted 3.6s, waiting for vsync to be disabled at the beginning of the test. This is due to vsync notifications being observed by a content process for an about:blank document (https://share.firefox.dev/3zl1Th3), I don't have a good understanding of what's causing that yet, but it's something to figure out in a separate bug.

I filed bug 1748466 for this.

Okay, I take over this.

Flags: needinfo?(gijskruitbosch+bugs)
Attachment #9256448 - Attachment description: WIP: Bug 1742797 - ensure we stop compositing after closing panels → Bug 1742797 - Discard WebRender resources when popup hides. r?sotaro!,florian!
Attachment #9255857 - Attachment is obsolete: true
Pushed by hikezoe.birchill@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/14f484747a66
Discard WebRender resources when popup hides. r=Gijs,emilio

Do'h. mView in nsMenuPopupFrame is a dangling pointer... I will figure out a way to avoid the situation.

Flags: needinfo?(hikezoe.birchill)

Move if (auto* widget = GetWidget()) {} block before mView gets destroyed by nsIFrame::DestroyFrom.

Pushed by hikezoe.birchill@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e7b991987ae5
Discard WebRender resources when popup hides. r=Gijs,emilio
Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → 97 Branch
Regressions: 1748808
Regressions: 1748788

Backed out for causing Bug 1748808

Status: RESOLVED → REOPENED
Flags: needinfo?(hikezoe.birchill)
Resolution: FIXED → ---
Target Milestone: 97 Branch → ---
Depends on: 1748788

This is really bad for a range of reason. Bumping the priority. Graphics team might have some thoughts here.

Severity: S4 → S2
Whiteboard: [qf:p1:resource]
No longer regressions: 1748808
See Also: → 1748808

I did investigate two test failures, bug 1748788 and bug 1748808. Now I am pretty sure bug 1748808 is unrelated to this bug. Bug 1748788 is the only one remaining issue to land this bug.

To performance team, I will not spend time for bug 1748788.

Flags: needinfo?(hikezoe.birchill)

A way to avoid bug 1748788 I can think of is to add a new route to invoke ClearAnimationResources and use it in HidePopup, it should just work.

Hey Bas, do you know somebody who might be able to help with bug 1748788? There's a nice power / CPU usage win patch here that's being blocked by it.

Flags: needinfo?(bas)

Jeff would be the best person to figure out who might be able to look at that I think.

Flags: needinfo?(bas) → needinfo?(jmuizelaar)
No longer depends on: 1748788

Hiro, is this ready to reland now that bug 1748788 is fixed?

Depends on: 1748788
Flags: needinfo?(jmuizelaar) → needinfo?(hikezoe.birchill)
No longer regressions: 1748788

Pushed a new try run to see if there's any suspicious failure or not. https://treeherder.mozilla.org/jobs?repo=try&revision=1a10622da604d539f42a7ae867d03ff8fb0e96de

That one is a try auto run, so I might have to trigger additional tasks there.

Flags: needinfo?(hikezoe.birchill)

The test for this bug, browser_panel_vsync.js failed. Something has been changed since the last land?

It looks like the vsync tick on the compositor thread has been less often than before. Thus even if ChromeUtils.vsyncEnabled() returns false, DownloadsPanel.panel.state is not yet closed. With an additional awaiting for popuphidden the test passes properly. I also confirmed with the additional await the test still properly fails without the proper fix in nsMenuPopupFrame.cpp.

Pushed a new try run; https://treeherder.mozilla.org/jobs?repo=try&revision=cc9d09aa4020c0206b98b7f1b80989e17d15ae6c (but it looks like it hasn't yet been queued)

Though bug 1748788 was backed out now, the test browser_panel_vsync.js is quite flaky, I did confirm that resources for compositor animations has been discarded but VsyncSource gets enabled after the discard.

Florian, are you aware of the case where VsyncSource gets re-enabled for some reason?

Flags: needinfo?(florian)
Performance Impact: --- → P1
Whiteboard: [qf:p1:resource]

(In reply to Hiroyuki Ikezoe (:hiro) from comment #59)

Sorry for the late reply, I was on PTO most of last week.

Though bug 1748788 was backed out now, the test browser_panel_vsync.js is quite flaky, I did confirm that resources for compositor animations has been discarded but VsyncSource gets enabled after the discard.

Florian, are you aware of the case where VsyncSource gets re-enabled for some reason?

Flaky in which way? When running locally on my Windows machine, I saw cases where the test timeouts, with the downloads panel still visible and a leftover popuphidden event listener. I haven't been able to reproduce with the profiler running, but my guess is we need to wait for something before sending the click that will trigger the popup closing.

If you are talking about the Linux failures ("TEST-UNEXPECTED-FAIL | browser/base/content/test/performance/browser_panel_vsync.js | vsync should still be off") on the try run from comment 58, I would say the easiest way forward is to get the try server to upload profiles of the failures.
I did that using ./mach try fuzzy browser/base/content/test/performance/browser_panel_vsync.js --env MOZ_PROFILER_STARTUP=1 and here are the results: https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=b821db08fc7f9d9ef7acd4e8aa3191122731d685

Here's one of the profiles from that try run: https://share.firefox.dev/3t2wsGZ
It shows that TestUtils.waitForCondition(() => !ChromeUtils.vsyncEnabled()) ("TestUtils - waitForCondition succeeded after 0 retries - undefined") finished before hiddenPromise ("BrowserTestUtils - waitForEvent: popuphidden"). So at the time we do the final ok(!ChromeUtils.vsyncEnabled(), "vsync should still be off"); we don't have a TestUtils.waitForCondition, and it's normal for vsync to take about 200ms before being disabled after something has visibly changed.

This change https://hg.mozilla.org/try/rev/4899258047a1156aecc42278fd1ba65e190ceb9a seems to be enough to make the test green on try (https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=cccfb7dd33b48a1419826a889a18eff916906fec) minus a few leaks on debug builds that are likely due to running with the profiler.

By the way, it would be nice to provide a message for the second TestUtils.waitForCondition call at https://hg.mozilla.org/try/rev/a3c46d3d95e0f389944c128485baed798b864bf7#l2.53

Flags: needinfo?(florian)

(In reply to Florian Quèze [:florian] from comment #60)

When running locally on my Windows machine, I saw cases where the test timeouts, with the downloads panel still visible and a leftover popuphidden event listener. I haven't been able to reproduce with the profiler running,

I haven't been able to get a profile of the failure locally, but TV jobs on try could! https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=cccfb7dd33b48a1419826a889a18eff916906fec&searchStr=win&selectedTaskRun=HOzdo7EMRQWJ--QAjzOgXw.0

This profile contains both a passing run of the test and a failing run that timeouts: https://share.firefox.dev/3MsmFRY
The sequences of DOM events are different in these 2 cases. The passing case has popuphiding/popuphidden events https://share.firefox.dev/3CygCqE, but no mousedown or click events. The failing case has mousedown, click (and event dblclick!) events: https://share.firefox.dev/3HNLWCN but no popuphiding/popuphidden events.

Depends on: 1760618
No longer depends on: 1760618

The CPU consumption issue seems to have gotten worse in 100.0b. Bug 1714846 did make 97+ seem more responsive when a large DL was in progress but it seems to have regressed or something.

This high CPU util is still present in 102 and 103b.

(In reply to Hiroyuki Ikezoe (:hiro) from comment #52)

A way to avoid bug 1748788 I can think of is to add a new route to invoke ClearAnimationResources and use it in HidePopup, it should just work.

It looks like nobody is currently working on fixing bug 1748788 / bug 1707052, so maybe this suggestion would be the way forward. Can you describe a bit more what "add a new route to invoke ClearAnimationResources" means? Is this something I might be able to do myself? Could this also open a way towards fixing bug 1768495?

Flags: needinfo?(hikezoe.birchill)

This is what I had in my mind. This is just doing a part of what WebRenderLayerManager::ClearCachedResources does.

Though WebRenderLayerManager::ClearCachedResources has a transaction pattern of begin ~ end, honestly I am not sure whether the pattern also is necessary for animation resources or not. I believe Sotaro knows the answer.

Flags: needinfo?(hikezoe.birchill)

(In reply to Florian Quèze [:florian] from comment #61)

(In reply to Florian Quèze [:florian] from comment #60)

When running locally on my Windows machine, I saw cases where the test timeouts, with the downloads panel still visible and a leftover popuphidden event listener. I haven't been able to reproduce with the profiler running,

I haven't been able to get a profile of the failure locally, but TV jobs on try could! https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=cccfb7dd33b48a1419826a889a18eff916906fec&searchStr=win&selectedTaskRun=HOzdo7EMRQWJ--QAjzOgXw.0

This profile contains both a passing run of the test and a failing run that timeouts: https://share.firefox.dev/3MsmFRY
The sequences of DOM events are different in these 2 cases. The passing case has popuphiding/popuphidden events https://share.firefox.dev/3CygCqE, but no mousedown or click events. The failing case has mousedown, click (and event dblclick!) events: https://share.firefox.dev/3HNLWCN but no popuphiding/popuphidden events.

I figured this out. The problem was that we use EventUtils.synthesizeNativeMouseEvent twice to generate clicks, but that function actually generates a mouse down and mouse up. If we run that twice quickly enough, instead of synthesizing 2 separate clicks, we synthesize a double click, which triggers different behaviors. If I go to the Windows mouse settings and make the double clicks slower, then I can reproduce all the time locally. This also explains why the failures happened more often locally (faster developer machine) and stopped happening when profiling (the profiler slowing things down a little bit).

Using EventUtils.synthesizeKey("VK_ESCAPE" instead of a second click to close the panel works.

I pushed to try the patch attached in comment 65 with the test fixes I mentioned in comment 60 and 66: https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=384b458b9b6a2b21871e30a361340cfa4f315fff&selectedTaskRun=cFI-Y9slSTKPX-aYdFBXuA.0

It is green, except on the linux swr-wayland jobs where it fails on an assertion: Assertion failure: mGdkWindow (Getting configure for invisible window?), added in bug 1738729.

To see if the assertion was caused by the fix, I pushed again the same test but with the fix commented out. It failed again with the same assertion on the linux swr-wayland jobs: https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=e8c7ac3ac87850073f5e61fd245415eeb1ab154a
It also failed on the Window 7 jobs (that's expected with the fix commented out!), but didn't fail on the Windows 10 jobs, which is puzzling as on my local Windows 10 and 11 machines, the test fails without the fix and passes with the fix.

Martin, how important is this assertion, and could it be changed to a simple return?

Hiro, thanks for the patch! Do you have any idea of why the test doesn't fail on Windows 10 on try when the fix is disabled? And do you think we could still go ahead and request review to reland without having understood this? (and with a skip-if for the linux swr-wayland jobs)

Flags: needinfo?(stransky)
Flags: needinfo?(hikezoe.birchill)

(In reply to Florian Quèze [:florian] from comment #67)

Do you have any idea of why the test doesn't fail on Windows 10 on try when the fix is disabled?

IIRC, closing the popup window by a native click is the way to make the test reliably fail without any fix, so using VK_ESCAPE did cause it?

EventUtils.synthesizeNativeMouseEvent can accept a callback function argument, so I guess if we use it, we can avoid the double click problem?

FWIW, from my comment #38

I managed to make the test work (on Linux with xul.panel-animations.enabled=true). So what I've noticed is it seems the issue only happens when the popup is closed by a native mouse click event. And I am now unsure though, I've changed to use the download panel instead of the browser menu.

(In reply to Florian Quèze [:florian] from comment #67)

And do you think we could still go ahead and request review to reland without having understood this? (and with a skip-if for the linux swr-wayland jobs)

I'd say yes, if the test reliably works on other platforms.

Flags: needinfo?(hikezoe.birchill)

Just a side-note that the Wayland issue could well be a system bug. Ubuntu 18.04 did not support Wayland officially, so fixes to the display manager don't get backported. So it would be worth retrying once bug 1725245 lands.

(In reply to Hiroyuki Ikezoe (:hiro) from comment #68)

(In reply to Florian Quèze [:florian] from comment #67)

Do you have any idea of why the test doesn't fail on Windows 10 on try when the fix is disabled?

IIRC, closing the popup window by a native click is the way to make the test reliably fail without any fix, so using VK_ESCAPE did cause it?

Yes.

When I push the test to try without the fix and using VK_ESCAPE, there's a failure only on Windows 7.
If I push this using a second click to close the panel, I additionally get TV failures on Windows: https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C3%2C2&revision=09650045400f91e6449c268157c664d780c96d97

EventUtils.synthesizeNativeMouseEvent can accept a callback function argument, so I guess if we use it, we can avoid the double click problem?

My understand of how this observer works is that we would get notified after the click has been dispatched, but waiting for the panel to open already handles that. We would need to wait long enough for the second click to not be considered as a double click by the OS, but that won't be covered by the observer.

(In reply to Florian Quèze [:florian] from comment #67)

And do you think we could still go ahead and request review to reland without having understood this? (and with a skip-if for the linux swr-wayland jobs)

I'd say yes, if the test reliably works on other platforms.

I've been told in #sheriffs that the wayland jobs don't run in CI, and I don't need to add a skip-if annotation. Apparently it's a bug that the job is marked as tier1 on Try.

(In reply to Florian Quèze [:florian] from comment #67)

I pushed to try the patch attached in comment 65 with the test fixes I mentioned in comment 60 and 66: https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=384b458b9b6a2b21871e30a361340cfa4f315fff&selectedTaskRun=cFI-Y9slSTKPX-aYdFBXuA.0

It is green, except on the linux swr-wayland jobs where it fails on an assertion: Assertion failure: mGdkWindow (Getting configure for invisible window?), added in bug 1738729.

To see if the assertion was caused by the fix, I pushed again the same test but with the fix commented out. It failed again with the same assertion on the linux swr-wayland jobs: https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=e8c7ac3ac87850073f5e61fd245415eeb1ab154a
It also failed on the Window 7 jobs (that's expected with the fix commented out!), but didn't fail on the Windows 10 jobs, which is puzzling as on my local Windows 10 and 11 machines, the test fails without the fix and passes with the fix.

Martin, how important is this assertion, and could it be changed to a simple return?

It's somehow expected so don't worry about it.

Flags: needinfo?(stransky)
Pushed by fqueze@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/88f60761e884
Discard WebRender resources when popup hides. r=Gijs,emilio
https://hg.mozilla.org/integration/autoland/rev/cfc1f440fb6d
Add a new IPC call to clear animation resources on WebRender, r=sotaro.
https://hg.mozilla.org/integration/autoland/rev/a692ca1d8401
Fix intermittent test failures, r=hiro.
Pushed by fqueze@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c48ff146a33d
Address review comment and ignore spurious popuphiding events on Linux r=mconley
Regressions: 1779295

Backed out for causing failures on /browser_ext_getViews.js

[task 2022-07-12T17:11:48.760Z] 17:11:48     INFO - TEST-PASS | browser/components/extensions/test/browser/browser_ext_getViews.js | browsing context group is correct - Expected: 1911, Actual: 1911 - 
[task 2022-07-12T17:11:48.760Z] 17:11:48     INFO - Buffered messages finished
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - TEST-UNEXPECTED-FAIL | browser/components/extensions/test/browser/browser_ext_getViews.js | view type is valid - 
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - Stack trace:
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - chrome://mochikit/content/browser-test.js:test_ok:1400
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - chrome://mochikit/content/tests/SimpleTest/ExtensionTestUtils.js:testHandler:87
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - chrome://mochikit/content/tests/SimpleTest/ExtensionTestUtils.js:testResult:97
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://specialpowers/SpecialPowersChild.jsm:listener:2020
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://specialpowers/SpecialPowersChild.jsm:loadExtension/<:1950
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://specialpowers/SpecialPowersChild.jsm:receiveMessage:253
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - JSActor query*resource://specialpowers/SpecialPowersParent.jsm:resultListener:1120
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://gre/modules/ExtensionCommon.jsm:emit:320
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://gre/modules/Extension.jsm:receiveMessage:2536
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - MessageListener.receiveMessage*resource://gre/modules/Extension.jsm:Extension:2340
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://gre/modules/Extension.jsm:startup:2230
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://gre/modules/addons/XPIProvider.jsm:callBootstrapMethod:1852
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://gre/modules/addons/XPIProvider.jsm:startup:1960
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://gre/modules/addons/XPIProvider.jsm:_install:2037
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://gre/modules/addons/XPIProvider.jsm:install:2026
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://gre/modules/addons/XPIInstall.jsm:_activateAddon:4581
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://gre/modules/addons/XPIInstall.jsm:installTemporaryAddon:4440
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://gre/modules/addons/XPIProvider.jsm:XPIProvider[meth]:3284
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://gre/modules/AddonManager.jsm:installTemporaryAddon:2477
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://gre/modules/AddonManager.jsm:installTemporaryAddon:4142
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://testing-common/ExtensionTestCommon.jsm:startup:182
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - resource://specialpowers/SpecialPowersParent.jsm:receiveMessage/<:1196
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - promise callback*resource://specialpowers/SpecialPowersParent.jsm:receiveMessage:1190
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - JSActor query*resource://specialpowers/SpecialPowersChild.jsm:startup:1975
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - chrome://mochitests/content/browser/browser/components/extensions/test/browser/browser_ext_getViews.js:null:154
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - chrome://mochikit/content/browser-test.js:handleTask:988
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - chrome://mochikit/content/browser-test.js:_runTaskBasedTest:1060
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - chrome://mochikit/content/browser-test.js:Tester_execTest:1195
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - chrome://mochikit/content/browser-test.js:nextTest/<:977
[task 2022-07-12T17:11:48.761Z] 17:11:48     INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/<:1041
[task 2022-07-12T17:11:48.762Z] 17:11:48     INFO - TEST-PASS | browser/components/extensions/test/browser/browser_ext_getViews.js | browsing context group is correct - Expected: 1911, Actual: 1911 -
Flags: needinfo?(florian)
Pushed by fqueze@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/18e11e7ea9d0
Discard WebRender resources when popup hides. r=Gijs,emilio
https://hg.mozilla.org/integration/autoland/rev/8ed69c2c1d49
Add a new IPC call to clear animation resources on WebRender, r=sotaro.
https://hg.mozilla.org/integration/autoland/rev/5b0edd03f698
Fix intermittent test failures, r=hiro.
Attachment #9285125 - Attachment is obsolete: true

(In reply to Natalia Csoregi [:nataliaCs] from comment #77)

Backed out for causing failures on /browser_ext_getViews.js

This was investigated in bug 1748808.

Flags: needinfo?(florian)
Backout by abutkovits@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ee7ddfc6690d
Backed out 4 changesets (bug 1742797, bug 1748808) for causing crashes at mozilla::layers::APZCTreeManagerParent::RecvUpdateZoomConstraints.

browser_test_select_popup_position.js is the test caused the crash.

The changes here unlikely cause the crash, at first glance.

These kind of crashes happened also with the land in comment 74, link here. Bug 1779309 was filled for them then.

[task 2022-07-12T22:21:34.354Z] 22:21:34     INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ mozilla::layers::APZCTreeManagerParent::RecvUpdateZoomConstraints(mozilla::layers::ScrollableLayerGuid const&, mozilla::Maybe<mozilla::layers::ZoomConstraints> const&)]
[task 2022-07-12T22:21:34.354Z] 22:21:34     INFO - Crash dump filename: C:\Users\task_165765759579374\AppData\Local\Temp\tmplwrnxkty.mozrunner\minidumps\61f22c50-d8a2-4c59-9a49-a420248cffc7.dmp
[task 2022-07-12T22:21:34.354Z] 22:21:34     INFO - Operating system: Windows NT
[task 2022-07-12T22:21:34.354Z] 22:21:34     INFO -                   10.0.19041
[task 2022-07-12T22:21:34.354Z] 22:21:34     INFO - CPU: amd64
[task 2022-07-12T22:21:34.354Z] 22:21:34     INFO -      family 6 model 106 stepping 6
[task 2022-07-12T22:21:34.354Z] 22:21:34     INFO -      8 CPUs
[task 2022-07-12T22:21:34.354Z] 22:21:34     INFO - 
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO - Crash reason:  EXCEPTION_ACCESS_VIOLATION_READ
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO - Crash address: 0x0
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO - Process uptime: 4 seconds
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO - 
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO - Thread 8 Compositor (crashed)
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -  0  xul.dll!mozilla::layers::APZCTreeManagerParent::RecvUpdateZoomConstraints(mozilla::layers::ScrollableLayerGuid const&, mozilla::Maybe<mozilla::layers::ZoomConstraints> const&) [APZCTreeManagerParent.cpp:a692ca1d84013f272d5e1170e23e3acdef89be08 : 97 + 0x0]
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      rax = 0x0000000100000023    rdx = 0x0000024b6f4bd880
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      rcx = 0x0000000000000000    rbx = 0x0000024b6f527910
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      rsi = 0x00000082f067f528    rdi = 0x00000082f067f548
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      rbp = 0x0000000000000001    rsp = 0x00000082f067f4c0
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -       r8 = 0x00000082f067f548     r9 = 0x00000082f067f530
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      r10 = 0x0000024b7b46e010    r11 = 0x0000024b6f3e8100
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      r12 = 0x00000082f067faaf    r13 = 0x0000000069a0c300
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      r14 = 0x0000024b6f4bd880    r15 = 0x0000000000000000
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      rip = 0x00007ffb84b353e8
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -     Found by: given as instruction pointer in context
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -  1  xul.dll!mozilla::layers::PAPZCTreeManagerParent::OnMessageReceived(IPC::Message const&) [PAPZCTreeManagerParent.cpp: : 443 + 0x7]
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      rbx = 0x0000024b6f527910    rsi = 0x00000082f067f588
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      rbp = 0x0000000000000001    rsp = 0x00000082f067f4f0
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      r12 = 0x00000082f067faaf    r13 = 0x0000000069a0c300
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      r14 = 0x0000024b6f4bd880    r15 = 0x0000000000000000
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      rip = 0x00007ffb84bb17b5
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -  2  xul.dll!mozilla::layers::PCompositorManagerParent::OnMessageReceived(IPC::Message const&) [PCompositorManagerParent.cpp: : 194 + 0xf]
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      rbx = 0x0000024b6f5ebc20    rsi = 0x0000024b6f527900
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      rdi = 0x0000000000000000    rbp = 0x0000000000000001
[task 2022-07-12T22:21:34.367Z] 22:21:34     INFO -      rsp = 0x00000082f067f5f0    r12 = 0x00000082f067faaf
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      r13 = 0x0000000069a0c300    r14 = 0x0000024b69a50bb0
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      r15 = 0x0000000000000000    rip = 0x00007ffb84aa9e5f
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -  3  xul.dll!mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) [MessageChannel.cpp:a692ca1d84013f272d5e1170e23e3acdef89be08 : 1749 + 0xf]
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      rbx = 0x0000024b6ea9a200    rsi = 0x0000024b6f527900
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      rdi = 0x0000024b69a50c28    rbp = 0x0000000000000001
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      rsp = 0x00000082f067f6e0    r12 = 0x00000082f067faaf
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      r13 = 0x0000000069a0c300    r14 = 0x0000024b6ea9a200
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      r15 = 0x0000000000000000    rip = 0x00007ffb8472b3c0
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -  4  xul.dll!mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::UniquePtr<IPC::Message,mozilla::DefaultDelete<IPC::Message> >) [MessageChannel.cpp:a692ca1d84013f272d5e1170e23e3acdef89be08 : 1674 + 0xa]
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      rbx = 0x0000024b6ea513a0    rsi = 0x00000082f067f840
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      rdi = 0x0000024b69a50c28    rbp = 0x00000000ffffffff
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      rsp = 0x00000082f067f740    r12 = 0x00000082f067faaf
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      r13 = 0x0000000069a0c300    r14 = 0x0000024b6ea9a200
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      r15 = 0x0000000000000000    rip = 0x00007ffb8472a835
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -  5  xul.dll!mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::ipc::MessageChannel::MessageTask&) [MessageChannel.cpp:a692ca1d84013f272d5e1170e23e3acdef89be08 : 1474 + 0xa]
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      rbx = 0x0000024b6f528400    rsi = 0x0000024b69a50c28
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      rdi = 0x0000024b6ea9a200    rbp = 0x0000000000000000
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      rsp = 0x00000082f067f820    r12 = 0x00000082f067faaf
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      r13 = 0x0000000069a0c300    r14 = 0x0000024b6ea513a0
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -      r15 = 0x0000000000000000    rip = 0x00007ffb8472abfe
[task 2022-07-12T22:21:34.368Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -  6  xul.dll!mozilla::ipc::MessageChannel::MessageTask::Run() [MessageChannel.cpp:a692ca1d84013f272d5e1170e23e3acdef89be08 : 1572 + 0xa]
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      rbx = 0x0000024b6ea9a200    rsi = 0x0000024b6f528440
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      rdi = 0x0000024b6f528400    rbp = 0x0000000000000000
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      rsp = 0x00000082f067f870    r12 = 0x00000082f067faaf
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      r13 = 0x0000000069a0c300    r14 = 0x0000024b6ea513a0
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      r15 = 0x0000000000000000    rip = 0x00007ffb8472ae91
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -  7  xul.dll!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:a692ca1d84013f272d5e1170e23e3acdef89be08 : 1199 + 0xc]
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      rbx = 0x00000082f067f9a0    rsi = 0x0000024b69a68300
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      rdi = 0x00000000ffffffff    rbp = 0x0000000000000000
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      rsp = 0x00000082f067f8c0    r12 = 0x00000082f067faaf
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      r13 = 0x0000000069a0c300    r14 = 0x0000000000000000
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      r15 = 0x0000000000000000    rip = 0x00007ffb840fb958
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -  8  xul.dll!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:a692ca1d84013f272d5e1170e23e3acdef89be08 : 465 + 0x16]
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      rbx = 0x0000000069abe501    rsi = 0x0000024b69abe5c0
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      rdi = 0x00000082f067fc08    rbp = 0x00000082f067fb50
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      rsp = 0x00000082f067fa80    r12 = 0x00007ffbbe4bfaa0
[task 2022-07-12T22:21:34.369Z] 22:21:34     INFO -      r13 = 0x00007ffb9c5a9468    r14 = 0x0000024b69a68300
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      r15 = 0x0000024b69abe5e0    rip = 0x00007ffb840ff928
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -  9  xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [MessagePump.cpp:a692ca1d84013f272d5e1170e23e3acdef89be08 : 330 + 0x9]
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      rbx = 0x0000000069abe500    rsi = 0x0000024b69abe5c0
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      rbp = 0x00000082f067fb50    rsp = 0x00000082f067fad0
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      r12 = 0x00007ffbbe4bfaa0    r13 = 0x00007ffb9c5a9468
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      r14 = 0x0000024b69a68300    r15 = 0x0000024b69abe5e0
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      rip = 0x00007ffb8472e0ae
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO - 10  xul.dll!MessageLoop::RunHandler() [message_loop.cc:a692ca1d84013f272d5e1170e23e3acdef89be08 : 373 + 0x15]
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      rbx = 0x0000024b69a68300    rsi = 0x00000082f067fc08
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      rdi = 0x000000000000000a    rbp = 0x00000082f067fb50
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      rsp = 0x00000082f067fb30    r12 = 0x00007ffbbe4bfaa0
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      r13 = 0x00007ffb9c5a9468    r14 = 0x0000024b69ac5b28
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      r15 = 0x0000024b69ac5b20    rip = 0x00007ffb846ec750
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO - 11  xul.dll!MessageLoop::Run() [message_loop.cc:a692ca1d84013f272d5e1170e23e3acdef89be08 : 355 + 0x4]
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      rbx = 0x0000024b69a68300    rsi = 0x00000082f067fc08
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      rdi = 0x000000000000000a    rbp = 0x00000082f0600000
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      rsp = 0x00000082f067fb80    r12 = 0x00007ffbbe4bfaa0
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      r13 = 0x00007ffb9c5a9468    r14 = 0x0000024b69ac5b28
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      r15 = 0x0000024b69ac5b20    rip = 0x00007ffb846ec6c8
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO - 12  xul.dll!static nsThread::ThreadFunc(void*) [nsThread.cpp:a692ca1d84013f272d5e1170e23e3acdef89be08 : 384 + 0x7]
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      rbx = 0x0000024b69a68300    rbp = 0x00000082f0600000
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      rsp = 0x00000082f067fbd0    r12 = 0x00007ffbbe4bfaa0
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      r13 = 0x00007ffb9c5a9468    r14 = 0x0000024b69ac5b28
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -      r15 = 0x0000024b69ac5b20    rip = 0x00007ffb840f8cdd
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.370Z] 22:21:34     INFO - 13  nss3.dll!PR_NativeRunThread(void*) [pruthr.c:a692ca1d84013f272d5e1170e23e3acdef89be08 : 399 + 0xd]
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      rbx = 0x0000024b69a30a30    rsi = 0x0000024b69a94a00
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      rdi = 0x0000024b69a94b38    rbp = 0x00000082f0600000
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      rsp = 0x00000082f067fda0    r12 = 0x00007ffbbe4bfaa0
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      r13 = 0x00007ffb9c5a9468    r14 = 0x0000024b69a30a00
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      r15 = 0x0000024b69a94b48    rip = 0x00007ffb9c496b67
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO - 14  nss3.dll!pr_root(void*) [w95thred.c:a692ca1d84013f272d5e1170e23e3acdef89be08 : 139 + 0xc]
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      rbx = 0x0000024b6b3b1540    rsi = 0x0000024b6b3b1540
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      rdi = 0x0000000000000000    rbp = 0x0000000000000000
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      rsp = 0x00000082f067fe20    r12 = 0x0000000000000000
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      r13 = 0x0000000000000000    r14 = 0x0000000000000000
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      r15 = 0x0000000000000000    rip = 0x00007ffb9c4892b1
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO - 15  ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1> + 0x41
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      rbx = 0x0000024b6b3b1540    rbp = 0x0000000000000000
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      rsp = 0x00000082f067fe50    r12 = 0x0000000000000000
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      r13 = 0x0000000000000000    r14 = 0x0000000000000000
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      r15 = 0x0000000000000000    rip = 0x00007ffbbbed1bb2
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO - 16  kernel32.dll!BaseThreadInitThunk + 0x13
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      rbx = 0x00007ffbbbed1b70    rbp = 0x0000000000000000
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      rsp = 0x00000082f067fe80    r12 = 0x0000000000000000
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      r13 = 0x0000000000000000    r14 = 0x0000000000000000
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      r15 = 0x0000000000000000    rip = 0x00007ffbbc727034
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO - 17  mozglue.dll!patched_BaseThreadInitThunk(int, void*, void*) [WindowsDllBlocklist.cpp:a692ca1d84013f272d5e1170e23e3acdef89be08 : 572 + 0x28]
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      rbx = 0x00007ffbbbed1b70    rbp = 0x0000000000000000
[task 2022-07-12T22:21:34.371Z] 22:21:34     INFO -      rsp = 0x00000082f067feb0    r12 = 0x0000000000000000
[task 2022-07-12T22:21:34.372Z] 22:21:34     INFO -      r13 = 0x0000000000000000    r14 = 0x0000000000000000
[task 2022-07-12T22:21:34.372Z] 22:21:34     INFO -      r15 = 0x0000000000000000    rip = 0x00007ffba4224674
[task 2022-07-12T22:21:34.372Z] 22:21:34     INFO -     Found by: call frame info
[task 2022-07-12T22:21:34.372Z] 22:21:34     INFO - 18  ntdll.dll!RtlUserThreadStart + 0x20
[task 2022-07-12T22:21:34.372Z] 22:21:34     INFO -      rbx = 0x0000000000000000    rsi = 0x0000000000000000
[task 2022-07-12T22:21:34.372Z] 22:21:34     INFO -      rdi = 0x0000000000000000    rbp = 0x0000000000000000
[task 2022-07-12T22:21:34.372Z] 22:21:34     INFO -      rsp = 0x00000082f067ff20    r12 = 0x0000000000000000
[task 2022-07-12T22:21:34.372Z] 22:21:34     INFO -      r13 = 0x0000000000000000    r14 = 0x0000000000000000
[task 2022-07-12T22:21:34.372Z] 22:21:34     INFO -      r15 = 0x0000000000000000    rip = 0x00007ffbbe4e2651
[task 2022-07-12T22:21:34.372Z] 22:21:34     INFO -     Found by: call frame info
~~~
Flags: needinfo?(hikezoe.birchill)

It looks like there are 2 different crashes.

One [@ mozilla::layers::APZCTreeManagerParent::RecvUpdateZoomConstraints(mozilla::layers::ScrollableLayerGuid const&, mozilla::Maybe<mozilla::layers::ZoomConstraints> const&)], which seems to be a pre-existing bug, as bug 1488886 had the exact same stack. Interestingly, all the recent crash reports there are from Thunderbird. The line is https://searchfox.org/mozilla-central/rev/fa71140041c5401b80a11f099cc0cd0653295e2c/gfx/layers/ipc/APZCTreeManagerParent.cpp#97 mTreeManager->UpdateZoomConstraints(aGuid, aConstraints); and the crash address is 0, maybe mTreeManager is null because it has already been destroyed?

Another one [@ mozilla::layers::APZCTreeManager::StartAutoscroll(mozilla::layers::ScrollableLayerGuid const&, mozilla::gfx::PointTyped<mozilla::ScreenPixel,float> const&)].
The line at https://searchfox.org/mozilla-central/rev/fa71140041c5401b80a11f099cc0cd0653295e2c/gfx/layers/apz/src/APZCTreeManager.cpp#1025 is NotifyAutoscrollRejected(aGuid);. So I guess https://searchfox.org/mozilla-central/rev/fa71140041c5401b80a11f099cc0cd0653295e2c/gfx/layers/apz/src/APZCTreeManager.cpp#1064-1067 is inlined. The crash address is also 0, and this is on the GPU process.

(In reply to Florian Quèze [:florian] from comment #83)

Another one [@ mozilla::layers::APZCTreeManager::StartAutoscroll(mozilla::layers::ScrollableLayerGuid const&, mozilla::gfx::PointTyped<mozilla::ScreenPixel,float> const&)].
The line at https://searchfox.org/mozilla-central/rev/fa71140041c5401b80a11f099cc0cd0653295e2c/gfx/layers/apz/src/APZCTreeManager.cpp#1025 is NotifyAutoscrollRejected(aGuid);. So I guess https://searchfox.org/mozilla-central/rev/fa71140041c5401b80a11f099cc0cd0653295e2c/gfx/layers/apz/src/APZCTreeManager.cpp#1064-1067 is inlined. The crash address is also 0, and this is on the GPU process.

I opened a minidump in a debugger to see a better stack, and there's indeed an inlined stack frame. The crashing line is https://searchfox.org/mozilla-central/rev/108c7843696fdf951083889d6fb858953139be96/gfx/layers/apz/src/APZCTreeManager.cpp#1067 (controller->NotifyAsyncAutoscrollRejected(aGuid.mScrollId); where controller is NULL)

(In reply to Hiroyuki Ikezoe (:hiro) from comment #81)

browser_test_select_popup_position.js is the test caused the crash.

The changes here unlikely cause the crash, at first glance.

I think we just ran into a way to reproduce bug 1488886.

I noticed that in all cases where the crash was reproduced, the browser\base\content\test\performance\browser.ini manifest (which includes the new test added in this bug, which moves the mouse using EventUtils.synthesizeNativeMouseEvent({type: "click") and the gfx\layers\apz\test\mochitest\browser.ini manifest (where we intermittently crash at shutdown) ran in the same job (typically bc6). When gfx\layers\apz\test\mochitest\browser.ini ran in bc4 independently of browser\base\content\test\performance\browser.ini, there was no crash.

I could reproduce the crash on try, and if I add a simple change to our new test to move the mouse to the center of the browser window at the end of the test, I can't reproduce the crashes anymore. There is activity in bug 1779985 to avoid this entire class of issues in the future.

Flags: needinfo?(hikezoe.birchill)
Pushed by fqueze@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fa3e753f8f94
Discard WebRender resources when popup hides. r=Gijs,emilio
https://hg.mozilla.org/integration/autoland/rev/494979d407ea
Add a new IPC call to clear animation resources on WebRender, r=sotaro.
https://hg.mozilla.org/integration/autoland/rev/0c27a3d7f914
Fix intermittent test failures, r=hiro.
See Also: → 1488886
Status: REOPENED → RESOLVED
Closed: 11 months ago5 months ago
Resolution: --- → FIXED
Target Milestone: --- → 104 Branch
See Also: → 1780207
See Also: 1780207
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.