Closed Bug 1657597 Opened 1 year ago Closed 2 days ago

Firefox slows down after clicking the hamburger menu button. (Basic/X11)

Categories

(Core :: Graphics: WebRender, defect, P3)

79 Branch
x86_64
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 1635153

People

(Reporter: lucastronks, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0

Steps to reproduce:

Open Firefox and click the hamburger menu button.

Actual results:

The menu opened and Firefox slowed down to about one frame per second, and remains to operate at this speed until the menu is closed.

Expected results:

The menu opened and Firefox remained as fast it was prior to clicking the hamburger menu button.

hi, thank you for the report. could you capture and share a performance profile that covers this situation?:
https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem

(In reply to [:philipp] from comment #1)

hi, thank you for the report. could you capture and share a performance profile that covers this situation?:
https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem

Hi, thank you for your quick response. Here is the performance report:

https://share.firefox.dev/2XA3y0A

While recording, I found out that it happens in other menus as well, like the downloads and bookmarks, and one extension but not the other. All of this should be visible in the performance profile, including screenshots that show what I was doing at the time. I also switched tabs and scrolled a bit to demonstrate that it really only happens when performing certain actions and not others.

I hope this will be of help.

(In reply to [:philipp] from comment #1)

hi, thank you for the report. could you capture and share a performance profile that covers this situation?:
https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem

Also, please note that I am on version 79, even though my user agent seemed to for some reason report version 78 when I created the report (that could be another bug, who knows ;) ).

i'm not entirely well-versed in interpreting profiles, but if i'm not mistaken the slowdowns here might be related to webrender. please correct me if i'm wrong.

Component: Untriaged → Graphics: WebRender
Keywords: perf
Product: Firefox → Core

Can you do another profile but use custom settings and disable screenshots? Taking screenshots in the profile adds a lot of graphics overhead.

Flags: needinfo?(lucastronks)

Please also open about:support, click on "Copy text to clipboard" and paste it here. Thanks!

OS: Unspecified → Linux
Hardware: Unspecified → x86_64

Another thing to try is running the Nightly - https://www.mozilla.org/en-US/firefox/80.0a1/releasenotes/
I'm seeing that in the original profile, the Renderer thread spends a ton of time in clip draw calls. This is totally unexpected. I'm not able to repro it on my Linux/Nightly/WR config.
Marking as S3 for now

Blocks: gfx-triage
Severity: -- → S3
Priority: -- → P3
Attached file about:support contents
Flags: needinfo?(lucastronks)

(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #6)

Please also open about:support, click on "Copy text to clipboard" and paste it here. Thanks!

Hi,

I have now attached the content of about:support.

(In reply to Timothy Nikkel (:tnikkel) from comment #5)

Can you do another profile but use custom settings and disable screenshots? Taking screenshots in the profile adds a lot of graphics overhead.

Hi,

Done! https://share.firefox.dev/3a7Bzdz

Apologies for the delay.

I have confirmed that this issue does not occur when I set gfx.webrender.all to false in about:config: with gfx.webrender.all to false, everything happens as it should.

  1. Do you see the same problem after enabling layers.acceleration.force-enabled and restarting Firefox 79?
  2. Does the same problem occur with WebRender and https://nightly.mozilla.org?

@Lucas: Pinging to see if you can provide the above requested information

Flags: needinfo?(lucastronks)

(In reply to Darkspirit, Servo QA from comment #12)

  1. Do you see the same problem after enabling layers.acceleration.force-enabled and restarting Firefox 79?
  2. Does the same problem occur with WebRender and https://nightly.mozilla.org?
  1. Yes, I do. If I disable gfx.webrender.all (which by itself fixes the issue) but then enable layers.acceleration.force-enabled and restart the browser I have the same issue again.
  2. Yes, I still face the same issue with webrender on the latest (as of time of writing) Firefox Nightly 64-bit for Linux, downloaded from https://www.mozilla.org/en-US/firefox/channel/desktop/#nightly.

I apologize again for the delay. I'm quite busy lately.

Flags: needinfo?(lucastronks)

Regarding the priority of this issue: this bug makes the affected UI elements of the browser essentially unusable (it takes something like 7 seconds to select an option from the hamburger menu, for example). I am not asking for the priority to be changed. I'm just letting you know how bad exactly the slowdown is because I imagine it could be difficult to gauge the issue priority in case you cannot reproduce the issue.

then enable layers.acceleration.force-enabled and restart the browser I have the same issue again

That confuses me most. Could it be something with direct composition, assuming both WR and non-WR use it?

This is Linux. I have exact the same Intel APU and don't see this bug with Gnome/KDE Wayland/X11 on Debian Testing. I'll check if I can reproduce it with Cinnamon (comment 8).

Windows is mentioned in the UA due to privacy.resistFingerprinting: true.

Maybe bug 1560457 is not restricted to Nvidia? It would be negatively surprising if Cinnamon messes OpenGL up for all its users.

(thepiguy0 from bug 1560457 comment #5)

(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #4)

Cinnamon

This problem has a long history: https://github.com/linuxmint/Cinnamon/issues/2465

Would this tutorial help? https://forums.linuxmint.com/viewtopic.php?t=277267 (Disable Vsync for the Clutter Compositor)

When applying these tweaks, disabling "Sync to VBlank" actually massively improves overall input latency for my system but then I experienced screen tearing across the whole system. Enabling "Force Composition Pipeline" then removed the tearing for both software and hardware rendered Firefox.

(In reply to lucastronks from comment #15)
Can you check if disabling "Sync to VBlank" helps you as well?

(In reply to Darkspirit, Servo QA from comment #19)

Maybe bug 1560457 is not restricted to Nvidia? It would be negatively surprising if Cinnamon messes OpenGL up for all its users.

(thepiguy0 from bug 1560457 comment #5)

(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #4)

Cinnamon

This problem has a long history: https://github.com/linuxmint/Cinnamon/issues/2465

Would this tutorial help? https://forums.linuxmint.com/viewtopic.php?t=277267 (Disable Vsync for the Clutter Compositor)

When applying these tweaks, disabling "Sync to VBlank" actually massively improves overall input latency for my system but then I experienced screen tearing across the whole system. Enabling "Force Composition Pipeline" then removed the tearing for both software and hardware rendered Firefox.

(In reply to lucastronks from comment #15)
Can you check if disabling "Sync to VBlank" helps you as well?

That option isn't there for me in Mint Start Menu > System Settings > Preferences > General > Compositor Options. I'm guessing that's because I'm on Linux Mint Cinnamon 20 and the people in that forum are on 19 and 19.1. However, I have tried adding CLUTTER_VBLANK=none to /etc/environment like they also said, and tried that with different combinations of VSync method in Mint Start Menu > System Settings > Preferences > General > Compositor Options, and completely logging out and in again to try each different option. None of the combinations I tried resolved the problem.

@Andrew: Can you take a look at this profile? Could this be related to the allocation issues you investigated?
I see some Allocation stall here:

__pthread_cond_wait
PR_Wait
mozilla::wr::WebRenderAPI::Create(mozilla::layers::CompositorBridgeParent*, RefPtr<mozilla::widget::CompositorWidget>&&, mozilla::wr::WrWindowId const&, mozilla::gfx::IntSizeTyped<mozilla::LayoutDevicePixel>)
mozilla::layers::CompositorBridgeParent::AllocPWebRenderBridgeParent(mozilla::wr::PipelineId const&, mozilla::gfx::IntSizeTyped<mozilla::LayoutDevicePixel> const&)
mozilla::layers::PCompositorBridgeParent::OnMessageReceived(IPC::Message const&)
PCompositorBridge::Msg_PWebRenderBridgeConstructor
mozilla::layers::PCompositorManagerParent::OnMessageReceived(IPC::Message const&)
mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&)
mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&)
mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&)
mozilla::ipc::MessageChannel::MessageTask::Run()
nsThread::ProcessNextEvent(bool, bool*)
NS_ProcessNextEvent(nsIThread*, bool)
mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*)
MessageLoop::Run()
nsThread::ThreadFunc(void*)
(root)
Flags: needinfo?(aosmond)

It is taking forever to get events back on xcb_wait_for_special_event on several Mesa calls. Quick look in the source code suggests it might be looping to get more events. We are blocking the main thread in the parent process -- I wonder if there could be some weird interaction where it expected us to handle some X events, but since we were blocked waiting for the WR layer manager to initialize we couldn't unblock ourselves....

With nightly (the changes aren't in 79), do you still see the problem? If so, can you try setting the environment variable MOZ_X11_EGL=1 and reproduce? If so, would you mind collecting another profile? Thanks!

Flags: needinfo?(aosmond) → needinfo?(lucastronks)

Sotaro, wrt to the sync IPC call Send/RecvEnsureConnected, is this purely to get the namespace from the GPU process? In theory, I imagine we could restructure things to make the AllocPWebRenderBridge sync again, post back to the thread to create the API (which will block), and return the namespace/texture factory identifier, and remove the SendEnsureConnected call? Any subsequent IPC messages should be processed after and blocked by the message we posted we made during AllocPWebRenderBridge.

Flags: needinfo?(sotaro.ikeda.g)

lucastronks: I also note that we disabled the GPU process since that release. You can disable the GPU process by setting the layers.gpu-process.enabled pref to false. That might be a factor in reproducing this in more recent builds...

(In reply to Andrew Osmond [:aosmond] from comment #23)

Sotaro, wrt to the sync IPC call Send/RecvEnsureConnected, is this purely to get the namespace from the GPU process?

It is also for getting TextureFactoryIdentifier. Current implementation is aligned to ClientLayerManager::Initialize().

In theory, I imagine we could restructure things to make the AllocPWebRenderBridge sync again, post back to the thread to create the API (which will block), and return the namespace/texture factory identifier, and remove the SendEnsureConnected call? Any subsequent IPC messages should be processed after and blocked by the message we posted we made during AllocPWebRenderBridge.

Can you explain more about why "AllocPWebRenderBridge sync again" could address the problem? Since, Rnderer thead seemed to be blocked before the sync IPC call.

Flags: needinfo?(sotaro.ikeda.g)

(In reply to Sotaro Ikeda [:sotaro] from comment #25)

(In reply to Andrew Osmond [:aosmond] from comment #23)

Sotaro, wrt to the sync IPC call Send/RecvEnsureConnected, is this purely to get the namespace from the GPU process?

It is also for getting TextureFactoryIdentifier. Current implementation is aligned to ClientLayerManager::Initialize().

In theory, I imagine we could restructure things to make the AllocPWebRenderBridge sync again, post back to the thread to create the API (which will block), and return the namespace/texture factory identifier, and remove the SendEnsureConnected call? Any subsequent IPC messages should be processed after and blocked by the message we posted we made during AllocPWebRenderBridge.

Can you explain more about why "AllocPWebRenderBridge sync again" could address the problem? Since, Rnderer thead seemed to be blocked before the sync IPC call.

It would be sync so that we could get the namespace and TextureFactoryIdentifier returned from the method directly, but it wouldn't block on the full context creation, just the bare minimum to avoid a dependency on the Renderer thread. I see on Windows we need the sync handle which is derived from ID2DDevice (so there we might need to block anyways), so that might not be possible there.

Otherwise I suspect the remainder of the parameters could simple be cached from the first WebRenderAPI call similar to what we do for WebRenderAPI::Clone? (After taking into account GPU switching which might alter things like max texture size)

Not that I'm saying this needs to be done, just exploring the possibility :).

No longer blocks: gfx-triage
Blocks: wr-linux-perf
No longer blocks: wr-linux

(In reply to Andrew Osmond [:aosmond] from comment #22)

It is taking forever to get events back on xcb_wait_for_special_event on several Mesa calls. Quick look in the source code suggests it might be looping to get more events. We are blocking the main thread in the parent process -- I wonder if there could be some weird interaction where it expected us to handle some X events, but since we were blocked waiting for the WR layer manager to initialize we couldn't unblock ourselves....

bug 1635153 fixed such a bug.

Status: UNCONFIRMED → RESOLVED
Closed: 2 days ago
Flags: needinfo?(lucastronks)
Resolution: --- → DUPLICATE
Summary: Firefox slows down after clicking the hamburger menu button. → Firefox slows down after clicking the hamburger menu button. (Basic/X11)
Duplicate of bug: 1635153
You need to log in before you can comment on or make changes to this bug.