The entire Nightly UI freezes (not rendering, but still responsive behind the scenes) every once in a while, while browsing 9gag.com
Categories
(Core :: Graphics, defect)
Tracking
()
People
(Reporter: itiel_yn8, Unassigned, NeedInfo)
References
Details
Attachments
(1 file)
|
87.50 KB,
text/plain
|
Details |
Profile taken while this happened (had to use the keyboard shortcuts as the entire content and chrome areas were not rendering):
https://share.firefox.dev/4fQuBvt
I have only managed to reproduce this while browsing 9gag.com. Exact STR is unknown.
I'm on Windows 11, using latest Intel gfx drivers.
about:support contents attached.
Note that I see WEBRENDER_SCISSORED_CACHE_CLEARS is apparently blocked by bug 1603515, however this profile was copy-pasted from an older machine which had HD520, which is no longer true in the current machine.
Comment 2•10 months ago
|
||
There's a bunch of very suspicious COM stuff going on in the main thread. Maybe Jamie or Yannis know what might be going on?
UIAutomation has also been used, and from Matrix conversations it's not from an a11y client:
"accessibility": {
"isActive": true,
"forceDisabled": 0,
"instantiator": "UIAUTOMATION|"
},
Updated•9 months ago
|
Comment 3•9 months ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)
There's a bunch of very suspicious COM stuff going on in the main thread.
Would you mind providing some of the stack frames you're referring to? I see some time spent in NtAlpcSendWaitReceivePort, for example, but I'm wondering whether I'm missing something given my relatively limited access to the profiler (only the call tree).
I don't see much time at all spent in a11y code, so I don't think a11y is the culprit here.
Comment 4•9 months ago
|
||
The stacks are not in a11y code, I agree. I was asking you more in terms of MSCOM expertise, sorry if I didn't make that clear :)
I see a bunch of stacks like this which seem to be doing some RPC shenanigans?
GetTypeInfoOfIIDFwd(_GUID const&, ITypeInfo**, int) [OLEAUT32.dll]
GetTypeInfoOfIID(_GUID const&, ITypeInfo**) [OLEAUT32.dll]
CUnivStubWrapper::Invoke(tagRPCOLEMESSAGE*, IRpcChannelBuffer*) [OLEAUT32.dll]
InvokeStubWithExceptionPolicyAndTracing::__l6::<lambda_c9f3956a20c9da92a64affc24fdd69ec>::operator()() const [onecore\com\combase\dcomrem\channelb.cxx]
ObjectMethodExceptionHandlingAction<<lambda_c9f3956a20c9da92a64affc24fdd69ec> >(InvokeStubWithExceptionPolicyAndTracing::__l6::<lambda_c9f3956a20c9da92a64affc24fdd69ec>, ObjectMethodExceptionHandlingInfo*, ExceptionHandlingResult*, void*) [onecore\com\combase\dcomrem\excepn.hxx]
InvokeStubWithExceptionPolicyAndTracing(IRpcStubBuffer*, tagRPCOLEMESSAGE*, IRpcChannelBuffer*, bool, IServerCall*, ServerCallTracingInfo const*, ObjectMethodExceptionHandlingInfo*, ExceptionHandlingResult*) [onecore\com\combase\dcomrem\channelb.cxx]
DefaultStubInvoke(bool, IServerCall*, IRpcChannelBuffer*, IRpcStubBuffer*, unsigned long*) [onecore\com\combase\dcomrem\channelb.cxx]
SyncStubCall::Invoke(IServerCall*, IRpcChannelBuffer*, IRpcStubBuffer*, unsigned long*) [onecore\com\combase\dcomrem\channelb.cxx]
SyncServerCall::StubInvoke(IRpcChannelBuffer*, IRpcStubBuffer*, unsigned long*) [onecore\com\combase\dcomrem\ServerCall.hpp]
StubInvoke(tagRPCOLEMESSAGE*, _GUID const&, CStdIdentity*, IRpcStubBuffer*, CServerChannel*, tagIPIDEntry*, unsigned long*) [onecore\com\combase\dcomrem\channelb.cxx]
ServerCall::ContextInvoke(tagIPIDEntry*) [onecore\com\combase\dcomrem\ctxchnl.cxx]
DefaultInvokeInApartment(ServerCall*, tagIPIDEntry*) [onecore\com\combase\dcomrem\callctrl.cxx]
ReentrantSTAInvokeInApartment(ServerCall*, tagIPIDEntry*) [onecore\com\combase\dcomrem\reentrantsta.cpp]
ComInvokeWithLockAndIPID(ServerCall*, tagIPIDEntry*) [onecore\com\combase\dcomrem\channelb.cxx]
ThreadDispatch(ServerCall*) [onecore\com\combase\dcomrem\channelb.cxx]
ThreadWndProc(HWND__*, unsigned int, unsigned long long, long long) [onecore\com\combase\dcomrem\chancont.cxx]
UserCallWinProcCheckWow(_ACTIVATION_CONTEXT*, int64_t (*)(tagWND*, unsigned int, uint64_t, int64_t), HWND__*, _WM_VALUE, uint64_t, int64_t, void*, int) [user32.dll]
DispatchMessageWorker [user32.dll]
nsAppShell::ProcessNextNativeEvent(bool) [widget/windows/nsAppShell.cpp]
Comment 5•9 months ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #4)
The stacks are not in a11y code, I agree. I was asking you more in terms of MSCOM expertise, sorry if I didn't make that clear :)
It's all good; I understood. I can't rule out some sort of issue related to accessibility though, even if it's a horribly misbehaving client, which is why I checked and mentioned this.
I see a bunch of stacks like this which seem to be doing some RPC shenanigans?
Yeah, it definitely seems like something is making lots of synchronous COM calls into our process, but I can't figure out where those calls are destined from the profile. Some of them do eventually end up with accessibility, but accessibility takes a fraction of the overall time. That could mean there are a lot of very short accessibility queries and the profiler is just missing them, or it could mean that there are many other non-accessibility COM calls that the profiler is missing. It's very strange. Unfortunately, I think this is going to be very difficult to debug further without being able to reproduce it and get a debugger onto it.
In bug 1646453, we added profiler markers for COM calls, but only for outgoing COM calls (when we're the client), not incoming ones (where we're the server). Having profiler markers for incoming COM calls would make things extremely spammy when an accessibility client is in use. On the other hand, it might make debugging situations like this easier.
Comment 6•9 months ago
|
||
I witnessed what I believe was the same issue on macOS Nightly, suggesting that any MSCOM interaction is not the issue (or not the only issue).
Comment 7•9 months ago
|
||
We have a similar problem reported to SUMO, though I'm not sure if it's directly related -> https://support.mozilla.org/questions/1530784
There's also about:support information there, which might be useful for investigation.
Updated•9 months ago
|
Reproduced this once again, on 9gag.com after ~5 mins of browsing.
This time I took a profile in which I "fixed" the issue towards the end of the profile session, by minimizing then restoring the Nightly window:
https://share.firefox.dev/3UEYFk2
Another profile taken, the same way as above:
https://share.firefox.dev/3HyUeEy
Comment 10•9 months ago
|
||
Thanks :Itiel for the report. Could you take a profile of browsing 9gag under normal conditions (i.e. when the issue is not present) for comparison?
Comment 11•9 months ago
•
|
||
Below are random things that caught my eye but may not be related to the issue:
- activity from 4 addons (LayerX, uBlock Origin, Aternity Extension, Tab Counter);
umppc19811.dllseems to be intercepting system calls;- CrowdStrike as security software.
I agree that the time taken by COM stuff on the main thread is abnormally high, but I'm not sure that it is relevant for the issue being reported, that is UI responsiveness (e.g. keyboard shortcuts work) without any rendering, with minimization restoring normal behavior. I feel like it could be a graphics problem, especially when I see the following:
- the graphics failure
GP+[GFX1-]: Video swapchain present is slow; - in the last two profiles, in the GPU process' compositor, the markers for
ContentPaint Payload Presentedshowing a latency of 20s (the end of the marker is when we receive the minimization event).
:sotaro, :botond, I see you've worked around those areas of the code, what do you think?
| Reporter | ||
Comment 12•9 months ago
|
||
Profile taken from a normal 9gag browsing with no issues:
https://share.firefox.dev/3VT8kE2
Comment 13•8 months ago
|
||
Thanks! As I suspected we can see the high COM activity also in the profile for normal browsing. Attempting a redirection to Graphics accordingly.
Comment 14•8 months ago
|
||
(In reply to Itiel from comment #8)
Reproduced this once again, on 9gag.com after ~5 mins of browsing.
This time I took a profile in which I "fixed" the issue towards the end of the profile session, by minimizing then restoring the Nightly window:
https://share.firefox.dev/3UEYFk2
In this profile, Renderer thread was blocked more than 10 sec at IDCompositionDevice2::Commit() in DCLayerTree::MaybeCommit()
Comment 15•8 months ago
|
||
(In reply to Itiel from comment #9)
Another profile taken, the same way as above:
https://share.firefox.dev/3HyUeEy
In this case, Renderer thread was blocked more than 8 sec in ID3D11DeviceContext4::Wait() under DCSurfaceVideo::CallVideoProcessorBlt().
Comment 16•8 months ago
|
||
From comment 14 and comment 15, GPU seemed to became too busy/blocked.
Comment 17•8 months ago
|
||
Hi Itiel, can you check if any of the following address the problem?
- [1] Set pref media.ffvpx-hw.enabled to false and re-start Firefox.
- [2] Set pref media.hardware-video-decoding.enabled to false and re-start Firefox.
And can you re-take Firefox profiler when the problem happens with "Media" setting?
| Reporter | ||
Comment 18•8 months ago
|
||
Will take me a while (few days at best) to see if toggling them will fix it as this doesn't happen too often. Leaving the needinfo until then
| Reporter | ||
Comment 19•8 months ago
|
||
With media.ffvpx-hw.enabled disabled issue persists.
Profile (media preset): https://share.firefox.dev/4nbIl6J
(I confused the keyboard shortcut and pressed Ctrl+Shift+S instead of Ctrl+Shift+1/2, so you may see some unrelated screenshots-stuff related in between. I had to minimize then restore the Nightly window to see what I had messed up, so half way through the profile the issue got fixed by that)
Going to reenable that pref, disable the other one, retest and report back in a few days.
| Reporter | ||
Comment 20•8 months ago
|
||
Hm, I managed to reproduce this also with media.hardware-video-decoding.enabled disabled but for 3 seconds, not enough for me realize what's happening and to start profiling, and it goes sorted out by itself without minimizing and restoring.
Not sure if this was a one-off thing. Will keep trying.
| Reporter | ||
Comment 21•8 months ago
•
|
||
Welp, that was quick. Managed to reproduce it again, profile (with media preset) of it while the issue remained by the end of the profiling (didn't minimize and restore):
https://share.firefox.dev/4nCS1qJ
And another profile, right after creating the previous profile, but this time I minimized and restored the window to fix the issue, all recorded in this profile:
https://share.firefox.dev/4nelV51
Note that in both profiles there's a video playing from that same website, which continued to play perfectly fine even though the app was visually frozen.
Comment 22•8 months ago
|
||
Thank you for the testings! It became clear from them that the problem happened even without hardware video decoding.
Can you also check if the problem happen with the following prefs with restarting Firefox?
- gfx.webrender.dcomp-video-hw-overlay-win to false
- gfx.webrender.dcomp-video-sw-overlay-win to false
- media.hardware-video-decoding.enabled to false
| Reporter | ||
Comment 23•7 months ago
|
||
After 1 month of testing I can confirm that with these prefs off, the issue doesn't occur.
Going to reenable them just to make sure this is still a thing, as this could have gone away in the meantime...
Will update.
| Reporter | ||
Comment 24•7 months ago
|
||
Yup, just managed to reproduce it with all of the prefs back on.
Comment 25•6 months ago
|
||
(Clearing needinfo since the comments above suggest the issue is graphics or video related.)
Comment 26•3 months ago
•
|
||
Did you determine which of these prefs avoids the freeze? I'm guessing the media related one, but we're curious.
gfx.webrender.dcomp-video-hw-overlay-win to false
gfx.webrender.dcomp-video-sw-overlay-win to false
media.hardware-video-decoding.enabled to false
Description
•