Firefox sometimes freezes / hangs when loading pages with video on Youtube
Categories
(Core :: Graphics, defect)
Tracking
()
People
(Reporter: whimboo, Unassigned)
References
Details
(Keywords: hang)
Attachments
(1 file)
23.42 KB,
text/plain
|
Details |
User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0
Navigating on YouTube can sometimes cause a freeze / hang of Firefox when loading pages with a video element embedded.
Steps:
- Open https://www.youtube.com/
- Search for anything
- Click a search result
- Wait a second if loading the video freezes Firefox
- If not go back to search result and try the next result
- Repeat until Firefox freezes
Here is a profile from the freeze: https://perfht.ml/2AMFD5x
Florian, do you have an idea?
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 1•5 years ago
|
||
Here the details of the graphics card:
Aktiv Ja
Beschreibung AMD Radeon(TM) R5 Graphics
Herstellerkennung 0x1002
Gerätekennung 0x9874
Treiber-Version 16.300.2701.0
Treiber-Datum 8-9-2016
Treiber aticfx64 aticfx64 aticfx64 amdxc64 aticfx32 aticfx32 aticfx32 amdxc32 atiumd64 atidxx64 atidxx64 atiumdag atidxx32 atidxx32 atiumdva atiumd6a atitmm64
Subsys-ID 00000000
RAM 512
Note that this is a very old driver version.
Comment 2•5 years ago
|
||
It looks a lot like a GPU process died, and the hang is the time we waited until a new one got created. I see a "PCompositorBridge::Msg_NotifyChildRecreated — sent to Compositor (Thread ID: 7412)" IPC after the hang.
Reporter | ||
Comment 3•5 years ago
|
||
In that profile (https://perfht.ml/2zS2RGI) you can see that it took more than 400s to recreate the GPU process. During that time the whole Firefox UI is frozen.
Is there anything else which I can provide to further investigate the problem?
Comment 4•5 years ago
|
||
Henrik, it would be great to test on Nightly.
Ideally, you could help us find the mozregression range where this broke (and/or got fixed).
Also, please attach the full "about:support".
Reporter | ||
Comment 5•5 years ago
|
||
Here is the full output from about:support
. Note that the included failure protocol is interesting:
(#0) Error Killing GPU process due to IPC reply timeout
(#1) Error Failed buffer for 0, 0, 1920, 1048
(#2) Error Failed buffer for 1903, 0, 17, 938
(#3) Error Failed buffer for 0, 0, 17, 314
(#4) Error Failed buffer for 0, 0, 1903, 2608
(#5) Error Receive IPC close with reason=AbnormalShutdown
(#6) Error Receive IPC close with reason=AbnormalShutdown
(#7) CP+[GFX1-]: Receive IPC close with reason=AbnormalShutdown
(#8) CP+[GFX1-]: Receive IPC close with reason=AbnormalShutdown
So the GPU process got killed due to an IPC timeout. Why does it take such a long time to get it restarted? Is there maybe a preference to shorten the time?
I'm not saying this is a regression, and I cannot easily run various tests with different versions of Firefox because that machine is in a foreign office. So I hope the above information will help to better diagnose it. And maybe Matt can have a look at it.
Comment 6•5 years ago
|
||
It would be good to know if this is reproducible on Nightly, and whether enabling WebRender via gfx.webrender.all
pref changes this in any way.
Not sure how to prioritize this yet, otherwise.
Comment 7•5 years ago
|
||
Is it possible that the GPU process hung, and we tried to send it a sync message from the parent process (SendReceiveMouseInputEvent), which blocked for 400s waiting on the hung process?
Then we finally kill the hung process, and quickly restart it, leading to things working again.
So maybe it's not that it took a long time to restart the process, but instead it took us a long time to realize that we needed to do so (and that it broke in the first place)?
Comment 8•5 years ago
|
||
Thanks Matt, that makes sense.
Henrik,
The following experiments would help:
- test on Nightly
- test with WebRender enabled (via
gfx.webrender.all
, needs browser restart) - record a Gecko profile of the hang, but stop recording before the hang is resolved. We want to see what the GPU process is doing during the hang, and if the recording stops after GPU process is restarted, we appear to only be getting the new GPU process activity.
Marking as S3 for the lack of understanding of how widespread this problem is. Ready to bump this!
Reporter | ||
Comment 9•5 years ago
|
||
(In reply to Dzmitry Malyshau [:kvark] from comment #8)
- record a Gecko profile of the hang, but stop recording before the hang is resolved. We want to see what the GPU process is doing during the hang, and if the recording stops after GPU process is restarted, we appear to only be getting the new GPU process activity.
Note that I do NOT have a way to just stop the profiler. As mentioned above the whole Firefox UI is frozen, and as such no event processing takes place. Or does webrender get rid of this problem, and only hangs the appropriate background processes and not the main thread? I not any other idea how to stop the profiler before the new GPU process has been created?
I actually wonder why we are loosing the recorded data from the former (as hung) GPU process'. It would be great to keep it as separate row. Julien, any idea if that is possible?
Comment 10•5 years ago
|
||
Yes, unfortunately we can't get a Gecko profile of a thread that truly hangs, since it will not send the data over (as noted by :mstange in #gfx room).
Comment 11•5 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #9)
I actually wonder why we are loosing the recorded data from the former (as hung) GPU process'.
Child processes send their profiling data to the parent when they shutdown, or when the parent process sends an IPC requesting the profile. When a child process gets killed, we are in neither of these cases.
It would be great to keep it as separate row. Julien, any idea if that is possible?
It's not possible right now. In some cases where a process appears unresponsive (eg. the main thread is busy with JS code running in an infinite loop) but is still processing IPC, if we changed the code that kills the unresponsive process to first request the profile, we might get the profile. In cases where the child process isn't processing IPC anymore (like we saw in bug 1629824), we still wouldn't get anything.
Comment 12•5 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #9)
I actually wonder why we are loosing the recorded data from the former (as hung) GPU process'. It would be great to keep it as separate row. Julien, any idea if that is possible?
Gerald will know better, redirecting the request to him :-)
Reporter | ||
Comment 13•5 years ago
|
||
(In reply to Dzmitry Malyshau [:kvark] from comment #10)
Yes, unfortunately we can't get a Gecko profile of a thread that truly hangs, since it will not send the data over (as noted by :mstange in #gfx room).
Dzmitry, would you mind telling me again the MOZ_LOG entries that I should use when testing again with a debug build of Firefox? I can recall something about IPC, but not sure if there is a prefix, or also others. Thanks!
Reporter | ||
Comment 14•5 years ago
|
||
I have an update here. When I tried to reproduce the problem today with a Nightly debug build I was not able to crash the GPU process. Maybe there is some race condition involved here and that debug builds are simply too slow?
Anyway, a 79a1 Nightly opt build actually let me catch a content crash which is related to the creation of shared memory. I filed that as bug 1643016. The crash happened when Firefox froze, so I hope that this is actually something helpful. Matt, can you have a look at this?
Also I recorded a new profile whereby hitting the freeze wasn't that easy. While navigating forward and backward on some pages I finally reproduced it ~5 minutes later. Note that there is still no news for the former (crashes) GPU process. Even with webrender enabled the whole UI is still hanging and doesn't let me stop the profiler. So not sure if you can find new information, but here it is:
https://share.firefox.dev/3cv1JH1
Due to time limitations I was not able to create log files via MOZ_LOG. I could do that next time, and maybe by then we know more about this crash (if it's related) and might have it fixed?
Reporter | ||
Updated•5 years ago
|
Comment 15•5 years ago
|
||
MOZ_LOG question got answered by Matt on #gfx:mozilla.org:
Maybe "ipc", and some of the "apz.*" ones?
(unsure if wildcard actually works, or if you need to specify the apz modules individually)
Comment 16•5 years ago
|
||
It looks like the crash happens because we've allocated 500 million shmems, and have run out of identifiers...
I can't see how we'd consume anywhere near that many in 5 minutes, especially since most usages are using a pool.
Reporter | ||
Comment 17•5 years ago
|
||
Note that the uptime in that case was only 222 seconds (3 minutes and 42 seconds) - see bp-a7df3eed-20bd-4b27-8c19-a73620200603.
Could that be related to the profiler? Actually I'm not sure if it was running in this particular Firefox session.
Reporter | ||
Comment 18•5 years ago
|
||
Matt, do you think that the crash I was seeing could be the reason why the GFX process stopped working? Or was it maybe only a side-effect? If it's the reason I might be blocked on further investigation. Otherwise I might wanna try again to create a LOG file with the before mentioned MOZ_LOG options set.
Comment 19•5 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #18)
Matt, do you think that the crash I was seeing could be the reason why the GFX process stopped working? Or was it maybe only a side-effect? If it's the reason I might be blocked on further investigation. Otherwise I might wanna try again to create a LOG file with the before mentioned MOZ_LOG options set.
Yeah, seems very likely to be related. Seems like it'd be worth trying to log some of the media IPC code to see what it's doing (but I don't think there is existing logging for that sorry).
Reporter | ||
Comment 20•5 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #19)
Yeah, seems very likely to be related. Seems like it'd be worth trying to log some of the media IPC code to see what it's doing (but I don't think there is existing logging for that sorry).
Ok, so how valuable would further investigation be here? The user of that machine actually wants YouTube working and I don't want to hold off for longer. Also it looks like we won't get much traction on bug 1643967 in the next weeks. As such I would upgrade the graphics drivers (right now there is a version from 2016 installed!), but it could also mean the problem will no longer be existent.
Comment 21•4 years ago
|
||
I think we need someone from the graphics team to be able to reproduce this, to try narrow it down further sorry.
Reporter | ||
Comment 22•4 years ago
|
||
With the graphics driver update the problem is gone and no longer reproducible. Not sure how easy it will be to get it reproduced with a similar setup. Maybe we should close this bug as incomplete for now.
Description
•