Firefox hangs on MacOS for seconds, getting worse over time
Categories
(Core :: Performance: Responsiveness, defect)
Tracking
()
People
(Reporter: jamesvd, Unassigned, NeedInfo)
References
(Depends on 1 open bug)
Details
Attachments
(2 files, 1 obsolete file)
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:141.0) Gecko/20100101 Firefox/141.0
Steps to reproduce:
Pretty much anything. Normal web browsing from a fresh browser. As the day goes on pauses become more and more pronounced and disruptive quickly.
Actual results:
Doing anything on any page pauses for seconds, locking up all actions, including typing, scrolling, etc. Get a beach ball, then it resumes and catches up on typing. Gets worse the longer the browser is open until it completely freezes. Last time that happened, I was even unable to force quit firefox from the activity panel.
This happens frequently. Right now, it's happened about 5 times during the typing on this page.
Profile: https://share.firefox.dev/3IXIvQd
Comment 1•2 months ago
•
|
||
Could you capture another profile, with these changes to the profiler settings?
- Add
,IPC I/O,IPDL
to the "Add custom threads by name:" textbox - Under Features, also enable "Native Stacks" and "IPC Messages"
Thanks!
Requested profiles are attached. This certainly wasn't the browser acting it's worse, but it takes a while for it to build up and I've had to restart my browser for company-required updates today.
I might have found something that relates to this. It seems like when i went to the default Browser Privacy setting (from strict) and disabled DNS over HTTPS settings, things got better, but I didn't leave it running for long before I re-enabled them to try to capture this profile.
Let me know if you need more info.
Comment 3•2 months ago
•
|
||
Uh oh, things are looking pretty rough in that profile!
I'd say the profile indicates that there are a lot of blob URLs around, and that Firefox isn't dealing with them very well. Can you go to about:memory, click Measure, and see which page or add-on has created all those blob URLs?
Comment 4•2 months ago
•
|
||
In the profile, launching a new content processes took one full second, with the time spent in IPCBlobUtils::Serialize
, PContentParent::SendInitBlobURLs
, and IPC::ParamTraits<mozilla::dom::BlobURLRegistrationData>::Read
. But even more importantly, it looks like whenever any content process is shutting down, all IPC to the parent process is blocked for 4 seconds! Those four seconds are spent in Node::DestroyAllPortsWithPeer
on the IPC I/O
thread. Nika, who would be the right person to look into this?
Updated•2 months ago
|
Comment 5•2 months ago
|
||
(In reply to Markus Stange [:mstange] from comment #4)
In the profile, launching a new content processes took one full second, with the time spent in
IPCBlobUtils::Serialize
,PContentParent::SendInitBlobURLs
, andIPC::ParamTraits<mozilla::dom::BlobURLRegistrationData>::Read
.
That's quite unfortunate. IIRC BlobURL stuff is something which has been a huge pain in the past, and I remember us having some discussions about wanting to massively reduce the amount of stuff we need to send down for it, but I don't know how feasable that is right now. Some quick searching found bug 1619943, which looks like an old project to improve that situation which doesn't appear to have moved.
But even more importantly, it looks like whenever any content process is shutting down, all IPC to the parent process is blocked for 4 seconds! Those four seconds are spent in
Node::DestroyAllPortsWithPeer
on theIPC I/O
thread. Nika, who would be the right person to look into this?
That's not great! The Node::DestroyAllPortsWithPeer
code is spending a lot of time specifically within TaskQueue::Dispatch
waiting for the mQueueMonitor
. When we lose connection with a process, we need to notify any actors which are still connected to that process that it is now gone (so that they can clean up), and to do that we dispatch a runnable to the nsISerialEventTarget corresponding to each actor.
It seems that in this particular case, we have a very large number of outstanding actors in the parent process, each of which is bound to a TaskQueue
, and we're dispatching OnNotifyMaybeChannelError
events to each of them to notify them that they are dying (https://searchfox.org/mozilla-central/rev/00d2cc8ebe323e0cde5619004c588d5e08ad1f46/ipc/glue/MessageChannel.cpp#2078-2082). The monitor on these TaskQueues appears to unfortunately be quite contended, meaning that we're spending a lot of time waiting for the mutex to be passed back and forth between threads, leading to the IPC I/O thread being blocked for an extended period of time.
My vague guess is that the bulk of these actors are all bound to the same TaskQueue. This would help explain the high contention we're seeing, as over the 4 seconds, we would be both trying to acquire the mutex on the task queue to run the events, as well as trying to acquire it on the IO thread in order to queue events. Given in this profile you're already seeing a very large number of Blobs, I'm guessing these may all be PRemoteLazyInputStream
actors (which share a single TaskQueue
in the parent process).
It might be feasible to reduce the contention a bit here by changing how TaskQueue is implemented to allow it to acquire the Monitor less while running. I'm not 100% sure what that would look like, but I expect we could perhaps move multiple entries from the TaskQueue's queue into the Runner at a time, to allow multiple tasks to be retired before re-acquiring the lock. I'll leave a NI? for myself to look into this more.
Changing this on the IPC side to somehow detect that all of the notifications are going to the same event target seems more difficult, as there's a lot of abstraction layers between Node::DestroyAllPortsWithPeer
and the actual dispatch, so our best bet is probably making the dispatch cheaper. (As a side note, reducing the number of Blob URLs which need to be broadcast could also improve the situation here).
I attached a memory report, but this isn't from the same session as before. Nothing has changed settings-wise since the last one, I just restarted my computer and there are different tabs.
I'll add that some of the artifacts I'm seeing are:
- Typing pauses when typing anything, including this comment. ~1-3s each time.
- When scrolling pages, the new part of the page coming up from the bottom is blank (white) until a few seconds later, even if the scrolling doesn't hang.
- When switching tabs of an already loaded page, I see a blank page with a loading indicator (gray radiating lines in a circle) for a few seconds before the content appears.
- Often when switching tabs I'll get a spinning "beach ball" for a mouse cursor while I wait for the tab to load.
- YouTube videos freeze, but the sound continues playing. This can last 5-10s. No audio stuttering.
All of the symptoms seem to get worse as the day goes on. If the browser is acting very slow, when I go to close it, the window will close, but the process with remain non-responsive in Activity Monitor for about a minute before it finally closes.
In case it matters, I'm on a 2025 Mac M4 Pro w/ 24 GB ram. No other software on my machine is pausing or even slowed down during these hiccups.
Replaced the previous memory report after realizing that anonymizing the report removed the extension names/urls.
Comment 9•2 months ago
|
||
When you captured the memory report, was Firefox in the state where it was performing slowly?
Reporter | ||
Comment 10•2 months ago
|
||
Yes, though not as bad as it sometimes does. I'll keep this window open for the rest of the day and tomorrow and capture a new report when things get really slow.
Comment 11•2 months ago
|
||
It's "SquareX Enterprise - Spreedly" that created the blob URLs.
curl -Ls 'https://bugzilla.mozilla.org/attachment.cgi?id=9505587' | gunzip | grep -c 'memory-blob-urls/owner(moz-extension:.*15237d20-dab3-4143-b9e7-1bc847749b7d'
counts 202011 blob URLs (that's over 200 000) for the add-on with the ID 15237d20-dab3-4143-b9e7-1bc847749b7d
, which is "SquareX Enterprise - Spreedly".
Comment 12•2 months ago
|
||
I've filed bug 1981596 to handle the developer outreach aspect of this issue.
Updated•2 months ago
|
Comment 13•2 months ago
|
||
And I've filed bug 1981600 about the fact that the memory report has over 8GB of heap-unclassified.
Reporter | ||
Comment 14•2 months ago
|
||
Ok, thanks! I suspected it might be that since it's something I'm not familiar with. It's installed by my company and I can't disable it.
Is there anything I can do to help or follow up with?
Reporter | ||
Comment 15•2 months ago
|
||
Here's a new profile with it being very slow and locking up every few seconds: https://share.firefox.dev/4m5Azer
Comment 16•2 months ago
|
||
I don't think this will fix the issue, because we'll still be creating a vast number of blob URIs, but we might be able to improve the locking contention which is appearing on the profile a bit with bug 1983309.
Comment 17•2 months ago
|
||
The severity field is not set for this bug.
:mstange, could you have a look please?
For more information, please visit BugBot documentation.
Description
•