Firefox process locks up/freezes after closing a browser tab or switching tabs (via arrowscrollbox's call to scrollIntoView in requestAnimationFrame)
Categories
(Core :: CSS Parsing and Computation, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox89 | --- | wontfix |
firefox90 | --- | wontfix |
firefox91 | --- | fixed |
People
(Reporter: robwu, Assigned: emilio)
References
Details
(Keywords: perf:responsiveness, regression, regressionwindow-wanted)
Attachments
(10 files)
Recently (within the last few weeks), Firefox's main process has started to lock up when a tab is closed. Visually, the tab has fully been closed, and the mouse pointer is over the close button of the next tab (since the previous tab has been closed). I can experiencing this issue after closing a Google docs or Bugzilla tab shortly after startup, the latest freeze has been when I closed a Google Meet tab after ending a video meeting. I have updated Nightly a couple of times (most recently on the 25th) and the issue happened across different versions.
I've observed this at least four times on macOS. Below is the sample that I took with the system monitor, and the reversed stack trace. At first the stack changed, but now it has been quite stable like this for over an hour. Based on the stack and what I did (closing the tab), I suspect that it's stuck at MozArrowScrollbox
's ensureElementIsVisible
(arrowscrollbox.js). Higher up the stack is nsRefreshDriver
, which was recently refactored in bug 1708325 (coincidence or not?).
RedBlackTree<arena_chunk_map_t, ArenaAvailTreeTrait>::Remove(RedBlackTree<arena_chunk_map_t, ArenaAvailTreeTrait>::TreeNode) (in libmozglue.dylib) + 1370
arena_t::DallocRun(arena_run_t*, bool) (in libmozglue.dylib) + 1597 [0x10190abad]
free (in libmozglue.dylib) + 299 [0x10190ed5b]
servo_arc::Arc$LT$T$GT$::drop_slow::hf95d6fc8cf73aa5a (in XUL) + 189 [0x1061f0e8d]
servo_arc::Arc$LT$T$GT$::drop_slow::ha5d360bf71a0b7af (.llvm.17206588539849639304) (in XUL) + 1061 [0x1061f0495]
Servo_ComputedStyle_Release (in XUL) + 54 [0x106286f96]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 4265 [0x10514b819]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x10514b503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x10514b503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x10514b503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x10514b503]
mozilla::RestyleManager::DoProcessPendingRestyles(mozilla::ServoTraversalFlags) (in XUL) + 1722 [0x10514c86a]
mozilla::PresShell::DoFlushPendingNotifications(mozilla::ChangesToFlush) (in XUL) + 1472 [0x10513cf40]
mozilla::dom::Document::FlushPendingNotifications(mozilla::ChangesToFlush) (in XUL) + 175 [0x104bddd6f]
mozilla::PresShell::ScrollContentIntoView(nsIContent*, mozilla::ScrollAxis, mozilla::ScrollAxis, mozilla::ScrollFlags) (in XUL) + 335 [0x10513ab8f]
mozilla::dom::Element::ScrollIntoView(mozilla::dom::ScrollIntoViewOptions const&) (in XUL) + 153 [0x106ee2fc9]
mozilla::dom::Element_Binding::scrollIntoView(JSContext*, JS::Handle<JSObject*>, void*, JSJitMethodCallArgs const&) (in XUL) + 289 [0x107669521]
bool mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions>(JSContext*, unsigned int, JS::Value*) (in XUL) + 266 [0x104dd82aa]
??? (in <unknown binary>) [0x2db3ef3eca02]
??? (in <unknown binary>) [0x125a60050]
??? (in <unknown binary>) [0x2db3ee55856f]
js::jit::MaybeEnterJit(JSContext*, js::RunState&) (in XUL) + 484 [0x105b10f54]
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) (in XUL) + 1766 [0x10556b186]
JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) (in XUL) + 727 [0x10574e757]
mozilla::dom::FrameRequestCallback::Call(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, double, mozilla::ErrorResult&) (in XUL) + 455 [0x104d13b27]
nsRefreshDriver::RunFrameRequestCallbacks(mozilla::TimeStamp) (in XUL) + 1461 [0x1051219e5]
nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsRefreshDriver::IsExtraTick) (in XUL) + 2221 [0x10511e1ed]
mozilla::RefreshDriverTimer::TickRefreshDrivers(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&) (in XUL) + 229 [0x105124485]
mozilla::RefreshDriverTimer::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) (in XUL) + 91 [0x1051242fb]
mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) (in XUL) + 355 [0x105123f33]
mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyParentProcessVsync() (in XUL) + 290 [0x105123ab2]
mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::ParentProcessVsyncNotifier::Run() (in XUL) + 32 [0x105123490]
mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (in XUL) + 2649 [0x1044f7dc9]
nsThread::ProcessNextEvent(bool, bool*) (in XUL) + 1658 [0x10450557a]
NS_ProcessPendingEvents(nsIThread*, unsigned int) (in XUL) + 124 [0x104502d2c]
nsAppShell::ProcessGeckoEvents(void*) (in XUL) + 241 [0x10509d6d1]
Reporter | ||
Comment 1•2 years ago
|
||
Process sample captured with macOS's Activity Monitor.
Reporter | ||
Comment 2•2 years ago
|
||
Linking bug 1708325 for visibility. If it is unrelated after all, remove the "See also".
Reporter | ||
Comment 3•2 years ago
|
||
Another freeze, this time not by closing tabs but by switching tabs (from Bugzilla to Google Calendar). When the process was frozen, the tab was still visually highlighted as if the mouse pointer was above it).
Based on the native stack (and the fact that I clicked on a different tab), the relevant sequence of code is probably:
- On tab select,
MozTabbrowserTabs
's_handleTabSelect
@ https://searchfox.org/mozilla-central/rev/e9eb869e90a8d717678c3f38bf75843e345729ab/toolkit/content/widgets/arrowscrollbox.js#315 - The scheduled animation frame callback called by
MozArrowScrollbox
'sensureElementIsVisible
@ https://searchfox.org/mozilla-central/rev/e9eb869e90a8d717678c3f38bf75843e345729ab/toolkit/content/widgets/arrowscrollbox.js#305,315
Stack trace (top = top of stack):
arena_t::DallocSmall(arena_chunk_t*, void*, arena_chunk_map_t*) (in libmozglue.dylib) + 215 [0x102d51597]free (in libmozglue.dylib) + 197 [0x102d4ecf5]
Servo_ComputedStyle_Release (in XUL) + 54 [0x106907f96]
servo_arc::Arc$LT$T$GT$::drop_slow::ha5d360bf71a0b7af (.llvm.17206588539849639304) (in XUL) + 18 [0x106871082]
Servo_ComputedStyle_Release (in XUL) + 54 [0x106907f96]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 4265 [0x1057cc819]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (in XUL) + 3475 [0x1057cc503]
mozilla::RestyleManager::DoProcessPendingRestyles(mozilla::ServoTraversalFlags) (in XUL) + 1722 [0x1057cd86a]
mozilla::PresShell::DoFlushPendingNotifications(mozilla::ChangesToFlush) (in XUL) + 1472 [0x1057bdf40]
mozilla::dom::Document::FlushPendingNotifications(mozilla::ChangesToFlush) (in XUL) + 175 [0x10525ed6f]
mozilla::PresShell::ScrollContentIntoView(nsIContent*, mozilla::ScrollAxis, mozilla::ScrollAxis, mozilla::ScrollFlags) (in XUL) + 335 [0x1057bbb8f]
mozilla::dom::Element::ScrollIntoView(mozilla::dom::ScrollIntoViewOptions const&) (in XUL) + 153 [0x107563fc9]
mozilla::dom::Element_Binding::scrollIntoView(JSContext*, JS::Handle<JSObject*>, void*, JSJitMethodCallArgs const&) (in XUL) + 289 [0x107cea521]
bool mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions>(JSContext*, unsigned int, JS::Value*) (in XUL) + 266 [0x1054592aa]
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) (in XUL) + 713 [0x105bebd69]
Interpret(JSContext*, js::RunState&) (in XUL) + 51822 [0x105be1b2e]
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) (in XUL) + 2140 [0x105bec2fc]
JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) (in XUL) + 727 [0x105dcf757]
mozilla::dom::FrameRequestCallback::Call(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, double, mozilla::ErrorResult&) (in XUL) + 455 [0x105394b27]
nsRefreshDriver::RunFrameRequestCallbacks(mozilla::TimeStamp) (in XUL) + 1461 [0x1057a29e5]
nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsRefreshDriver::IsExtraTick) (in XUL) + 2221 [0x10579f1ed]
mozilla::RefreshDriverTimer::TickRefreshDrivers(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&) (in XUL) + 229 [0x1057a5485]
mozilla::RefreshDriverTimer::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) (in XUL) + 91 [0x1057a52fb]
mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) (in XUL) + 355 [0x1057a4f33]
mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyParentProcessVsync() (in XUL) + 290 [0x1057a4ab2]
mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::ParentProcessVsyncNotifier::Run() (in XUL) + 32 [0x1057a4490]
mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) (in XUL) + 2649 [0x104b78dc9]
nsThread::ProcessNextEvent(bool, bool*) (in XUL) + 1658 [0x104b8657a]
NS_ProcessPendingEvents(nsIThread*, unsigned int) (in XUL) + 124 [0x104b83d2c]
nsAppShell::ProcessGeckoEvents(void*) (in XUL) + 241 [0x10571e6d1]
Reporter | ||
Updated•2 years ago
|
Updated•2 years ago
|
Reporter | ||
Comment 4•2 years ago
|
||
Another one, with closing tab, similar to the initial report. I closed Google Meet after a video call.
I also note that in every stack observed so far, the top of the stack is free
.
Comment 5•2 years ago
|
||
Sounds bad. In my wild guess this is a sort of memory allocation issues on MacOS. CCing glaundim since he is only one person I know of who is familiar with this kind of things.
Reporter | ||
Comment 6•2 years ago
|
||
Yet another lockup after closing Bugzilla. This is annoying.
Reporter | ||
Comment 7•2 years ago
|
||
Another one, after closing multiple tabs. I had multiple Google docs, the tabs were lazy after session restore, I clicked on them to focus one and then closed the tabs in succession. And then Firefox froze again.
Reporter | ||
Comment 8•2 years ago
|
||
Closed one tab (Google Docs) and Nightly froze again.
Note: these freezes happen when I use the mouse pointer to close a tab via the X button on a tab.
Reporter | ||
Comment 9•2 years ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #5)
Sounds bad. In my wild guess this is a sort of memory allocation issues on MacOS. CCing glaundim since he is only one person I know of who is familiar with this kind of things.
Adding needinfo to raise attention. Is this an issue with memory or something else?
Comment 10•2 years ago
|
||
This is more of a hang than a performance issue per se, but I'm hoping the [qf]
tag can bring more attention here.
Reporter | ||
Comment 11•2 years ago
|
||
I closed a tab while it was restoring on focus (the same tab that I tried to close last time, but session restore hasn't had a chance to remove the closed tab), and Nightly froze again.
Still the same kind of stack trace.
Comment 12•2 years ago
|
||
If glandium isn't reachable, we could also try pbone.
Comment 15•2 years ago
|
||
All the stack traces show a free() call in jemalloc, and a malloc call running concurrently on another thread. They also all happen with the same actions performed by Rob, and when called from this Servo code. If this were a jemalloc/memory problem then we'd see this behaviour with other stack traces, not just servo ones, and when the user performs other actions. So I think Servo is the right place to look. I'm going to keep looking but may need to ping some Servo people to get their help.
Thanks.
Comment 16•2 years ago
|
||
Hi Rob,
Are you able to reproduce this with, asan or tsan? If you can find a culprit with mozregression that'd be great, but I understand if it takes a long time to reproduce.
Comment 17•2 years ago
|
||
Hi Emilio. Could Bug 1707310 be causing this problem? it's the only change I've found to the code in the main thread's stack, a change to servro_arc
. Although I don't see how, I'm at the stage of finding candidate causes before beginning to rule them out.
Rob, you may wish to try a build made before Bug 1707310.
Comment 18•2 years ago
|
||
I'm suspecting heap corruption caused by servo because there's more than one thread in jemalloc when this hangs and servo is always in the stack of the main thread.
Assignee | ||
Comment 20•2 years ago
|
||
It should not, I only added a convenience method to get an already-adreffed Arc, and that code runs fairly frequently...
Assignee | ||
Comment 21•2 years ago
|
||
The other reason why this might always show servo on the stack is because servo has an uncommon jemalloc setup (we use thread-local arenas, but free stuff on the main thread).
Reporter | ||
Comment 23•2 years ago
|
||
bug 1708968 has a similar stack, with a build from May, 1st. That predates the nsRefreshDriver changes from bug 1708325, so I'm removing the bug link.
Reporter | ||
Comment 24•2 years ago
|
||
(In reply to Paul Bone [:pbone] from comment #16)
Are you able to reproduce this with, asan or tsan? If you can find a culprit with mozregression that'd be great, but I understand if it takes a long time to reproduce.
I cannot reproduce this issue on demand. As seen from the timings of the report, sometimes it takes days before I encounter the issue. But I have also seen the issue happening within 10 minutes or so.
What do you hope to see with ASAN/TSAN? I'm not encountering a crash, merely a deadlock.
Side note: I just looked on Bugzilla to try and find related issues. Keywords: freeze, frozen, freezes, lock, deadlock, non-responsive, unresponsive
Reporter | ||
Comment 25•2 years ago
|
||
Not specific to macOS, there is a report of this happening on Windows at bug 1710876.
Comment 26•2 years ago
|
||
(In reply to Rob Wu [:robwu] from comment #24)
(In reply to Paul Bone [:pbone] from comment #16)
Are you able to reproduce this with, asan or tsan? If you can find a culprit with mozregression that'd be great, but I understand if it takes a long time to reproduce.
I cannot reproduce this issue on demand. As seen from the timings of the report, sometimes it takes days before I encounter the issue. But I have also seen the issue happening within 10 minutes or so.
Okay, I must have missed that. Sorry.
What do you hope to see with ASAN/TSAN? I'm not encountering a crash, merely a deadlock.
Heap corruption, eg an out-of-bounds write that messes with jemalloc's bookkeeping or threads not synchronised.
Side note: I just looked on Bugzilla to try and find related issues. Keywords: freeze, frozen, freezes, lock, deadlock, non-responsive, unresponsive
Thanks.
Assignee | ||
Comment 29•2 years ago
|
||
So in bug 1711582 and bug 1711769 there are some full-process stacks. From what I can tell, the thing that they all have in common is that the main thread is sampled and it runs a SIGPROF signal handler while holding an allocator lock.
There are some stacks with malloc()
and some with free()
on the main thread. So the question is why does this happen usually with the restyle code on this particular scrollIntoView
call. Is it just because it is a common source of long restyles? Or is it something deeper? I suspect it's just the former fwiw, and that the bug should be reproducible without opening / closing tabs if we cause big restyles. I'll try to come up with something.
The only weird thing that the style code does is enabling jemalloc's thread-local arenas for the worker threads in here.
Moving my ni? from bug 1711769 to here in case Gerald has thoughts on how this can happen or any related change that could've landed recently.
Comment 30•2 years ago
|
||
From what I can tell, the thing that they all have in common is that the main thread is sampled and it runs a SIGPROF signal handler while holding an allocator lock.
I think the issue is not really that the signal handler is called while a lock is being held. It's that the signal handler itself wants to acquire a lock too, via futex_wait. That's not allowed.
Assignee | ||
Comment 31•2 years ago
|
||
Right. The signal handler is what ends up dead-locking.
Assignee | ||
Comment 32•2 years ago
|
||
So can someone who can reproduce this relatively frequently do something like pasting this in the browser toolbox, leaving Firefox open for a bit, and see if it reproduces the issue without closing / opening tabs?
function bigRestyle() {
var s = document.createElement("style");
s.innerHTML = "* { background-color: red; color: red }";
document.documentElement.appendChild(s);
document.documentElement.getBoundingClientRect();
s.remove();
document.documentElement.getBoundingClientRect();
}
setInterval(bigRestyle, 50);
Something like this might also be worth trying (this one makes Nightly rather unusable of course, hanging the parent process half of the time, but the idea is that the background hang monitor would kick in hopefully):
function bigRestyle() {
let start = Date.now();
while (Date.now() - start < 1000) { /* busy wait for a second */ }
let s = document.createElement("style");
s.innerHTML = "* { background-color: red; color: red }";
document.documentElement.appendChild(s);
document.documentElement.getBoundingClientRect();
s.remove();
document.documentElement.getBoundingClientRect();
}
setInterval(bigRestyle, 2000);
Assignee | ||
Comment 33•2 years ago
|
||
Huh, then again, comment 0 is on macOS, so something else is going on...
Assignee | ||
Comment 34•2 years ago
|
||
Oh, I think I have an idea of what this might be... One of the unique things the JS code involved does is using a privileged promiseDocumentFlushed API, which I recently refactored in bug 1699844... I wonder if the timing there matches these reports?
Assignee | ||
Comment 35•2 years ago
|
||
And if so, I believe this can be caused by corruption from bug 1716481 (which I just got ni?d on a couple days ago).
Assignee | ||
Comment 36•2 years ago
|
||
If some of the folks that can repro can use an ASAN nightly for a few days and see if there's any interesting reports that'd be greatly appreciated. I'll try to fix bug 1716481 ASAP anyways.
Comment 37•2 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #32)
So can someone who can reproduce this relatively frequently do something like pasting this in the browser toolbox, leaving Firefox open for a bit, and see if it reproduces the issue without closing / opening tabs?
Neither reproduces the issue for me.
Comment 38•2 years ago
|
||
With regular nightlies I can reproduce the hang within minutes. With the ASAN build I haven't been able to so far.
But I'll keep using it for a while as requested.
(In reply to Emilio Cobos Álvarez (:emilio) from comment #29)
So in bug 1711582 and bug 1711769 there are some full-process stacks. From what I can tell, the thing that they all have in common is that the main thread is sampled and it runs a SIGPROF signal handler while holding an allocator lock.
...
Moving my ni? from bug 1711769 to here in case Gerald has thoughts on how this can happen or any related change that could've landed recently.
Thank you Emilio.
Based on your bug 1711769 comment 9, I've filed bug 1717386 (ThreadStackHelper::CollectProfilingStackFrame should not allocate memory).
Now, looking at the other stacks in this bug here, I don't see any use of profiler_suspend_and_sample_thread
(which was the main issue there in relation with mallocs in signal handlers), so I doubt they're related, and that's why I've filed bug 1717386 separately.
Comment 40•2 years ago
|
||
Can I leave this in your capable hands Emilio? You seem to have a lot of the context including Servo and Bug 1716481.
Feel free to bounce this back at me if it's a memory allocator thing or there's something else I can do.
Reporter | ||
Comment 41•2 years ago
|
||
Another hang, after closing a tab (Bugzilla) via the context menu. Stack is similar to closing via the close button.
Was using 91.0a1 buildID 20210615134418.
Reporter | ||
Comment 42•2 years ago
|
||
Closed tab via Cmd-W (Bugzilla), and it happened again.
Nightly 91.0a1 buildID 20210620210823.
Reporter | ||
Comment 43•2 years ago
|
||
Updated Nightly to latest version, and pressed Cmd-W for each Bugzilla tab that restored. Then the bug happened again. That was quick...
Nightly 91.0a1 buildID 20210621213500
Assignee | ||
Comment 44•2 years ago
|
||
I'd be curious if you can repro now that bug 1717386 landed? That should fix at least some of the Linux deadlocks reported here, and I'm not familiar with the profiler on macOS but it seems it could be the cause there too.
Reporter | ||
Comment 45•2 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #44)
I'd be curious if you can repro now that bug 1717386 landed? That should fix at least some of the Linux deadlocks reported here, and I'm not familiar with the profiler on macOS but it seems it could be the cause there too.
I cannot easily verify the fix; I can merely try to see if the issue still happens by closing tabs and getting unlucky.
The profiler never appeared in the stacks that I shared, I'm not sure whether it's related. When my issue happens, I am just closing tabs (or switching tabs on some occasion).
I have just updated to 91.0a1 buildID 20210622212907, and sacrificed dozens of my tabs without triggering this bug. That doesn't prove that the bug got fixed however. I'll continue to watch for the issue and keep attaching stacks whenever I encounter this issue. Based on my reports so far, sometimes the issue happens within minutes, and sometimes it takes almost two weeks to reproduce.
Comment 46•2 years ago
|
||
(This is not really a qf bug, but something else, but p1 anyhow.)
Updated•2 years ago
|
Updated•2 years ago
|
Comment 47•2 years ago
|
||
Had no luck in reproducing this on the latest Nightly 91.0a1 91.0a1 (2021-06-23) (64-bit) upon surfing around the web, opening, closing, moving around tabs, and using restore session in the timeframe of 1 hour - tested on MacOS 10.15.
Assignee | ||
Comment 48•2 years ago
|
||
Let's call this fixed tentatively given the two suspect bugs are fixed and nobody has been able to repro this afterwards (see comments in bug
1717386) .
If someone can repro this again let's reopen.
Updated•2 years ago
|
Updated•2 years ago
|
Description
•