Closed Bug 1713170 Opened 3 years ago Closed 3 years ago

Firefox process locks up/freezes after closing a browser tab or switching tabs (via arrowscrollbox's call to scrollIntoView in requestAnimationFrame)

Categories

(Core :: CSS Parsing and Computation, defect)

x86_64
All
defect

Tracking

()

RESOLVED FIXED
91 Branch
Performance Impact high
Tracking Status
firefox-esr78 --- unaffected
firefox89 --- wontfix
firefox90 --- wontfix
firefox91 --- fixed

People

(Reporter: robwu, Assigned: emilio)

References

Details

(Keywords: perf:responsiveness, regression, regressionwindow-wanted)

Attachments

(10 files)

Recently (within the last few weeks), Firefox's main process has started to lock up when a tab is closed. Visually, the tab has fully been closed, and the mouse pointer is over the close button of the next tab (since the previous tab has been closed). I can experiencing this issue after closing a Google docs or Bugzilla tab shortly after startup, the latest freeze has been when I closed a Google Meet tab after ending a video meeting. I have updated Nightly a couple of times (most recently on the 25th) and the issue happened across different versions.

I've observed this at least four times on macOS. Below is the sample that I took with the system monitor, and the reversed stack trace. At first the stack changed, but now it has been quite stable like this for over an hour. Based on the stack and what I did (closing the tab), I suspect that it's stuck at MozArrowScrollbox's ensureElementIsVisible (arrowscrollbox.js). Higher up the stack is nsRefreshDriver, which was recently refactored in bug 1708325 (coincidence or not?).

RedBlackTree<arena_chunk_map_t, ArenaAvailTreeTrait>::Remove(RedBlackTree<arena_chunk_map_t, ArenaAvailTreeTrait>::TreeNode)  (in libmozglue.dylib) + 1370
arena_t::DallocRun(arena_run_t*, bool)  (in libmozglue.dylib) + 1597  [0x10190abad]
free  (in libmozglue.dylib) + 299  [0x10190ed5b]
servo_arc::Arc$LT$T$GT$::drop_slow::hf95d6fc8cf73aa5a  (in XUL) + 189  [0x1061f0e8d]
servo_arc::Arc$LT$T$GT$::drop_slow::ha5d360bf71a0b7af (.llvm.17206588539849639304)  (in XUL) + 1061  [0x1061f0495]
Servo_ComputedStyle_Release  (in XUL) + 54  [0x106286f96]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 4265  [0x10514b819]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x10514b503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x10514b503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x10514b503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x10514b503]
mozilla::RestyleManager::DoProcessPendingRestyles(mozilla::ServoTraversalFlags)  (in XUL) + 1722  [0x10514c86a]
mozilla::PresShell::DoFlushPendingNotifications(mozilla::ChangesToFlush)  (in XUL) + 1472  [0x10513cf40]
mozilla::dom::Document::FlushPendingNotifications(mozilla::ChangesToFlush)  (in XUL) + 175  [0x104bddd6f]
mozilla::PresShell::ScrollContentIntoView(nsIContent*, mozilla::ScrollAxis, mozilla::ScrollAxis, mozilla::ScrollFlags)  (in XUL) + 335  [0x10513ab8f]
mozilla::dom::Element::ScrollIntoView(mozilla::dom::ScrollIntoViewOptions const&)  (in XUL) + 153  [0x106ee2fc9]
mozilla::dom::Element_Binding::scrollIntoView(JSContext*, JS::Handle<JSObject*>, void*, JSJitMethodCallArgs const&)  (in XUL) + 289  [0x107669521]
bool mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions>(JSContext*, unsigned int, JS::Value*)  (in XUL) + 266  [0x104dd82aa]
???  (in <unknown binary>)  [0x2db3ef3eca02]
???  (in <unknown binary>)  [0x125a60050]
???  (in <unknown binary>)  [0x2db3ee55856f]
js::jit::MaybeEnterJit(JSContext*, js::RunState&)  (in XUL) + 484  [0x105b10f54]
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)  (in XUL) + 1766  [0x10556b186]
JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>)  (in XUL) + 727  [0x10574e757]
mozilla::dom::FrameRequestCallback::Call(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, double, mozilla::ErrorResult&)  (in XUL) + 455  [0x104d13b27]
nsRefreshDriver::RunFrameRequestCallbacks(mozilla::TimeStamp)  (in XUL) + 1461  [0x1051219e5]
nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsRefreshDriver::IsExtraTick)  (in XUL) + 2221  [0x10511e1ed]
mozilla::RefreshDriverTimer::TickRefreshDrivers(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&)  (in XUL) + 229  [0x105124485]
mozilla::RefreshDriverTimer::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp)  (in XUL) + 91  [0x1051242fb]
mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp)  (in XUL) + 355  [0x105123f33]
mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyParentProcessVsync()  (in XUL) + 290  [0x105123ab2]
mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::ParentProcessVsyncNotifier::Run()  (in XUL) + 32  [0x105123490]
mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&)  (in XUL) + 2649  [0x1044f7dc9]
nsThread::ProcessNextEvent(bool, bool*)  (in XUL) + 1658  [0x10450557a]
NS_ProcessPendingEvents(nsIThread*, unsigned int)  (in XUL) + 124  [0x104502d2c]
nsAppShell::ProcessGeckoEvents(void*)  (in XUL) + 241  [0x10509d6d1]

Process sample captured with macOS's Activity Monitor.

Linking bug 1708325 for visibility. If it is unrelated after all, remove the "See also".

See Also: → 1708325

Another freeze, this time not by closing tabs but by switching tabs (from Bugzilla to Google Calendar). When the process was frozen, the tab was still visually highlighted as if the mouse pointer was above it).

Based on the native stack (and the fact that I clicked on a different tab), the relevant sequence of code is probably:

Stack trace (top = top of stack):

arena_t::DallocSmall(arena_chunk_t*, void*, arena_chunk_map_t*)  (in libmozglue.dylib) + 215  [0x102d51597]free  (in libmozglue.dylib) + 197  [0x102d4ecf5]
Servo_ComputedStyle_Release  (in XUL) + 54  [0x106907f96]
servo_arc::Arc$LT$T$GT$::drop_slow::ha5d360bf71a0b7af (.llvm.17206588539849639304)  (in XUL) + 18  [0x106871082]
Servo_ComputedStyle_Release  (in XUL) + 54  [0x106907f96]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 4265  [0x1057cc819]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x1057cc503]
mozilla::RestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags)  (in XUL) + 3475  [0x1057cc503]
mozilla::RestyleManager::DoProcessPendingRestyles(mozilla::ServoTraversalFlags)  (in XUL) + 1722  [0x1057cd86a]
mozilla::PresShell::DoFlushPendingNotifications(mozilla::ChangesToFlush)  (in XUL) + 1472  [0x1057bdf40]
mozilla::dom::Document::FlushPendingNotifications(mozilla::ChangesToFlush)  (in XUL) + 175  [0x10525ed6f]
mozilla::PresShell::ScrollContentIntoView(nsIContent*, mozilla::ScrollAxis, mozilla::ScrollAxis, mozilla::ScrollFlags)  (in XUL) + 335  [0x1057bbb8f]
mozilla::dom::Element::ScrollIntoView(mozilla::dom::ScrollIntoViewOptions const&)  (in XUL) + 153  [0x107563fc9]
mozilla::dom::Element_Binding::scrollIntoView(JSContext*, JS::Handle<JSObject*>, void*, JSJitMethodCallArgs const&)  (in XUL) + 289  [0x107cea521]
bool mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions>(JSContext*, unsigned int, JS::Value*)  (in XUL) + 266  [0x1054592aa]
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)  (in XUL) + 713  [0x105bebd69]
Interpret(JSContext*, js::RunState&)  (in XUL) + 51822  [0x105be1b2e]
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)  (in XUL) + 2140  [0x105bec2fc]
JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>)  (in XUL) + 727  [0x105dcf757]
mozilla::dom::FrameRequestCallback::Call(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, double, mozilla::ErrorResult&)  (in XUL) + 455  [0x105394b27]
nsRefreshDriver::RunFrameRequestCallbacks(mozilla::TimeStamp)  (in XUL) + 1461  [0x1057a29e5]
nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsRefreshDriver::IsExtraTick)  (in XUL) + 2221  [0x10579f1ed]
mozilla::RefreshDriverTimer::TickRefreshDrivers(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&)  (in XUL) + 229  [0x1057a5485]
mozilla::RefreshDriverTimer::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp)  (in XUL) + 91  [0x1057a52fb]
mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp)  (in XUL) + 355  [0x1057a4f33]
mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyParentProcessVsync()  (in XUL) + 290  [0x1057a4ab2]
mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::ParentProcessVsyncNotifier::Run()  (in XUL) + 32  [0x1057a4490]
mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&)  (in XUL) + 2649  [0x104b78dc9]
nsThread::ProcessNextEvent(bool, bool*)  (in XUL) + 1658  [0x104b8657a]
NS_ProcessPendingEvents(nsIThread*, unsigned int)  (in XUL) + 124  [0x104b83d2c]
nsAppShell::ProcessGeckoEvents(void*)  (in XUL) + 241  [0x10571e6d1]
Summary: Firefox process locks up/freezes after closing a browser tab → Firefox process locks up/freezes after closing a browser tab or switching tabs (via arrowscrollbox's call to scrollIntoView in requestAnimationFrame)
Component: Tabbed Browser → CSS Parsing and Computation
Product: Firefox → Core

Another one, with closing tab, similar to the initial report. I closed Google Meet after a video call.

I also note that in every stack observed so far, the top of the stack is free.

Sounds bad. In my wild guess this is a sort of memory allocation issues on MacOS. CCing glaundim since he is only one person I know of who is familiar with this kind of things.

Severity: -- → S2
OS: Unspecified → macOS
Hardware: Unspecified → x86_64

Yet another lockup after closing Bugzilla. This is annoying.

Another one, after closing multiple tabs. I had multiple Google docs, the tabs were lazy after session restore, I clicked on them to focus one and then closed the tabs in succession. And then Firefox froze again.

Closed one tab (Google Docs) and Nightly froze again.

Note: these freezes happen when I use the mouse pointer to close a tab via the X button on a tab.

(In reply to Hiroyuki Ikezoe (:hiro) from comment #5)

Sounds bad. In my wild guess this is a sort of memory allocation issues on MacOS. CCing glaundim since he is only one person I know of who is familiar with this kind of things.

Adding needinfo to raise attention. Is this an issue with memory or something else?

Flags: needinfo?(mh+mozilla)

This is more of a hang than a performance issue per se, but I'm hoping the [qf] tag can bring more attention here.

Whiteboard: [qf]

I closed a tab while it was restoring on focus (the same tab that I tried to close last time, but session restore hasn't had a chance to remove the closed tab), and Nightly froze again.

Still the same kind of stack trace.

If glandium isn't reachable, we could also try pbone.

Matt, could this be related to bug 1708325?

Flags: needinfo?(matt.woodrow)

Hey pbone, any ideas what might be happening here?

Flags: needinfo?(pbone)

All the stack traces show a free() call in jemalloc, and a malloc call running concurrently on another thread. They also all happen with the same actions performed by Rob, and when called from this Servo code. If this were a jemalloc/memory problem then we'd see this behaviour with other stack traces, not just servo ones, and when the user performs other actions. So I think Servo is the right place to look. I'm going to keep looking but may need to ping some Servo people to get their help.

Thanks.

Assignee: nobody → pbone
Status: NEW → ASSIGNED
Flags: needinfo?(pbone)

Hi Rob,

Are you able to reproduce this with, asan or tsan? If you can find a culprit with mozregression that'd be great, but I understand if it takes a long time to reproduce.

Hi Emilio. Could Bug 1707310 be causing this problem? it's the only change I've found to the code in the main thread's stack, a change to servro_arc. Although I don't see how, I'm at the stage of finding candidate causes before beginning to rule them out.

Rob, you may wish to try a build made before Bug 1707310.

Flags: needinfo?(emilio)

I'm suspecting heap corruption caused by servo because there's more than one thread in jemalloc when this hangs and servo is always in the stack of the main thread.

Forgot to NI Rob re asan/tsan.

Flags: needinfo?(rob)

It should not, I only added a convenience method to get an already-adreffed Arc, and that code runs fairly frequently...

Flags: needinfo?(emilio)

The other reason why this might always show servo on the stack is because servo has an uncommon jemalloc setup (we use thread-local arenas, but free stuff on the main thread).

bug 1708968 has a similar stack, with a build from May, 1st. That predates the nsRefreshDriver changes from bug 1708325, so I'm removing the bug link.

Flags: needinfo?(rob)
Flags: needinfo?(matt.woodrow)
See Also: 1708325

(In reply to Paul Bone [:pbone] from comment #16)

Are you able to reproduce this with, asan or tsan? If you can find a culprit with mozregression that'd be great, but I understand if it takes a long time to reproduce.

I cannot reproduce this issue on demand. As seen from the timings of the report, sometimes it takes days before I encounter the issue. But I have also seen the issue happening within 10 minutes or so.

What do you hope to see with ASAN/TSAN? I'm not encountering a crash, merely a deadlock.

Side note: I just looked on Bugzilla to try and find related issues. Keywords: freeze, frozen, freezes, lock, deadlock, non-responsive, unresponsive

Not specific to macOS, there is a report of this happening on Windows at bug 1710876.

OS: macOS → All
See Also: → 1710876

(In reply to Rob Wu [:robwu] from comment #24)

(In reply to Paul Bone [:pbone] from comment #16)

Are you able to reproduce this with, asan or tsan? If you can find a culprit with mozregression that'd be great, but I understand if it takes a long time to reproduce.

I cannot reproduce this issue on demand. As seen from the timings of the report, sometimes it takes days before I encounter the issue. But I have also seen the issue happening within 10 minutes or so.

Okay, I must have missed that. Sorry.

What do you hope to see with ASAN/TSAN? I'm not encountering a crash, merely a deadlock.

Heap corruption, eg an out-of-bounds write that messes with jemalloc's bookkeeping or threads not synchronised.

Side note: I just looked on Bugzilla to try and find related issues. Keywords: freeze, frozen, freezes, lock, deadlock, non-responsive, unresponsive

Thanks.

So in bug 1711582 and bug 1711769 there are some full-process stacks. From what I can tell, the thing that they all have in common is that the main thread is sampled and it runs a SIGPROF signal handler while holding an allocator lock.

There are some stacks with malloc() and some with free() on the main thread. So the question is why does this happen usually with the restyle code on this particular scrollIntoView call. Is it just because it is a common source of long restyles? Or is it something deeper? I suspect it's just the former fwiw, and that the bug should be reproducible without opening / closing tabs if we cause big restyles. I'll try to come up with something.

The only weird thing that the style code does is enabling jemalloc's thread-local arenas for the worker threads in here.

Moving my ni? from bug 1711769 to here in case Gerald has thoughts on how this can happen or any related change that could've landed recently.

Flags: needinfo?(gsquelart)

From what I can tell, the thing that they all have in common is that the main thread is sampled and it runs a SIGPROF signal handler while holding an allocator lock.

I think the issue is not really that the signal handler is called while a lock is being held. It's that the signal handler itself wants to acquire a lock too, via futex_wait. That's not allowed.

Right. The signal handler is what ends up dead-locking.

So can someone who can reproduce this relatively frequently do something like pasting this in the browser toolbox, leaving Firefox open for a bit, and see if it reproduces the issue without closing / opening tabs?

function bigRestyle() {
  var s = document.createElement("style");
  s.innerHTML = "* { background-color: red; color: red }";
  document.documentElement.appendChild(s);
  document.documentElement.getBoundingClientRect();
  s.remove();
  document.documentElement.getBoundingClientRect();
}
setInterval(bigRestyle, 50);

Something like this might also be worth trying (this one makes Nightly rather unusable of course, hanging the parent process half of the time, but the idea is that the background hang monitor would kick in hopefully):

function bigRestyle() {
  let start = Date.now();
  while (Date.now() - start < 1000) { /* busy wait for a second */ }
  let s = document.createElement("style");
  s.innerHTML = "* { background-color: red; color: red }";
  document.documentElement.appendChild(s);
  document.documentElement.getBoundingClientRect();
  s.remove();
  document.documentElement.getBoundingClientRect();
}
setInterval(bigRestyle, 2000);

Huh, then again, comment 0 is on macOS, so something else is going on...

Oh, I think I have an idea of what this might be... One of the unique things the JS code involved does is using a privileged promiseDocumentFlushed API, which I recently refactored in bug 1699844... I wonder if the timing there matches these reports?

And if so, I believe this can be caused by corruption from bug 1716481 (which I just got ni?d on a couple days ago).

See Also: → 1716481

If some of the folks that can repro can use an ASAN nightly for a few days and see if there's any interesting reports that'd be greatly appreciated. I'll try to fix bug 1716481 ASAP anyways.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #32)

So can someone who can reproduce this relatively frequently do something like pasting this in the browser toolbox, leaving Firefox open for a bit, and see if it reproduces the issue without closing / opening tabs?

Neither reproduces the issue for me.

With regular nightlies I can reproduce the hang within minutes. With the ASAN build I haven't been able to so far.
But I'll keep using it for a while as requested.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #29)

So in bug 1711582 and bug 1711769 there are some full-process stacks. From what I can tell, the thing that they all have in common is that the main thread is sampled and it runs a SIGPROF signal handler while holding an allocator lock.
...
Moving my ni? from bug 1711769 to here in case Gerald has thoughts on how this can happen or any related change that could've landed recently.

Thank you Emilio.
Based on your bug 1711769 comment 9, I've filed bug 1717386 (ThreadStackHelper::CollectProfilingStackFrame should not allocate memory).

Now, looking at the other stacks in this bug here, I don't see any use of profiler_suspend_and_sample_thread (which was the main issue there in relation with mallocs in signal handlers), so I doubt they're related, and that's why I've filed bug 1717386 separately.

Flags: needinfo?(gsquelart)

Can I leave this in your capable hands Emilio? You seem to have a lot of the context including Servo and Bug 1716481.

Feel free to bounce this back at me if it's a memory allocator thing or there's something else I can do.

Assignee: pbone → emilio

Another hang, after closing a tab (Bugzilla) via the context menu. Stack is similar to closing via the close button.

Was using 91.0a1 buildID 20210615134418.

Closed tab via Cmd-W (Bugzilla), and it happened again.

Nightly 91.0a1 buildID 20210620210823.

Updated Nightly to latest version, and pressed Cmd-W for each Bugzilla tab that restored. Then the bug happened again. That was quick...

Nightly 91.0a1 buildID 20210621213500

I'd be curious if you can repro now that bug 1717386 landed? That should fix at least some of the Linux deadlocks reported here, and I'm not familiar with the profiler on macOS but it seems it could be the cause there too.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #44)

I'd be curious if you can repro now that bug 1717386 landed? That should fix at least some of the Linux deadlocks reported here, and I'm not familiar with the profiler on macOS but it seems it could be the cause there too.

I cannot easily verify the fix; I can merely try to see if the issue still happens by closing tabs and getting unlucky.
The profiler never appeared in the stacks that I shared, I'm not sure whether it's related. When my issue happens, I am just closing tabs (or switching tabs on some occasion).

I have just updated to 91.0a1 buildID 20210622212907, and sacrificed dozens of my tabs without triggering this bug. That doesn't prove that the bug got fixed however. I'll continue to watch for the issue and keep attaching stacks whenever I encounter this issue. Based on my reports so far, sometimes the issue happens within minutes, and sometimes it takes almost two weeks to reproduce.

(This is not really a qf bug, but something else, but p1 anyhow.)

Whiteboard: [qf] → [qf:p1:responsiveness]
QA Whiteboard: [qa-regression-triage]

Had no luck in reproducing this on the latest Nightly 91.0a1 91.0a1 (2021-06-23) (64-bit) upon surfing around the web, opening, closing, moving around tabs, and using restore session in the timeframe of 1 hour - tested on MacOS 10.15.

Whiteboard: [qf:p1:responsiveness] → [qf:p1:responsiveness]

Let's call this fixed tentatively given the two suspect bugs are fixed and nobody has been able to repro this afterwards (see comments in bug
1717386) .

If someone can repro this again let's reopen.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(mh+mozilla)
Resolution: --- → FIXED
Target Milestone: --- → 91 Branch
Performance Impact: --- → P1
Whiteboard: [qf:p1:responsiveness]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: