Crash reporter doesn't seem to catch crashes when force quitting after resume from sleep
Categories
(Core :: IPC, defect)
Tracking
()
People
(Reporter: jryans, Unassigned)
Details
Attachments
(1 file)
|
104.13 KB,
text/plain
|
Details |
Comment 1•7 years ago
|
||
Updated•7 years ago
|
Comment 2•7 years ago
|
||
Comment 3•7 years ago
|
||
Updated•7 years ago
|
Comment 4•7 years ago
|
||
| Reporter | ||
Comment 5•7 years ago
|
||
Comment 6•7 years ago
|
||
| Reporter | ||
Comment 7•7 years ago
|
||
Comment 8•7 years ago
|
||
Comment 9•7 years ago
|
||
This is still concerning to me, but also unlikely to go anywhere in time for 65 at this point. Will leave this as fix-optional in case a low-risk fix does arrive at some point.
Updated•7 years ago
|
Comment 10•7 years ago
|
||
jryans, can you try the crash me now extension to force a crash?
https://github.com/rhelmer/webext-experiment-crashme
Comment 11•7 years ago
|
||
Stephen, have you seen other recent problems with hangs on waking from sleep, on macOS 10.13?
Is there any way we can usefully investigate? Telemetry to check?
| Reporter | ||
Comment 12•7 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #8)
(In reply to J. Ryan Stinnett [:jryans] from comment #7)
Ah okay, no recent crashes submitted. According to about:crashes, 2018-10-20
is the last time I was able to submit a crash to Mozilla,If you still have the crash report for that one it would be helpful to
figure out what's going on.
The report from 2018-10 is https://crash-stats.mozilla.org/report/index/aaea24a2-bb78-4966-9a6e-9bd040181020, but I don't think it's very helpful here, as it's not same kind of crash this bug is about. The resume from sleep crashes aren't being caught by the Mozilla crash reporter, so I don't have Mozilla crash reports to share for them.
| Reporter | ||
Comment 13•7 years ago
|
||
(In reply to Liz Henry (:lizzard) (use needinfo) from comment #10)
jryans, can you try the crash me now extension to force a crash?
https://github.com/rhelmer/webext-experiment-crashme
Yes, using this add-on, I was able to trigger a crash and successfully submit a report to Mozilla:
https://crash-stats.mozilla.org/report/index/97ed020c-cabf-4b8a-85be-7f99f0190118
So, this confirms I can still report crashes in the general case.
The unsolved issue seems to be getting crash reporter to correctly capture resume from sleep crashes so that they load in the Mozilla crash reporter (instead of Apple) and can be submitted.
Comment 14•7 years ago
|
||
(In reply to Liz Henry (:lizzard) (use needinfo) from comment #11)
Stephen, have you seen other recent problems with hangs on waking from sleep, on macOS 10.13?
Is there any way we can usefully investigate? Telemetry to check?
The only other recent issue that I'm aware of is bug 1516367. However, that applies to all crashes, not just wake from sleep. Someone who's more familiar with the crash reporter may be able to help here.
Comment 15•7 years ago
|
||
Anthony, can anyone from your team investigate crashing on wake from sleep?
Comment 16•7 years ago
|
||
This likely isn't a priority for 66, but I think it still could use investigation to make sure there isn't a widespread problem. I'll follow up in email.
Updated•7 years ago
|
It seems that the problem at hand is that resume after suspend is broken.
It is not clear that we should be generating a crash report for a force kill, because if nothing else, it could cause false positives. While we could discuss that, it would likely result in more happiness for us to focus on the resume issue. I have certainly seen resume issues in the past when I used a Mac and had trouble generating a crash report. I don't recall whether I ended up filing a bug about it.
Attaching a debugger would help, as would getting a crash report for the right process (i.e. the one that is hanging or probably deadlocking). Perhaps there is also an issue with deadlock detection not waking up properly either.
Eric - can you discuss this with Nathan and/or Gabriele to figure out a way to get this ticket moving?
Comment 18•7 years ago
|
||
THere does seem to be a rash a Mac wake related issues, along the lines of Bug 1201401 - crash in CVCGDisplayLink::getDisplayTimes Mac coming out of sleep (waking) with external monitor
Comment 19•6 years ago
|
||
jryans, are you still seeing this?
Either way, comment 3 and comment 4 indicate we don't expect to get a crash report when force quitting. I'm inclined to wontfix this, but we could morph it into a bug that deals with the underlying issue. AFAICT from the attached crash report there is some sort of deadlock where we're trying to send a gfxCriticialError and blocking the main thread waiting on a mutex. I'm going to at least move this over to IPC for now so that they can take a look.
Thread 0:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x00007fff5ae90a46 __psynch_mutexwait + 10
1 libsystem_pthread.dylib 0x00007fff5b058b9d _pthread_mutex_lock_wait + 83
2 libsystem_pthread.dylib 0x00007fff5b0564c8 _pthread_mutex_lock_slow + 253
3 libmozglue.dylib 0x00000001079661ae mozilla::detail::MutexImpl::lock() + 142
4 XUL 0x000000010a2b6c07 mozilla::ipc::MessageChannel::Send(IPC::Message*) + 647
5 XUL 0x000000010a38e00b mozilla::dom::PContentChild::SendGraphicsError(nsTString<char> const&) + 443
6 XUL 0x000000010ad3d1fb CrashStatsLogForwarder::Log(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 1195
7 XUL 0x000000010aa35a79 mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::Flush() + 185
8 XUL 0x000000010acf8adc mozilla::layers::CompositorBridgeChild::ActorDestroy(mozilla::ipc::IProtocol::ActorDestroyReason) + 348
9 XUL 0x000000010a6f28d6 mozilla::layers::PCompositorBridgeChild::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) + 1366
10 XUL 0x000000010a36a61f mozilla::layers::PCompositorManagerChild::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) + 111
11 XUL 0x000000010a36a927 mozilla::layers::PCompositorManagerChild::OnChannelError() + 23
12 XUL 0x000000010a2c8d97 mozilla::detail::RunnableMethodImpl<mozilla::ipc::MessageChannel*, void (mozilla::ipc::MessageChannel::*)(), false, (mozilla::RunnableKind)1>::Run() + 39
13 XUL 0x0000000109bec243 nsThread::ProcessNextEvent(bool, bool*) + 2819
14 XUL 0x0000000109beeca8 NS_ProcessNextEvent(nsIThread*, bool) + 56
15 XUL 0x000000010a2c0be7 mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) + 279
16 XUL 0x000000010cbb3f0e nsBaseAppShell::Run() + 126
17 XUL 0x000000010cc34157 nsAppShell::Run() + 151
18 XUL 0x000000010e265378 XRE_RunAppShell() + 488
19 XUL 0x000000010e265014 XRE_InitChildProcess(int, char**, XREChildData const*) + 4196
20 org.mozilla.plugincontainer 0x00000001075b9f39 main + 89
21 libdyld.dylib 0x00007fff5ad40015 start + 1
Thread 3 Crashed:: Chrome_~dThread
0 XUL 0x000000010a2be858 mozilla::ipc::MessageChannel::OnChannelErrorFromLink() + 696
1 XUL 0x000000010a2c0475 non-virtual thunk to mozilla::ipc::ProcessLink::OnChannelError() + 53
2 XUL 0x000000010a297dd4 event_process_active_single_queue + 1684
3 XUL 0x000000010a295e70 event_base_loop + 1824
4 XUL 0x000000010a2822bb base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) + 331
5 XUL 0x000000010a28933b base::Thread::ThreadMain() + 1019
6 XUL 0x000000010a2859ba ThreadFunc(void*) (.llvm.14947838784875774767) + 10
7 libsystem_pthread.dylib 0x00007fff5b058661 _pthread_body + 340
8 libsystem_pthread.dylib 0x00007fff5b05850d _pthread_start + 377
9 libsystem_pthread.dylib 0x00007fff5b057bf9 thread_start + 13
| Reporter | ||
Comment 20•6 years ago
|
||
I no longer seem to experience this with Nightly in the last month or so, so perhaps the underlying issue has been fixed.
Comment 21•6 years ago
|
||
In comment #19, the lock the main thread is waiting for is probably MessageChannel::mMonitor, which the IPC I/O thread (Chrome_ChildThread; we really should rename that) would have to be holding to enter MessageChannel::OnChannelErrorFromLink.
But this doesn't look like a deadlock: see bug 1354200, and specifically the comment that was added with it; this is probably just the child process reacting to the parent process exiting, which used to be a MOZ_CRASH and is now an _exit because it was the opposite of helpful to invoke the OS crash reporter in that case (see also bug 1518470).
The original hang may have been caused by something in the parent process that was destroyed by the Force Quit.
Updated•6 years ago
|
Comment 22•6 years ago
|
||
As a crash reporter bug this seems to be WONTFIX (comment #4); as an IPC bug (that we were causing spurious OS-level crash reports as a side-effect of force-quitting the main process) this looks like a duplicate of bug 1354200.
Description
•