Crash reporter doesn't seem to catch crashes when force quitting after resume from sleep
Categories
(Core :: IPC, defect)
Tracking
()
People
(Reporter: jryans, Unassigned)
Details
Attachments
(1 file)
104.13 KB,
text/plain
|
Details |
Every so often, Firefox will hang when I resume my computer from sleep mode. I then force quit it and restart it to restore my session. Usually I would see the Mozilla crash reporter appear after force quitting (I am pretty sure at least...), but in the last few weeks, I instead see the Apple crash reporter dialog instead. At the very least, I am pretty sure I would _not_ see the Apple dialog in the past. The crashes are typically in the child process. I have attached an example report from the Apple crash reporter. I am on macOS 10.13.6.
Comment 1•5 years ago
|
||
Gabriele, Not sure you have cycles to look into that, or know someone who has. But that looks pretty bad... Note that jryans is still around if we need some more detail on this.
Updated•5 years ago
|
Comment 2•5 years ago
|
||
I remember encountering a similar report some time ago. Leaving the NI to test on my Mac.
Comment 3•5 years ago
|
||
I wasn't aware that we ever handled "Force Quit", but it ought to be straightforward to test on an older release. I'm pretty sure that does the Darwin equivalent of SIGKILL which is not something you can handle.
Updated•5 years ago
|
Comment 4•5 years ago
|
||
Force quitting doesn't trigger the crash reporter. If it appeared before it might have been because some hang detector had a chance of killing Firefox before it was force-quitted. Do you still have some submitted reports for those occurrences on crash-stats?
Reporter | ||
Comment 5•5 years ago
|
||
I am not sure I understand the question... Since the crashes went to the Apple crash reporter instead of Mozilla's, there was no option to submit them to crash-stats. I have Apple format *.crash files listed in Console.app similar to the attachment on this bug. If there's some way to manually submit those to crash-stats, please let me know. What's a good way inject a crash that should definitely be caught be the crash reporter, so I can check whether it's working at all for me? About the hang detector, have there been changes in that area recently? Mainly I am just worried that other people could be seeing the same as me, so potentially there are many crash reports no longer being collected.
Comment 6•5 years ago
|
||
Sorry for the confusion, I meant if you had submissions of the crashes you sent when the crash reporter was still showing up.
Reporter | ||
Comment 7•5 years ago
|
||
Ah okay, no recent crashes submitted. According to about:crashes, 2018-10-20 is the last time I was able to submit a crash to Mozilla, but in reality I do see them every few days or so.
Comment 8•5 years ago
|
||
(In reply to J. Ryan Stinnett [:jryans] from comment #7) > Ah okay, no recent crashes submitted. According to about:crashes, 2018-10-20 > is the last time I was able to submit a crash to Mozilla, If you still have the crash report for that one it would be helpful to figure out what's going on.
Comment 9•5 years ago
|
||
This is still concerning to me, but also unlikely to go anywhere in time for 65 at this point. Will leave this as fix-optional in case a low-risk fix does arrive at some point.
Updated•5 years ago
|
Comment 10•5 years ago
|
||
jryans, can you try the crash me now extension to force a crash?
https://github.com/rhelmer/webext-experiment-crashme
Comment 11•5 years ago
|
||
Stephen, have you seen other recent problems with hangs on waking from sleep, on macOS 10.13?
Is there any way we can usefully investigate? Telemetry to check?
Reporter | ||
Comment 12•5 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #8)
(In reply to J. Ryan Stinnett [:jryans] from comment #7)
Ah okay, no recent crashes submitted. According to about:crashes, 2018-10-20
is the last time I was able to submit a crash to Mozilla,If you still have the crash report for that one it would be helpful to
figure out what's going on.
The report from 2018-10 is https://crash-stats.mozilla.org/report/index/aaea24a2-bb78-4966-9a6e-9bd040181020, but I don't think it's very helpful here, as it's not same kind of crash this bug is about. The resume from sleep crashes aren't being caught by the Mozilla crash reporter, so I don't have Mozilla crash reports to share for them.
Reporter | ||
Comment 13•5 years ago
|
||
(In reply to Liz Henry (:lizzard) (use needinfo) from comment #10)
jryans, can you try the crash me now extension to force a crash?
https://github.com/rhelmer/webext-experiment-crashme
Yes, using this add-on, I was able to trigger a crash and successfully submit a report to Mozilla:
https://crash-stats.mozilla.org/report/index/97ed020c-cabf-4b8a-85be-7f99f0190118
So, this confirms I can still report crashes in the general case.
The unsolved issue seems to be getting crash reporter to correctly capture resume from sleep crashes so that they load in the Mozilla crash reporter (instead of Apple) and can be submitted.
Comment 14•5 years ago
|
||
(In reply to Liz Henry (:lizzard) (use needinfo) from comment #11)
Stephen, have you seen other recent problems with hangs on waking from sleep, on macOS 10.13?
Is there any way we can usefully investigate? Telemetry to check?
The only other recent issue that I'm aware of is bug 1516367. However, that applies to all crashes, not just wake from sleep. Someone who's more familiar with the crash reporter may be able to help here.
Comment 15•5 years ago
|
||
Anthony, can anyone from your team investigate crashing on wake from sleep?
Comment 16•5 years ago
|
||
This likely isn't a priority for 66, but I think it still could use investigation to make sure there isn't a widespread problem. I'll follow up in email.
Updated•5 years ago
|
It seems that the problem at hand is that resume after suspend is broken.
It is not clear that we should be generating a crash report for a force kill, because if nothing else, it could cause false positives. While we could discuss that, it would likely result in more happiness for us to focus on the resume issue. I have certainly seen resume issues in the past when I used a Mac and had trouble generating a crash report. I don't recall whether I ended up filing a bug about it.
Attaching a debugger would help, as would getting a crash report for the right process (i.e. the one that is hanging or probably deadlocking). Perhaps there is also an issue with deadlock detection not waking up properly either.
Eric - can you discuss this with Nathan and/or Gabriele to figure out a way to get this ticket moving?
Comment 18•5 years ago
|
||
THere does seem to be a rash a Mac wake related issues, along the lines of Bug 1201401 - crash in CVCGDisplayLink::getDisplayTimes Mac coming out of sleep (waking) with external monitor
Comment 19•5 years ago
|
||
jryans, are you still seeing this?
Either way, comment 3 and comment 4 indicate we don't expect to get a crash report when force quitting. I'm inclined to wontfix this, but we could morph it into a bug that deals with the underlying issue. AFAICT from the attached crash report there is some sort of deadlock where we're trying to send a gfxCriticialError
and blocking the main thread waiting on a mutex. I'm going to at least move this over to IPC for now so that they can take a look.
Thread 0:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x00007fff5ae90a46 __psynch_mutexwait + 10
1 libsystem_pthread.dylib 0x00007fff5b058b9d _pthread_mutex_lock_wait + 83
2 libsystem_pthread.dylib 0x00007fff5b0564c8 _pthread_mutex_lock_slow + 253
3 libmozglue.dylib 0x00000001079661ae mozilla::detail::MutexImpl::lock() + 142
4 XUL 0x000000010a2b6c07 mozilla::ipc::MessageChannel::Send(IPC::Message*) + 647
5 XUL 0x000000010a38e00b mozilla::dom::PContentChild::SendGraphicsError(nsTString<char> const&) + 443
6 XUL 0x000000010ad3d1fb CrashStatsLogForwarder::Log(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 1195
7 XUL 0x000000010aa35a79 mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::Flush() + 185
8 XUL 0x000000010acf8adc mozilla::layers::CompositorBridgeChild::ActorDestroy(mozilla::ipc::IProtocol::ActorDestroyReason) + 348
9 XUL 0x000000010a6f28d6 mozilla::layers::PCompositorBridgeChild::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) + 1366
10 XUL 0x000000010a36a61f mozilla::layers::PCompositorManagerChild::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) + 111
11 XUL 0x000000010a36a927 mozilla::layers::PCompositorManagerChild::OnChannelError() + 23
12 XUL 0x000000010a2c8d97 mozilla::detail::RunnableMethodImpl<mozilla::ipc::MessageChannel*, void (mozilla::ipc::MessageChannel::*)(), false, (mozilla::RunnableKind)1>::Run() + 39
13 XUL 0x0000000109bec243 nsThread::ProcessNextEvent(bool, bool*) + 2819
14 XUL 0x0000000109beeca8 NS_ProcessNextEvent(nsIThread*, bool) + 56
15 XUL 0x000000010a2c0be7 mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) + 279
16 XUL 0x000000010cbb3f0e nsBaseAppShell::Run() + 126
17 XUL 0x000000010cc34157 nsAppShell::Run() + 151
18 XUL 0x000000010e265378 XRE_RunAppShell() + 488
19 XUL 0x000000010e265014 XRE_InitChildProcess(int, char**, XREChildData const*) + 4196
20 org.mozilla.plugincontainer 0x00000001075b9f39 main + 89
21 libdyld.dylib 0x00007fff5ad40015 start + 1
Thread 3 Crashed:: Chrome_~dThread
0 XUL 0x000000010a2be858 mozilla::ipc::MessageChannel::OnChannelErrorFromLink() + 696
1 XUL 0x000000010a2c0475 non-virtual thunk to mozilla::ipc::ProcessLink::OnChannelError() + 53
2 XUL 0x000000010a297dd4 event_process_active_single_queue + 1684
3 XUL 0x000000010a295e70 event_base_loop + 1824
4 XUL 0x000000010a2822bb base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) + 331
5 XUL 0x000000010a28933b base::Thread::ThreadMain() + 1019
6 XUL 0x000000010a2859ba ThreadFunc(void*) (.llvm.14947838784875774767) + 10
7 libsystem_pthread.dylib 0x00007fff5b058661 _pthread_body + 340
8 libsystem_pthread.dylib 0x00007fff5b05850d _pthread_start + 377
9 libsystem_pthread.dylib 0x00007fff5b057bf9 thread_start + 13
Reporter | ||
Comment 20•5 years ago
|
||
I no longer seem to experience this with Nightly in the last month or so, so perhaps the underlying issue has been fixed.
Comment 21•5 years ago
|
||
In comment #19, the lock the main thread is waiting for is probably MessageChannel::mMonitor
, which the IPC I/O thread (Chrome_ChildThread
; we really should rename that) would have to be holding to enter MessageChannel::OnChannelErrorFromLink
.
But this doesn't look like a deadlock: see bug 1354200, and specifically the comment that was added with it; this is probably just the child process reacting to the parent process exiting, which used to be a MOZ_CRASH
and is now an _exit
because it was the opposite of helpful to invoke the OS crash reporter in that case (see also bug 1518470).
The original hang may have been caused by something in the parent process that was destroyed by the Force Quit.
Updated•5 years ago
|
Comment 22•5 years ago
|
||
As a crash reporter bug this seems to be WONTFIX (comment #4); as an IPC bug (that we were causing spurious OS-level crash reports as a side-effect of force-quitting the main process) this looks like a duplicate of bug 1354200.
Description
•