Closed Bug 1501575 Opened 6 years ago Closed 5 years ago

Crash reporter doesn't seem to catch crashes when force quitting after resume from sleep

Categories

(Core :: IPC, defect)

Unspecified
macOS
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1354200
Tracking Status
firefox65 - wontfix
firefox66 - wontfix
firefox67 - fix-optional
firefox68 --- affected

People

(Reporter: jryans, Unassigned)

Details

Attachments

(1 file)

Every so often, Firefox will hang when I resume my computer from sleep mode.  I then force quit it and restart it to restore my session.  Usually I would see the Mozilla crash reporter appear after force quitting (I am pretty sure at least...), but in the last few weeks, I instead see the Apple crash reporter dialog instead.  At the very least, I am pretty sure I would _not_ see the Apple dialog in the past.

The crashes are typically in the child process.  I have attached an example report from the Apple crash reporter.

I am on macOS 10.13.6.
Gabriele, Not sure you have cycles to look into that, or know someone who has. But that looks pretty bad...

Note that jryans is still around if we need some more detail on this.
Flags: needinfo?(gsvelto)
I remember encountering a similar report some time ago. Leaving the NI to test on my Mac.
I wasn't aware that we ever handled "Force Quit", but it ought to be straightforward to test on an older release. I'm pretty sure that does the Darwin equivalent of SIGKILL which is not something you can handle.
Force quitting doesn't trigger the crash reporter. If it appeared before it might have been because some hang detector had a chance of killing Firefox before it was force-quitted. Do you still have some submitted reports for those occurrences on crash-stats?
Flags: needinfo?(gsvelto) → needinfo?(jryans)
I am not sure I understand the question... Since the crashes went to the Apple crash reporter instead of Mozilla's, there was no option to submit them to crash-stats. I have Apple format *.crash files listed in Console.app similar to the attachment on this bug. If there's some way to manually submit those to crash-stats, please let me know.

What's a good way inject a crash that should definitely be caught be the crash reporter, so I can check whether it's working at all for me?

About the hang detector, have there been changes in that area recently? Mainly I am just worried that other people could be seeing the same as me, so potentially there are many crash reports no longer being collected.
Flags: needinfo?(jryans) → needinfo?(gsvelto)
Sorry for the confusion, I meant if you had submissions of the crashes you sent when the crash reporter was still showing up.
Flags: needinfo?(gsvelto)
Ah okay, no recent crashes submitted. According to about:crashes, 2018-10-20 is the last time I was able to submit a crash to Mozilla, but in reality I do see them every few days or so.
(In reply to J. Ryan Stinnett [:jryans] from comment #7)
> Ah okay, no recent crashes submitted. According to about:crashes, 2018-10-20
> is the last time I was able to submit a crash to Mozilla,

If you still have the crash report for that one it would be helpful to figure out what's going on.

This is still concerning to me, but also unlikely to go anywhere in time for 65 at this point. Will leave this as fix-optional in case a low-risk fix does arrive at some point.

jryans, can you try the crash me now extension to force a crash?
https://github.com/rhelmer/webext-experiment-crashme

Flags: needinfo?(jryans)

Stephen, have you seen other recent problems with hangs on waking from sleep, on macOS 10.13?
Is there any way we can usefully investigate? Telemetry to check?

Flags: needinfo?(spohl.mozilla.bugs)

(In reply to Gabriele Svelto [:gsvelto] from comment #8)

(In reply to J. Ryan Stinnett [:jryans] from comment #7)

Ah okay, no recent crashes submitted. According to about:crashes, 2018-10-20
is the last time I was able to submit a crash to Mozilla,

If you still have the crash report for that one it would be helpful to
figure out what's going on.

The report from 2018-10 is https://crash-stats.mozilla.org/report/index/aaea24a2-bb78-4966-9a6e-9bd040181020, but I don't think it's very helpful here, as it's not same kind of crash this bug is about. The resume from sleep crashes aren't being caught by the Mozilla crash reporter, so I don't have Mozilla crash reports to share for them.

(In reply to Liz Henry (:lizzard) (use needinfo) from comment #10)

jryans, can you try the crash me now extension to force a crash?
https://github.com/rhelmer/webext-experiment-crashme

Yes, using this add-on, I was able to trigger a crash and successfully submit a report to Mozilla:

https://crash-stats.mozilla.org/report/index/97ed020c-cabf-4b8a-85be-7f99f0190118

So, this confirms I can still report crashes in the general case.

The unsolved issue seems to be getting crash reporter to correctly capture resume from sleep crashes so that they load in the Mozilla crash reporter (instead of Apple) and can be submitted.

Flags: needinfo?(jryans)

(In reply to Liz Henry (:lizzard) (use needinfo) from comment #11)

Stephen, have you seen other recent problems with hangs on waking from sleep, on macOS 10.13?
Is there any way we can usefully investigate? Telemetry to check?

The only other recent issue that I'm aware of is bug 1516367. However, that applies to all crashes, not just wake from sleep. Someone who's more familiar with the crash reporter may be able to help here.

Flags: needinfo?(spohl.mozilla.bugs)

Anthony, can anyone from your team investigate crashing on wake from sleep?

Flags: needinfo?(ajones)

This likely isn't a priority for 66, but I think it still could use investigation to make sure there isn't a widespread problem. I'll follow up in email.

It seems that the problem at hand is that resume after suspend is broken.

It is not clear that we should be generating a crash report for a force kill, because if nothing else, it could cause false positives. While we could discuss that, it would likely result in more happiness for us to focus on the resume issue. I have certainly seen resume issues in the past when I used a Mac and had trouble generating a crash report. I don't recall whether I ended up filing a bug about it.

Attaching a debugger would help, as would getting a crash report for the right process (i.e. the one that is hanging or probably deadlocking). Perhaps there is also an issue with deadlock detection not waking up properly either.

Eric - can you discuss this with Nathan and/or Gabriele to figure out a way to get this ticket moving?

Flags: needinfo?(ajones) → needinfo?(erahm)

THere does seem to be a rash a Mac wake related issues, along the lines of Bug 1201401 - crash in CVCGDisplayLink::getDisplayTimes Mac coming out of sleep (waking) with external monitor

jryans, are you still seeing this?

Either way, comment 3 and comment 4 indicate we don't expect to get a crash report when force quitting. I'm inclined to wontfix this, but we could morph it into a bug that deals with the underlying issue. AFAICT from the attached crash report there is some sort of deadlock where we're trying to send a gfxCriticialError and blocking the main thread waiting on a mutex. I'm going to at least move this over to IPC for now so that they can take a look.

Thread 0:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	0x00007fff5ae90a46 __psynch_mutexwait + 10
1   libsystem_pthread.dylib       	0x00007fff5b058b9d _pthread_mutex_lock_wait + 83
2   libsystem_pthread.dylib       	0x00007fff5b0564c8 _pthread_mutex_lock_slow + 253
3   libmozglue.dylib              	0x00000001079661ae mozilla::detail::MutexImpl::lock() + 142
4   XUL                           	0x000000010a2b6c07 mozilla::ipc::MessageChannel::Send(IPC::Message*) + 647
5   XUL                           	0x000000010a38e00b mozilla::dom::PContentChild::SendGraphicsError(nsTString<char> const&) + 443
6   XUL                           	0x000000010ad3d1fb CrashStatsLogForwarder::Log(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 1195
7   XUL                           	0x000000010aa35a79 mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::Flush() + 185
8   XUL                           	0x000000010acf8adc mozilla::layers::CompositorBridgeChild::ActorDestroy(mozilla::ipc::IProtocol::ActorDestroyReason) + 348
9   XUL                           	0x000000010a6f28d6 mozilla::layers::PCompositorBridgeChild::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) + 1366
10  XUL                           	0x000000010a36a61f mozilla::layers::PCompositorManagerChild::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) + 111
11  XUL                           	0x000000010a36a927 mozilla::layers::PCompositorManagerChild::OnChannelError() + 23
12  XUL                           	0x000000010a2c8d97 mozilla::detail::RunnableMethodImpl<mozilla::ipc::MessageChannel*, void (mozilla::ipc::MessageChannel::*)(), false, (mozilla::RunnableKind)1>::Run() + 39
13  XUL                           	0x0000000109bec243 nsThread::ProcessNextEvent(bool, bool*) + 2819
14  XUL                           	0x0000000109beeca8 NS_ProcessNextEvent(nsIThread*, bool) + 56
15  XUL                           	0x000000010a2c0be7 mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) + 279
16  XUL                           	0x000000010cbb3f0e nsBaseAppShell::Run() + 126
17  XUL                           	0x000000010cc34157 nsAppShell::Run() + 151
18  XUL                           	0x000000010e265378 XRE_RunAppShell() + 488
19  XUL                           	0x000000010e265014 XRE_InitChildProcess(int, char**, XREChildData const*) + 4196
20  org.mozilla.plugincontainer   	0x00000001075b9f39 main + 89
21  libdyld.dylib                 	0x00007fff5ad40015 start + 1
Thread 3 Crashed:: Chrome_~dThread
0   XUL                           	0x000000010a2be858 mozilla::ipc::MessageChannel::OnChannelErrorFromLink() + 696
1   XUL                           	0x000000010a2c0475 non-virtual thunk to mozilla::ipc::ProcessLink::OnChannelError() + 53
2   XUL                           	0x000000010a297dd4 event_process_active_single_queue + 1684
3   XUL                           	0x000000010a295e70 event_base_loop + 1824
4   XUL                           	0x000000010a2822bb base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) + 331
5   XUL                           	0x000000010a28933b base::Thread::ThreadMain() + 1019
6   XUL                           	0x000000010a2859ba ThreadFunc(void*) (.llvm.14947838784875774767) + 10
7   libsystem_pthread.dylib       	0x00007fff5b058661 _pthread_body + 340
8   libsystem_pthread.dylib       	0x00007fff5b05850d _pthread_start + 377
9   libsystem_pthread.dylib       	0x00007fff5b057bf9 thread_start + 13
Component: Crash Reporting → IPC
Flags: needinfo?(erahm) → needinfo?(jryans)
Product: Toolkit → Core

I no longer seem to experience this with Nightly in the last month or so, so perhaps the underlying issue has been fixed.

Flags: needinfo?(jryans)

In comment #19, the lock the main thread is waiting for is probably MessageChannel::mMonitor, which the IPC I/O thread (Chrome_ChildThread; we really should rename that) would have to be holding to enter MessageChannel::OnChannelErrorFromLink.

But this doesn't look like a deadlock: see bug 1354200, and specifically the comment that was added with it; this is probably just the child process reacting to the parent process exiting, which used to be a MOZ_CRASH and is now an _exit because it was the opposite of helpful to invoke the OS crash reporter in that case (see also bug 1518470).

The original hang may have been caused by something in the parent process that was destroyed by the Force Quit.

As a crash reporter bug this seems to be WONTFIX (comment #4); as an IPC bug (that we were causing spurious OS-level crash reports as a side-effect of force-quitting the main process) this looks like a duplicate of bug 1354200.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: