Closed Bug 1401721 Opened 7 years ago Closed 5 years ago

Crash in mozilla::dom::ContentChild::~ContentChild

Categories

(Core :: DOM: Content Processes, defect, P3)

x86
Windows 8
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox57 --- unaffected
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- ?

People

(Reporter: marcia, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, topcrash, Whiteboard: [AV:Webroot SecureAnywhere][AV:K7][inj+])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-47d0651d-bb3d-4dc3-a66f-d77b20170907.
=============================================================

Seen while looking at crash stats: http://bit.ly/2wHyMCJ. Crashes started using 20170905220108. 20 crashes/23 installs according to crash stats.

Possible regression range based on crash stats: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=632e42dca494ec3d90b70325d9c359f80cb3f38a&tochange=f64e2b4dcf5eec0b4ad456c149680a67b7c26dc4

MOZ_CRASH(Content Child shouldn't be destroyed.)
Hi Bill,
Do you have any idea about how to deal with this bug?
Flags: needinfo?(wmccloskey)
I looked through the regression range. The only think suspicious I see is bug 1229829. I'll needinfo Alex in case he has ideas.

We could try to add some additional assertions to figure out what's causing this. It seems like we're exiting our content process event loop in a way we don't expect. Normally we go through here:
http://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/dom/ipc/ContentChild.cpp#2333

We could assert that no one calls MessageLoop::Quit() on the main thread in a content process in opt builds. That might turn something up.
Flags: needinfo?(wmccloskey) → needinfo?(agaynor)
I don't understand how alternate desktops would trigger this. 

There's one other regression caused by them, bug 1400637. That's been determined to be a bad interaction with some anti-virus. Sampling a handful of the crash reports (I don't know how to do it automatically), I see that they almost all have either k7pswsen.dll or WRusr.dll loaded, which are the two dlls assosciated with the AV in the other crash, so it seems like a decent bet that this is another formulation of that problem.

Alternate desktops was disabled on beta already: https://bugzilla.mozilla.org/show_bug.cgi?id=1402340#c9, and in that bug we're exploring either blocking the injection of DLLs into our process, or getting the AV vendor to fix their software.
Flags: needinfo?(agaynor)
Depends on: 1400637
Priority: -- → P2
No longer depends on: 1400637
Blocks: 1229829
Caused by bad interaction between anti-virus and a new content process isolation feature. We're working with vendors in related bug 1400637.
Blocks: injecteject
Priority: P2 → P3
Whiteboard: [AV:Webroot SecureAnywhere]
Whiteboard: [AV:Webroot SecureAnywhere] → [AV:Webroot SecureAnywhere][AV:K7]
We've only had 3 reports in the past week, which is good. Let's keep an eye on this as 58 rolls into Beta.
Whiteboard: [AV:Webroot SecureAnywhere][AV:K7] → [AV:Webroot SecureAnywhere][AV:K7][inj+]
This is showing up at the top of early Nightly crashes for 62, but for only a few installs. Looks like Nightly 61 was affected too.
This crash signature recently spiked from 1 to 200 crashes per day (May 6 - 10) and has become the #1 Top Crasher on Nightly 62.0. It affects Windows, Mac and Linux.

Top Crashers for Firefox 62.0a1
Top 50 Crashing Signatures. 7 days ago

1 	6.42% 	-2.63% 	mozilla::dom::ContentChild::~ContentChild	201 	39 	20 	142 	37 	0 	2012-11-08
Keywords: topcrash
Attached file embed-twitch.html
I'm able to reproduce this consistently in 61/mac by loading this embed-twitch.html page via File > Open File... (no crash when it's served over the network).
I'm not sure I understand how this ever *doesn't* crash.  When the nsAutoPtr at [1] goes out of scope it will destroy the ContentProcess, which destroys its ContentChild member.  Do we normally just exit instead of returning from the run loop?

[1] https://searchfox.org/mozilla-central/rev/93d2b9860b3d341258c7c5dcd4e278dea544432b/toolkit/xre/nsEmbedFunctions.cpp#652
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #10)
> I'm not sure I understand how this ever *doesn't* crash.  When the nsAutoPtr
> at [1] goes out of scope it will destroy the ContentProcess, which destroys
> its ContentChild member.  Do we normally just exit instead of returning from
> the run loop?

ContentChild::ActorDestroy() calls QuickExit(), which does an exit in non-debug builds.
Comment #2 also answers the question asked in comment #10, now that I've read it more closely.  The suggestion in the last sentence of comment #2 might be useful combined with the repro in comment #9.

(But I'm still wondering why the crash spikes line up with the past two release dates.)
Not sure if this is helpful.

Had crash 1e970bec-fe93-4040-8211-166fa0180706 and it linked to this bug.

I tend to have this crash/similar crash every time I update firefox in the background. e.g I just upgraded from 60.0.2 to 61 using the built in package manager in neon. It seems firefox doesn't do graceful upgrades like chrome.

In this instance it didn't completely crash the browser but would load new webpages (I'm guessing to do with new/existing child processes) and would show the crashed tab page. In the past It would send 1 crash report per upgrade. This time it was more like 6 before I realised I had updated firefox and needed to restart it.
There is a known problem with updates (especially with Linux distribution packages, or with Firefox's own updater if multiple profiles are running) where the old browser tries to create a child process by launching the new executable.  The spikes around release times suggested this had something to do with that somehow, but “somehow” is the key word there.

Bug 1366808 added code to handle that case more gracefully and is shipping in 62, which I think means we won't get decent UI for this until the 62->63 upgrade.  The 61->62 upgrade will detect the lack of `-parentBuildID` and exit relatively early… but apparently after the message loop is created, so it might still run into this; I'd need to stare at the code some more.  For 62->63 and later, the child should call QuickExit, which will hopefully avoid this bug: https://searchfox.org/mozilla-central/rev/c579ce13ca7864c5df9711eda730ceb00501aed3/dom/ipc/ContentChild.cpp#688

Before 62, we should be hitting the MOZ_RELEASE_ASSERT added in bug 1345978, which would crash the parent process and might explain content processes exiting in an unexpected way (maybe; I haven't tried tracing through the IPC code to see exactly what happens when the other end hangs up the connection).  But that doesn't explain reports of tab crashes and no browser crash, like comment #13.

There's also bug 1463960, which was filed about the Linux distribution case but also about media plugin processes, which weren't covered by the change in bug 1366808.
See Also: → 1366808, 1345978
Adding 63 as affected, as this has risen to the top browser crash in nightly: https://crash-stats.mozilla.com/topcrashers/?product=Firefox&version=63.0a1
So I am not sure we have to track this any longer, as the crashes stopped in 63 nightly in the 20180715014912 build. But it is interesting in the 20180714102053 build that we had 182 crashes/47 installs.
(In reply to Marcia Knous [:marcia - needinfo? me] from comment #16)
> So I am not sure we have to track this any longer, as the crashes stopped in
> 63 nightly in the 20180715014912 build. But it is interesting in the
> 20180714102053 build that we had 182 crashes/47 installs.

Good news then, untracking :)
Even, I had the same crash.
29a33eaf-418a-4ab3-96fa-b7e8a0180907
Attached video Crash_PoC.avi
PoC for crashing!
Hello I get this issue too,
Report ID is bp-4abf29af-b161-4d5a-8bc2-6c4a90180911

What I do:
I use Ubuntu 16.4 and try to open a local file stored on the path:
/home/developer/ConanCache/fep_sdk/2.0.1/aev25/testing/package/b5525a6e08c3997401fd88eb2c65832150657452/doc/fep-sdk.html

the content is really simple:

<html>
    <head>
        <meta http-equiv="refresh" content="0; URL=html/index.html"
    </head>
</html>

It is a Doxygen documentation, in my case. 
Hope it helps.
There are no crashes on 62 release or 63 beta. From comment 14 it sounds like we may still see this crash when people update, so I'll leave it marked affected for 63 for now. The later comments are all about 61 or older versions.
No crashes on 63, we have a few crashes on 62.0.3 only, if we don't have new crashes in a couple of weeks, we probably should close is as WFM.

No crashes in recent builds, closing out as WFM.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: