Closed Bug 1103833 Opened 5 years ago Closed 4 years ago

Shutdown crash in mozilla::`anonymous namespace''::RunWatchdog(void*)

Categories

(Core :: General, defect, critical)

36 Branch
All
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED INVALID
Tracking Status
firefox35 --- unaffected
firefox36 - affected
firefox37 - affected
firefox38 --- ?

People

(Reporter: dmajor, Unassigned)

References

Details

(4 keywords)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-845152ee-5c65-45bd-8710-2b1ff2141117.
=============================================================

MOZ_CRASH("Shutdown too long, probably frozen, causing a crash.");

This is a topcrash on 36 nightly. First seen in 20141116030212. Almost entirely Win64. Yoric can you take a look?
Flags: needinfo?(dteller)
Interesting, it appears on mac as well. Same start date of 20141116030212.
Crash Signature: [@ mozilla::`anonymous namespace''::RunWatchdog(void*)] → [@ mozilla::`anonymous namespace''::RunWatchdog(void*)] [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
All I can tell you is that xpcom-will-shutdown takes more than 1 minute, so the (brand new) shutdown watchdog takes over and turns this into a crash.
Flags: needinfo?(dteller)
The main thread is the interesting thread. In this case mozilla::MediaShutdownManager::Shutdown() is never getting past http://hg.mozilla.org/mozilla-central/annotate/a52bf59965a0/dom/media/MediaShutdownManager.cpp#l134

But in other cases that I sampled we're at

mozilla::layers::CompositorParent::ShutDown()
https://crash-stats.mozilla.com/report/index/569807e3-431a-4f7b-a800-0bbc62141124

mozilla::net::nsHttpConnectionMgr::Shutdown()
https://crash-stats.mozilla.com/report/index/be05f4c3-7194-40c1-aaf4-ae0b42141124

Or doing busywork within mozilla::net::CacheStorageService::Shutdown()
https://crash-stats.mozilla.com/report/index/b776b61d-a73a-40b1-81dd-581212141124

It sounds like we should change the signature algorithm for these crashes to report something like "shutdownhang | <mainthreadsignature>". dmajor can you file that?

It's a bit unfortunate that there isn't a crashreporter annotation along with the MOZ_CRASH in RunWatchdog... were we worried about that call itself deadlocking? Otherwise annotating explicitly that we're crashing because of a shutdown hang would be better than inferring it either from the RunWatchdog top frame or from the ShutdownProgress field, which would lump in unrelated crashes that happen during shutdown. Yoric, can we add an annotation?
Flags: needinfo?(dteller)
Flags: needinfo?(dmajor)
Indeed, I didn't annotate as it might deadlock, since `AnnotateCrashReport` needs a mutex, and also doesn't work off the main thread in content processes. I suppose I can hack around it, if the complexity is really worth the cost (e.g. dispatch a runnable to the main thread, have it call AnnotateCrashReport, while the terminator waits 2 more seconds to be reasonably sure that the runnable has finished its work).

I figured that `MOZ_CRASH("Shutdown too long, probably frozen, causing a crash.");` would be sufficient, but I hadn't realized that the string is apparently not passed to the crash reporter. Perhaps changing that would be a better strategy?
Flags: needinfo?(dteller)
I don't think that any annotations will work in content processes once we get to XPCOM shutdown, since the IPDL connection should have already dropped.

In chrome processes, I think we can just call AnnotateCrashReport directly and not particularly worry about deadlock: it's a small critical section and very unlikely to be causing shutdown hangs.

Making MOZ_CRASH do something sensible with the message would be nice (should be a separate bug): perhaps have a global Atomic<const char*> that we can shove the literal string into?
I believe I can do that, but I'm a bit overbusy at the moment. Could this wait after Portland?
Yes, I don't think it's super urgent.
[Tracking Requested - why for this release]: This has become a top crasher at #4 on Fx36 nightly.
This started crashing because bug 1044020 landed. This likely is just exposing pre-existing shutdown hangs, and is not necessarily evidence of a real regression. So we can track, but I probably wouldn't support backing out bug 1044020 unless we have evidence that it's actually causing shutdown hangs.
> report something like "shutdownhang | <mainthreadsignature>". dmajor can you file that?
Opened bug 1104317
Flags: needinfo?(dmajor)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #7)
> Yes, I don't think it's super urgent.

Filed as bug 1104682.
bp-9622e485-a1e8-4594-8bea-47c212141226 me on laptop with Thunderbird. Was an ordinatry shutdown, until the crash.
See Also: → 1104317
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #3)
> The main thread is the interesting thread. In this case
> mozilla::MediaShutdownManager::Shutdown() is never getting past
> http://hg.mozilla.org/mozilla-central/annotate/a52bf59965a0/dom/media/
> MediaShutdownManager.cpp#l134
> 
> But in other cases that I sampled we're at
> 
> mozilla::layers::CompositorParent::ShutDown()
> https://crash-stats.mozilla.com/report/index/569807e3-431a-4f7b-a800-
> 0bbc62141124

If it helps, I can reliably reproduce this on a local VM with a locally built firefox with the main thread stuck under CompositorParent::Shutdown(), but only with e10s enabled.  The exact same build doesn't show the problem on the VM host (both the host and the VM itself are Win7 x64).  Happens with debug and release builds.  Let me know if there's something I can do to help.
Mark, could you share a few crash IDs?
For reasons I don't understand, my crash reports don't seem to have stacks.  I just tried after doing "./mach buildsymbols" and still no joy.  FWIW, I'm building (including the buildsymbols step) on the VM host and running the built binaries on the VM client directly from the build directory.

The most recent reports are:

bp-88777e60-3eef-48b6-a2a0-fa7692150105
bp-08192a0f-d581-40a9-9f13-7d6a02150104
bp-9e591c9f-1180-4dc0-80b6-500c02150104

Debug builds aren't offering to submit error reports - they aren't showing the crash dialog, but instead the normal Windows dialog offering me the ability to debug.
Oh, I missed that it was a local build. The crash-stats server won't know about the symbols for those, even if you buildsymbols.
We never hit this crash in Mozmill tests so far while Firefox was in Nightly and Aurora. So what's different to Beta? Could this somehow be related to the deactivated protected mode of Flash?

Mihaela, how many times did we see this crash? Is it reproducible on one of the machines? It could help us a lot to get more details about it.
Flags: needinfo?(mihaela.velimiroviciu)
We had ~40 crashes on beta today, across all our Win XP machines.
Flags: needinfo?(mihaela.velimiroviciu)
PS guys this was also causing my profile to lock up I am sure as it happened on this when my Nightly hanged then closed and did not give me the crash reporter
David, does Satdav crash report help? Thanks
Flags: needinfo?(dmajor)
I'm noticing that there are now two different crashes happening at the same time.

This one:
https://crash-stats.mozilla.com/report/index/110047fc-a3b0-4b97-9560-2f4d02150118
which points me back to here.

And this one:
https://crash-stats.mozilla.com/report/index/e60fea33-769a-4368-ade7-f0e672150118
Which doesn't point to any bug.  Although it appears to be something to do with the plug-in container, and I truly think that these bugs are joined at the hip.
Hi its happening on todays nightly also so aware and included todays 2 crashes the top one was with a clean profile on windows 7
Flags: needinfo?(sledru)
This crash signature is the aggregate of many different issues. We need a fix for bug 1104317 before we can detangle them.

In the meantime I'd like to request that people not post any more crash IDs in this meta-bug. Everyone's got a different root cause, and we're not going to get anywhere with a jumbled mess of conversations.
Depends on: 1104317
Flags: needinfo?(dmajor)
Flags: needinfo?(sledru)
Depends on: 1123698
Keywords: meta
See Also: 1104317
Same here in Nightly (Error is repeated for 5-6 weeks in Nightly, now after update 35 to 36 also in BETA)
After notebook was in 'Standby' or 'Hibernation' mod Firefox Nightly can't load any new website. Also sync don't work. Both without any errormessage.
This problem surfaced recently on frequently.
After finishing Firefox the Nightly task stays in the task manager and can not stop there! After about 20-40 seconds then reported crash reporter.
When finaly I hit the 'Quit Firefox' button on this dialog Nightly stays in the task manager.
https://crash-stats.mozilla.com/report/index/784b59a0-3e96-45b4-a05e-5ff822150107

And today:
I have deactivated WLAN adapter when Nightly was open. After reactivating Nightly can't load any new website. https://crash-stats.mozilla.com/report/index/368df209-6394-45b0-9b0a-e60e52150122
Think it's the same. Task in TM can't stop there.
(In reply to David Major [:dmajor] (UTC+13) from comment #26)
> This crash signature is the aggregate of many different issues. We need a
> fix for bug 1104317 before we can detangle them.

Now that this has happened, I filed bug 1124880 on the predominant signature I found in a search for 36.0b2 crashes, but there are more signatures with somewhat lower volume.
I am untracking this one because we are tracking the key crash in bug 1124880.
Crash Signature: [@ mozilla::`anonymous namespace''::RunWatchdog(void*)] [@ mozilla::(anonymous namespace)::RunWatchdog(void*)] → [@ mozilla::`anonymous namespace''::RunWatchdog(void*)] [@ mozilla::(anonymous namespace)::RunWatchdog(void*)] [@ mozilla::`anonymous namespace''::RunWatchdog] [@ mozilla::(anonymous namespace)::RunWatchdog]
(In reply to Sylvestre Ledru [:sylvestre] from comment #30)
> I am untracking this one because we are tracking the key crash in bug 1124880.

As a result, signatures for this bug are now gone - so not useful to keep open
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.