Crash in RtlEnterCriticalSection | mozilla::MonitorAutoLock::MonitorAutoLock

RESOLVED FIXED in Firefox 56

Status

()

Toolkit
Storage
--
critical
RESOLVED FIXED
9 months ago
9 months ago

People

(Reporter: philipp, Assigned: asuth)

Tracking

({crash, regression})

56 Branch
mozilla57
x86_64
Windows
crash, regression
Points:
---

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox55 unaffected, firefox56 fixed, firefox57 fixed)

Details

(crash signature)

Attachments

(1 attachment)

(Reporter)

Description

9 months ago
This bug was filed from the Socorro interface and is 
report bp-845c5136-b9b2-4d3d-a874-534dd0170810.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	ntdll.dll 	RtlEnterCriticalSection 	
1 	xul.dll 	mozilla::MonitorAutoLock::MonitorAutoLock(mozilla::Monitor&) 	obj-firefox/dist/include/mozilla/Monitor.h:78
2 	xul.dll 	<lambda_f3a3d6898d1d18cace97913c9d9a8661>::operator() 	storage/mozStorageService.cpp:949
3 	xul.dll 	mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*) 	storage/mozStorageService.cpp:944
4 	xul.dll 	nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) 	xpcom/ds/nsObserverList.cpp:112
5 	xul.dll 	nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) 	xpcom/ds/nsObserverService.cpp:295
6 	xul.dll 	mozilla::ShutdownXPCOM(nsIServiceManager*) 	xpcom/build/XPCOMInit.cpp:894
7 	xul.dll 	ScopedXPCOMStartup::~ScopedXPCOMStartup() 	toolkit/xre/nsAppRunner.cpp:1464
8 	xul.dll 	mozilla::UniquePtr<ScopedXPCOMStartup, mozilla::DefaultDelete<ScopedXPCOMStartup> >::reset(ScopedXPCOMStartup*) 	obj-firefox/dist/include/mozilla/UniquePtr.h:345
9 	xul.dll 	XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4800
10 	xul.dll 	XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4867
11 	firefox.exe 	NS_internal_main(int, char**, char**) 	browser/app/nsBrowserApp.cpp:309
12 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:115
13 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
14 	kernel32.dll 	BaseThreadInitThunk 	
15 	ntdll.dll 	RtlUserThreadStart

this crash signature is newly showing up in firefox 56 - so far only from windows users on 64bit versions of firefox.
(Assignee)

Comment 1

9 months ago
I believe this is happening due to the lack of a kung fu death grip.  More specifically, our xpcom-shutdown-threads logic is:
- Remove all the observers, and therefore the only references the Service holds that keep itself alive.  Each Connection that exists also holds a reference to the service, so we are alive only as long as there exists a live Connection instance.
- Now access our own state, iterating over all of the currently registered Connections.  We keep looping if any were in the process of performing an AsyncClose().

This has been dangerous for a really long time.  However, it's only crash-worthy if no Connection instances exist.  We've been doing better at shutdown in a variety of places, which ironically helps make us more likely to crash.

I'm going to add a death grip because it is indeed preferable to have all of the observers removed before we spin our event loop.
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
(Assignee)

Comment 2

9 months ago
Created attachment 8896103 [details] [diff] [review]
storage::Service needs a death grip when removing strong observer references

nsObserverService::RemoveObserver has a comment about "stupid consumers" and
some existing death-grip logic to avoid infinite recursion.  Clearly it should
have gone a step further and done NS_ProxyRelease(..., aAlwaysProxy=true)!
Attachment #8896103 - Flags: review?(bkelly)

Comment 4

9 months ago
Comment on attachment 8896103 [details] [diff] [review]
storage::Service needs a death grip when removing strong observer references

Review of attachment 8896103 [details] [diff] [review]:
-----------------------------------------------------------------

Seems reasonable given the comment.

I'm not sure RemoveObserver() could safely use NS_ProxyRelease().  Not sure how that would play with things like xpcom shutdown where observers spin the event loop, etc.  Also, it would probably trigger a lot more allocations since I think observers get added/removed a lot.  (Although maybe that would be in the noise.)
Attachment #8896103 - Flags: review?(bkelly) → review+
(Assignee)

Comment 5

9 months ago
Thanks for the review!  The NS_ProxyRelease comment was meant as a joke, although does not read as one sufficiently.  I'll amend the commit comment before pushing.
(Assignee)

Comment 6

9 months ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/f8282594dc5aeb01df502a388b492106a2c4ae35
Bug 1389279 - storage::Service needs a death grip when removing strong observer references. r=bkelly
https://hg.mozilla.org/mozilla-central/rev/f8282594dc5a
Status: ASSIGNED → RESOLVED
Last Resolved: 9 months ago
status-firefox57: ? → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57

Comment 8

9 months ago
Hi :asuth,
Since this is a new regression, do you think this is worth uplifting to Beta56?
Flags: needinfo?(bugmail)
(Assignee)

Comment 9

9 months ago
(In reply to Gerry Chang [:gchang] from comment #8)
> Since this is a new regression, do you think this is worth uplifting to
> Beta56?

Yes, crash stats is showing at least some shutdown crashes due to this on 56b, and the fix is simple enough and has baked enough that I'll ask for approval now.
Flags: needinfo?(bugmail)
(Assignee)

Comment 10

9 months ago
Comment on attachment 8896103 [details] [diff] [review]
storage::Service needs a death grip when removing strong observer references

[Feature/Bug causing the regression]:
No direct bug.  This is a longstanding issue that was not made obvious before because many mozStorage consumers were bad at cleaning up.  They've gotten better.

[User impact if declined]:
Rare-ish shutdown crashes.  We're at 4 right now, but it's quite possible that users with lots of (legacy) extensions exacerbate the problem, so the crash rate could be much higher on release if not taken.

[Is this code covered by automated tests?]:
Not directly; there may have been an intermittent orange or two that was this.  Tests would reliably crash on poisoned memory if they reliably exercised this.  However, this was a dumb/naive bug and it's not clear the testing effort is worth it given that we'll never remove the kung fu death grip.

[Has the fix been verified in Nightly?]:
It has baked on nightly.

[Needs manual test from QE? If yes, steps to reproduce]: 
No.

[List of other uplifts needed for the feature/fix]:
No other uplifts.

[Is the change risky?]:
No.

[Why is the change risky/not risky?]:
We're adding a strong refcount at a point when we 100% know our object is not destroyed.  We hold it through the period where we previously had a use-after-free risk.  Totes safe.

[String changes made/needed]:
No string changes.
Attachment #8896103 - Flags: approval-mozilla-beta?
(Assignee)

Comment 11

9 months ago
That is 4 crashes/week.
Comment on attachment 8896103 [details] [diff] [review]
storage::Service needs a death grip when removing strong observer references

Fix for crash that may become worse on release. Let's take this fix for 56 beta 5.
Attachment #8896103 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 13

9 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/e8eb71b4df6f
status-firefox56: affected → fixed
You need to log in before you can comment on or make changes to this bug.