Closed Bug 1389279 Opened 2 years ago Closed 2 years ago

Crash in RtlEnterCriticalSection | mozilla::MonitorAutoLock::MonitorAutoLock

Categories

(Toolkit :: Storage, defect, critical)

56 Branch
x86_64
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: philipp, Assigned: asuth)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-845c5136-b9b2-4d3d-a874-534dd0170810.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	ntdll.dll 	RtlEnterCriticalSection 	
1 	xul.dll 	mozilla::MonitorAutoLock::MonitorAutoLock(mozilla::Monitor&) 	obj-firefox/dist/include/mozilla/Monitor.h:78
2 	xul.dll 	<lambda_f3a3d6898d1d18cace97913c9d9a8661>::operator() 	storage/mozStorageService.cpp:949
3 	xul.dll 	mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*) 	storage/mozStorageService.cpp:944
4 	xul.dll 	nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) 	xpcom/ds/nsObserverList.cpp:112
5 	xul.dll 	nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) 	xpcom/ds/nsObserverService.cpp:295
6 	xul.dll 	mozilla::ShutdownXPCOM(nsIServiceManager*) 	xpcom/build/XPCOMInit.cpp:894
7 	xul.dll 	ScopedXPCOMStartup::~ScopedXPCOMStartup() 	toolkit/xre/nsAppRunner.cpp:1464
8 	xul.dll 	mozilla::UniquePtr<ScopedXPCOMStartup, mozilla::DefaultDelete<ScopedXPCOMStartup> >::reset(ScopedXPCOMStartup*) 	obj-firefox/dist/include/mozilla/UniquePtr.h:345
9 	xul.dll 	XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4800
10 	xul.dll 	XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4867
11 	firefox.exe 	NS_internal_main(int, char**, char**) 	browser/app/nsBrowserApp.cpp:309
12 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:115
13 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
14 	kernel32.dll 	BaseThreadInitThunk 	
15 	ntdll.dll 	RtlUserThreadStart

this crash signature is newly showing up in firefox 56 - so far only from windows users on 64bit versions of firefox.
I believe this is happening due to the lack of a kung fu death grip.  More specifically, our xpcom-shutdown-threads logic is:
- Remove all the observers, and therefore the only references the Service holds that keep itself alive.  Each Connection that exists also holds a reference to the service, so we are alive only as long as there exists a live Connection instance.
- Now access our own state, iterating over all of the currently registered Connections.  We keep looping if any were in the process of performing an AsyncClose().

This has been dangerous for a really long time.  However, it's only crash-worthy if no Connection instances exist.  We've been doing better at shutdown in a variety of places, which ironically helps make us more likely to crash.

I'm going to add a death grip because it is indeed preferable to have all of the observers removed before we spin our event loop.
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
nsObserverService::RemoveObserver has a comment about "stupid consumers" and
some existing death-grip logic to avoid infinite recursion.  Clearly it should
have gone a step further and done NS_ProxyRelease(..., aAlwaysProxy=true)!
Attachment #8896103 - Flags: review?(bkelly)
Comment on attachment 8896103 [details] [diff] [review]
storage::Service needs a death grip when removing strong observer references

Review of attachment 8896103 [details] [diff] [review]:
-----------------------------------------------------------------

Seems reasonable given the comment.

I'm not sure RemoveObserver() could safely use NS_ProxyRelease().  Not sure how that would play with things like xpcom shutdown where observers spin the event loop, etc.  Also, it would probably trigger a lot more allocations since I think observers get added/removed a lot.  (Although maybe that would be in the noise.)
Attachment #8896103 - Flags: review?(bkelly) → review+
Thanks for the review!  The NS_ProxyRelease comment was meant as a joke, although does not read as one sufficiently.  I'll amend the commit comment before pushing.
https://hg.mozilla.org/integration/mozilla-inbound/rev/f8282594dc5aeb01df502a388b492106a2c4ae35
Bug 1389279 - storage::Service needs a death grip when removing strong observer references. r=bkelly
https://hg.mozilla.org/mozilla-central/rev/f8282594dc5a
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Hi :asuth,
Since this is a new regression, do you think this is worth uplifting to Beta56?
Flags: needinfo?(bugmail)
(In reply to Gerry Chang [:gchang] from comment #8)
> Since this is a new regression, do you think this is worth uplifting to
> Beta56?

Yes, crash stats is showing at least some shutdown crashes due to this on 56b, and the fix is simple enough and has baked enough that I'll ask for approval now.
Flags: needinfo?(bugmail)
Comment on attachment 8896103 [details] [diff] [review]
storage::Service needs a death grip when removing strong observer references

[Feature/Bug causing the regression]:
No direct bug.  This is a longstanding issue that was not made obvious before because many mozStorage consumers were bad at cleaning up.  They've gotten better.

[User impact if declined]:
Rare-ish shutdown crashes.  We're at 4 right now, but it's quite possible that users with lots of (legacy) extensions exacerbate the problem, so the crash rate could be much higher on release if not taken.

[Is this code covered by automated tests?]:
Not directly; there may have been an intermittent orange or two that was this.  Tests would reliably crash on poisoned memory if they reliably exercised this.  However, this was a dumb/naive bug and it's not clear the testing effort is worth it given that we'll never remove the kung fu death grip.

[Has the fix been verified in Nightly?]:
It has baked on nightly.

[Needs manual test from QE? If yes, steps to reproduce]: 
No.

[List of other uplifts needed for the feature/fix]:
No other uplifts.

[Is the change risky?]:
No.

[Why is the change risky/not risky?]:
We're adding a strong refcount at a point when we 100% know our object is not destroyed.  We hold it through the period where we previously had a use-after-free risk.  Totes safe.

[String changes made/needed]:
No string changes.
Attachment #8896103 - Flags: approval-mozilla-beta?
That is 4 crashes/week.
Comment on attachment 8896103 [details] [diff] [review]
storage::Service needs a death grip when removing strong observer references

Fix for crash that may become worse on release. Let's take this fix for 56 beta 5.
Attachment #8896103 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.