Closed
Bug 1413501
Opened 8 years ago
Closed 8 years ago
Crash in nsCOMPtr<T>::nsCOMPtr<T> | mozilla::storage::Service::unregisterConnection
Categories
(Core :: SQLite and Embedded Database Bindings, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla59
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox56 | --- | unaffected |
firefox57 | --- | unaffected |
firefox58 | --- | wontfix |
firefox59 | --- | fixed |
People
(Reporter: calixte, Assigned: asuth)
References
(Blocks 1 open bug)
Details
(Keywords: crash, regression, Whiteboard: [clouseau])
Crash Data
Attachments
(2 files)
4.00 KB,
patch
|
mak
:
review+
|
Details | Diff | Splinter Review |
2.64 KB,
patch
|
mak
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-3775d422-f68f-480e-9a55-ec3830171101.
=============================================================
There are 46 crashes in nightly 58 starting with buildid 20171020100426. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1388998.
[1] https://hg.mozilla.org/mozilla-central/rev/2f78917dbcfd
Flags: needinfo?(bugmail)
Assignee | ||
Comment 1•8 years ago
|
||
Okay, so the problem is that SpinningSynchronousClose creates a new AsyncCloseConnection() which has a strong RefPtr<Connection> which puts our refcount up above 1 again which means that when AsyncCloseConnection::~AsyncCloseConnection does NS_ReleaseOnMainThreadSystemGroup(mConnection.forget()) we can end up in Service::unregisterConnection() for the same connection a second time, which causes us to end up with the case where !forgettingRef.
Since Connection claims to implement a threadsafe ISUPPORTS but it's just the mRefCnt that's safe and not our Release method, and adding additional mutexes into the custom Release method is asking for more headaches, I think the simplest thing is just to change Service::unregisterConnection to be okay with the case where forgettingRef is null. We just want to ensure we don't leak Connections and we don't leak their underlying resources.
As long as we invoke unregisterConnection() at least once, we will drop our book-keeping reference. If we invoke it twice, that's fine as long as we don't crash. The "resurrecting" AddRef is unfortunate because it prevents us from having a nice invariant about the refcount never going back above one, but as long as we can't end up with mozStorage-using code ending up with a Connection we don't have in our Connection list, it's fine. I think that holds true.
It's possible there's some other higher level cleanup we can do here, but we're sort of converging to an ugly evolved system that no longer crashes.
The triggering case here seems to be invoking executeAsync and getting GC'ed on the main thread so that when the AsyncStatementExecution completes, it's the last (non-connections-list) strong refcount. This should be easy enough to trigger from JS, so I'll provide a test with the fix.
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Flags: needinfo?(bugmail)
Updated•8 years ago
|
Priority: -- → P1
Updated•8 years ago
|
Crash Signature: [@ nsCOMPtr<T>::nsCOMPtr<T> | mozilla::storage::Service::unregisterConnection] → [@ nsCOMPtr<T>::nsCOMPtr<T> | mozilla::storage::Service::unregisterConnection]
[@ RefPtr<T>::RefPtr<T> | mozilla::storage::Service::unregisterConnection]
Assignee | ||
Comment 2•8 years ago
|
||
Assignee | ||
Comment 3•8 years ago
|
||
Assignee | ||
Comment 4•8 years ago
|
||
My prior analysis was a little too quick. Since the Close() is happening in the destructor, it's not exactly a great idea for us to let a new strong reference be added at that point since that implies redundant destruction or UAF.
So I've moved our failsafe close from the destructor into our too-clever Release() method. I've added comments there, but perhaps restating more simply:
- We invoke Close() when our refcount hits 1. This may cause a transient AddRef/Release pair due to AsyncCloseConnection, where the Release() may or may not occur by the time Close() returns (based on varying notions of event targets).
- So I have us directly manipulate mRefCnt to go to 2 before we call Close() so we can avoid a 2->1 transition.
- Because of that potential delayed release, we have 2 paths to decide when we do the strong Release() via unregisterConnection and dropping our synthetic mRefCnt manipulation so that we avoid a "strong" 2->1 transition.
This is more than a little ugly. I'm open to alternative fixes or suggestions for fixes. I'm at TPAC right now and returning tomorrow, which is why this fix got delayed in the first place, but should be available to try alternatives on Friday/the weekend.
Uploading test next.
Attachment #8926530 -
Flags: review?(mak77)
Assignee | ||
Comment 5•8 years ago
|
||
I verified that this test, if applied without the fix, results in a crash of the xpcshell test and therefore a failure.
Attachment #8926531 -
Flags: review?(mak77)
Comment 6•8 years ago
|
||
Comment on attachment 8926530 [details] [diff] [review]
Part 1: Fix
Review of attachment 8926530 [details] [diff] [review]:
-----------------------------------------------------------------
ugh! Hopefull yone day we can do better than manual bookeeping :(
::: storage/mozStorageConnection.cpp
@@ +587,5 @@
> + // Okay, now our refcount is 2, we trigger Close().
> + Unused << Close();
> + // Now our refcount should either be at 2 (because nothing happened, or the
> + // addref and release pair happened due to SpinningSynchronousClose) or
> + // 3 (because SpinningSynchronousClose hapened but didn't release yet).
typo "hapened"
Attachment #8926530 -
Flags: review?(mak77) → review+
Comment 7•8 years ago
|
||
Comment on attachment 8926531 [details] [diff] [review]
Part 2: Test
Review of attachment 8926531 [details] [diff] [review]:
-----------------------------------------------------------------
::: storage/test/unit/test_connection_failsafe_close.js
@@ +29,5 @@
> +
> + // now we need to wait for that callback to have completed.
> + await callbackInvoked;
> +
> + ok(true, "if we shutdown cleanly and do not crash, then we succeeded");
nit: please use the complete form Assert.ok, per common convention.
Attachment #8926531 -
Flags: review?(mak77) → review+
Pushed by bugmail@asutherland.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/47f845a860d8
Fix for SpinningSynchronousClose unregisterConnection edge-case. r=mak
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b34fc8bfe40
Test for SpinningSynchronousClose unregisterConnection edge-case. r=mak
Assignee | ||
Comment 9•8 years ago
|
||
I made both the requested changes. I also added a few more comment lines about how the fix will not explode even if the memory reporters or shrinkMemory happen during the spun nested event loop.
![]() |
||
Comment 10•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/47f845a860d8
https://hg.mozilla.org/mozilla-central/rev/4b34fc8bfe40
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox59:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Comment 11•8 years ago
|
||
Please request Beta approval on this when you get a chance.
Flags: needinfo?(bugmail)
Flags: in-testsuite+
Assignee | ||
Comment 12•8 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #11)
> Please request Beta approval on this when you get a chance.
Status update: this bug has moved some crashes around, or at least given us intermittent assertion failures, giving us bug 1417326 and bug 1422327 that I will finish investigating and address in the short term. Once those are fixed and baked a little, we can consider uplift. Those all currently appear a bit more edge-casey because they are a function of cross-thread races.
Comment 13•8 years ago
|
||
Too late for Beta58. Mark 58 won't fix.
Updated•7 years ago
|
Flags: needinfo?(bugmail)
Updated•1 year ago
|
Product: Toolkit → Core
You need to log in
before you can comment on or make changes to this bug.
Description
•