Closed Bug 932162 Opened 6 years ago Closed 6 years ago

Crash in mozilla::dom::indexedDB::IndexedDatabaseManager::AddRef invoked by mozilla::dom::indexedDB::FileInfo::Cleanup in cycle collecting DOM Worker death

Categories

(Core :: Storage: IndexedDB, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla28
Tracking Status
firefox26 --- wontfix
firefox27 + fixed
firefox28 --- fixed
firefox29 --- unaffected
firefox-esr24 --- wontfix
b2g18 --- fixed
b2g-v1.1hd --- fixed
b2g-v1.2 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- unaffected

People

(Reporter: asuth, Assigned: asuth)

References

Details

(Keywords: sec-moderate, Whiteboard: [qa-][adv-main27+])

Attachments

(2 files)

In Gaia e-mail we are doing exciting things with message sending where we build composite Blobs on a DOMWorker made out of some Strings and/or TypedArrays plus some Blobs we stashed in IndexedDB (and then postMessaged across to the worker).

I am able to reliably reproduce the following crash in our gaia e-mail backend tests which are being run under a linux x64 b2g-desktop build.  Because I am already trying to work around bug 932143 I know with high probability that the worker is dying because of a self.close() call and that the main page that owns it is probably still alive.  (Because if this wasn't what is happening, we would be crashing due to that bug.)

There's a pretty good chance the main thread JS code has already discarded its references to the IndexedDB Blobs, called close on the IndexedDB instance and forgotten about the IndexedDB instance too.  (The unit tests attempt a clean shutdown of our use of the database.)  I have no idea if a GC would have already happened on the main thread.  (In fact, the main thread holds very few references to the Blobs or the IndexedDB instance, all of the worker happens on the worker and we're just remoting calls to the main thread because we have to.)

Please feel free to ask me to run with any logging/DEBUG builds that would help, or if you'd like me to trace any code for suspiciousness.  This is probably going to block a koi+ blocker unless you think this can only happen on DOMWorker shutdown, in which case it's just an annoyance.  I'm not going to investigate anything right now since it sounded from khuey like this might be fairly clear cut.

#0  0x00002aaaacff32d6 in mozilla::dom::indexedDB::IndexedDatabaseManager::AddRef (this=this@entry=0x2aaad1dab5e0) at /home/visbrero/rev_control/fgit/mozilla-central/dom/indexedDB/IndexedDatabaseManager.cpp:726
#1  0x00002aaaacfd6e8a in nsRefPtr (aRawPtr=0x2aaad1dab5e0, this=<synthetic pointer>) at ../../dist/include/nsAutoPtr.h:911
#2  mozilla::dom::indexedDB::FileInfo::Cleanup (this=this@entry=0x2aaadb147df0) at /home/visbrero/rev_control/fgit/mozilla-central/dom/indexedDB/FileInfo.cpp:107
#3  0x00002aaaacfd6f76 in mozilla::dom::indexedDB::FileInfo::UpdateReferences (this=0x2aaadb147df0, aRefCount=..., aDelta=-1, aClear=<optimized out>) at /home/visbrero/rev_control/fgit/mozilla-central/dom/indexedDB/FileInfo.cpp:92
#4  0x00002aaaacd545b8 in ~nsRefPtr (this=0x2aaad7e60d90, __in_chrg=<optimized out>) at ../../../dist/include/nsAutoPtr.h:887
#5  Destruct (e=0x2aaad7e60d90) at ../../../dist/include/nsTArray.h:534
#6  DestructRange (count=4, start=0, this=0x2aaadaf3fee8) at ../../../dist/include/nsTArray.h:1573
#7  RemoveElementsAt (count=4, start=0, this=0x2aaadaf3fee8) at ../../../dist/include/nsTArray.h:1290
#8  Clear (this=0x2aaadaf3fee8) at ../../../dist/include/nsTArray.h:1301
#9  ~nsTArray_Impl (this=0x2aaadaf3fee8, __in_chrg=<optimized out>) at ../../../dist/include/nsTArray.h:752
#10 ~nsTArray (this=0x2aaadaf3fee8, __in_chrg=<optimized out>) at ../../../dist/include/nsTArray.h:1646
#11 nsDOMFileBase::~nsDOMFileBase (this=0x2aaadaf3fe80, __in_chrg=<optimized out>) at ../../../dist/include/nsDOMFile.h:121
#12 0x00002aaaacd546a3 in nsDOMMultipartFile::~nsDOMMultipartFile (this=0x2aaadaf3fe80, __in_chrg=<optimized out>) at /home/visbrero/rev_control/fgit/mozilla-central/content/base/src/nsDOMBlobBuilder.h:23
#13 0x00002aaaacd55fd2 in nsDOMFile::Release (this=0x2aaadaf3fe80) at /home/visbrero/rev_control/fgit/mozilla-central/content/base/src/nsDOMFile.cpp:461
#14 0x00002aaaadd93c9f in finalize (fop=0x2aaad8901900, this=0x2aaaddaa42b0) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsobjinlines.h:89
#15 finalize<JSObject> (thingSize=48, thingKind=js::gc::FINALIZE_OBJECT2, fop=0x2aaad8901900, this=0x2aaaddaa4000) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:337
#16 FinalizeTypedArenas<JSObject> (dest=..., budget=..., thingKind=js::gc::FINALIZE_OBJECT2, src=0x2aaad89018a8, fop=0x2aaad8901900) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:401
#17 FinalizeArenas (fop=fop@entry=0x2aaad8901900, src=src@entry=0x2aaad89018a8, dest=..., thingKind=thingKind@entry=js::gc::FINALIZE_OBJECT2, budget=...) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:438
#18 0x00002aaaadd949f8 in finalizeNow (thingKind=js::gc::FINALIZE_OBJECT2, fop=0x2aaad8901900, this=0x2aaad80e7838) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:1340
#19 js::gc::ArenaLists::queueObjectsForSweep (this=this@entry=0x2aaad80e7838, fop=fop@entry=0x2aaad8901900) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:1436
#20 0x00002aaaadd94fd2 in BeginSweepingZoneGroup (rt=rt@entry=0x2aaad7cec000) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:3767
#21 0x00002aaaadd97ab1 in BeginSweepPhase (lastGC=true, rt=0x2aaad7cec000) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:3851
#22 IncrementalCollectSlice (rt=rt@entry=0x2aaad7cec000, budget=budget@entry=0, reason=reason@entry=JS::gcreason::DESTROY_RUNTIME, gckind=gckind@entry=js::GC_NORMAL) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:4410
#23 0x00002aaaadd98c68 in GCCycle (rt=rt@entry=0x2aaad7cec000, incremental=incremental@entry=false, budget=budget@entry=0, gckind=gckind@entry=js::GC_NORMAL, reason=reason@entry=JS::gcreason::DESTROY_RUNTIME) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:4555
#24 0x00002aaaadd990f5 in Collect (rt=rt@entry=0x2aaad7cec000, incremental=incremental@entry=false, budget=budget@entry=0, gckind=gckind@entry=js::GC_NORMAL, reason=reason@entry=JS::gcreason::DESTROY_RUNTIME) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:4699
#25 0x00002aaaadd9939d in Collect (reason=JS::gcreason::DESTROY_RUNTIME, gckind=js::GC_NORMAL, budget=0, incremental=false, rt=rt@entry=0x2aaad7cec000) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:4725
#26 js::GC (rt=rt@entry=0x2aaad7cec108, gckind=gckind@entry=js::GC_NORMAL, reason=reason@entry=JS::gcreason::DESTROY_RUNTIME) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:4724
#27 0x00002aaaade8ec7b in JSRuntime::~JSRuntime (this=0x2aaad7cec108, __in_chrg=<optimized out>) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/vm/Runtime.cpp:443
#28 0x00002aaaadd391ce in js_delete<JSRuntime> (p=0x2aaad7cec000) at ../../dist/include/js/Utility.h:319
#29 JS_DestroyRuntime (rt=0x2aaad7cec000) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsapi.cpp:743
#30 0x00002aaaad73811e in mozilla::CycleCollectedJSRuntime::DestroyRuntime (this=this@entry=0x2aaad8901ca0) at /home/visbrero/rev_control/fgit/mozilla-central/xpcom/base/CycleCollectedJSRuntime.cpp:472
#31 0x00002aaaad738aef in mozilla::CycleCollectedJSRuntime::~CycleCollectedJSRuntime (this=0x2aaad8901ca0, __in_chrg=<optimized out>) at /home/visbrero/rev_control/fgit/mozilla-central/xpcom/base/CycleCollectedJSRuntime.cpp:480
#32 0x00002aaaacfbc49b in (anonymous namespace)::WorkerThreadRunnable::Run (this=<optimized out>) at /home/visbrero/rev_control/fgit/mozilla-central/dom/workers/RuntimeService.cpp:946
#33 0x00002aaaad7346f0 in nsThread::ProcessNextEvent (this=0x2aaad7bc9ce0, mayWait=<optimized out>, result=0x2aaad8901e2f) at /home/visbrero/rev_control/fgit/mozilla-central/xpcom/threads/nsThread.cpp:622
#34 0x00002aaaad7064bf in NS_ProcessNextEvent (thread=<optimized out>, thread@entry=0x2aaad7bc9ce0, mayWait=mayWait@entry=true) at /home/visbrero/rev_control/fgit/mozilla-central/xpcom/glue/nsThreadUtils.cpp:251
#35 0x00002aaaad734d99 in nsThread::ThreadFunc (arg=0x2aaad7bc9ce0) at /home/visbrero/rev_control/fgit/mozilla-central/xpcom/threads/nsThread.cpp:250
#36 0x00002aaaabf92cc2 in _pt_root (arg=0x2aaabee55020) at /home/visbrero/rev_control/fgit/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:204
#37 0x00002aaaaacd7f6e in start_thread (arg=0x2aaad8902700) at pthread_create.c:311
#38 0x00002aaaab7f39cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Flags: needinfo?(bent.mozilla)
So this means we're looking at an object that never should have made it to the worker thread in the first place.  We should be blocking something in the structured clone process that we're not ...
Objects leaking across threads sounds bad...
Keywords: sec-high
Yes, it is.  Can you take this jan?  We need to disallow the underlying object type here from being structured cloned to a worker, most likely.  That's probably going to change asuth's app from crashing to not working at all ...
Flags: needinfo?(Jan.Varga)
We're just passing (IndexedDB-owned) Blobs around, nothing more sinister.  Unless I'm missing something, it seems like the IndexedDB blob reference can keep the IndexedDB object it is associated with alive until it is released, and then it can proxy the release to the main thread for sanity/simplicity of analysis purposes.
Ok, but I need to think about it a bit
Assignee: nobody → Jan.Varga
Flags: needinfo?(Jan.Varga)
I talked with Jan about this on Friday and we agreed I'd take this over.  I am arbitrarily asking Kyle to review.

The summary of the problem is that if the last reference to the IndexedDB blob is on the worker when FileInfo::Release gets called, FileInfo::UpdateReferences runs, goes to zero, needsCleanup is true, and so FileInfo::Cleanup gets called.

FileInfo::Cleanup has an NS_ASSERT about being on the main-thread.  If that doesn't stop us, IndexedDatabaseManager's ref-count stuff triggers a MOZ_CRASH.  I think that might have been what caused my crash in comment 0.

The patch moves the cleanup logic into a runnable which it dispatches to the main thread if we're not on the main thread and just invokes directly if we are (to avoid code duplication).

This patch includes a test that will cause a debug build without the fix to assert and/or crash, generating the desired mochitest failure.  I don't think I was clever enough to get the non-debug build to actually crash.  This may have something to do with the changes that khuey landed today to cleanup workers; a new b2g-desktop build no longer crashes reliably even without my fix here.  But to be clear, a firefox debug build with those worker cleanup changes (and without this fix) does still abort/crash.

I have just initiated a try build:
https://tbpl.mozilla.org/?tree=Try&rev=b071f8ef9617
Assignee: Jan.Varga → bugmail
Status: NEW → ASSIGNED
Attachment #827839 - Flags: review?(khuey)
Flags: needinfo?(bent.mozilla)
(In reply to Andrew Sutherland (:asuth) from comment #6)
> FileInfo::Cleanup has an NS_ASSERT about being on the main-thread.  If that
> doesn't stop us, IndexedDatabaseManager's ref-count stuff triggers a
> MOZ_CRASH.  I think that might have been what caused my crash in comment 0.

What do you know, my opt build assertions are actually useful.
Comment on attachment 827839 [details] [diff] [review]
indexeddb-blob-death-v1.diff

Review of attachment 827839 [details] [diff] [review]:
-----------------------------------------------------------------

Upgrade the threadsafety NS_ASSERTION to a MOZ_ASSERT please while you're at it.
Attachment #827839 - Flags: review?(khuey) → review+
Comment on attachment 827839 [details] [diff] [review]
indexeddb-blob-death-v1.diff

Review of attachment 827839 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, this makes sense, thanks for fixing it.
additional r=me
quick update: Thanks for the fast review.  There was a test failure on "B2G Desktop Linux x64 Opt" involving the indexedDB spinup in the iframe that looks deterministic or at least very frequent that I need to look into and resolve.  I have been swamped with FxOS 1.3 blocker reviews that are higher risk than this but should be able to address that test failure and land sometime over the next few days.  If the reason for the test failure is obvious, feel free to chime in with the problem/solution :)
(In reply to Andrew Sutherland (:asuth) from comment #10)
> quick update: Thanks for the fast review.  There was a test failure on "B2G
> Desktop Linux x64 Opt" involving the indexedDB spinup in the iframe that
> looks deterministic or at least very frequent that I need to look into and
> resolve.  I have been swamped with FxOS 1.3 blocker reviews that are higher
> risk than this but should be able to address that test failure and land
> sometime over the next few days.  If the reason for the test failure is
> obvious, feel free to chime in with the problem/solution :)

It turns out indexed DB mochitests don't run on linux x64 opt because they fail; bug 931116.  Unfortunately, I did not know this and did more work than laziness requires.  The good news is that in bug 927889 I figured out how to run a single mochitest under b2g and incidentally this seems to also make at least one other indexedDB test work.

(The bug was making the mochi-tests look like third-party iframes when they should not have been treated as such.)
(In reply to Andrew Sutherland (:asuth) from comment #11)
> (The bug was making the mochi-tests look like third-party iframes when they
> should not have been treated as such.)

Ugh.

And, you are my hero.
mozilla-inbound is closed for PGO failures and the back-outs will take time to bake, landing on b2g-inbound since B2G is the one what cares about this:

https://hg.mozilla.org/integration/b2g-inbound/rev/854452f105dc
https://hg.mozilla.org/mozilla-central/rev/854452f105dc

Don't sec-highs need security-approval to land?
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-b2g18: --- → ?
status-b2g-v1.2: --- → ?
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #14)
> Don't sec-highs need security-approval to land?

Ah, I was not aware.  This appears to be documented at https://wiki.mozilla.org/Security/Bug_Approval_Process.  Consider me informed for the future.  Please let me know if I should file a bugzilla enhancement bug about letting keywords trigger a banner on security pages that could say stuff like "POLICY FOR HANDLING THIS BUG: URL" or "DO NOT LAND THIS YOURSELF, YO!", as I do think it would be handy, but maybe I'm just falling through a small crack here.

The good news is that per https://wiki.mozilla.org/Security_Severity_Ratings I don't think this is actually a sec-high.  We were intentionally crashing during GC cleanup when an invariant is violated.  The main advantage this has over the equally viable "generate a ton of workers and allocate a ton of memory" attack is that the attacker can structure things in such a way that closing the tab triggers the crash.

The secondary attack against this is that Blob cleanup may not occur(?) as a result of the crash, but quota protection should remain in-place so an attacker can entirely leak an origin's disk space quota so it is never released.  If we don't perform Blob cleanup sweeps this seems like a broad bug in Firefox anyways.


I don't think I understand the security weighting for a situation like this, but it seems like a sec-low or whatever a griefing attack that requires the user to go to a URL would be.
How far back does this go? If this is a sec-high, we need to consider this for ESR24 and older branches.
I think it was there from the beginning, that is, when the initial support for stored files in IDB landed which was like 2 years ago.
Can we get an ESR24 and Beta patch for this? How risky would it be?
If possible, we'll want patches for b2g26 and b2g18 as well.
This applies down to esr24 without much trouble. It has some bigger conflicts on b2g18. Andrew, please request beta and esr24 approval on this patch and if possible, attach a branch-specific patch for b2g18. Thanks! :)
Flags: needinfo?(bugmail)
Comment on attachment 827839 [details] [diff] [review]
indexeddb-blob-death-v1.diff

I think this is pretty safe to take on older branches.  Requesting approval per the requests.  Thanks very much for testing the uplift-ability of the patch, RyanVM!  I had started a fresh gecko-dev checkout to do that when I got the mail, but that takes a while! :)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Always there. 
User impact if declined: Webpages can accidentally or intentionally (if nefarious/jerky) structure disk-persisted Blob creation on DOM web workers to trigger process crashes during garbage collection.  Note that Chrome does not support storing Blobs in their IndexedDB still (http://code.google.com/p/chromium/issues/detail?id=108012), so cross-browser web content may not be using this so much.  (Though Chrome/friends do have WebSQLDatabase, and Firefox doesn't, so Firefox-specific paths could be used.)
Testing completed (on m-c, etc.): The fix landed on Firefox 28 on Nov 19th, 2013 when it was mozilla-central.  It has a mochitest.  But the mochitest is blacklisted on b2g with most of the other mochitests.
Risk to taking this patch (and alternatives if risky): If the patch applies, the patch should be safe.
String or IDL/UUID changes made by this patch: No strings.
Attachment #827839 - Flags: approval-mozilla-esr24?
Attachment #827839 - Flags: approval-mozilla-beta?
Attachment #827839 - Flags: approval-mozilla-b2g26?
Flags: needinfo?(bugmail)
Attachment #827839 - Flags: approval-mozilla-esr24?
Attachment #827839 - Flags: approval-mozilla-esr24+
Attachment #827839 - Flags: approval-mozilla-beta?
Attachment #827839 - Flags: approval-mozilla-beta+
Attachment #827839 - Flags: approval-mozilla-b2g26?
Attachment #827839 - Flags: approval-mozilla-b2g26+
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #22)
> Hopefully a b2g18 backport isn't too much trouble still :)

Do we expect v1.1 devices to get further v1.1 updates?  I had the impression any v1.1 device lucky enough to get an update would end up on v1.2.  (The exception being geeksphone devices which can get updates to any version they want.)
(In reply to Andrew Sutherland (:asuth) from comment #23)
> Do we expect v1.1 devices to get further v1.1 updates?  I had the impression
> any v1.1 device lucky enough to get an update would end up on v1.2.  (The
> exception being geeksphone devices which can get updates to any version they
> want.)

My understanding is that we're committed to partners to support security backports for an eventual v1.1.x release until v1.1 is officially EOL (currently March timeframe).
I'd like to get this on ESR24 since it is a sec-high rated issue.
(In reply to Al Billings [:abillings] from comment #26)
> I'd like to get this on ESR24 since it is a sec-high rated issue.

I'll provide esr24 and b2g18 patches, but note that in comment 15 I assert this might not really be a sec-high.  However, from https://wiki.mozilla.org/Security_Severity_Ratings it wasn't clear to me what it actually is either.  Having more explicit mappings of what various types of potential DoS translate to would be handy.  This is mainly just a way for a page to crash the browser intentionally or accidentally via an assert.  You can't steal data, you can't gain privileges, you can't mess up the contents of the file system, etc.
So the crash is non-exploitable? You can't get access to memory in any deterministic fashion, etc?
(In reply to Al Billings [:abillings] from comment #28)
> So the crash is non-exploitable? You can't get access to memory in any
> deterministic fashion, etc?

Yes.  The crash was either NS_ASSERT or MOZ_CRASH, depending on debug build versus production build.
Flags: needinfo?(bugmail)
I'm re-rating this as a sec-moderate.

With that, I don't think we need this for ESR24.
Keywords: sec-highsec-moderate
Does this have a reproducible testcase?
Flags: needinfo?(bugmail)
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #31)
> Does this have a reproducible testcase?

There's a mochitest in the patch that reliably causes a failure on builds without the fix part of the patch.  And it could be run manually if someone wanted.
Flags: needinfo?(bugmail)
If the mochitest is reliable and has already been checked there's no point in QA rehashing that effort. Please add the verifyme keyword if there's something more we can do.
Whiteboard: [qa-]
Whiteboard: [qa-] → [qa-][adv-main27+]
Andrew, now that this has been downgraded in severity, is it worth the effort to attempt a b2g18 backport?
Flags: needinfo?(bugmail)
I made a backport and pushed it to try.  The changes were the quota manager refactoring and the system-wide ISUPPORTS changes.  I got it to build locally but did not run tests.  Try run is at:
https://tbpl.mozilla.org/?tree=Try&rev=0b4293d6b256
Flags: needinfo?(bugmail)
You've gotten as much in the way of green tests as you can hope to get from a b2g18 Try push. So that's good anyway :)
This is the b2g18 patch, I'm carrying forward r+ since the NS_ISUPPORTS changes were mechanical in nature and the QuotaManager landing only changed where we found out if the FileInfo-owner subsystem was shutting down.

NOTE: This patch was already approved for uplift to esr24 and b2g26 but did not apply cleanly for b2g18.  This is simply updated the patch for b2g18.  I'm just pasting my approval request from those branches here:

Bug caused by (feature/regressing bug #): Always there. 
User impact if declined: Webpages can accidentally or intentionally (if nefarious/jerky) structure disk-persisted Blob creation on DOM web workers to trigger process crashes during garbage collection.  Note that Chrome does not support storing Blobs in their IndexedDB still (http://code.google.com/p/chromium/issues/detail?id=108012), so cross-browser web content may not be using this so much.  (Though Chrome/friends do have WebSQLDatabase, and Firefox doesn't, so Firefox-specific paths could be used.)
Testing completed (on m-c, etc.): The fix landed on Firefox 28 on Nov 19th, 2013 when it was mozilla-central.  It has a mochitest.  But the mochitest is blacklisted on b2g with most of the other mochitests.
Risk to taking this patch (and alternatives if risky): If the patch applies, the patch should be safe.
String or IDL/UUID changes made by this patch: No strings.
Attachment #8374046 - Flags: review+
Attachment #8374046 - Flags: approval-mozilla-b2g18?
Comment on attachment 8374046 [details] [diff] [review]
indexeddb-blob-death-v1-b2g18.diff

Review of attachment 8374046 [details] [diff] [review]:
-----------------------------------------------------------------

Taking for sec-moderate
Attachment #8374046 - Flags: approval-mozilla-b2g18? → approval-mozilla-b2g18+
Group: core-security
You need to log in before you can comment on or make changes to this bug.