Closed Bug 1152026 Opened 10 years ago Closed 9 years ago

IndexedDB cycle-collection crash, probably

Categories

(Core :: Storage: IndexedDB, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla43
blocking-b2g 2.5?
Tracking Status
firefox40 --- wontfix
firefox41 + fixed
firefox42 + fixed
firefox43 --- fixed
firefox-esr31 --- unaffected
firefox-esr38 41+ fixed
b2g-v2.0 --- unaffected
b2g-v2.0M --- unaffected
b2g-v2.1 --- unaffected
b2g-v2.1S --- unaffected
b2g-v2.2 --- fixed
b2g-v2.2r --- fixed
b2g-master --- fixed

People

(Reporter: asuth, Assigned: khuey)

References

Details

(Keywords: crash, sec-moderate, Whiteboard: [fixed in 43 by bug 1179909][post-critsmash-triage][adv-main41+][adv-esr38.3+])

Crash Data

Attachments

(5 files)

I'm using IndexedDB on workers, which rocks. But my (stock, mozilla.org-built linux x86_64) nightly builds are crashing a lot in the cycle collector it seems. Here's one: bp-d90ca7bb-15bc-4ab7-8ef3-a26d12150407 And all of these but the windows one seem to be me: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=ChildFinder%3A%3ANoteJSObject%28JSObject*%29#tab-reports Here's the summary of sorts from the stack provided by the crash reporter infrastructure: ChildFinder::NoteJSObject(JSObject*) TraceCallbackFunc::Trace(JS::Heap<JSObject*>*, char const*, void*) const mozilla::DOMEventTargetHelper::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) mozilla::dom::indexedDB::IDBWrapperCache::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) mozilla::dom::indexedDB::IDBTransaction::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) void nsPurpleBuffer::Block::VisitEntries<RemoveSkippableVisitor>(nsPurpleBuffer&, RemoveSkippableVisitor&) nsPurpleBuffer::RemoveSkippable(nsCycleCollector*, bool, bool, void (*)()) nsCycleCollector_forgetSkippable(bool, bool) FireForgetSkippable CCTimerFired Note that I did initially suspect the devtools indexeddb inspector was at play, and commented to that effect, but I've stopped using that. I'm going to spin a local debug build and switch to that and see if there are any helpful assertion explosions that show up there.
Hm, that crash is on the main thread...
So, things seem to be a bit more stable now for me (still using the same nightly production build that likes to crash) now that my code isn't throwing as many random errors. === Mental state dump for my benefit, probably don't read this: I believe that while I was seeing these crashes, I was triggering an exception in a request "onsuccess" handler of an on-worker function that would be caught and do a console.error('blah', ex, ex.stack), possibly following an app schema upgrade. It was also possible for code to throw an exception and not be caught. The arrow-function closure where these calls happen definitely closed over the IDBRequest whose onsuccess they were handling and may have incidentally closed-over the associated IDBTransaction by virtue of being declared in the same scope. The transaction seems like something scary could be happening with it because even now I'm seeing: "JavaScript error: http://glodastrophe/deps/gelam/js/maildb.js, line 336: InvalidStateError: An attempt was made to use an object that is not, or is no longer, usable" The code in question is: this._db.transaction([TBL_CONFIG, TBL_FOLDER_INFO], 'readonly') And this._db is only populated in the "onsuccess" handler of an IndexedDB.open() call which also has an appropriate "onupgradeneeded" handler. Which seems surprising, but it wouldn't be surprising if I'm being insufficiently diligent about IndexedDB database lifecycle stuff.
Not sure how related this is to my crashes (I'll try and upgrade variant too), but this appears to be a case of my code creating a new transaction inside an onsuccess event and this causing RunBeforeNextEvent to assert. My investigation got cut short because when I did "call DumpJSStack()" my X session locked up until I killed gdb. (I was expecting something bad to happen, just not that.) Context: this is from "./mach run --debug -P aprofile", which I think means that gdb should generally be happy, configuration wise. And the path is in my auto-load-safe-path. Gonna try again now forcing an upgrade and maybe try and use devtools if I manage to avoid the asserts.
Summary: IndexedDB-on-workers cycle-collection crash, probably → IndexedDB cycle-collection crash, probably
I deferred the transaction's creation to a future turn of the event loop/micro-task queue with Promise.resolve().then(...) but the assertion still happens. Just with a shorter stack. And call DumpJSStack() was fine this time.
My optimization was turned up too high so ptarray didn't work and I did it manually. The pre-empted runnable is an outstanding, non-overlapping read-only transaction. (Pre-empted one is readonly over ['config', folderInfo], exploding one is readonly over ['tasks'].)
I think that pre-empted transaction may be doing something weird and suspicious, because if I defer the creation of my new transaction with with setTimeout(blah, 0) or event 5000ms, things still explode. I'm going to stop poking at things now, but let me know if there's something useful I can do if this isn't a known problem.
Severity: normal → critical
Crash Signature: ChildFinder::NoteJSObject(JSObject*) → [@ ChildFinder::NoteJSObject(JSObject*)]
Keywords: crash
[Blocking Requested - why for this release]: We can't move calendar's database interactions into the worker if this doesn't get fixed in v3.
blocking-b2g: --- → 3.0?
Attached file Repro script
I wrote a little script to run a test-agent (gaia unit) test that reproduces the issue here. You do the following after downloading the script to $GAIA/reproduce_bug_1152026.sh: ```bash cd $GAIA git remote add gaye https://github.com/gaye/bug-1169790 git fetch gaye git checkout gaye/bug-1169790 FIREFOX=/path/to/nightly/firefox ./reproduce_bug_1152026.sh ``` The script will (1) start the server that receives test results from the browser (2) build gaia (3) start firefox with the gaia profile and navigate to the test app (4) run the test that reproduces the crash Let me know if this helps! It'd be a bit of a chore for me to produce a standalone webpage that reproduces, but I could do that if you all needed.
Attachment #8622776 - Flags: feedback?(bent.mozilla)
Just FYI you need to wait for a tiny bit (~10-20s on my machine) for gecko to crash after the test runs.
(In reply to Gareth Aye [:gaye] from comment #9) > Created attachment 8622776 [details] > Repro script > > I wrote a little script to run a test-agent (gaia unit) test that reproduces > the issue here. You do the following after downloading the script to > $GAIA/reproduce_bug_1152026.sh: > > ```bash > cd $GAIA > git remote add gaye https://github.com/gaye/bug-1169790 > git fetch gaye > git checkout gaye/bug-1169790 > FIREFOX=/path/to/nightly/firefox ./reproduce_bug_1152026.sh > ``` > > The script will > > (1) start the server that receives test results from the browser > (2) build gaia > (3) start firefox with the gaia profile and navigate to the test app > (4) run the test that reproduces the crash > > Let me know if this helps! It'd be a bit of a chore for me to produce a > standalone webpage that reproduces, but I could do that if you all needed. That repo doesn't exist :/
The site in bug 1176529 reproduces though, so I can debug that.
Ah crap. I was trying to point you to a branch of my fork. You should really `git remote add gaye https://github.com/gaye/gaia` and then the rest should work out... see https://github.com/gaye/gaia/tree/bug-1169790 ?
Group: core-security
Hiding because one of the dupes indicates a thread safety issue. Is this a regression from bug 1137245 as indicated in bug 1176529? This bug was filed two months before that.
Flags: needinfo?(bent.mozilla)
No, I don't think bug 1137245 has anything to do with this.
Flags: needinfo?(bent.mozilla)
The patch in bug 1179909 fixes the site in bug 1176529. I suspect it will fix the Gaia problems too.
Keywords: sec-high
Comment on attachment 8622776 [details] Repro script khuey is on this (thanks!)
Attachment #8622776 - Flags: feedback?(bent.mozilla)
I've been running with a local build using the fix on bug 1179909 (try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=6f94afe72869 for builds too), and it indeed seems to have fixed the problem for me. Woo!
This crash (or one very similar to it) shot up dramatically in the 20150728030209 build, possibly due to 1184574. What's the status of the investigation here?
Flags: needinfo?(nsm.nikhil)
Flags: needinfo?(khuey)
The status is that we need to finish the patch from bug 1179909.
Flags: needinfo?(nsm.nikhil)
Flags: needinfo?(khuey)
It is quite possible that Nightly users are trying out push notifications since Bug 1184574 and operating on IDB leading to crashes.
It shouldn't lead to crashes unless there are promises involved.
Apparently nightly and aurora both spiked on 0728. I guess bug 1184574 is off the hook, but this is still worrying. [Tracking Requested - why for this release]: Existing low-volume crash spiking in 41 and 42.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #25) > It shouldn't lead to crashes unless there are promises involved. There are Promises involved in all the APIs that ServiceWorkers expose. In addition, if users are using some library like localforage, that would involve Promises too.
With the landing of bug 1179909 which fixes this, I think we can call this fixed. Or we can call it something else (worksforme?, dupe to bug 1179909?). Many thanks to :khuey and :bzbarsky and :smaug and all involved.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
fixed wfm
Assignee: nobody → khuey
(In reply to Andrew Sutherland [:asuth] from comment #29) > With the landing of bug 1179909 which fixes this, I think we can call this > fixed. Or we can call it something else (worksforme?, dupe to bug > 1179909?). Many thanks to :khuey and :bzbarsky and :smaug and all involved. Given that the fix for bug 1179909 works, can we request an uplift to Aurora as it affects 42? The patch associated with bug 1179909 is pretty huge, can somebody comment on whether it is safe to uplift to Beta?
Flags: needinfo?(bugmail)
(In reply to Ritu Kothari (:ritu) from comment #31) > Given that the fix for bug 1179909 works, can we request an uplift to Aurora > as it affects 42? The patch associated with bug 1179909 is pretty huge, can > somebody comment on whether it is safe to uplift to Beta? The authoritative answer would seem to be "no", from bug 1179909: (Quoting Nikhil Marathe [:nsm] (please needinfo?) from bug 1179909 comment 18) > Should this be uplifted at least to aurora? There are some push+IDB+Promises > combinations that crash currently. (Quoting Benjamin Smedberg [:bsmedberg] from bug 1179909 comment 20) > I oppose trying to uplift anything non-critical related to event loops. > Event loops are notoriously regression-prone and we need the full release > cycle to find bugs. I personally do think we should disable IndexedDB-on-workers on all branches where bug 1179909 did not land because the crashing is not fun. (And anyone cool enough to use IndexedDB on workers is probably also cool enough to use Promises on workers.)
Flags: needinfo?(bugmail)
IndexedDB on workers has been shipping since 37, so disabling it is not an option IMO.
Given that people are discovering it, we should probably do something though.
Whiteboard: [fixed in 43 by bug 1179909]
Depends on: 1197547
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #35) > Given that people are discovering it, we should probably do something though. I don't suppose there's any chance the pre-empted runnable logic was the entire cause of the cycle collector problem and can be harmlessly disabled?
Attached patch Branch patchSplinter Review
For branches, we can at least explicitly clear the array on the right thread. This should prevent any security issues, although I don't think IndexedDB will actually work with Promises with this patch.
Attachment #8651883 - Flags: review?(amarchesini)
Unless there's some side-effect to when the mPreemptingRunnableInfos gets cleared out, the only correctness problem I saw with IndexedDB-with-promises prior to the bug 1179909 fix was bug 1161690 stalling no-op transactions forever.
Attachment #8651883 - Flags: review?(amarchesini) → review+
(In reply to Andrew Sutherland [:asuth] from comment #38) > Unless there's some side-effect to when the mPreemptingRunnableInfos gets > cleared out, the only correctness problem I saw with IndexedDB-with-promises > prior to the bug 1179909 fix was bug 1161690 stalling no-op transactions > forever. Right. I guess having requests running "primes the pump" so to speak. I didn't account for that behavior, so correctness fallout should be minimal.
Comment on attachment 8651883 [details] [diff] [review] Branch patch [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: sec-moderate that affects a new feature introduced in ESR 38 User impact if declined: Possible security issues, crashes when IndexedDB is used in certain ways. Fix Landed on Version: 43 Risk to taking this patch (and alternatives if risky): Low risk, alternative is disabling a feature which has been in release for at least 3 cycles. String or UUID changes made by this patch: N/A
Attachment #8651883 - Flags: approval-mozilla-esr38?
Attachment #8651883 - Flags: approval-mozilla-beta?
Attachment #8651883 - Flags: approval-mozilla-aurora?
Attachment #8651883 - Flags: approval-mozilla-b2g37?
Comment on attachment 8651883 [details] [diff] [review] Branch patch Seems safe to uplift to all nom'd branches.
Attachment #8651883 - Flags: approval-mozilla-esr38?
Attachment #8651883 - Flags: approval-mozilla-esr38+
Attachment #8651883 - Flags: approval-mozilla-beta?
Attachment #8651883 - Flags: approval-mozilla-beta+
Attachment #8651883 - Flags: approval-mozilla-b2g37?
Attachment #8651883 - Flags: approval-mozilla-b2g37+
Attachment #8651883 - Flags: approval-mozilla-aurora?
Attachment #8651883 - Flags: approval-mozilla-aurora+
Whiteboard: [fixed in 43 by bug 1179909] → [fixed in 43 by bug 1179909][post-critsmash-triage]
Group: core-security → core-security-release
Whiteboard: [fixed in 43 by bug 1179909][post-critsmash-triage] → [fixed in 43 by bug 1179909][post-critsmash-triage][adv-main41+][adv-esr38.3+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: