Closed Bug 992512 Opened 6 years ago Closed 6 years ago

DOMStorageCache crashes (sDatabase access violation read)

Categories

(Core :: DOM: Core & HTML, defect, critical)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla31
Tracking Status
firefox28 --- wontfix
firefox29 - wontfix
firefox30 + verified
firefox31 --- verified

People

(Reporter: johnp, Assigned: mayhemer)

Details

Attachments

(2 files, 3 obsolete files)

There are multiple crash signatures crashing on sDatabase calls, mostly by ::Clear and ::SetItem being called:

https://crash-stats.mozilla.com/query/?query_type=simple&query=DOMStorageCache

One of those crashes even goes back to 24 ESR.
Could someone pls have a look into it? I don't know DOMStorage code, but should this even be possible to happen? (e.g. calling ::Clear after a ::StopDatabase (or how else does sDatabase get cleared?))
Just a missing null check?
Apparently [1][2] there is a reentrancy of the main thread loop.  This probably allows content script run from during profile-before-change notifications.  How is this possible?  Who is the called of ProcessNextEvent here?  Seems like some javascript does it?  Bad!

The option to fix the crash is not a null check, but shutdown of the DOMStorage database from something else then the profile-before-change notification, probably profile-before-change2.  Apparently, content scripts may run past that point.  I can do it even only from xpcom-shutdown, see [3].  Olli?


[1] https://crash-stats.mozilla.com/report/index/f64b34a2-f3f6-47f6-adc1-f7b6b2140406
[2] https://crash-stats.mozilla.com/report/index/80dc739d-4c74-436e-a28b-fca072140406
[3] http://hg.mozilla.org/mozilla-central/annotate/5fa70bd90a8b/dom/src/storage/DOMStorageObserver.cpp#l296
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
QA Contact: honzab.moz
Attached patch potential fix 1 (obsolete) — Splinter Review
- changing profile-before-change to profile-before-change2 in hope all nested loops and content scripts are done by that time
- other option: only shutdown at xpcom-shutdown, we must do it no longer after that point (synchronously) since the database connections have to be all closed after xpcom-shutdown
Assignee: nobody → honzab.moz
Attachment #8402360 - Flags: review?(bugs)
Note: database connections need to be shutdown by the time xpcom-shutdown-threads runs, not xpcom-shutdown.
Josh, I indeed tried that before I submitted the patch, assertions raised in my face:

>	xul.dll!mozilla::storage::Service::Observe(0x00000000, 0x06a9e790, 0x00000000) Line 940	C++
 	xul.dll!nsObserverList::NotifyObservers(0x00000000, 0x06a9e790, 0x00000000) Line 97	C++
 	xul.dll!nsObserverService::NotifyObservers(0x00000000, 0x06a9e790, 0x00000000) Line 305	C++
 	xul.dll!mozilla::ShutdownXPCOM(0x00481b2c) Line 886	C++
 	xul.dll!NS_ShutdownXPCOM(0x00481b2c) Line 819	C++
 	xul.dll!ScopedXPCOMStartup::~ScopedXPCOMStartup() Line 1202	C++

0x06a9e790 = NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID

Other option is to post an event from either profile2 or xpcom-shutdown that will shut the database down between xpcom-shutdown and xpcom-shutdown-threads.  There is a root loop spin between those too.  I think it could be be even the best solution here.
Tried to post an event only from xpcom-shutdown.  Apparently, not the true solution either.  I can see that the seer thread is being shutdown exactly from [1] and in the seer's thread shutdown nested loop my shutdown event for the local storage database is executed.  [1] is the place I want my shutdown event to be fired too.  So, if this main thread event loop happens a bit sooner from some reason we are still not safe from execution of a content script here.

Adding just a null check will lead to actual (and unnecessary) data lose!

Hmm.. thinking further, we could just let consumers of the database keep a reference to it.  When it is finally released, do the shutdown thing.  No events, just a flag set in profile2 and xpcom-shutdown that the database can go away when consumers are gone.  Question here tho is, if all consumers are gone before the NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID check for open sqlite connections...  If not, then it's a problem beyond scope of this bug (beyond localStorage back end)


[1] http://hg.mozilla.org/mozilla-central/annotate/5fa70bd90a8b/xpcom/build/nsXPComInit.cpp#l879
Attached patch v1 (obsolete) — Splinter Review
- added a null-check with an assertion (no more crashes)
- shutdown delayed as much as possible and is implemented as a post from xpcom-shutdown, see comment 5 for rational
- added a flag to prevent reinitialization of the database after it has ones been shut down


I don't want to implement what's mentioned in comment 6, since it is first quite complicated and second may introduce shutdown of the connection after the check point mentioned in comment 5 or still lead to data lose.  So, eventually not much better then this fix.
Attachment #8402360 - Attachment is obsolete: true
Attachment #8402360 - Flags: review?(bugs)
Attachment #8402375 - Flags: review?(bugs)
(In reply to Honza Bambas (:mayhemer) from comment #2)
> Apparently [1][2] there is a reentrancy of the main thread loop.  This
> probably allows content script run from during profile-before-change
> notifications.  How is this possible?  Who is the called of ProcessNextEvent
> here?  Seems like some javascript does it?  Bad!
Shutting down some thread would also spin the event loop
Looks like Seer thing shuts down some thread.
...after some js spins the event loop first.
(In reply to Olli Pettay [:smaug] from comment #10)
> ...after some js spins the event loop first.

Exactly!  Seer shutdown is sent as an event to prevent loops so it's innocent here.  Something in JS invoked (maybe just indirectly) the main thread looping during profile-before-change notifications looping was on the stack.

I think it will be hard to find that piece of code and enforce not to do looping.
Trying to understand why DOMStorage couldn't just handle the case when something spins the event loop.
(In reply to Olli Pettay [:smaug] from comment #12)
> Trying to understand why DOMStorage couldn't just handle the case when
> something spins the event loop.

Explanation:
- in profile-before-change we shut the database thread (which also blocks until all pending writes are committed to the database) down
- later, a script is executed after that point that still wants to use localStorage, this is something I really would not expect

The patch tries to delay shutdown of the database to a later point, ideally between xpcom-shutdown and xpcom-shutdown-threads.  However, when someone 1) loops the main thread event queue and 2) executes a script from xpcom-shutdown after we did that, we may get to the point we lose data.

If it is OK that web content scripts may execute even during xpcom-shutdown (which I personally would not expect..) then we may need to impl the "Hmm..." note from comment 6.
Comment on attachment 8402375 [details] [diff] [review]
v1

We could do this, but for branches a null check would be nicer, IMO.
Attachment #8402375 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #14)
> There is also
> http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.
> cpp#4524 case.

Hmm.. this ruins it a bit... there is no xpcom-shutdown...  And in all cases [1] we invoke profile(2) data == 'shutdown-persist', so no way to recognize between exit(0) and normal shutdown.

The patch is then wrong...

[1] http://mxr.mozilla.org/mozilla-central/search?string=Notify.*change2&regexp=1&find=&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central
Comment on attachment 8402375 [details] [diff] [review]
v1

comment 15/16
Attachment #8402375 - Flags: review+ → review-
(In reply to Josh Matthews [:jdm] from comment #4)
> Note: database connections need to be shutdown by the time
> xpcom-shutdown-threads runs, not xpcom-shutdown.

fwiw, anything using the profile (and dbs are in the profile) should start its shutdown at profile-before-change (now there is p-b-c2, but the point sticks) and likely be done by xpcom-shutdown. There's no guarantee that the openedDBs check that today happens at xpcom-shutdown-threads may have to be move earlier in future, there's still that plan to exit(0) at xpcom-shutdown around, even if there's no definitive plan for it. So we may have to move the barrier at xpcom-shutdown at a certain point.
(In reply to Marco Bonardo [:mak] from comment #18)
> (In reply to Josh Matthews [:jdm] from comment #4)
> > Note: database connections need to be shutdown by the time
> > xpcom-shutdown-threads runs, not xpcom-shutdown.
> 
> fwiw, anything using the profile (and dbs are in the profile) should start
> its shutdown at profile-before-change (now there is p-b-c2, but the point
> sticks) 

That's exactly what localstorage does!  And hence the bug, because someone is breaking this contract and tries to access localstorage past this shutdown point.

> and likely be done by xpcom-shutdown. 

We do that too.

> There's no guarantee that the
> openedDBs check that today happens at xpcom-shutdown-threads may have to be
> move earlier in future, there's still that plan to exit(0) at xpcom-shutdown
> around, even if there's no definitive plan for it. So we may have to move
> the barrier at xpcom-shutdown at a certain point.

I don't see a relevance here, the current bits of exit(0) doesn't trigger xpcom-shutdown (-threads) at all as Olli mentions in comment 14: http://hg.mozilla.org/mozilla-central/annotate/e7d3a8320289/widget/windows/nsWindow.cpp#l4528

Hence r- on the patch.

I also have another plan how to fix this in the DOM storage code (but I'm quit busy these days with other stuff): after p-b-c tell the database to shutdown after all references to it go away.  In xpcom-shutdown block (or rather force) shutdown with some (small as possible) data lose but no crash.
And by the way - this bug is indirectly caused by nsThread::Shutdown event queue spinning...
Olli, this is the same as v1 (you actually has already r+'ed) except the observer notification changes.  It's actually a null check + check against reinitialization of the database after shutdown.
Attachment #8413213 - Flags: review?(bugs)
Attachment #8413215 - Flags: review?(bugs)
Attachment #8413213 - Flags: review?(bugs) → review+
Attachment #8413215 - Flags: review?(bugs) → review+
Attachment #8413213 - Flags: checkin+
Attachment #8402375 - Attachment is obsolete: true
Comment on attachment 8413215 [details] [diff] [review]
v2 - null check only for branches

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 600307
User impact if declined: potential shutdown crash (null dereference)
Testing completed (on m-c, etc.): similar patch just landed on m-i
Risk to taking this patch (and alternatives if risky): near to zero, just a null check
String or IDL/UUID changes made by this patch: none
Attachment #8413215 - Flags: approval-mozilla-beta?
Attachment #8413215 - Flags: approval-mozilla-aurora?
Didn't catch you on IRC, Olli, maybe for the branch patch would be better to just return NS_OK when the database is not up.  This will be in parity with the regular patch and also have less impact, do you agree to update the branch patch this way?
Flags: needinfo?(bugs)
Does it really matter? It is some unusual case. exception at least tells something went wrong.
Flags: needinfo?(bugs)
Attached patch v3Splinter Review
The patch is based on a plain m-c code, so to land this one, I will back the v2 out first.

- as discussed on IRC, returns NS_ERROR_NOT_INITIALIZED
- MOZ_ASSERT changed to NS_ERROR (imo better)
Attachment #8413267 - Flags: review?(bugs)
Attachment #8413267 - Flags: review?(bugs) → review+
Attachment #8413213 - Attachment is obsolete: true
Attachment #8413213 - Flags: checkin+
Attachment #8413267 - Flags: checkin+
Tracking for 30 (and will approve uplift) but this is not a viable ride-along in a 29 chemspill so we're not going to track there, it can ride for the next 6 weeks.
Attachment #8413215 - Flags: approval-mozilla-beta?
Attachment #8413215 - Flags: approval-mozilla-beta-
Attachment #8413215 - Flags: approval-mozilla-aurora?
Attachment #8413215 - Flags: approval-mozilla-aurora+
QA Contact: honzab.moz
After this fix was pushed, there are no crashes in Socorro on Fx 30 and 31 for the following signatures:
[@ mozilla::dom::DOMStorageCache::Clear(mozilla::dom::DOMStorage const*) ]
[@ mozilla::dom::DOMStorageCache::SetItem(mozilla::dom::DOMStorage const*, nsAString_internal const&, nsString const&, nsString&)]
[@ mozilla::dom::DOMStorageCache::Clear]
[@ mozilla::dom::DOMStorageCache::~DOMStorageCache()]

Marking as verified based on the above results.

Although, with [@ mozilla::dom::DOMStorageCache::~DOMStorageCache()] signature, there are a couple of crashes (~16) on Nightly 32.0a1: https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Adom%3A%3ADOMStorageCache%3A%3A~DOMStorageCache%28%29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&version=Firefox%3A32.0a1&hang_type=any&date=2014-05-20+08%3A00%3A00&range_value=3#tab-reports
Could this be a concern?
Status: RESOLVED → VERIFIED
Flags: needinfo?(honzab.moz)
Keywords: verifyme
This crashes seem to come from a single user. Also only three reports show a full stacktrace (probably because this is a shutdown crash). This is the first one (the others are equal):

https://crash-stats.mozilla.com/report/index/48a8b7dd-e190-4bce-9600-4b2bb2140518

Some kind of use after free introduced by delaying the db shutdown?
(In reply to Johannes Pfrang [:johnp] from comment #33)
> This crashes seem to come from a single user. Also only three reports show a
> full stacktrace (probably because this is a shutdown crash). This is the
> first one (the others are equal):
> 
> https://crash-stats.mozilla.com/report/index/48a8b7dd-e190-4bce-9600-
> 4b2bb2140518
> 
> Some kind of use after free introduced by delaying the db shutdown?

Thanks, filed bug 1013329 on that.
Flags: needinfo?(honzab.moz)
(In reply to Alexandra Lucinet, QA Mentor [:adalucinet] from comment #32)
> After this fix was pushed, there are no crashes in Socorro on Fx 30 and 31
> for the following signatures:
> [@ mozilla::dom::DOMStorageCache::Clear(mozilla::dom::DOMStorage const*) ]
> [@ mozilla::dom::DOMStorageCache::SetItem(mozilla::dom::DOMStorage const*,
> nsAString_internal const&, nsString const&, nsString&)]
> [@ mozilla::dom::DOMStorageCache::Clear]
> [@ mozilla::dom::DOMStorageCache::~DOMStorageCache()]
> 
> Marking as verified based on the above results.
> 
> Although, with [@ mozilla::dom::DOMStorageCache::~DOMStorageCache()]
> signature, there are a couple of crashes (~16) on Nightly 32.0a1:
> https://crash-stats.mozilla.com/report/
> list?signature=mozilla%3A%3Adom%3A%3ADOMStorageCache%3A%3A~DOMStorageCache%28
> %29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&ver
> sion=Firefox%3A32.0a1&hang_type=any&date=2014-05-
> 20+08%3A00%3A00&range_value=3#tab-reports
> Could this be a concern?

To answer, this crash (in the dtor) is there since 28 and maybe even earlier.  So it's probably not related to this patch.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.