Closed Bug 992512 Opened 11 years ago Closed 11 years ago

DOMStorageCache crashes (sDatabase access violation read)

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla31

Tracking Flags:

Tracking

Status

firefox28

---

wontfix

firefox29

wontfix

firefox30

verified

firefox31

---

verified

People

(Reporter: johnp, Assigned: mayhemer)

Details

Attachments

(2 files, 3 obsolete files)

potential fix 1 11 years ago Honza Bambas (:mayhemer) 1.51 KB, patch		Details \| Diff \| Splinter Review
v1 11 years ago Honza Bambas (:mayhemer) 8.82 KB, patch	mayhemer : review-	Details \| Diff \| Splinter Review
v2 - subpart of v1 (actually better nullchecks only) 11 years ago Honza Bambas (:mayhemer) 5.79 KB, patch	smaug : review+	Details \| Diff \| Splinter Review
v2 - null check only for branches 11 years ago Honza Bambas (:mayhemer) 1.73 KB, patch	smaug : review+ lsblakk : approval-mozilla-aurora+ lsblakk : approval-mozilla-beta- mayhemer : checkin+	Details \| Diff \| Splinter Review
v3 11 years ago Honza Bambas (:mayhemer) 5.00 KB, patch	smaug : review+ mayhemer : checkin+	Details \| Diff \| Splinter Review

Johannes Pfrang [:johnp]

Reporter

Description

•

11 years ago

There are multiple crash signatures crashing on sDatabase calls, mostly by ::Clear and ::SetItem being called: https://crash-stats.mozilla.com/query/?query_type=simple&query=DOMStorageCache One of those crashes even goes back to 24 ESR. Could someone pls have a look into it? I don't know DOMStorage code, but should this even be possible to happen? (e.g. calling ::Clear after a ::StopDatabase (or how else does sDatabase get cleared?))

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 1

•

11 years ago

Just a missing null check?

Honza Bambas (:mayhemer)

Assignee

Comment 2

•

11 years ago

Apparently [1][2] there is a reentrancy of the main thread loop. This probably allows content script run from during profile-before-change notifications. How is this possible? Who is the called of ProcessNextEvent here? Seems like some javascript does it? Bad! The option to fix the crash is not a null check, but shutdown of the DOMStorage database from something else then the profile-before-change notification, probably profile-before-change2. Apparently, content scripts may run past that point. I can do it even only from xpcom-shutdown, see [3]. Olli? [1] https://crash-stats.mozilla.com/report/index/f64b34a2-f3f6-47f6-adc1-f7b6b2140406 [2] https://crash-stats.mozilla.com/report/index/80dc739d-4c74-436e-a28b-fca072140406 [3] http://hg.mozilla.org/mozilla-central/annotate/5fa70bd90a8b/dom/src/storage/DOMStorageObserver.cpp#l296

Honza Bambas (:mayhemer)

Assignee

Updated

•

11 years ago

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

QA Contact: honzab.moz

Honza Bambas (:mayhemer)

Assignee

Comment 3

•

11 years ago

Attached patch potential fix 1 (obsolete) — Details — Splinter Review

- changing profile-before-change to profile-before-change2 in hope all nested loops and content scripts are done by that time - other option: only shutdown at xpcom-shutdown, we must do it no longer after that point (synchronously) since the database connections have to be all closed after xpcom-shutdown

Assignee: nobody → honzab.moz

Attachment #8402360 - Flags: review?(bugs)

Josh Matthews [:jdm]

Comment 4

•

11 years ago

Note: database connections need to be shutdown by the time xpcom-shutdown-threads runs, not xpcom-shutdown.

Honza Bambas (:mayhemer)

Assignee

Comment 5

•

11 years ago

Josh, I indeed tried that before I submitted the patch, assertions raised in my face: > xul.dll!mozilla::storage::Service::Observe(0x00000000, 0x06a9e790, 0x00000000) Line 940 C++ xul.dll!nsObserverList::NotifyObservers(0x00000000, 0x06a9e790, 0x00000000) Line 97 C++ xul.dll!nsObserverService::NotifyObservers(0x00000000, 0x06a9e790, 0x00000000) Line 305 C++ xul.dll!mozilla::ShutdownXPCOM(0x00481b2c) Line 886 C++ xul.dll!NS_ShutdownXPCOM(0x00481b2c) Line 819 C++ xul.dll!ScopedXPCOMStartup::~ScopedXPCOMStartup() Line 1202 C++ 0x06a9e790 = NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID Other option is to post an event from either profile2 or xpcom-shutdown that will shut the database down between xpcom-shutdown and xpcom-shutdown-threads. There is a root loop spin between those too. I think it could be be even the best solution here.

Honza Bambas (:mayhemer)

Assignee

Comment 6

•

11 years ago

Tried to post an event only from xpcom-shutdown. Apparently, not the true solution either. I can see that the seer thread is being shutdown exactly from [1] and in the seer's thread shutdown nested loop my shutdown event for the local storage database is executed. [1] is the place I want my shutdown event to be fired too. So, if this main thread event loop happens a bit sooner from some reason we are still not safe from execution of a content script here. Adding just a null check will lead to actual (and unnecessary) data lose! Hmm.. thinking further, we could just let consumers of the database keep a reference to it. When it is finally released, do the shutdown thing. No events, just a flag set in profile2 and xpcom-shutdown that the database can go away when consumers are gone. Question here tho is, if all consumers are gone before the NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID check for open sqlite connections... If not, then it's a problem beyond scope of this bug (beyond localStorage back end) [1] http://hg.mozilla.org/mozilla-central/annotate/5fa70bd90a8b/xpcom/build/nsXPComInit.cpp#l879

Honza Bambas (:mayhemer)

Assignee

Comment 7

•

11 years ago

Attached patch v1 (obsolete) — Details — Splinter Review

- added a null-check with an assertion (no more crashes) - shutdown delayed as much as possible and is implemented as a post from xpcom-shutdown, see comment 5 for rational - added a flag to prevent reinitialization of the database after it has ones been shut down I don't want to implement what's mentioned in comment 6, since it is first quite complicated and second may introduce shutdown of the connection after the check point mentioned in comment 5 or still lead to data lose. So, eventually not much better then this fix.

Attachment #8402360 - Attachment is obsolete: true

Attachment #8402360 - Flags: review?(bugs)

Attachment #8402375 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 8

•

11 years ago

(In reply to Honza Bambas (:mayhemer) from comment #2) > Apparently [1][2] there is a reentrancy of the main thread loop. This > probably allows content script run from during profile-before-change > notifications. How is this possible? Who is the called of ProcessNextEvent > here? Seems like some javascript does it? Bad! Shutting down some thread would also spin the event loop

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 9

•

11 years ago

Looks like Seer thing shuts down some thread.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 10

•

11 years ago

...after some js spins the event loop first.

Honza Bambas (:mayhemer)

Assignee

Comment 11

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #10) > ...after some js spins the event loop first. Exactly! Seer shutdown is sent as an event to prevent loops so it's innocent here. Something in JS invoked (maybe just indirectly) the main thread looping during profile-before-change notifications looping was on the stack. I think it will be hard to find that piece of code and enforce not to do looping.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 12

•

11 years ago

Trying to understand why DOMStorage couldn't just handle the case when something spins the event loop.

Honza Bambas (:mayhemer)

Assignee

Comment 13

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #12) > Trying to understand why DOMStorage couldn't just handle the case when > something spins the event loop. Explanation: - in profile-before-change we shut the database thread (which also blocks until all pending writes are committed to the database) down - later, a script is executed after that point that still wants to use localStorage, this is something I really would not expect The patch tries to delay shutdown of the database to a later point, ideally between xpcom-shutdown and xpcom-shutdown-threads. However, when someone 1) loops the main thread event queue and 2) executes a script from xpcom-shutdown after we did that, we may get to the point we lose data. If it is OK that web content scripts may execute even during xpcom-shutdown (which I personally would not expect..) then we may need to impl the "Hmm..." note from comment 6.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 14

•

11 years ago

There is also http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow.cpp#4524 case.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 15

•

11 years ago

Comment on attachment 8402375 [details] [diff] [review] v1 We could do this, but for branches a null check would be nicer, IMO.

Attachment #8402375 - Flags: review?(bugs) → review+

Honza Bambas (:mayhemer)

Assignee

Comment 16

•

11 years ago

(In reply to Olli Pettay [:smaug] from comment #14) > There is also > http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsWindow. > cpp#4524 case. Hmm.. this ruins it a bit... there is no xpcom-shutdown... And in all cases [1] we invoke profile(2) data == 'shutdown-persist', so no way to recognize between exit(0) and normal shutdown. The patch is then wrong... [1] http://mxr.mozilla.org/mozilla-central/search?string=Notify.*change2&regexp=1&find=&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central

Honza Bambas (:mayhemer)

Assignee

Comment 17

•

11 years ago

Comment on attachment 8402375 [details] [diff] [review] v1 comment 15/16

Attachment #8402375 - Flags: review+ → review-

Marco Bonardo [:mak]

Comment 18

•

11 years ago

(In reply to Josh Matthews [:jdm] from comment #4) > Note: database connections need to be shutdown by the time > xpcom-shutdown-threads runs, not xpcom-shutdown. fwiw, anything using the profile (and dbs are in the profile) should start its shutdown at profile-before-change (now there is p-b-c2, but the point sticks) and likely be done by xpcom-shutdown. There's no guarantee that the openedDBs check that today happens at xpcom-shutdown-threads may have to be move earlier in future, there's still that plan to exit(0) at xpcom-shutdown around, even if there's no definitive plan for it. So we may have to move the barrier at xpcom-shutdown at a certain point.

Honza Bambas (:mayhemer)

Assignee

Comment 19

•

11 years ago

(In reply to Marco Bonardo [:mak] from comment #18) > (In reply to Josh Matthews [:jdm] from comment #4) > > Note: database connections need to be shutdown by the time > > xpcom-shutdown-threads runs, not xpcom-shutdown. > > fwiw, anything using the profile (and dbs are in the profile) should start > its shutdown at profile-before-change (now there is p-b-c2, but the point > sticks) That's exactly what localstorage does! And hence the bug, because someone is breaking this contract and tries to access localstorage past this shutdown point. > and likely be done by xpcom-shutdown. We do that too. > There's no guarantee that the > openedDBs check that today happens at xpcom-shutdown-threads may have to be > move earlier in future, there's still that plan to exit(0) at xpcom-shutdown > around, even if there's no definitive plan for it. So we may have to move > the barrier at xpcom-shutdown at a certain point. I don't see a relevance here, the current bits of exit(0) doesn't trigger xpcom-shutdown (-threads) at all as Olli mentions in comment 14: http://hg.mozilla.org/mozilla-central/annotate/e7d3a8320289/widget/windows/nsWindow.cpp#l4528 Hence r- on the patch. I also have another plan how to fix this in the DOM storage code (but I'm quit busy these days with other stuff): after p-b-c tell the database to shutdown after all references to it go away. In xpcom-shutdown block (or rather force) shutdown with some (small as possible) data lose but no crash.

Honza Bambas (:mayhemer)

Assignee

Comment 20

•

11 years ago

And by the way - this bug is indirectly caused by nsThread::Shutdown event queue spinning...

Honza Bambas (:mayhemer)

Assignee

Comment 21

•

11 years ago

Attached patch v2 - subpart of v1 (actually better nullchecks only) (obsolete) — Details — Splinter Review

Olli, this is the same as v1 (you actually has already r+'ed) except the observer notification changes. It's actually a null check + check against reinitialization of the database after shutdown.

Attachment #8413213 - Flags: review?(bugs)

Honza Bambas (:mayhemer)

Assignee

Comment 22

•

11 years ago

Attached patch v2 - null check only for branches — Details — Splinter Review

Attachment #8413215 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

11 years ago

Attachment #8413213 - Flags: review?(bugs) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

11 years ago

Attachment #8413215 - Flags: review?(bugs) → review+

Honza Bambas (:mayhemer)

Assignee

Comment 23

•

11 years ago

Thanks for quick r+'s. https://hg.mozilla.org/integration/mozilla-inbound/rev/9e4afd48a727

Honza Bambas (:mayhemer)

Assignee

Updated

•

11 years ago

Attachment #8413213 - Flags: checkin+

Honza Bambas (:mayhemer)

Assignee

Updated

•

11 years ago

Attachment #8402375 - Attachment is obsolete: true

Honza Bambas (:mayhemer)

Assignee

Updated

•

11 years ago

status-firefox28: --- → affected

status-firefox29: --- → affected

status-firefox30: --- → affected

status-firefox31: --- → affected

tracking-firefox29: --- → ?

tracking-firefox30: --- → ?

Honza Bambas (:mayhemer)

Assignee

Comment 24

•

11 years ago

Comment on attachment 8413215 [details] [diff] [review] v2 - null check only for branches [Approval Request Comment] Bug caused by (feature/regressing bug #): 600307 User impact if declined: potential shutdown crash (null dereference) Testing completed (on m-c, etc.): similar patch just landed on m-i Risk to taking this patch (and alternatives if risky): near to zero, just a null check String or IDL/UUID changes made by this patch: none

Attachment #8413215 - Flags: approval-mozilla-beta?

Attachment #8413215 - Flags: approval-mozilla-aurora?

Honza Bambas (:mayhemer)

Assignee

Comment 25

•

11 years ago

Didn't catch you on IRC, Olli, maybe for the branch patch would be better to just return NS_OK when the database is not up. This will be in parity with the regular patch and also have less impact, do you agree to update the branch patch this way?

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 26

•

11 years ago

Does it really matter? It is some unusual case. exception at least tells something went wrong.

Flags: needinfo?(bugs)

Honza Bambas (:mayhemer)

Assignee

Comment 27

•

11 years ago

Attached patch v3 — Details — Splinter Review

The patch is based on a plain m-c code, so to land this one, I will back the v2 out first. - as discussed on IRC, returns NS_ERROR_NOT_INITIALIZED - MOZ_ASSERT changed to NS_ERROR (imo better)

Attachment #8413267 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

11 years ago

Attachment #8413267 - Flags: review?(bugs) → review+

Honza Bambas (:mayhemer)

Assignee

Comment 28

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/85e868943b61 https://hg.mozilla.org/integration/mozilla-inbound/rev/565704a273f5

Honza Bambas (:mayhemer)

Assignee

Updated

•

11 years ago

Attachment #8413213 - Attachment is obsolete: true

Attachment #8413213 - Flags: checkin+

Honza Bambas (:mayhemer)

Assignee

Updated

•

11 years ago

Attachment #8413267 - Flags: checkin+

Ryan VanderMeulen [:RyanVM]

Comment 29

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/9e4afd48a727 https://hg.mozilla.org/mozilla-central/rev/85e868943b61 https://hg.mozilla.org/mozilla-central/rev/565704a273f5

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla31

Lukas Blakk [:lsblakk] use ?needinfo

Comment 30

•

11 years ago

Tracking for 30 (and will approve uplift) but this is not a viable ride-along in a 29 chemspill so we're not going to track there, it can ride for the next 6 weeks.

status-firefox31: affected → fixed

tracking-firefox29: ? → -

tracking-firefox30: ? → +

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

11 years ago

Attachment #8413215 - Flags: approval-mozilla-beta?

Attachment #8413215 - Flags: approval-mozilla-beta-

Attachment #8413215 - Flags: approval-mozilla-aurora?

Attachment #8413215 - Flags: approval-mozilla-aurora+

Honza Bambas (:mayhemer)

Assignee

Comment 31

•

11 years ago

Comment on attachment 8413215 [details] [diff] [review] v2 - null check only for branches https://hg.mozilla.org/releases/mozilla-aurora/rev/041d2a2ee631

Attachment #8413215 - Flags: checkin+

Honza Bambas (:mayhemer)

Assignee

Updated

•

11 years ago

status-firefox30: affected → fixed

Honza Bambas (:mayhemer)

Assignee

Updated

•

11 years ago

QA Contact: honzab.moz

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

status-firefox28: affected → wontfix

status-firefox29: affected → wontfix

[SV Manager] Florin Mezei, QA (:FlorinMezei)

Updated

•

11 years ago

Keywords: verifyme

Ada [:adalucinet]

Comment 32

•

11 years ago

After this fix was pushed, there are no crashes in Socorro on Fx 30 and 31 for the following signatures: [@ mozilla::dom::DOMStorageCache::Clear(mozilla::dom::DOMStorage const*) ] [@ mozilla::dom::DOMStorageCache::SetItem(mozilla::dom::DOMStorage const*, nsAString_internal const&, nsString const&, nsString&)] [@ mozilla::dom::DOMStorageCache::Clear] [@ mozilla::dom::DOMStorageCache::~DOMStorageCache()] Marking as verified based on the above results. Although, with [@ mozilla::dom::DOMStorageCache::~DOMStorageCache()] signature, there are a couple of crashes (~16) on Nightly 32.0a1: https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Adom%3A%3ADOMStorageCache%3A%3A~DOMStorageCache%28%29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&version=Firefox%3A32.0a1&hang_type=any&date=2014-05-20+08%3A00%3A00&range_value=3#tab-reports Could this be a concern?

Status: RESOLVED → VERIFIED

status-firefox30: fixed → verified

status-firefox31: fixed → verified

Flags: needinfo?(honzab.moz)

Keywords: verifyme

Johannes Pfrang [:johnp]

Reporter

Comment 33

•

11 years ago

This crashes seem to come from a single user. Also only three reports show a full stacktrace (probably because this is a shutdown crash). This is the first one (the others are equal): https://crash-stats.mozilla.com/report/index/48a8b7dd-e190-4bce-9600-4b2bb2140518 Some kind of use after free introduced by delaying the db shutdown?

Honza Bambas (:mayhemer)

Assignee

Comment 34

•

11 years ago

(In reply to Johannes Pfrang [:johnp] from comment #33) > This crashes seem to come from a single user. Also only three reports show a > full stacktrace (probably because this is a shutdown crash). This is the > first one (the others are equal): > > https://crash-stats.mozilla.com/report/index/48a8b7dd-e190-4bce-9600- > 4b2bb2140518 > > Some kind of use after free introduced by delaying the db shutdown? Thanks, filed bug 1013329 on that.

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Assignee

Comment 35

•

11 years ago

(In reply to Alexandra Lucinet, QA Mentor [:adalucinet] from comment #32) > After this fix was pushed, there are no crashes in Socorro on Fx 30 and 31 > for the following signatures: > [@ mozilla::dom::DOMStorageCache::Clear(mozilla::dom::DOMStorage const*) ] > [@ mozilla::dom::DOMStorageCache::SetItem(mozilla::dom::DOMStorage const*, > nsAString_internal const&, nsString const&, nsString&)] > [@ mozilla::dom::DOMStorageCache::Clear] > [@ mozilla::dom::DOMStorageCache::~DOMStorageCache()] > > Marking as verified based on the above results. > > Although, with [@ mozilla::dom::DOMStorageCache::~DOMStorageCache()] > signature, there are a couple of crashes (~16) on Nightly 32.0a1: > https://crash-stats.mozilla.com/report/ > list?signature=mozilla%3A%3Adom%3A%3ADOMStorageCache%3A%3A~DOMStorageCache%28 > %29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&ver > sion=Firefox%3A32.0a1&hang_type=any&date=2014-05- > 20+08%3A00%3A00&range_value=3#tab-reports > Could this be a concern? To answer, this crash (in the dtor) is there since 28 and maybe even earlier. So it's probably not related to this patch.

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: DOM → DOM: Core & HTML

You need to log in before you can comment on or make changes to this bug.