Closed Bug 1240238 Opened 9 years ago Closed 9 years ago

NS_ERROR_STORAGE_CONSTRAINT on webappsstore.sqlite breaks lots of sites

Categories

(Core :: DOM: Core & HTML, defect)

43 Branch
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox45 --- unaffected
firefox46 + fixed
firefox47 + fixed

People

(Reporter: andy+bugzilla, Assigned: mayhemer)

References

(Depends on 1 open bug)

Details

(Keywords: regression)

Attachments

(3 files, 2 obsolete files)

Found that I was having a JS error on viewing many sites: NS_ERROR_STORAGE_CONSTRAINT. I found that I had a places.sqlite db of 80 MB in my profile. I stopped Firefox, moved that out of the way and restarted and things started working again. This isn't reproducible in Nightly, so this might already be fixed, but since this broke a large number of sites (bill.com, datadoghq.com, addons.mozilla.org etc) thought it was worth reporting.
I'm not sure how a problem in places.sqlite may break sites, the only thing it does is storing history visits and marking visited links coloring, and those go through an async API (and also through xpidl with e10s enabled). It would be more likely for indexedDB or DOMStorage to break a site than places.sqlite. The side of the database doesn't matter at all, mine is even bigger and I have no problem. If you could send me the broken places.sqlite I could have a look but I'm honestly not sure what I should look at since there's no relation between places and content that may break content.
ehr, I said xpidl but I meant ipc... btw, that's minor.
Also, having a stack of that NS_ERROR_STORAGE_CONSTRAINT from the console would have been useful.
Note that the graphs are blank. Perhaps it wasn't places.sqlite, but rather webappsstore.sqlite. I moved the latter (at 78MB) out of the way and the graph re-rendered. If you can tell me how to get any more data for you, please let me know.
yep, it sounds more like DOM Storage. I think having webappsstore.sqlite and an example page where the bug happens could help. But mostly I guess the problem is webappsstore.sqlite is corrupt and we don't have a way to detect corruption on-the-fly (that is, if the database opens, we consider it good, even if it's not)
Component: General → DOM
Product: Firefox → Core
Summary: NS_ERROR_STORAGE_CONSTRAINT breaks lots of sites → NS_ERROR_STORAGE_CONSTRAINT on webappsstore.sqlite breaks lots of sites
Why are we throwing an NS_ERROR_BLAH and not some sort of DOM exception?
Good question, but without a stack it's hard to guess. If anyone can get by mail the corrupt db and try to reproduce, would be nice to see where we fail.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #6) > Why are we throwing an NS_ERROR_BLAH and not some sort of DOM exception? Because we take whatever mozStoreage spits w/o converting it to a DOM exception at [1]. This and probably [2] are the places to convert mLoadResult to anything of your liking. [1] http://hg.mozilla.org/mozilla-central/annotate/a77b73c7723e/dom/storage/DOMStorageCache.cpp#l656 [2] http://hg.mozilla.org/mozilla-central/annotate/a77b73c7723e/dom/storage/DOMStorageCache.cpp#l239
The best way is IMO to walk the 6 methods on DOMStorage class exposed to DOM and make sure aRv is converted to something DOM related. Because there may be more places then just those at comment 8, most coming from one place tho - DOMStorageDBThread::InsertDBOp. The database can be in some failure state (a mozStorage originated nsresult) which we then directly return to DOM.
even if we fix reporter errors, if this breaks websites it's not great, we likely need to detect this corruption and throw away current webappsstore.sqlite. If it's complex to do while Firefox is running, we could just set a pref, read it on startup and replace it there. I suppose all the pages using DOM Storage can recreate the missing data, right?
Something like what I suggested in bug 1125157 would be useful here. We really can't expect every consumer to implement their own Storage wrapper to catch corruption, having an onCorrupt handler on the connection would be simpler...
Depends on: 1125157
(In reply to Marco Bonardo [::mak] from comment #11) > Something like what I suggested in bug 1125157 would be useful here. > We really can't expect every consumer to implement their own Storage wrapper > to catch corruption, having an onCorrupt handler on the connection would be > simpler... Also seeing this on :rfeeley's machine here. https://arty.name/localstorage.html reports "0 items saved" into storage. Moving "webappsstore.sqlite" out of the profile directory fixes it. the SQLite size is 55mb. I'm gonna check with Ryan if he wants to share the corrupt db or not.
Okay Ryan didn't want to share the DB. If you explain the steps to trace the SQLite error we can do that on his machine when possible.
Why would this be related to IndexedDB?
The first thing that would be helpful is to break at http://mxr.mozilla.org/mozilla-central/source/storage/mozStoragePrivateHelpers.cpp#68 and get a stack.
Flags: needinfo?(vlad)
Could you please run PRAGMA integrity_check; using Sqlite Manager (or sqlite3 command line) on the db? It may be interesting to check the results. Off-hand having a corrupt db is not very useful, the only thing we could do is send it to Sqlite team to check if they can figure out something about how it got corrupted, but if it isn't shareable, we can't even do that. with this kind of corruption we could just throw at any point during any read or write, that's why I don't think fixing a specific point in code helps, unless it happens during the connection startup (but then I assume DOM Storage already has code to throw away the db if open fails).
Ryan is super busy this week with UX meetings, so just going to ni? him. Ryan when you have time could you run (on the ~/Desktop file that we backed up): ``` sqlite3 webappsstore.sqlite ``` that will show you ``` sqlite> ``` in the terminal, type in: ``` sqlite> PRAGMA integrity_check; ```
Flags: needinfo?(vlad) → needinfo?(rfeeley)
I did this in my profile: ``` oding98t.default sqlite3 webappsstore.sqlite SQLite version 3.8.5 2014-08-15 22:37:57 Enter ".help" for usage hints. sqlite> PRAGMA integrity_check; ok sqlite> ``` Here's how the first run pages looks for me in that profile: https://www.dropbox.com/s/km2do3spxs1rjgb/Screenshot%202016-01-21%2011.23.14.png?dl=0&preview=Screenshot+2016-01-21+11.23.14.png
so, it's the data that is somehow broken, doesn't look like something related to Storage. there are also reports in bug 1238354, did we change anything in DOM Storage recently?
Flags: needinfo?(rfeeley)
(In reply to Andy McKay [:andym] from comment #19) > I did this in my profile: > > ``` > oding98t.default sqlite3 webappsstore.sqlite > > SQLite version 3.8.5 2014-08-15 22:37:57 > Enter ".help" for usage hints. > sqlite> PRAGMA integrity_check; > ok > sqlite> > ``` > > Here's how the first run pages looks for me in that profile: > https://www.dropbox.com/s/km2do3spxs1rjgb/Screenshot%202016-01-21%2011.23.14. > png?dl=0&preview=Screenshot+2016-01-21+11.23.14.png Do you use this profile with different versions /builds of Firefox? Ryan used his profile with Stable and Nightly.
(In reply to Marco Bonardo [::mak] from comment #20) > so, it's the data that is somehow broken, doesn't look like something > related to Storage. > > there are also reports in bug 1238354, did we change anything in DOM Storage > recently? Yes, bug 1165214. It changes the database schema, however should be compatible in both ways mostly. I remember testing going back and forth the version before and after the patch with the same database (for both cases - created first with and w/o the patch). Everything was working for me, but something could well slipped between my fingers.
(In reply to Honza Bambas (:mayhemer) from comment #22) > (In reply to Marco Bonardo [::mak] from comment #20) > > so, it's the data that is somehow broken, doesn't look like something > > related to Storage. > > > > there are also reports in bug 1238354, did we change anything in DOM Storage > > recently? > > Yes, bug 1165214. It changes the database schema, however should be > compatible in both ways mostly. I remember testing going back and forth the > version before and after the patch with the same database (for both cases - > created first with and w/o the patch). Everything was working for me, but > something could well slipped between my fingers. However, at this time it's just on mozilla-central.
(In reply to Vlad Filippov :vladikoff from comment #21) > Do you use this profile with different versions /builds of Firefox? > Ryan used his profile with Stable and Nightly. Yes I switch between Stable and Nightly frequently on the same profile.
(In reply to Andy McKay [:andym] from comment #19) > I did this in my profile: > > ``` > oding98t.default sqlite3 webappsstore.sqlite > > SQLite version 3.8.5 2014-08-15 22:37:57 > Enter ".help" for usage hints. > sqlite> PRAGMA integrity_check; > ok > sqlite> > ``` > > Here's how the first run pages looks for me in that profile: > https://www.dropbox.com/s/km2do3spxs1rjgb/Screenshot%202016-01-21%2011.23.14. > png?dl=0&preview=Screenshot+2016-01-21+11.23.14.png Thanks Andy, we have been trying to track down the cause of this and related errors that are reported to FxA metrics. FxA from the firstrun flow also reports: * NS_ERROR_STORAGE_IOERR * NS_ERROR_FILE_NO_DEVICE_SPACE * NS_ERROR_FILE_CORRUPTED * NS_ERROR_STORAGE_CONSTRAINT * NS_ERROR_FAILURE * NS_ERROR_STORAGE_BUSY * NS_ERROR_OUT_OF_MEMORY * SecurityError (This operation is insecure) Some of these errors occur several thousand times an hour, the main reported browsers are Fx 41->Fx 43. What I find confusing about these errors is users should only see the firstrun flow when they create a new profile. A corrupt sqlite file with a new profile is hard to understand.
I recently experienced this error, NS_ERROR_STORAGE_CONSTRAINT, while attempting to access https://accounts.firefox.com. Additionally, I often switch between Nightly and Stable builds.
Attached file o6y6ubjz.qwqeoroe.zip
Attached o6y6ubjz.qwqeoroe.zip is a test profile with the broken storage from /Users/[username]/Library/Application Support/Firefox/Profiles Open any site that uses local storage with that profile (such as https://accounts.firefox.com/) and it should throw the error.
See comment above
Flags: needinfo?(mak77)
Flags: needinfo?(khuey)
Flags: needinfo?(khuey) → needinfo?(honzab.moz)
I think I screwed up something with the database update code in bug 1165214. Going to try the test profile.
Assignee: nobody → honzab.moz
Blocks: 1165214
Flags: needinfo?(honzab.moz)
Honza, I suspect the problem is you removed a UNIQUE constraint (scope, key) in the new schema, that means now you can have multiple rows having the same (scope, key), that means your previous schema version, trying to do this: "CREATE UNIQUE INDEX IF NOT EXISTS scope_key_index" " ON webappsstore2(scope, key)")); will now fail. That means you cannot use the existence of that index to detect downgrades. that means InitDatabase fails in the old version. What do we do in such a case? replace the db? From a quick look sounds like we just bail out? Btw, since you are already looking into this I didn't dig too deep, this is just what I noticed from comparing that broken db with mine.
Flags: needinfo?(mak77)
SELECT count(*), scope, key FROM webappsstore2 GROUP BY scope, key HAVING count(*) > 1; looks like accounts.firefox.com falls into the multiple scope, key case.
So, what's happening here (with the provided profile): - the database has been apparently updated to schema 1, but the origin_key_index is missing from a reason I don't fully understand ; we don't try to create it again when already updated to schema 1 - in Fx versions w/o bug 1165214 (=schema 0) we fail to create scope_key_index index -> hence the storage failures bubbling up to DOM - to fail the index creating is only possible when the scope+key pairs are not unique (I found 22 duplicates for "moc.xoferif.stnuocca.:https:443" and "crosstab.MESSAGE_KEY") - that may happen only when the data were added to the database while there already were not a unique index (with bug 1165214 (=schema 1) it's origin_key_index) I'm not sure how that happened, since we don't allow writes to the database when we fail any part of the open and update process, half-updated tables should be threw away as all is happening in a transaction and is rolled back on a failure. I suspect we failed the update but the transaction didn't rollback correctly and left the schema broken. I know about a certain problem that we loose uniqueness when origin attributes stored with schema 1 expands beyond what the "scope" column can store (which is only appid and inbrowser). It might be this case. Has user context already landed on m-c? If so, Vlad, have you been using it? The fix here is: - preserve uniqueness of scope to allow schema 1 -> schema 0 -> schema 1 jumps be fully handled - try to create the index and on a failure just drop the database and start over with an empty one
(In reply to Marco Bonardo [::mak] from comment #31) > SELECT count(*), scope, key FROM webappsstore2 GROUP BY scope, key HAVING > count(*) > 1; > > looks like accounts.firefox.com falls into the multiple scope, key case. Why are you wasting time when I'm already looking into this?
(In reply to Honza Bambas (:mayhemer) from comment #33) > Why are you wasting time when I'm already looking into this? I'm sorry, I just spent a few minutes looking at the sqlite file, just to be sure it was not related to Storage and noticed that discrepancy with indices.
(In reply to Marco Bonardo [::mak] from comment #34) > (In reply to Honza Bambas (:mayhemer) from comment #33) > > Why are you wasting time when I'm already looking into this? > > I'm sorry, I just spent a few minutes looking at the sqlite file, just to be > sure it was not related to Storage and noticed that discrepancy with indices. Aha. Understand. It just looked to me a bit like we are doing the same thing :)
Attached patch v1 (obsolete) — Splinter Review
Aim is to preserve both data and functionality. - the patch enforces creation of tables and indexes, on failure we throw the database away and start over - the |scope| column still present for backward compat can tho only keep appid and inbrowser and not any other attributes that may be set on the principal of a storage - hence, I store the whole origin attributes suffix as another part of the |scope| string - the old code will just ignore it and will not find the records (which is actually pretty correct) while the uniqueness is preserved - when updating forward after the schema 1 database has been used on schema 0 code, we fully restore scoping information from the |scope| column now Roughly tested. https://treeherder.mozilla.org/#/jobs?repo=try&revision=e590ecd26d07
Attachment #8711768 - Flags: review?(bugs)
Comment on attachment 8711768 [details] [diff] [review] v1 ># HG changeset patch ># User Honza Bambas <honzab.moz@firemni.cz> ># Parent 23937b60386e7cd0c86b7f73eef098c71fc6675f >Bug 1240238 - Full DOMStorage schema 1 scope update uniqueness, r=smaug > >diff --git a/dom/storage/DOMStorageDBThread.cpp b/dom/storage/DOMStorageDBThread.cpp >--- a/dom/storage/DOMStorageDBThread.cpp >+++ b/dom/storage/DOMStorageDBThread.cpp >@@ -60,16 +60,39 @@ Scheme0Scope(DOMStorageCacheBridge* aCac > > if (oa.mAppId != nsIScriptSecurityManager::NO_APP_ID || oa.mInBrowser) { > result.AppendInt(oa.mAppId); > result.Append(':'); > result.Append(oa.mInBrowser ? 't' : 'f'); > result.Append(':'); > } > >+ // If there is more than just appid and/or inbrowser stored in origin >+ // attributes, put it to the schema 0 scope as well. We must do that >+ // to keep the scope column unique (same resolution as schema 1 has >+ // with originAttributes and originKey columns) so that switch between >+ // schema 1 and 0 always works in both ways. >+ nsAutoCString remaining; >+ oa.mAppId = 0; >+ oa.mInBrowser = false; >+ oa.CreateSuffix(remaining); >+ if (!remaining.IsEmpty()) { >+ MOZ_ASSERT(!suffix.IsEmpty()); >+ >+ if (result.IsEmpty()) { >+ // Must contain the old prefix, otherwise we won't search for the whole >+ // origin attributes suffix. >+ result.Append(NS_LITERAL_CSTRING("0:f:")); >+ } >+ // Append the whole suffix despite we have already stored appid and browser. >+ // This always starts with '^' and is then easy to grab and use. >+ result.Append(suffix); >+ result.Append(':'); >+ } Please add some examples where what kinds of strings we're playing with here. Otherwise this and the changes to ExtractOriginData are very hard to read. (...and it took enough time to figure this out that I'd like to see a new patch with the comments fixed.) >+ // If the profile was running schema 1 -> schema 0 -> schema 1 switching >+ // we may have stored the full attributes when there were more than just >+ // appid and inbrowser set on OriginAttrbutes. OriginAttributes >+ // To preserve full uniqueness we store such attributes to the scope key. >+ // Schema 0 code will just ignore it while keeping the scoping unique. Please add some example what kinds of strings we're trying to tokenize here.
Attachment #8711768 - Flags: review?(bugs) → review-
Attached patch v1.1 (obsolete) — Splinter Review
Attachment #8712118 - Flags: review?(bugs)
Attachment #8711768 - Attachment is obsolete: true
Comment on attachment 8712118 [details] [diff] [review] v1.1 >+ // If the profile went through schema 1 -> schema 0 -> schema 1 switching >+ // we may have stored the full attributes origin suffix when there were >+ // more than just appid and inbrowser set on storage principal's >+ // OriginAttributes. >+ // >+ // To preserve full uniqueness we store the suffix to the scope key. >+ // Schema 0 code will just ignore it while keeping the scoping unique. >+ // >+ // The OriginAttributes suffix is a string in a form like: >+ // "^addonId=101&userContextId=5" and it's ensured it always starts with '^' >+ // and never contains ':'. See OriginAttributes::CreateSuffix. >+ Record(); >+ if (CheckChar('^')) { >+ Token t; >+ while (Next(t)) { >+ if (t.Equals(Token::Char(':'))) { >+ Claim(suffix); >+ break; >+ } So the comment still doesn't explain why we're looking at : here. Could you add something about the stuff after : being non-origin attributes part of the string, or it being OriginNoSuffix or something. It would be still nice to have also somewhere an example showing what kinds of string can be stored here, I mean the whole string, not just some substring. (it is unfortunate to need to play with strings here and not with some objects.)
Attachment #8712118 - Flags: review?(bugs) → review+
Attached patch v1.2Splinter Review
Attachment #8712118 - Attachment is obsolete: true
Attachment #8712739 - Flags: review+
Keywords: checkin-needed
(In reply to Honza Bambas (:mayhemer) from comment #40) > Created attachment 8712739 [details] [diff] [review] > v1.2 Now that it landed does this mean that if you update Nightly this error should never happen? or would it require uplifts to older versions?
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
(In reply to Vlad Filippov :vladikoff from comment #42) > (In reply to Honza Bambas (:mayhemer) from comment #40) > > Created attachment 8712739 [details] [diff] [review] > > v1.2 > > Now that it landed does this mean that if you update Nightly this error > should never happen? or would it require uplifts to older versions? We cannot do uplift here. It's dependent on heavy changes to the database that are currently on m-c only. I think it's not that much standard people switch on the same profile between nightly and release. The DOM error should never show up again with nightly. One solution for release is that a user runs the profile with nightly having this patch and then reverts to release (or any non-nightly branch).
Comment on attachment 8712739 [details] [diff] [review] v1.2 Approval Request Comment [Feature/regressing bug #]: 1165214 [User impact if declined]: lot of site breakage (dom throws errors where devs usually don't expect exceptions leading to inability to use many web services) [Describe test coverage new/current, TreeHerder]: lot of domstorage tests [Risks and why]: N/A, probably very low [String/UUID change made/needed]: none
Attachment #8712739 - Flags: approval-mozilla-aurora?
Tracking since this is a regression (landed in m-c, 2016-01-06)
In comment 44 you say "we cannot do uplift here" but then you request uplift in comment 45. Did you change your mind or are you talking about something different in comment 44?
Flags: needinfo?(honzab.moz)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #47) > In comment 44 you say "we cannot do uplift here" Up to beta and further. But we can do an uplift to aurora now (since the base patch for bug 1165214 is now on aurora. > but then you request uplift > in comment 45. Did you change your mind or are you talking about something > different in comment 44?
Flags: needinfo?(honzab.moz)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #47) > In comment 44 you say "we cannot do uplift here" Up to beta and further and at that time even to aurora. But we can do an uplift to aurora now (since the base patch for bug 1165214 is now on aurora). > but then you request uplift > in comment 45. Did you change your mind or are you talking about something > different in comment 44? as above.
Comment on attachment 8712739 [details] [diff] [review] v1.2 Fix for a dataloss regression, please uplift to aurora.
Attachment #8712739 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: