Closed Bug 1244038 Opened 4 years ago Closed 4 years ago

Recent update breaks localStorage on all sites that use it for some users with no error feedback

Categories

(Toolkit :: Storage, defect, critical)

46 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox45 --- unaffected
firefox46 blocking fixed
firefox47 blocking fixed

People

(Reporter: kael, Assigned: mayhemer)

References

()

Details

(Keywords: dataloss, regression)

Attachments

(4 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0
Build ID: 20160128004012

Steps to reproduce:

Sometime in the last couple days an update pushed to Developer Edition channel that breaks localStorage. The symptoms are similar to bug #980875 and it produces errors that are virtually impossible to debug:

NS_ERROR_FILE_CORRUPTED:  www-upload.js:1780:0
TypeError: yt.www.metadataeditor is undefined
<anonymous>
 upload:4563
 upload:4563:5

Currently for me this is affecting Slack, Youtube, etc. In these cases the entire site or key portions of it fail to load and function.

The problem appears to exist in the webappsstore.sqlite file. If I copy it and the associated .sqlite-wal file over to a new profile the bug will reproduce in the same websites.

Previously this bug was only appearing in Nightly, and I could switch back over to Developer Edition for the bug to go away (same profile). So this is definitely a bug that was introduced recently.

I can provide my webappsstore.sqlite file for reproduction purposes but I'm not going to attach it to the bug since it contains some of my personal data and information.
Easy reproduction step: Visit the youtube uploader (URL attached to bug) while logged in to a google account. If you have this bug the uploader will fail to load and produce a script error (the stack above in the report).

This basically means I can't use Firefox as my browser for most purposes right now.

It's really unfortunate that the feedback and error handling code for this problem is so terrible given that people have been reporting general forms of it since 2014. It's impossible for an ordinary user to figure out and it has no reason to be that bad.
Severity: normal → critical
The webappsstore.sqlite file opens correctly in SQLiteSpy and the table contains all the localStorage entries I expect to see. Doing a PRAGMA integrity_check produces the following warnings:

integrity_check
*** in database main ***
On tree page 344 cell 2: 2nd reference to page 294
row 5075 missing from index scope_key_index
row 5076 missing from index scope_key_index
row 5077 missing from index scope_key_index
row 5078 missing from index scope_key_index
row 5079 missing from index scope_key_index
row 5080 missing from index scope_key_index
row 5081 missing from index scope_key_index
non-unique entry in index scope_key_index
non-unique entry in index scope_key_index
non-unique entry in index scope_key_index
non-unique entry in index scope_key_index
non-unique entry in index scope_key_index
non-unique entry in index scope_key_index
non-unique entry in index scope_key_index
non-unique entry in index scope_key_index
non-unique entry in index scope_key_index
wrong # of entries in index scope_key_index

From digging around using a SQL query, the duplicate entry appears to be the localStorage key '__DM__:latestNotificationTimestamp' for twitter.com:443. If I go in and delete all copies of that key (there are 11 rows), then drop and recreate the index, the database file is treated as valid by Firefox again.
SQLite 3.10.2 rode the train to aurora/dev edition (http://hg.mozilla.org/releases/mozilla-aurora/file/tip/db/sqlite3/src/sqlite3.h)

From the integrity check, this sounds more like a SQLite internal error than a bug in our use of SQLite, but maybe I'm misunderstanding the error.  Either way, drh will know!
Flags: needinfo?(drh)
Also, :kael, thanks for reporting the problem and doing the excellent detective work and finding a specific reproduction case and generally being awesome!  This breakage does indeed massively suck but your invaluable assistance should help us get it fixed ASAP.
:mayhemer, assuming you're in charge of the current DOM storage implementation, as a heads-up, this may need a spin-off recovery bug in DOM Storage depending on how transparently mozStorage is able to recover from this problem.  (mozStorage currently has no mechanism for doing things once per upgrade, although it could grow such functionality.  Like if we need to VACUUM everything to cause indexes to be rebuilt, etc.)
Flags: needinfo?(honzab.moz)
QA: it'd be great to have a (non-)repro of this using a current aurora/dev edition build as well as one that pre-dates the 3.10.2 SQLite landing/uplift.  (It landed on trunk on bug 1241121 on the 21st, and I don't know when the Aurora cut-over happened.)

:kael, can you confirm that you don't have any antivirus or anything else that might try and muck with disk accesses?  Thanks!
Flags: needinfo?(kg)
Keywords: qaurgent, qawanted
Broken by bug 1165214 (landed on 46).  Fixed in bug 1240238 (landed on 47).  Asking for uplift on that one.


Anyone commented here: do any of you have backup of webappsstore.sqlite before update to developer edition?  Before this error has appeared?  I would be very interested to check what part of the update is breaking.  I'm afraid the fix in bug 1240238 will make this work but in cost of loosing the data.  I would really like to avoid that.

(In reply to Andrew Sutherland [:asuth] from comment #5)
> :mayhemer, assuming you're in charge of the current DOM storage
> implementation.

No longer.  Any new issues would be directed to Jan Varga please.  I can only help diagnose and review if necessary.
Kael: do you have backup of webappsstore.sqlite before update to developer edition?  Before this error has appeared?
Flags: needinfo?(honzab.moz)
(In reply to Andrew Sutherland [:asuth] from comment #6)
> QA: it'd be great to have a (non-)repro of this using a current aurora/dev
> edition build as well as one that pre-dates the 3.10.2 SQLite
> landing/uplift.  (It landed on trunk on bug 1241121 on the 21st, and I don't
> know when the Aurora cut-over happened.)

I think STR could be switch of using Aura and Beta/Release on the same profile.  I feel this bug confirms there is something more to investigate.

I would really appreciate help here (my local testing was apparently not enough).

If we don't catch this it may cause many users loose their local storage data as this rides the trains.

Thanks.

> 
> :kael, can you confirm that you don't have any antivirus or anything else
> that might try and muck with disk accesses?  Thanks!

I don't think it is an issue.  I screwed up the update process in bug 1165214 simply, but I'll need more info to figure out the problem.
Removing drh ping.  Apologies and sorry about the confusion.  I was aware of the context of our upgrade to 3.10.2 but not the DOM storage bug.  I'll start watching the Core::DOM component too.  Removing your cc momentarily.
Flags: needinfo?(drh)
The SQLite developers are all still in panic mode because an SQLite database should never fail an integrity test, unless you are somehow bypassing SQLite APIs to modify the database file.  Errors in schemas and/or SQL should never cause corruption.  We are very interested to know how this corruption came about.

Or maybe I'm misunderstanding the problem here?
Tracking since this is a regression. I am tentatively marking this as a blocker, also, because it sounds potentially very bad for users. 
 
Andrei, can someone from your team work with Honza to do more testing?   See comment 7.
Honza, what might help? Do you want QA to look for more test cases, find sites where this breaks?
Flags: needinfo?(honzab.moz)
Flags: needinfo?(andrei.vaida)
The DOM Storage regression that Honza pointed out is likely the reason some sites stop working when reusing the profile in different builds. But that causes a CONSTRAINT error.

Here we have an index corruption that can likely be fixed by issuing a REINDEX on the database.
But it should generally not cause a CORRUPT error.

It may be just a coincidence, probably the bug happens due to the regression and the corruption was there silent.

In general the problem we have is still that we miss an API to notify consumers when their db gets corrupt, that is bug 1125157. And then DOM Storage should hook into it.
We don't have a good history of dinamically reacting to db corruptions that cause only some calls to return SQLITE_CORRUPT when the db can be opened normally.
(In reply to D. Richard Hipp from comment #11)
> The SQLite developers are all still in panic mode because an SQLite database
> should never fail an integrity test, unless you are somehow bypassing SQLite
> APIs to modify the database file.  Errors in schemas and/or SQL should never
> cause corruption.  We are very interested to know how this corruption came
> about.

I don't think we are in an emergency regarding SQLite.
It's hard to tell how this corrupted, here we use WAL with synchronous normal, so I can imagine there are cases where a corruption is possible.
Our problem is that most of our components detect corruption only when opening a database, but I noticed in years there are various cases where the DB opens properly but some specific query returns a SQLITE_CORRUPT. we are not yet handling that properly.
So, please release the panic mode button for now.
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #12)
> Honza, what might help? Do you want QA to look for more test cases, find
> sites where this breaks?

Primarily, anyone experiencing this issue having a backup of webappsstore.sqlite* files BEFORE 2016-01-06 (date bug 1165214 has landed) so that I can try to reproduce.

If that is not available, then someone from QA to try on a previously heavily used profile 1) back webappsstore.sqlite* files up 2) do switching between aurora and beta (or aurora and release) and try to reproduce the corruption.  Maybe also keeping the db files snapshot between each update.  Then hand me the database files and probably the STR (I believe it will finally be very simple)
Flags: needinfo?(honzab.moz)
The error and warning log of SQLite (https://www.sqlite.org/errlog.html) will issue a callback whenever a CORRUPT error is seen.  There is no overhead in the common case where no errors occur.  The same mechanism also logs all other kinds of errors and warnings, but you could - if you want - ignore the rest.  I don't find the SQLITE_CONFIG_LOG symbol anywhere in the Nightly source tree, which suggests to me that you are not currently using this feature.  Would it help if you did?
I concur with :mak, I think the database corruption was probably latent.  The DOMStorage use of mozStorage and SQLite looks like it just ignores errors (see https://dxr.mozilla.org/mozilla-central/source/dom/storage/DOMStorageDBThread.cpp).  (Compare with the cookie service which nukes the database anytime it sees *any* errors including a full disk.)

In fact, it looks like the DOMStorage implementation will handle errors reasonably well (the requests will still be closed out rather than left hanging).  Since the DOMStorage implementation caches the writes and keeps them in memory for 20 seconds (or whatever) after the origin goes away, in this specific case, Twitter probably never actually notices or is affected by the fact that its storage is not actually persistent.
(In reply to Andrew Sutherland [:asuth] from comment #17)
> I concur with :mak, I think the database corruption was probably latent. 
> The DOMStorage use of mozStorage and SQLite looks like it just ignores
> errors (see
> https://dxr.mozilla.org/mozilla-central/source/dom/storage/
> DOMStorageDBThread.cpp).

Isn't 

    rv = stmt->Execute();
    NS_ENSURE_SUCCESS(rv, rv);

enough?
Sorry, I phrased that badly.  What I meant by ignore is that upon encountering an error from the database, DOMStorage will return an error and keep on using the database without nuking it.  This means that we expect the database to accumulate errors over time until the database gets so corrupted that corruption is reported when the database is initially opened.  This is also consistent with the corruption having to do with twitter but the repro site being youtube.

In contrast, the cookie service will nuke the database at the first sign of corruption (and try and roll back to a backup, I think?).  So if we see any corruption in a cookie database, it's much more likely that it's new, deterministic corruption.
(And by "accumulate errors", I mean that, per mak, it sounds like our fsync-settings for this database mean that we have tied SQLite's hands and that corruption is possible, although ideally occurring at an extremely low rate.)
Now I understand, thanks.  File a bug to introduce the nuke/restore?  But I will definitely won't work on it...
(In reply to Andrew Sutherland [:asuth] from comment #19)
> What I meant by ignore is that upon
> encountering an error from the database, DOMStorage will return an error and
> keep on using the database without nuking it.  This means that we expect the
> database to accumulate errors over time until the database gets so corrupted
> that corruption is reported when the database is initially opened.

Yep, but I don't think it's an acceptable solution to have every consumer implement differently the corruption handling, rather than having a single callback that tells them "your db is likely broken!".
Also Places doesn't throw away the db at any sign of error.
I didn't know the cookie service was doing that now, it's interesting, but also sort-of overkill, since it's considering the db corrupt for ANY error rather than just SQLITE_CORRUPT. This could also explain why recently I kept losing cookies every other week.
(In reply to Marco Bonardo [::mak] from comment #22)
> Yep, but I don't think it's an acceptable solution to have every consumer
> implement differently the corruption handling, rather than having a single
> callback that tells them "your db is likely broken!".

I agree.  Just wanted to call out that the current behavior meant that the corruption could absolutely be long-standing from a system crash/bad shutdown/etc.

> Also Places doesn't throw away the db at any sign of error.
> I didn't know the cookie service was doing that now, it's interesting, but
> also sort-of overkill, since it's considering the db corrupt for ANY error
> rather than just SQLITE_CORRUPT. This could also explain why recently I kept
> losing cookies every other week.

Yes, see https://bugzilla.mozilla.org/show_bug.cgi?id=1237527#c1.
(In reply to Honza Bambas (:mayhemer) from comment #8)
> Kael: do you have backup of webappsstore.sqlite before update to developer
> edition?  Before this error has appeared?

I have a backup from January 20, but sadly that is after I ran Nightly once on my profile. So if it made any changes to the database file they are already applied to the backup.

The backup is still in the 'corrupt' state though so I can share it with you if it's useful for reproduction.
Flags: needinfo?(kg)
(In reply to Honza Bambas (:mayhemer) from comment #15)
> If that is not available, then someone from QA to try on a previously
> heavily used profile 1) back webappsstore.sqlite* files up 2) do switching
> between aurora and beta (or aurora and release) and try to reproduce the
> corruption.  Maybe also keeping the db files snapshot between each update. 
> Then hand me the database files and probably the STR (I believe it will
> finally be very simple)
I've first reproduced the problem (2), then I've backed up the files (1), hope the order is not a problem.
The bug is not 100% reproducible, I've reproduce it while switching several times between latest nightly, latest aurora, Nightly 2016-01-05 and FF 44 release.
On FF 44 I'm getting "NS_ERROR_STORAGE_CONSTRAINT" (bug 1240238).
In latest nightly 47.0a1 (2016-01-31) I'm getting "NS_ERROR_FAILURE."
I'm on Win 7 x64.

STR:
1. Login to google
2. Open https://www.youtube.com/upload
Optional: 3. Click to "create channel" if you're trying to upload files to youtube for the first time
Actual results:
The youtube uploader if frozen - see attached.
Error console log:
"NS_ERROR_FAILURE:
TypeError: yt.www.metadataeditor is undefined"
Flags: needinfo?(andrei.vaida)
(In reply to Paul Silaghi, QA [:pauly] from comment #26)
> Created attachment 8714277 [details]
> webappsstore.sqlite.zip

FWIW: That database is empty.  It contains no content, not even a schema.  The page size is 32768 bytes.  There has never been any write operation against the database, not even a CREATE TABLE.  The database file was created by SQLite version 3.10.2.
(In reply to D. Richard Hipp from comment #27)
> FWIW: That database is empty.
So, what does this mean, considering my results in comment 25?
(In reply to Paul Silaghi, QA [:pauly] from comment #25)
> Created attachment 8714276 [details]
> frozen youtube uploader.png
> 
> (In reply to Honza Bambas (:mayhemer) from comment #15)
> > If that is not available, then someone from QA to try on a previously
> > heavily used profile 1) back webappsstore.sqlite* files up 2) do switching
> > between aurora and beta (or aurora and release) and try to reproduce the
> > corruption.  Maybe also keeping the db files snapshot between each update. 
> > Then hand me the database files and probably the STR (I believe it will
> > finally be very simple)
> I've first reproduced the problem (2), then I've backed up the files (1),
> hope the order is not a problem.

The order is wrong.  We needed a database BEFORE it's been broken and then steps that break it.  It's not simple, but when you are doing backups incrementally every few steps (keeping versions) we can get the last unbroken db and the simplest steps to break it.

So, no, sorry, this is not what I needed from QA.  Anyway, thanks for the effort...

> The bug is not 100% reproducible, I've reproduce it while switching several
> times between latest nightly, latest aurora, Nightly 2016-01-05 and FF 44
> release.
> On FF 44 I'm getting "NS_ERROR_STORAGE_CONSTRAINT" (bug 1240238).
> In latest nightly 47.0a1 (2016-01-31) I'm getting "NS_ERROR_FAILURE."

That's interesting, something even new with the first fix (bug 1240238).

> I'm on Win 7 x64.
> 
> STR:
> 1. Login to google
> 2. Open https://www.youtube.com/upload
> Optional: 3. Click to "create channel" if you're trying to upload files to
> youtube for the first time
> Actual results:
> The youtube uploader if frozen - see attached.
> Error console log:
> "NS_ERROR_FAILURE:
> TypeError: yt.www.metadataeditor is undefined"

I'll look at the database.

BTW, have you uploaded all files?  there may also be webappsstore.sqlite-shm and webappsstore.sqlite-wal files side by the webappsstore.sqlite file.  We need all of them.
(In reply to Paul Silaghi, QA [:pauly] from comment #28)
> (In reply to D. Richard Hipp from comment #27)
> > FWIW: That database is empty.
> So, what does this mean, considering my results in comment 25?

See end of comment 29.
Attached file webappsstore.zip
Ok, I tried again.
Created a new profile with nightly 2016-01-05, then switched to FF 44 and Aurora 46.0a2 (2016-01-31) several times. I've made backups on every switch, until I reproduced the problem.
See the backups attached, the last backup is the one that reproduced the problem.
Followed the STR in comment 25.
(In reply to Paul Silaghi, QA [:pauly] from comment #31)
> Ok, I tried again.
> Created a new profile with nightly 2016-01-05, then switched to FF 44 and
> Aurora 46.0a2 (2016-01-31) several times. I've made backups on every switch,
> until I reproduced the problem.
> See the backups attached, the last backup is the one that reproduced the
> problem.

The databases seem to be well-formed (no corruption).  Snapshots 8 and 9 seem to be identical.  The last difference was from 7 to 8, which contained these changes:

DROP INDEX origin_key_index;
DROP INDEX scope_key_index;
INSERT INTO webappsstore2(rowid,originAttributes,originKey,scope,"key",value) VALUES(6,'','moc.elgoog.5stneilc.:https:443','moc.elgoog.5stneilc.:https:443','tr-drt','{"nrt":1454360396650,"nrt-e":1454360396650,"nrt-r":18420000}');
INSERT INTO webappsstore2(rowid,originAttributes,originKey,scope,"key",value) VALUES(7,'','or.elgoog.www.:https:443','or.elgoog.www.:https:443','epbar::impc','1');
INSERT INTO webappsstore2(rowid,originAttributes,originKey,scope,"key",value) VALUES(8,'','or.elgoog.www.:https:443','or.elgoog.www.:https:443','og-up-OGPL','5082229');
UPDATE webappsstore2 SET originKey='moc.elgoog.sulp.:https:443', scope='moc.elgoog.sulp.:https:443', "key"='nts.ver.0', value='1' WHERE rowid=9;
UPDATE webappsstore2 SET value='{"data":"[]","expiration":1454428382204,"creation":1454341982204}' WHERE rowid=10;
UPDATE webappsstore2 SET originKey='moc.elgoog.sulp.:https:443', scope='moc.elgoog.sulp.:https:443', "key"='nts.cnt.0', value='0,0,0,false' WHERE rowid=11;
INSERT INTO webappsstore2(rowid,originAttributes,originKey,scope,"key",value) VALUES(12,'','moc.ebutuoy.www.:https:443','moc.ebutuoy.www.:https:443','yt-remote-online-screens','{"data":"[]","expiration":1454342052205,"creation":1454341992205}');
INSERT INTO webappsstore2(rowid,originAttributes,originKey,scope,"key",value) VALUES(14,'','moc.ebutuoy.www.:https:443','moc.ebutuoy.www.:https:443','yt-remote-online-screens','{"data":"[]","expiration":1454342072430,"creation":1454342012430}');
INSERT INTO webappsstore2(rowid,originAttributes,originKey,scope,"key",value) VALUES(15,'','moc.ebutuoy.www.:https:443','moc.ebutuoy.www.:https:443','yt-remote-connected-devices','{"data":"[]","expiration":1454428412431,"creation":1454342012431}');
INSERT INTO webappsstore2(rowid,originAttributes,originKey,scope,"key",value) VALUES(16,'','moc.ebutuoy.www.:https:443','moc.ebutuoy.www.:https:443','yt-remote-online-screens','{"data":"[]","expiration":1454342082431,"creation":1454342022431}');
INSERT INTO webappsstore2(rowid,originAttributes,originKey,scope,"key",value) VALUES(17,'','moc.ebutuoy.www.:https:443','moc.ebutuoy.www.:https:443','google-upload::stats','{}');
(In reply to Paul Silaghi, QA [:pauly] from comment #31)
> Created attachment 8714334 [details]
> webappsstore.zip
> 
> Ok, I tried again.
> Created a new profile with nightly 2016-01-05, then switched to FF 44 and
> Aurora 46.0a2 (2016-01-31) several times. I've made backups on every switch,
> until I reproduced the problem.
> See the backups attached, the last backup is the one that reproduced the
> problem.
> Followed the STR in comment 25.

Thanks!  This is what I needed!  Great work!

So, what you did was:
#1 - #5: only ever run on release or aurora w/o bug 1165214 
#6 - ran on Nightly
#7 - back to release (it's still correct)
#8 - forward to Nightly => here we break!


I was able to reproduce the update problem (copying the #7 db to a profile and then run Nightly w/o bug 1240238 on it).  

It's kinda weird to me.  The updater fails to create the index.  From a reason I don't fully understand, since I went over the index creation statement it in the debugger.  After the update transaction is committed (no errors) the index is missing on the table.  On next Nightly (and probably Release too) run, the index creation fails for hard -> localStorage starts to throw.

What we do during the update (when there is schema version == 1 and the scope_index exists):
- we start a deferred transaction
- we create-if-not-exists webappsstore2 (with the old schema) ; this is a no-op
- we create-if-not-exists the scope_key index on it ; this is also a no-op
- we rename webappsstore2 to webappsstore2_old, this should effectively also move the index to no longer be on webappsstore2 but rather be on webappsstore2_old
- we create-if-not-exists webappsstore2 with the current schema (now it creates a new table)
- we create-if-not-exists the origin_key index on it <= THIS UNEXPECTEDLY FAILS
- we INSERT from _webappsstore2_old to webappsstore2 (all data)

Now the table is left w/o the uniqueness index, non-unique pairs are created and on the next run we fail origin_key index creation that puts the storage to a state throwing errors.


Solution: simplest that comes to my mind is to drop all indexes from the table before we rename it.
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attached patch v1Splinter Review
- drop the indexes from webappsstore2 before it's renamed
Attachment #8714448 - Flags: review?(mak77)
(In reply to Honza Bambas (:mayhemer) from comment #33)
> 
> What we do during the update (when there is schema version == 1 and the
> scope_index exists):
> - we start a deferred transaction
> - we create-if-not-exists webappsstore2 (with the old schema) ; this is a
> no-op
> - we create-if-not-exists the scope_key index on it ; this is also a no-op
> - we rename webappsstore2 to webappsstore2_old, this should effectively also
> move the index to no longer be on webappsstore2 but rather be on
> webappsstore2_old

Correct.  But, while the indexes now point to the renamed table, the indexes themselves are not renamed.  So if you try to create a new index on the new table with the same name as the old index, the CREATE INDEX will fail.  Is that what is happening?

> - we create-if-not-exists webappsstore2 with the current schema (now it
> creates a new table)
> - we create-if-not-exists the origin_key index on it <= THIS UNEXPECTEDLY
> FAILS
(In reply to Honza Bambas (:mayhemer) from comment #33)
> What we do during the update (when there is schema version == 1 and the
> scope_index exists)

IIUC, this means you have table with schema 1 and you have both scope_key_index and origin_key_index

> - we rename webappsstore2 to webappsstore2_old, this should effectively also
> move the index to no longer be on webappsstore2 but rather be on
> webappsstore2_old

and this also moves origin_key_index

> - we create-if-not-exists webappsstore2 with the current schema (now it
> creates a new table)
> - we create-if-not-exists the origin_key index on it <= THIS UNEXPECTEDLY
> FAILS

As drh said, it'd be because we already have a different index with the same name

Does this make sense?
note that when at the end you drop the old table, the related indexes will also be removed and leave no trace, that basically leaves you with the new table and without the index.
Attachment #8714448 - Flags: review?(mak77) → review+
(In reply to D. Richard Hipp from comment #35)
> > - we rename webappsstore2 to webappsstore2_old, this should effectively also
> > move the index to no longer be on webappsstore2 but rather be on
> > webappsstore2_old
> 
> Correct.  But, while the indexes now point to the renamed table, the indexes
> themselves are not renamed.  So if you try to create a new index on the new
> table with the same name as the old index, the CREATE INDEX will fail.  Is
> that what is happening?

No.  Apparently (or only way how I can explain that to myself) is that the "IF NOT EXISTS" returns |false|.  At least the statement behaves like that would be exactly happening.  sqlite believes the index still exists, even though the table it belongs to has been renamed (inside a pending transaction).  Is the transaction a problem?  Does IF NOT EXISTS look at records outside the transaction?

The statement to create the index is:

CREATE UNIQUE INDEX IF NOT EXISTS origin_key_index ON webappsstore2(originAttributes, originKey, key)

where the table webappsstore2 has just been created a line before with:

CREATE TABLE IF NOT EXISTS webappsstore2 (originAttributes TEXT, originKey TEXT, scope TEXT, key TEXT, value TEXT)

which WAS created successfully.
(In reply to Marco Bonardo [::mak] from comment #36)
> (In reply to Honza Bambas (:mayhemer) from comment #33)
> > What we do during the update (when there is schema version == 1 and the
> > scope_index exists)
> 
> IIUC, this means you have table with schema 1 and you have both
> scope_key_index and origin_key_index

Yes, presence of scope_key_index means the profile was run on Gecko of any possible previous version (down to 1.9.0)

> 
> > - we rename webappsstore2 to webappsstore2_old, this should effectively also
> > move the index to no longer be on webappsstore2 but rather be on
> > webappsstore2_old
> 
> and this also moves origin_key_index

Yes, I assume that.

> 
> > - we create-if-not-exists webappsstore2 with the current schema (now it
> > creates a new table)
> > - we create-if-not-exists the origin_key index on it <= THIS UNEXPECTEDLY
> > FAILS
> 
> As drh said, it'd be because we already have a different index with the same
> name
> 
> Does this make sense?

Aha!  I thought any index name was fully classified with name of the table (like table.index) it has been created on.  Apparently it doesn't.  That's it!!!

Thanks.
No longer depends on: SQLite3.10.2
Blocks: 1165214
Comment on attachment 8714448 [details] [diff] [review]
v1

Approval Request Comment
[Feature/regressing bug #]: 1165214
[User impact if declined]: broken site usage because of localStorage throwing too much errors to DOM
[Describe test coverage new/current, TreeHerder]: lot of DOM storage tests for localStorage functionality, but none for updates between different schema version (impossible to automate), needs to be done manually by QA ; the fix is waiting for land on m-c, so cannot be externally verified to fix the problem (only verified locally)
[Risks and why]: nearly zero if not completely zero
[String/UUID change made/needed]: none
Attachment #8714448 - Flags: approval-mozilla-aurora?
(In reply to Honza Bambas (:mayhemer) from comment #40)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b5bf60028e2e

Paul, would you be willing to test with this try Nightly build?  It contains the fix and never should break the database.  Thanks for your work so far!
Flags: needinfo?(paul.silaghi)
I used http://ftp.mozilla.org/pub/firefox/try-builds/honzab.moz@firemni.cz-b5bf60028e2e6fc6794109835d0883601da97bab/try-win32/ for testing.
I've switched several times between 46.0a1 (2016-01-05), 44 release and the try build - no problem occurred.
I didn't use the latest aurora/nightly during the testing, in order not to break the db with the currently affected versions, hope that's correct.
Let me know if you need anything else.
Flags: needinfo?(paul.silaghi)
https://hg.mozilla.org/mozilla-central/rev/8d75d53432d3
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
(In reply to Paul Silaghi, QA [:pauly] from comment #44)
> I used
> http://ftp.mozilla.org/pub/firefox/try-builds/honzab.moz@firemni.cz-
> b5bf60028e2e6fc6794109835d0883601da97bab/try-win32/ for testing.
> I've switched several times between 46.0a1 (2016-01-05), 44 release and the
> try build - no problem occurred.
> I didn't use the latest aurora/nightly during the testing, in order not to
> break the db with the currently affected versions, hope that's correct.

Yes, that is correct.

> Let me know if you need anything else.

Thanks for the help here!  That's all we needed.
When should I expect this fix to hit Dev Edition? And will it fix the related problem of cookies vanishing across sessions (as mentioned in comment #22 - I also have this problem as of very recently)?
Comment on attachment 8714448 [details] [diff] [review]
v1

Fix for dataloss regression, manual verification, please uplift to aurora.
Attachment #8714448 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
kael, this should show up in Saturday's aurora build. If that doesn't fix the cookie vanishing issue, please let us know and reopen the bug!
You need to log in before you can comment on or make changes to this bug.