Closed Bug 1478602 Opened 6 years ago Closed 2 years ago

IndexedDB won't run onupgradeneeded in some cases during add-on upgrade

Categories

(Core :: Storage: IndexedDB, defect, P2)

62 Branch
defect

Tracking

()

RESOLVED DUPLICATE of bug 1653276

People

(Reporter: juraj.masiar, Unassigned)

Details

(Whiteboard: DWS_NEXT)

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0
Build ID: 20180723144101

Steps to reproduce:

See the video demonstration:
https://www.youtube.com/watch?v=vzgndaZ2_Vo

1) install 'web-ext' tool
2) extract the attached zip file
3) execute `web-ext run.bat` several times with different delay (every 1 - 5 seconds)


Actual results:

Some windows will contain database error:
NotFoundError
The operation failed because the requested database object could not be found. For example, an object store did not exist but was being opened.


Expected results:

There should be no errors!


This error affects some of the users of my add-on after I changed the database in the last upgrade. More information in this thread:
https://discourse.mozilla.org/t/background-script-extension-page-during-add-on-upgrade/30418
Video demonstration
Priority: -- → P2
asuth: on your radar
Flags: needinfo?(bugmail)
Thank you so much for the reproduction and video showing the reproduction!

From skimming the linked discourse thread at https://discourse.mozilla.org/t/background-script-extension-page-during-add-on-upgrade/30418/7 it sounds like the error cases consistently line up with the addon being updated nearly instantly after being installed in the first place, which can set up some very interesting edge-cases in terms of the upgrade code effectively racing itself.

It's possible that our changes in bug 1454309 than landed in 61 have made a potential assumption-violating race more likely to happen when it comes to multiple pages attempting to open the same database at the same time.  That is, when it comes to upgrades, the OpenDatabaseOp checks the database's version information on the QuotaManager IO thread, decides whether it's dealing with an upgrade or not and saves that decision, then closes the database connection, then takes action based on the currently visible database version.  Database and transaction blocking logic notably only come into play after that.

We expect the upgrade transaction itself to be atomic and not visible to the OpenDatabaseOp, so I don't think it's a case of getting confused about the database version from disk reads.  But it's now possible for the non-race situation of [open#1 sees v=old, upgrade#1 to v=new, open#2 sees v=new] to become [open#1 sees v=old, open#2 sees v=old, upgrade#1 to v=new, PROBLEM].  I'm not sure what PROBLEM is in this case.


If it's possible to get a reproduction when running with MOZ_LOG=IndexedDB:5, that would probably shed a ton of light of what exactly is happening and make it easier/faster for us to further investigate and debug this.  MOZ_LOG can be used in conjunction with MOZ_LOG_FILE, but obviously if you're running multiple runs of web-ext run atop each other, that might not turn out well if they all try and use the same log file.  Some more information at https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Gecko_Logging


In the meantime, if you want a mitigation for this very specific problem, closing the database and re-opening it should trigger the upgrade to run and things to then work.  Note, however, that what you may be seeing in the field is users experiencing completely broken QuotaManager/IndexedDB setups.  We're seeing a lot of that in our ActivityStream telemetry and are working to fix the underlying issues.  (Which is to say, you may have identified a different bug here than your users are experiencing/reporting.)


I'm going to leave the needinfo on me to look into this again later.
Uh, and if you want to be able to differentiate between broken QuotaManager and not broken QuotaManager, you can point your users at https://firefox-storage-test.glitch.me/ and if they/you don't see "Good: Totally Working" for all four lines, that's probably the problem.  (Note that it's important to head to http*S*, some things won't work on http intentionally.)
Thank you for your analysis! Also that page is amazing - I will totally use that for the support I do.

Regarding broken IndexedDB engine - I have a lot of experience with this as I'm getting at least two e-mails every week (mostly due to downgrade from Firefox 57+ or due to running some old CCleaner).

However I have two users that had this problem (NotFoundError) when I released the database upgrade in 9.8 version and the problem was fixed for them when they upgraded to 9.9 where I just incremented the DB version again. But I will write them again to make sure their QuotaManager is working.

One question - you said I can just close and reopen database without increment version number and the onupgradeneeded handler will fire again? That would be a great hotfix. I will test it tomorrow.

BTW: I'm still getting reports with this bug from 9.9 version, so far about 30+ which is something below 0.1% of users. 
And I'm pretty sure I will not introduce database changes anytime soon (it's been very stressful days for me and unpleasant experience for those users).

Thank you again for your help!
(In reply to juraj.masiar from comment #5)
> One question - you said I can just close and reopen database without
> increment version number and the onupgradeneeded handler will fire again?
> That would be a great hotfix. I will test it tomorrow.

Yes, if:
- The problem is a race of multiple upgrades happening concurrently as I theorize.
- When you try and re-open the race doesn't happen again.  This would suggest you might need to use a setTimeout with a random delay so that a second race doesn't happen when the timeout fires.

In the event there's a more serious transactional integrity problem, you would need to bump the schema version.  (And note, of course, that the version must be an integer.)  But I think that's a significantly less likely explanation for the problem.
But how do I know that there is a concurrent `onupgradeneeded` being executed?
Because the error being logged is from the query that cannot find the object store. The opening database didn't produced any errors.

In the meantime I've contacted several users with this error and send them the storage-test page and they all passed OK so I think we can exclude the broken QuotaManager theory.
Also statistically I should get at least 2 reports from Chrome users if this was a bug in my code but I didn't.
adding to DWS next and removing need-info
Flags: needinfo?(bugmail)
Whiteboard: DWS_NEXT

I think this likely ended up being bug 1653276 (which may end up getting some kind of follow-up, however).

Status: UNCONFIRMED → RESOLVED
Closed: 2 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: