Closed Bug 1404344 Opened 2 years ago Closed 2 years ago

Downgrade backport allowing 57 users downgrade to 56

Categories

(Toolkit :: Startup and Profile System, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox57 blocking fixed
firefox58 --- fixed

People

(Reporter: RT, Assigned: asuth)

References

Details

User Story

Context: 
Users can downgrade Firefox by having different instances of Firefox installed and switching between them. This can happen between Firefox instances of the same channel  or Firefox instances of different channels.
Downgrades are not supported although the current marketing push to get users on Nightly and Beta brought concerns that downgrade instances may increase in a context where bugs like https://bugzilla.mozilla.org/show_bug.cgi?id=1290481 can degrade user experience on certain websites.

User stories: 
- As a user on Release 56 who wants to to try Beta 57 and then switch back to release 56, I want to be made aware of potential issues and be offered a way to mitigate them.
- As a user on Release 57 who does not like square tabs, I want to install Firefox 56 with the full installer so I can keep using round tabs.

Attachments

(3 files)

No description provided.
User Story: (updated)
User Story: (updated)
(Commenting on User Story)
> Proposed UX:
> 1 On startup, detect if a profile comes from the future and display an
> infobar with a message "You previously used a more recent version of
> Firefox, we suggest you refresh your profile to get the best possible
> experience", a button allowing to open the Firefox repair utility and a
> learn more link that opens a SUMO page. The infobar can be dismissed and
> won't appear on the next startup.

We may want to say something more along the lines of "We detected a problem with your Firefox installation, ...".
(In reply to Andrew Overholt [:overholt] from comment #1)
> (Commenting on User Story)
> > Proposed UX:
> > 1 On startup, detect if a profile comes from the future and display an
> > infobar with a message "You previously used a more recent version of
> > Firefox, we suggest you refresh your profile to get the best possible
> > experience", a button allowing to open the Firefox repair utility and a
> > learn more link that opens a SUMO page. The infobar can be dismissed and
> > won't appear on the next startup.
> 
> We may want to say something more along the lines of "We detected a problem
> with your Firefox installation, ...".
Thanks, US now updated accordingly


gijs, can you please help clarify if achieving what is suggested on '3' above is a complex task or not? I was pinged your way for this, if someone else is better suited to answer that question thanks for suggesting.
User Story: (updated)
Flags: needinfo?(gijskruitbosch+bugs)
(Commenting on User Story)
> 3 The profile repair utility does what the current profile repair utility
> does but does not remove the plugins or addons that are installed (we assume
> that users downgrading do it because they want to have access to their
> add-ons that are not supported on 57 anymore)
> 
> Proposed implementation:
> A system add-on allows enabling that functionality as early as Firefox 56.

The "profile repair utility" of record is "firefox refresh". What that actually does is:

1) create a clean profile
2) copy some specific data back into the clean profile from the old profile
3) create a backup of the old profile
4) make the new clean-ish profile the default (or at least give it the same name as the old profile)


This means you actually lose a lot of stuff, not just add-ons but also any changed preferences, hsts data, intermediate certificate DB contents that have been built up over time (and/or custom certs you installed), etc. etc.

bug 1017919 covers keeping add-ons. It's not straightforward, I think, certainly not if we want to disable all of them, but even without that keeping add-on state as well as add-on files isn't trivial, AIUI.

But then, even if we did that we'd then wipe all the about:preferences and about:config settings, toolbar customizations, etc. etc., by doing the reset, which also seems undesirable.

That seems like a really big hammer to fix whatever issues we expect people to have with a downgrade (which I assume is basically the indexeddb issue? Are there other issues we're aware of?). Without knowing anything about indexedDB, I'm pretty sure it would be easier to write a system add-on that just wiped the indexeddb database if it was corrupt/unreadable.

Does that help?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(rtestard)
(In reply to :Gijs from comment #3)

> That seems like a really big hammer to fix whatever issues we expect people
> to have with a downgrade (which I assume is basically the indexeddb issue?
> Are there other issues we're aware of?). Without knowing anything about
> indexedDB, I'm pretty sure it would be easier to write a system add-on that
> just wiped the indexeddb database if it was corrupt/unreadable.
> 
> Does that help?
It does, thanks!

Andrew, does that suggestion seem possible in short timelines or should we fall-back to just an information message with a link to a SUMO page?
Flags: needinfo?(rtestard) → needinfo?(overholt)
Maybe this is another bug, but I would highly recommend that we enable the "separate profile per channel" code we currently use in dev-edition in beta/nightly.  That would minimize problems when we ask people to test new features on a different channel in the future.

This would be in addition to the warning we are discussing here.
(In reply to Ben Kelly [:bkelly] from comment #5)
> Maybe this is another bug, but I would highly recommend that we enable the
> "separate profile per channel" code we currently use in dev-edition in
> beta/nightly.  That would minimize problems when we ask people to test new
> features on a different channel in the future.
> 
> This would be in addition to the warning we are discussing here.

This is tracked in bug 1373244.
(In reply to :Gijs from comment #3)
> That seems like a really big hammer to fix whatever issues we expect people
> to have with a downgrade (which I assume is basically the indexeddb issue?

Any site using IndexedDB/DOM Cache/Service Workers won't work.

> Are there other issues we're aware of?). Without knowing anything about
> indexedDB, I'm pretty sure it would be easier to write a system add-on that
> just wiped the indexeddb database if it was corrupt/unreadable.

I think we'd need to clear all of <profile>/storage and possibly other things, too, but asuth/janv/bkelly will know better.
Flags: needinfo?(overholt) → needinfo?(bugmail)
(In reply to :Gijs from comment #3)
> The "profile repair utility" of record is "firefox refresh". What that
> actually does is:
> 
> 1) create a clean profile
> 2) copy some specific data back into the clean profile from the old profile
> 3) create a backup of the old profile
> 4) make the new clean-ish profile the default (or at least give it the same
> name as the old profile)
> 
> 
> This means you actually lose a lot of stuff, not just add-ons but also any
> changed preferences, hsts data, intermediate certificate DB contents that
> have been built up over time (and/or custom certs you installed), etc. etc.
> 
> bug 1017919 covers keeping add-ons. It's not straightforward, I think,
> certainly not if we want to disable all of them, but even without that
> keeping add-on state as well as add-on files isn't trivial, AIUI.
> 
> But then, even if we did that we'd then wipe all the about:preferences and
> about:config settings, toolbar customizations, etc. etc., by doing the
> reset, which also seems undesirable.

"Refresh" is the tool we recommend to users in other situations with corrupted profiles.  We have documentation, links, etc all supporting this feature.  It also solves this problem.  Using it here is the simplest path to getting a repair tool.

Yes, you lose some other data.  But we are already recommending this in other cases.  I don't see why we have to introduce risk to try to do a more pin-pointed repair.  Remember, we don't know exactly what is going on in these machines and we risk corrupting things the fancier we get.

> That seems like a really big hammer to fix whatever issues we expect people
> to have with a downgrade (which I assume is basically the indexeddb issue?
> Are there other issues we're aware of?). Without knowing anything about
> indexedDB, I'm pretty sure it would be easier to write a system add-on that
> just wiped the indexeddb database if it was corrupt/unreadable.

This is not adequate.  You have to wipe the entire origin atomically or you risk leaving sites in unexpected states.

Since we are trying to fix this on release under time pressure I would highly recommend leaning on our existing refresh tool.  If people complain extensively then we can build something more targeted in the future.
Flags: needinfo?(bugmail)
(In reply to Ben Kelly [:bkelly] from comment #8)
> > That seems like a really big hammer to fix whatever issues we expect people
> > to have with a downgrade (which I assume is basically the indexeddb issue?
> > Are there other issues we're aware of?). Without knowing anything about
> > indexedDB, I'm pretty sure it would be easier to write a system add-on that
> > just wiped the indexeddb database if it was corrupt/unreadable.
> 
> This is not adequate.  You have to wipe the entire origin atomically or you
> risk leaving sites in unexpected states.

Exactly, very good point.
(In reply to Ben Kelly [:bkelly] from comment #8)
> "Refresh" is the tool we recommend to users in other situations with
> corrupted profiles.  We have documentation, links, etc all supporting this
> feature.  It also solves this problem.  Using it here is the simplest path
> to getting a repair tool.
> 
> Yes, you lose some other data.  But we are already recommending this in
> other cases.  I don't see why we have to introduce risk to try to do a more
> pin-pointed repair.  Remember, we don't know exactly what is going on in
> these machines and we risk corrupting things the fancier we get.

I'm suggesting wiping the entire database. I don't understand how that's "fancy", or why it represents more risk than swapping out the entire profile and then copying some files back, which is altogether a more involved operation.

In any case, you're talking about item #1 in the user story here. Item #3, which I was asked about, is about making changes to said tool, and I commented about the feasibility of those changes for a system add-on for 56.

> > That seems like a really big hammer to fix whatever issues we expect people
> > to have with a downgrade (which I assume is basically the indexeddb issue?
> > Are there other issues we're aware of?). Without knowing anything about
> > indexedDB, I'm pretty sure it would be easier to write a system add-on that
> > just wiped the indexeddb database if it was corrupt/unreadable.
> 
> This is not adequate.  You have to wipe the entire origin atomically or you
> risk leaving sites in unexpected states.

I don't understand how "wipe the entirety of indexeddb storage" is not adequate, as it seems like a strict superset of "wipe the entire origin atomically". Can you clarify what you mean here?

> Since we are trying to fix this on release under time pressure I would
> highly recommend leaning on our existing refresh tool.  If people complain
> extensively then we can build something more targeted in the future.

But the hard part here is not the "open the usual refresh dialog" bit, it's the "make changes to refresh" or "wipe indexeddb specifically". And, orthogonally, it's going to be "detect when things are broken in a way that's accurate but not obnoxious" (ie step 2 from the user story, specifically how you define "next occurrence" - if the user closes the infobar and refreshes the page, do we show the infobar again? That's going to be annoying very quickly for websites that work perfectly adequately but have ad iframes that try to use storage or things like that).

It's one thing to ship functionality that people use to fix issues when they're troubleshooting themselves, and with a clear understanding of the trade-offs having read the sumo page etc. It's different to be suggesting that feature as a cure-all whenever anything goes wrong, with much the same casual UI that we use to say "hey, maybe you want to unblock this popup", despite how destructive fxreset really is.

Every time this issue comes up (which it's been doing repeatedly over the last few months), people say "we'd be more comfortable about using refresh if it didn't also delete X, but we might use it anyway and hope you stop deleting X in the future". The fact that people don't understand that "deleting things" isn't how refresh works, and that this happens repeatedly, makes me think that the real issue is that people are trying to use reset/refresh to do something it's not designed to do. This approach has already had consequences in the Firefox 55 cycle. Pushing it aggressively is risky, and I think we overuse it and should investigate more targeted changes when/where that is appropriate. To me, this seems like such a place.
Its not "indexeddb storage".  Its all quota storage which includes IDB, cache API, etc.  It also needs to include wiping other origin things like localstorage, cookies, and service workers.

Building a new tool also will require documenting and providing support for it.  I'm not saying its technically difficult, but getting all the pieces in place to make it a supported product for our users is more than just pointing at our existing refresh tool.  Taking this on after our target release is already out the door seems risky to me.

Anyway, I guess I'll defer the risk analysis to product.
(In reply to Ben Kelly [:bkelly] from comment #11)
> Its not "indexeddb storage".  Its all quota storage which includes IDB,
> cache API, etc.  It also needs to include wiping other origin things like
> localstorage, cookies, and service workers.

Sure. I assume we'd basically just want to remove "all the files related to storage" early on startup based on a commandline switch or env var, similar to how refresh does a restart and then switches files around before continuing with starting up the browser.

> Building a new tool also will require documenting and providing support for
> it.  I'm not saying its technically difficult, but getting all the pieces in
> place to make it a supported product for our users is more than just
> pointing at our existing refresh tool.  Taking this on after our target
> release is already out the door seems risky to me.

This makes sense. I guess I'm concerned that while there's less risk in using a feature we already have, there's a dataloss risk to users who then use that feature and don't expect it to do all the things it does. That's not a development risk, but still a risk to how things work out when we release something that heavily hints to users to "click this button to fix all the problems".
Some comments/questions...

(In reply to Romain Testard [:RT] from current user story)
> As a user on Release 56 who wants to to try Beta 57 and then switch back to release 56, I
> want to be made aware of potential issues and be offered a way to mitigate them.

This is a little too specific/leading. I think the core requirement here is that a user returning to pre-57 from 57+ expects to have a functional Firefox with a minimum of effort.

The best solution is one where it Just Works, without any user interaction or notification.

I'm reluctant to ship a solution that relies on asking the user to do something... Lots of users are just going to ignore any notification (but still be upset that Firefox isn't working right), and it's nigh impossible to enable the user to make an informed decision about the tradeoffs between keeping things as-is vs. performing (for example) a profile reset.

It's conceivable that we could end up doing a notification as the least-worst solution -- given the constraints of needing to solve this quickly -- but I feel like we've jumped a step ahead without exploring the alternatives.



(In reply to Ben Kelly [:bkelly] from comment #5)
> Maybe this is another bug, but I would highly recommend that we enable the
> "separate profile per channel" code we currently use in dev-edition in
> beta/nightly.

There's talk about this, but let's keep that separate. I don't expect that's going to happen for 57 Beta, and it doesn't help with users moving from 57-Release to 56-Release anyway.



(In reply to Ben Kelly [:bkelly] from comment #8)

> "Refresh" is the tool we recommend to users in other situations with
> corrupted profiles.  We have documentation, links, etc all supporting this
> feature.  It also solves this problem.  Using it here is the simplest path
> to getting a repair tool.

I concur with Gijs that this seems like using an awfully big hammer that introduces additional concerns.

The difference here (vs other recommended usage of profile refreshes) is that we _know_ what the problem is, and that _we_ caused it. It's a great tool when the alternative is spending a lot of time diagnosing an issue or being stuck with a broken Firefox because a user is unable to diagnose the issue. It's not so great here, where (from the user's perspective) we'd essentially be causing more (minor) problems to fix the (major) problem.

Is the relevant storage hooked in with Clear Recent History (which is also a documented and supported feature)? I don't know that we'd want to point users at that directly, but having a dot-release or system addon that made use of the relevant APIs to clear storage the same way, automatically, would be more appealing.


(In reply to Ben Kelly [:bkelly] from comment #11)
> Its not "indexeddb storage".  Its all quota storage which includes IDB,
> cache API, etc.  It also needs to include wiping other origin things like
> localstorage, cookies, and service workers.


Err, cookies? Can you expand further?

The discussion I've seen so far (and it's quite possible I've missed something) indicated this was a problem specific to IndexDB, where things using it could go wonky because the underlying storage isn't forwards-compatible with 57+ changes and just errors out. AIUI the browser is allowed to discard such storage at will -- and the web / addons need to deal with that happening in normal usage -- so us clearing that storage was the simplest way to fix the problem. (As opposed to somehow downgrading existing data, or adding code to 56 to make it forwards-compatible.)

Cookies are in a sqlite DB separate from all that, so I don't understand why they're involved in this.

Some of this may be us (or at least me :) needing an update on how this all _actually_ works. Perhaps something's already been written describing exactly what things are affected by this forwards-incompatible change?
Flags: needinfo?(bkelly)
(In reply to Justin Dolske [:Dolske] from comment #13)
> (In reply to Ben Kelly [:bkelly] from comment #11)
> > Its not "indexeddb storage".  Its all quota storage which includes IDB,
> > cache API, etc.  It also needs to include wiping other origin things like
> > localstorage, cookies, and service workers.
> 
> 
> Err, cookies? Can you expand further?
> 
> The discussion I've seen so far (and it's quite possible I've missed
> something) indicated this was a problem specific to IndexDB, where things
> using it could go wonky because the underlying storage isn't
> forwards-compatible with 57+ changes and just errors out. AIUI the browser
> is allowed to discard such storage at will -- and the web / addons need to
> deal with that happening in normal usage -- so us clearing that storage was
> the simplest way to fix the problem. (As opposed to somehow downgrading
> existing data, or adding code to 56 to make it forwards-compatible.)
> 
> Cookies are in a sqlite DB separate from all that, so I don't understand why
> they're involved in this.
> 
> Some of this may be us (or at least me :) needing an update on how this all
> _actually_ works. Perhaps something's already been written describing
> exactly what things are affected by this forwards-incompatible change?

Well, we just changed our UX to clear quota storage and cookies together when triggered from our normal preferences screen.  Users don't seem to understand the difference between cookies and storage.

Also, there is some small risk that sites will use both cookies and quota storage together to manage state.  If one half of their state disappears while the other half of the state remains the site may break.

However, I see that Refresh leave cookies and clears storage today.  So I guess we have some precedence there and its something I can live with.
Flags: needinfo?(bkelly)
(In reply to Justin Dolske [:Dolske] from comment #13)
> 
> The best solution is one where it Just Works, without any user interaction
> or notification.

Do you think it's possible to do something to rollback the changes from bug 1290481 and bug 1396282, asuth?

> Is the relevant storage hooked in with Clear Recent History (which is also a
> documented and supported feature)? I don't know that we'd want to point
> users at that directly, but having a dot-release or system addon that made
> use of the relevant APIs to clear storage the same way, automatically, would
> be more appealing.
> 
> 
> (In reply to Ben Kelly [:bkelly] from comment #11)
> > Its not "indexeddb storage".  Its all quota storage which includes IDB,
> > cache API, etc.  It also needs to include wiping other origin things like
> > localstorage, cookies, and service workers.

baku recently did some work to ensure lots of things are properly cleared so I *think* we have the APIs that we need in place. He can probably clarify what gets cleared with what APIs so we know what the options here are.
Flags: needinfo?(bugmail)
Flags: needinfo?(amarchesini)
(In reply to Andrew Overholt [:overholt] from comment #15)
> Do you think it's possible to do something to rollback the changes from bug
> 1290481 and bug 1396282, asuth?

I just want to note that doing something like this is a one-time fix and doesn't solve anything for future releases.
(In reply to Andrew Overholt [:overholt] from comment #15)
> Do you think it's possible to do something to rollback the changes from bug
> 1290481 and bug 1396282, asuth?

Bug 1396282 is the Places change that added an index and no changes are required because Places already will handle the downgrade well enough.

Bug 1290481 (QuotaManager and DOM Cache API schema's rev'ed) can be hacked up by landing changes on 57, see below.  If bug 1402581 is going to end up rev'ing the schema again, we will want to account for these changes.

Note that :bkelly is very correct in comment 16.  This is a rare situation where downgrading is feasible without data-loss.  We have a large number of enhancements targeted for 58 onwards that will categorically not be feasible, and a solution is needed.  Given the lack of traction on the user-facing enhancements and mitigations, it's beginning to seem that we should indeed just start blowing away all QuotaManager-controlled data for affected origins on downgrade where the origin has an impacted client.

If we go this route, we will probably be looking at at least one more "downgrade=all QuotaManager data gets wiped" cliff.  After that, damage would be limited to the origin having been touched before the user downgrades.  Unfortunately, this sucks in practice because it's the user's WebExtensions, pinned tabs, and sites that use ServiceWorkers that are going to be most affected.  It's only the sites the user doesn't visit that will avoid automatic erasure.


Here's my previous analysis on bug 1290481:
"""
One non-extension approach, assuming we can get 57 beta approval, is to update 57 to bump QM v3 and Cache API v26 back down to QM v2 and Cache API v25 *on disk*, respectively, when they see the higher numbers so that 56 doesn't freak out.

At least, my preliminary auditing of https://bugzilla.mozilla.org/show_bug.cgi?id=1290481 was that:
- The .padding file will only generate NS_WARNINGs; cache's QuotaClient just doesn't care that much about files it doesn't understand: https://hg.mozilla.org/releases/mozilla-beta/file/tip/dom/cache/QuotaClient.cpp#l159
- The content visible fall-out is that an opaque Response will lose its padding bytes that content saw in 57, which is somewhat of a meh regression considering that the opaque padding protection is gone if you downgrade to 56 no matter what.

Elaborating on the hackery, the idea for the DOM Cache API is:
- In dom/cache/DBSchema.cpp, create a wrapper around mozIStorageConnection::GetSchemaVersion.  Technically, we already have one in dom/cache/Connection.cpp if we want to be extra-hacky about things.
- Create a new Cache API schema v27.
- When cache sees v26, it "upgrades" to v27 by bumping the schema number down to v25.
- The "GetSchemaVersion" wrapper handles seeing version 25 specially, checking if the response_padding_size column exists.  If it exists, it returns version 27, otherwise it returns version 26.

The idea for Quota Manager is:
- Looking at the patches, the Quota Manager v2 => v3 upgrade process just looks like a way to front-load the cost of creating the .padding files since dom/cache seems to be able to handle the file not existing.  I presume the upgrade sweep might have been more necessary in prior revisions of the patch stack.  I may also be missing something.
- We could just have Quota Manager put v3 back down to v2 and be done with it, if the above is right.  If we really do want that upgrade sweep, Quota Manager has the concept of minor schema revision.  56 will not freak out if it sees major=2, minor=1.  We can basically rename [3, 0] to [2, 1] and put in a minor fix-up so that when we see [3, 0] we immediately just set it to [2, 1].
"""
Flags: needinfo?(bugmail)
(In reply to Ben Kelly [:bkelly] from comment #16)
> (In reply to Andrew Overholt [:overholt] from comment #15)
> > Do you think it's possible to do something to rollback the changes from bug
> > 1290481 and bug 1396282, asuth?
> 
> I just want to note that doing something like this is a one-time fix and
> doesn't solve anything for future releases.

Yes, you're right. I was thinking about it only because of the perception that the deprecation of old-style addons may lead to an increase in 57->56 downgrades.
Another option (just to have everything on the table) is that we uplift bug 1290481 to a 56 dot release. We'd still need to ensure everyone goes through this dot release before going to 57 ...
(In reply to Andrew Overholt [:overholt] from comment #15)
> (In reply to Justin Dolske [:Dolske] from comment #13)
> > Is the relevant storage hooked in with Clear Recent History (which is also a
> > documented and supported feature)? I don't know that we'd want to point
> > users at that directly, but having a dot-release or system addon that made
> > use of the relevant APIs to clear storage the same way, automatically, would
> > be more appealing.
> > 
> > 
> > (In reply to Ben Kelly [:bkelly] from comment #11)
> > > Its not "indexeddb storage".  Its all quota storage which includes IDB,
> > > cache API, etc.  It also needs to include wiping other origin things like
> > > localstorage, cookies, and service workers.
> 
> baku recently did some work to ensure lots of things are properly cleared so
> I *think* we have the APIs that we need in place. He can probably clarify
> what gets cleared with what APIs so we know what the options here are.

Jan told me it's probably unlikely we can do things once Firefox is already running. I'll let him clarify.
Flags: needinfo?(amarchesini) → needinfo?(jvarga)
See Also: → 1405318
(In reply to Andrew Overholt [:overholt] from comment #20)
> > baku recently did some work to ensure lots of things are properly cleared so
> > I *think* we have the APIs that we need in place. He can probably clarify
> > what gets cleared with what APIs so we know what the options here are.
> 
> Jan told me it's probably unlikely we can do things once Firefox is already
> running. I'll let him clarify.

Yes, if we want to clear entire profile/storage and profile/storage.sqlite, it must be done before quota manager touches that stuff.
Flags: needinfo?(jvarga)
(In reply to Andrew Overholt [:overholt] from comment #19)
> Another option (just to have everything on the table) is that we uplift bug
> 1290481 to a 56 dot release. We'd still need to ensure everyone goes through
> this dot release before going to 57 ...

Yeah, but I guess nobody will approve that since the fix for DOM cache paddings is quite complex.
But I got another idea, we could create a small subset of the fix which only creates a new column in SQL and does the QM upgrade. Something like a placeholder in 56 for real fix in 57.
So 56 would use the same version of storage and DOM cache database scheme w/o actually using the column and .padding file.
(In reply to Jan Varga [:janv] from comment #22)
> (In reply to Andrew Overholt [:overholt] from comment #19)
> > Another option (just to have everything on the table) is that we uplift bug
> > 1290481 to a 56 dot release. We'd still need to ensure everyone goes through
> > this dot release before going to 57 ...
> 
> Yeah, but I guess nobody will approve that since the fix for DOM cache
> paddings is quite complex.
> But I got another idea, we could create a small subset of the fix which only
> creates a new column in SQL and does the QM upgrade. Something like a
> placeholder in 56 for real fix in 57.
> So 56 would use the same version of storage and DOM cache database scheme
> w/o actually using the column and .padding file.

Basically just patches P2 and P6 from bug 1290481.
(In reply to Jan Varga [:janv] from comment #23)
> (In reply to Jan Varga [:janv] from comment #22)
> > (In reply to Andrew Overholt [:overholt] from comment #19)
> > > Another option (just to have everything on the table) is that we uplift bug
> > > 1290481 to a 56 dot release. We'd still need to ensure everyone goes through
> > > this dot release before going to 57 ...
> > 
> > But I got another idea, we could create a small subset of the fix which only
> > creates a new column in SQL and does the QM upgrade. Something like a
> > placeholder in 56 for real fix in 57.
> > So 56 would use the same version of storage and DOM cache database scheme
> > w/o actually using the column and .padding file.
> 
> Basically just patches P2 and P6 from bug 1290481.

This does sound preferable to the added complexity of the magic downgrade I prototyped in bug 1405318.  It also avoids the "downgrading to Firefox 56 debug build will result in DOM Cache API breaking for already initialized origins" problem inherent in that fix.  I think it's a little bit more scary in absolute terms, however.  If we're okay with landing something on 56, wouldn't it be better to just make 56 not freak out when it sees those schema versions rather than introducing a hard schema bump that makes it impossible to downgrade to the prior point release?  (Specifically, we have it accept v26 DOM Cache API as-is and v3.0 QM as-is.)
(In reply to Andrew Sutherland [:asuth] from comment #24)
> This does sound preferable to the added complexity of the magic downgrade I
> prototyped in bug 1405318.  It also avoids the "downgrading to Firefox 56
> debug build will result in DOM Cache API breaking for already initialized
> origins" problem inherent in that fix.  I think it's a little bit more scary
> in absolute terms, however.  If we're okay with landing something on 56,
> wouldn't it be better to just make 56 not freak out when it sees those
> schema versions rather than introducing a hard schema bump that makes it
> impossible to downgrade to the prior point release?  (Specifically, we have
> it accept v26 DOM Cache API as-is and v3.0 QM as-is.)

Yeah, that sounds even safer/simpler to me. I prefer this approach because we would only add hacks to a source code branch, so it will be forgotten soon.
The other approach (bug 1405318) assumes that those changes/hacks will have to land on m-c too and stay basically "forever", right ?
(In reply to Jan Varga [:janv] from comment #25)
> The other approach (bug 1405318) assumes that those changes/hacks will have
> to land on m-c too and stay basically "forever", right ?

Yes and sorta.  The hack needs to land on m-c and the 3.0 QM version will have been "consumed" by the hack; the next version would need to be "2.2" or "4.0".  But "forever" is really just "as long as we handle upgrades of the given Firefox profile without blowing storage away".

I think it can be argued that once we ship ESR 59 overlapped with ESR 52 that ESR 45 is moot.  And given that we introduced storage.sqlite v1.0 in Firefox 49, that it might be reasonable to drop all the older upgrade logic and instead just nuke any "storage" directory without a "storage.sqlite" that thereby implies a pre-v1.0 schema and create a fresh schema.  Using this same rationale, once ESR 52 is mooted by the availability of ESR 66(?) concurrent with ESR 59, we could potentially drop any logic related to the hacks.  (Note that we can certainly support further back than that, but I think we'd want to have a good rationale for why we're keeping the functionality, the code size, and the test overhead around.)
Ok, understood and agreed. Now we just need to decide which approach to take.
Based on the recent discussion above, how do you feel about a 56 point release that makes 56 okay with the 57 schemas?  It need not be pushed out to all users if we assume this is how the downgrade scenario goes:
- User has a 56 install.
- The user gets automatically upgraded to 57, therefore no more 56 install.
- User does not like 57, goes and finds a place to download 56.  I just googled and yahoo'd "download old firefox" and got sent to https://support.mozilla.org/en-US/kb/install-older-version-of-firefox which really seems like it needs an update if we're expecting this to happen a lot.  We can certainly point that at the point-fix of Firefox 56 that addresses this.
- User installs 56 somewhere and everything's great.

Obviously there do end being edge cases if the user installed 57 manually in a separate location, or if the user habitually saves their old versions to a downloads folder and simply installs from there, etc.  In that case you'd want the point release pushed, but I'm not sure how that works if the user is actively doing something to defeat the 57 upgrade from recurring.
Flags: needinfo?(rtestard)
(In reply to Andrew Sutherland [:asuth] from comment #28)
> Based on the recent discussion above, how do you feel about a 56 point
> release that makes 56 okay with the 57 schemas?  It need not be pushed out
> to all users if we assume this is how the downgrade scenario goes:
> - User has a 56 install.
> - The user gets automatically upgraded to 57, therefore no more 56 install.
> - User does not like 57, goes and finds a place to download 56.  I just
> googled and yahoo'd "download old firefox" and got sent to
> https://support.mozilla.org/en-US/kb/install-older-version-of-firefox which
> really seems like it needs an update if we're expecting this to happen a
> lot.  We can certainly point that at the point-fix of Firefox 56 that
> addresses this.
I'll raise that with the SUMO team. We obviously don't want to promote downgrades but I'l check what they advise on the 57 case
> - User installs 56 somewhere and everything's great.
> 
> Obviously there do end being edge cases if the user installed 57 manually in
> a separate location, or if the user habitually saves their old versions to a
> downloads folder and simply installs from there, etc.  In that case you'd
> want the point release pushed, but I'm not sure how that works if the user
> is actively doing something to defeat the 57 upgrade from recurring.

Is there no way to ship this in 57 instead so that by the time 57 gets to release going back to 56 won’t be a problem?
Otherwise I assume this dot release will have to be a watershed? (56.0 is already a watershed because of LZMA/SHA384 and watershed releases have a high cost since they contribute to orphaning users)
Uplifting this to Beta 57 would also have the added benefit to solve the issue for users trying beta 57.
For info Beta downloads seem to be dropping (see https://sql.telemetry.mozilla.org/queries/4881#9980).
Flags: needinfo?(rtestard)
Based on your response re:56, this is the plan :overholt and I have reached based on your feedback from comment 29:

- We're going to go with simultaneously landing the hack from bug 1405318 on mozilla-central (=58) and mozilla-beta (=57) that makes 56 okay with the schemas from the patched build.  *No changes* need to land on mozilla-release (=56) and users downgraded from patched 57 to (non-DEBUG[1] build) 56 should experience no hiccups or problems.  We do this because:
  - It's hard to make sure everyone gets the 56 changes, it's easier to make sure everyone gets the 57 changes, especially since the people we're most concerned about don't have any 57 yet...
  - We've got several 57 betas remaining, 56 already shipped.
  - There's a uniquely high chance of users downgrading from 57 to 56 due to the removal of legacy add-on support from 57.  A unique situation that will not recur going forward.
  - The hack does not require user interaction and will not result in data-loss, whereas most system add-on approaches would.
  - Most system add-on approaches could run into additional complications when attempting to interact with a quota manager that has decided not to initialize.
  - The hack is uniquely feasible in the history of schema upgrades.
  - Everyone seems to be on the same page that the situation is undesirable and that it's critically important to implement profile-per-channel on beta and nightly as a related mitigation as we encourage users to try beta/nightly out.  Also, that downgrades can't be supported, but that additional UI-surfaced mitigations are necessary for handling same-profile downgrades.  (And that if UI-surfaced mitigations don't happen, the storage subsystems really have no option but to wipe origins and/or all QuotaManager-managed storage in order to restore proper browser operation.  The current situation that has endured for years where things just break on downgrade isn't working, and the storage subsystems are currently overdue to improve their corruption handling.  And it's easiest/simplest/safest to treat a downgrade as corruption, as the only possible user action in the face of a downgrade is to upgrade again, but that requires UI that exists at a much higher abstraction level, and that's what bug 1246615 and its dependencies are handling.  And when the UI/profile-logic does get implemented, then the storage subsystems don't have to worry about downgrades anymore, and there's no over-complicated hacky vestigial downgrade code-paths left over to worry about.)

- I'm going to create a glitch.com page to simplify QA testing of this fix and inter-release transitions that will explicitly indicate when storage subsystems are broken or operational.  The primary breakage we are concerned about involves opaque response objects, so this page will also be sure to trigger those edge cases of concern.

- I'm going to spin try builds of patched mozila-central and mozilla-beta so that you/your delegate can arrange for QA to test.

- I'm going to put the patches up for review after manual testing using the glitch.  I'm at least going to put comments in the patch about why the "57 -> 56 -> 57" transition is safe for both subsystems and how the manual testing bears that out.  If I feel it's too hand-wavey or :janv/:bkelly requests, I can also add automated tests that simulate the phenomenon.  (Full cross-version automated tests I fear are out of scope given existing test infrastructure.)


:RT, do you sign off on this approach?  (I think we effectively have consensus already, but :overholt wanted to be sure you approve.)


1: Again, people using 56 DEBUG builds are out of luck, but only developers do that, and they usually aren't downgrading.  Also, this downgrade situation already won't work for them as the various branches stand, so it's not actually a new problem.
Flags: needinfo?(rtestard)
> :RT, do you sign off on this approach?  (I think we effectively have
> consensus already, but :overholt wanted to be sure you approve.)
Yes!
Flags: needinfo?(rtestard)
Updating title and user story to reflect the chosen solution.
User Story: (updated)
Summary: Ship a system add-on that informs release users who downgrade of potential profile issues → Downgrade backport allowing 57 users downgrade to 56
Romain pinged me about this. It is a must for 57, tracked as a blocking issue.
Test page up at https://firefox-storage-test.glitch.me/
Fork-able via https://glitch.com/edit/#!/firefox-storage-test

The test page tests IDB at a database creation/opening granularity, and the DOM Cache API at a keys()/match() granularity, with permutations for normal, plus padded and un-padded no-cors based on Firefox version (after opening caches and put()/add()ing in the base case).

Screenshot-wise, this is what the page looks like:
* Starting with a fresh profile run in Firefox 56:
  https://clicky.visophyte.org/files/screenshots/20171010-011628.png
* Upgrade to unpatched Firefox 57. NOTE[1]:
  https://clicky.visophyte.org/files/screenshots/20171010-005347.png
* Refreshing the page in 57, so effectively no change:
  https://clicky.visophyte.org/files/screenshots/20171010-010326.png
* Downgrading to Firefox 56, things break:
  https://clicky.visophyte.org/files/screenshots/20171010-012209.png
* Upgrade to patched Firefox 57:
  https://clicky.visophyte.org/files/screenshots/20171010-012827.png
* Downgrading to Firefox 56 from patched Firefox 57, things work fine:
  https://clicky.visophyte.org/files/screenshots/20171010-010354.png

And an interesting edge case:
* Profile run in Firefox 56, blowing away the QM directory for the origin but leaving LocalStorage intact:
  https://clicky.visophyte.org/files/screenshots/20171009-201135.png

1: Sometimes on upgrade from 56 to 57 the page thumbnail service appears to have front-run our manual opening of the page, resulting in the page saying "This is the same version (57) as the last time you loaded this page." instead of "You upgraded from 56 to 57."  The broken/not broken indication will be correct, regardless.
# Builds for Testing

## Patched Firefox 57
From https://treeherder.mozilla.org/#/jobs?repo=try&revision=efc62fb7fa68968335c605cf54f1d94631bc3b33

Windows
* 32-bit zip: https://queue.taskcluster.net/v1/task/XkEWniflSpCOvpeqf-EuOw/runs/0/artifacts/public/build/target.zip
* 64-bit zip: https://queue.taskcluster.net/v1/task/Av9i9TcVT961Km_5CBRTkg/runs/0/artifacts/public/build/target.zip

OS X
* dmg: https://queue.taskcluster.net/v1/task/CbEo6r_jQHunOXi5nYV0_Q/runs/0/artifacts/public/build/target.dmg

Linux:
* 32-bit tbz2: https://queue.taskcluster.net/v1/task/OM6gc8EUTPmGqUOg3XQdEg/runs/0/artifacts/public/build/target.tar.bz2
* 64-bit tbz2: https://queue.taskcluster.net/v1/task/SaLkY2BjTfiuVTPrMR0whQ/runs/0/artifacts/public/build/target.tar.bz2

## Patched Firefox 58
From https://treeherder.mozilla.org/#/jobs?repo=try&revision=2a726ba318eb227a823e5c70687dd4272f2fe794

Windows
* 32-bit zip: https://queue.taskcluster.net/v1/task/PIcqkbmyT96IHzlMZmBaCw/runs/0/artifacts/public/build/target.zip
* 64-bit zip: https://queue.taskcluster.net/v1/task/SMtIdKwRQGuC7JXmC3R3qQ/runs/0/artifacts/public/build/target.zip

OS X
* dmg: https://queue.taskcluster.net/v1/task/BadOcUEqRfCUwhcoqkQD1w/runs/0/artifacts/public/build/target.dmg

Linux:
* 32-bit tbz2: https://queue.taskcluster.net/v1/task/aoPbVonhSZeBITMWkqytWw/runs/0/artifacts/public/build/target.tar.bz2
* 64-bit tbz2: https://queue.taskcluster.net/v1/task/b0BsAw8QSjiPePQjSzqfWg/runs/0/artifacts/public/build/target.tar.bz2
I'm putting the patches from bug 1405318 up here for review to consolidate things.  (Flags get confusing otherwise.)  I'm putting the 57 patches up for review here; they graft cleanly onto mozilla-central.  I'm also going to trigger another try run for the 58/mozilla-central patches with debug builds and full test coverage to deal with Murphy's law guaranteeing breakage down the road if we don't.
To improve the Firefox 57 to 56 downgrade scenario, have 57 and 58
re-number the version 3.0 schema introduced by bug 1290481 to
version 2.1.  Firefox 56's Quota Manager will recognize the minor
schema version and accept the schema from the future.
Attachment #8916884 - Flags: review?(jvarga)
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Alter 57 and 58 DOM Cache API to use an on-disk schema version of 25 again,
as was used prior to the landing of bug 1290481 that bumped the disk schema
version to 26.  Patched versions will recognize this internally as schema 27
based on the presence of the column introduced by the 26 upgrade.
Attachment #8916885 - Flags: review?(bkelly)
Upgrade the test names from 3.0/3_0 to 2.1/2_1, consistent with the changes
made in part 1.
Attachment #8916886 - Flags: review?(jvarga)
Duplicate of this bug: 1405318
Attachment #8916885 - Flags: review?(bkelly) → review+
Comment on attachment 8916884 [details] [diff] [review]
Part 1: Alias QuotaManager schema 3.0 to 2.1

Review of attachment 8916884 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, I appreciate the comments in the code.
Attachment #8916884 - Flags: review?(jvarga) → review+
Attachment #8916886 - Flags: review?(jvarga) → review+
Comment on attachment 8916884 [details] [diff] [review]
Part 1: Alias QuotaManager schema 3.0 to 2.1

Approval Request Comment:  (This whole bug is about this landing :)
QA=:Ovidiu signed off on the fixes earlier today.  I think there's already consensus on landing these changes given the QA pass, but I want to land these concurrently-ish on 57 and 58, so I want the approval flag before I do the 58 landing.

Specifically, I would like to land the changes on 57 relatively quickly after the changes migrate via mozilla-inbound to mozilla-central.  There's no need to bypass mozilla-inbound.
Attachment #8916884 - Flags: approval-mozilla-beta?
Attachment #8916885 - Flags: approval-mozilla-beta?
Attachment #8916886 - Flags: approval-mozilla-beta?
Hi Andrew, have these changes landed on m-c? I'll review beta uplift nomination only after these have landed and baked in Nightly58 for a few days. Thanks!
Flags: needinfo?(bugmail)
(In reply to Ritu Kothari (:ritu) from comment #46)
> Hi Andrew, have these changes landed on m-c? I'll review beta uplift
> nomination only after these have landed and baked in Nightly58 for a few
> days. Thanks!

They have not.  Doing so introduces an additional scenario that I was hoping to avoid.  It's transient and recoverable, but it does mean that "downgrade from patched-58 to unpatched-57" introduces a breakage that doesn't currently exist.  Also, "downgrade from patched-58 to non-debug-56" will be fine BUT upgrading from profile-previously-run-under-patched-58-but-then-run-under-non-debug-56 to unpatched-57 will break which is probably counter-intuitive.  Both breakage scenarios will resolve themselves on upgrade to patched-57 from unpatched-57, or from unpatched-57 to patched-58.

In any event, I'll land the patches on inbound now and I'll leave it up to you to decide the time-window trade-offs.  This type of downgrade breakage is the way things normally happen (modulo the 58->56 downgrade being okay after this patch), I just thought we were going to more aggressively land on 57 given the confidence from QA testing.

Once the 57-uplift is approved and is landed, we can commence the second round of manual QA testing.  (But not before, because we'll fail the testing.)
Flags: needinfo?(bugmail)
Comment on attachment 8916884 [details] [diff] [review]
Part 1: Alias QuotaManager schema 3.0 to 2.1

This is a must fix for 57. The sooner it gets tested in beta (if at all) the better. Beta57+
Attachment #8916884 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8916885 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8916886 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Hi Andrew, Romain, Tom, I am really worried about this fix and the complex scenarios that ought to be tested here. Have we made a PI request for this and has SV signed off on this change? Thanks!
Flags: needinfo?(tgrabowski)
Flags: needinfo?(rtestard)
Flags: needinfo?(bugmail)
:Ovidiu has been leading QA/PI testing of the feature.  Prior to landing the functionality on trunk, an extensive round of permutations of [56, 57-patched, 58-patched] upgrade/downgrade tests were performed across all platforms.  The plan is to re-run this testing post-landing on 57 which I guess I'm going to land now depending on what :RyanVM tells me when I ping him.

Excerpting the relevant part of :Ovidiu's sign-off here:
"""
We finished testing the PI Request "Testing 57->56->57 for no data loss". 
The tests were made on Windows 10, Windows 7, Mac OS X 10.12, Ubuntu 16.04. We followed the scenarios discussed with Andrew Overholt and Romain Testard in the kickoff meeting that we had for this issue and we also used the scenario from Andrew Sutherland. You can find all the scenarios in this etherpad (https://public.etherpad-mozilla.org/p/Scenarios_that_we_used_for_data_loss_issue)
"""
Flags: needinfo?(bugmail)
Update on our plan to gain release sign-off for this:
1 Beta 10 gets released on the 20th
2 Softvision (Ovidiu?) proceeds with testing (same as what was run on Nightly) soon after Beta 10 is released and shares test results
Flags: needinfo?(rtestard)
I retested this issue on the following OSes: Mac OS X 10.12, Windows 7 (x32 and x64), Windows 10 (x32 and x64), Ubuntu 16.04 (x64) using the scenarios from https://public.etherpad-mozilla.org/p/Scenarios_that_we_used_for_data_loss_issue.

During the tests on the scenario from https://bugzilla.mozilla.org/show_bug.cgi?id=1404344#c36, we found out one issue. 

Here is the exact scenario, please see the screenshots with the errors:


1. Starting with a fresh profile run in Firefox 56 and open test page https://firefox-storage-test.glitch.me/
The actual result: https://imgur.com/a/gjjOb

 2. Upgrade to unpatched Firefox 57(I used FF beta 57.0b9) and open test page https://firefox-storage-test.glitch.me/
The actual result: https://imgur.com/a/ssgGl

3. Refreshing the page in 57, so effectively no change.
 The actual result: https://imgur.com/a/ssgGl

4. Downgrading to Firefox 56 and open test page https://firefox-storage-test.glitch.me/
The actual result: https://imgur.com/a/hIZRg

5. Upgrade to Firefox beta 57.0b10 and open test page https://firefox-storage-test.glitch.me/
The actual result: https://imgur.com/a/1sCTh

6. Downgrading to Firefox 56 from  Firefox beta 57.0b10 and open test page https://firefox-storage-test.glitch.me/ 
The actual result: https://imgur.com/a/V9xJw


Andrew, please tell me your opinion about this, thanks.
Flags: needinfo?(bugmail)
It looks like you accessed the domain via http rather than https, resulting in the unavailability of the Cache API, which is expected.  It was an oversight on my part to not force a redirect to https when accessed via http, my apologies.  I'll implement a redirect and atomically-as-possible apply it to the test.  (glitches are live-updating, so I'll try and changes in a different glitch first, then propagate the changes.)
Flags: needinfo?(bugmail) → needinfo?(ovidiu.boca)
Thanks Andrew and team for the due diligence. https://bugzilla.mozilla.org/show_bug.cgi?id=1404344#c52 Awesome!!
Will this now allow me to uninstall 57, install 56, and recover my old window/tabs with SessionManager? It WAS all working swell in 57, but then one day, it just stopped. Not nice!
(In reply to Worcester12345 from comment #58)
> Will this now allow me to uninstall 57, install 56, and recover my old
> window/tabs with SessionManager? It WAS all working swell in 57, but then
> one day, it just stopped. Not nice!

Yes.  As far as I can tell from a brief investigation, SessionStore has been stable since at least Firefox 44 when bug 1147822 added a version number to the file format.  If you attempted to downgrade from 57 to 56 and experienced problems with restoring a session before, it's conceivable it was due to problems this patch addresses interfering with session restore.  I'm not aware of any specific bugzilla bugs describing such a problem; if you are, I'd appreciate if you could reference them here or cc/needifo me on the bugs so I can do further research and establish the relevant dependencies or see-also's.  Thanks!
Flags: needinfo?(tgrabowski)
I retested the issue mentioned in comment 55 using the Windows 7(x32) with FF beta 57.0b10 x32 and everything works as expected. Also, I made a check on Windows 7 x64 FF 57.0b10 x64, to make sure all is good. Tell me if you have any thoughts about this, if not from my point of view this issue is verified.
Flags: needinfo?(ovidiu.boca)
You need to log in before you can comment on or make changes to this bug.