Open Bug 1536797 Opened 5 years ago Updated 1 year ago

Changes to localStorage do not always get committed when app exits

Categories

(GeckoView :: General, defect, P3)

66 Branch
Unspecified
Android

Tracking

(firefox66 wontfix, firefox67 wontfix, firefox68 fix-optional)

Tracking Status
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- fix-optional

People

(Reporter: colormatch, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0

Steps to reproduce:

Web-page assigns new value to localStorage.data = "some new value"
directly exit / close the app
reopen the app/page

Actual results:

"localStorage.data" had "old value", as it seems it doesn't always commit the changes on time when you close the app with biult-in GV.

Note that the page was registering the showing the new value before closing the app.
However after restarting the app, the old value is loading.

Expected results:

localStorage should have values committed to storage on time

This localStorage bug might be related to about:config bug 1536872 and process switching bug 1527778.

Depends on: 1527778
OS: All → Android
Priority: -- → P1
Whiteboard: [geckoview:fenix:m4]

As of latest tested version: geckoview-beta-armeabi-v7a/67.0.20190325125126
localStorage (all values) gets committed 5 seconds after a localStorage value gets changed.

If you exit the app before that, the changes to localStorage are lost.

If GV can't use onDestroy() to commit changes to localStorage, at least lower the 5sec commit-delay to no more than 1 sec.

I expect the reason for this is to avoid sites using an "attack vector" to overload GV/device.

Batch-write is a good solution, even if it's just a case of changed multiple localStorage values at the same time, after user action, but we can expect those values to be changed in far less than 1sec.

While 5sec are an eternity for user actions, I doubt we'll see a user performing actions, which require localStorage update and at the same time exiting the app in less than 1sec.

The commit-delay of 1sec should still provide protection against attempts of overloading GV.

67=wontfix. Fenix MVP will use GeckoView 68, so we don't need to uplift this fix to 67 Beta.

Assignee: nobody → esawin

On Android, Gecko sessions are often short-lived and suspect to getting killed by the system without an option to do any blocking I/O.
The 5s timeout for local storage flushing can cause unexpected behavior with data not being made persistent even in regular use cases, e.g., by quickly switching between a custom tab and an app.

Reducing the timeout to 1s or less would mitigate that issue, but introduces performance risks.

Jan, what is your assessment on this?

Flags: needinfo?(jvarga)

This issue is not related to bug 1527778, there is simply a 5s delay for local storage flushing.

No longer depends on: 1527778
Status: UNCONFIRMED → NEW
Ever confirmed: true

Commented in phabricator since I was asked there first.

Flags: needinfo?(jvarga)

I can't help myself, but this looks like quick and dirty fix.
Have you investigated other options ? Is this a design issue ?
If such problem exists with LocalStorage then there's a chance that other storage APIs may be affected too.

Is there any hint we can use to force a flush? For example, in the web platform we have the "pagehide" event[1] and the page-lifecycle spec is proposing a "freeze" event[2]. Since I think web apps are going to need to be able to receive such events themselves and then issue multiple rounds of async requests themselves (although new methods like IDB's commit()[3] are intended to help with this). If this isn't something we can currently do, then it either needs to be a priority for GeckoView to provide or we should maybe give up on GeckoView.

At a high level, LocalStorage is massively abused by many sites, so reducing the timer by a factor of 5 potentially will result in a 5x magnification of disk I/O. We can break the symmetry between bad/dumb LocalStorage spammy sites by introducing some kind of token bucket mechanism[4] so that non-spammy sites can experience reduced I/O delays for LocalStorage, but that's a bigger enhancement.

1: https://developer.mozilla.org/en-US/docs/Web/API/Window/pagehide_event
2: https://wicg.github.io/page-lifecycle/spec.html
3: https://w3c.github.io/IndexedDB/#dom-idbtransaction-commit
4: https://en.wikipedia.org/wiki/Token_bucket

Flags: needinfo?(esawin)

(In reply to Jan Varga [:janv] from comment #9)

I can't help myself, but this looks like quick and dirty fix.
Have you investigated other options ? Is this a design issue ?
If such problem exists with LocalStorage then there's a chance that other storage APIs may be affected too.

Right, sorry about the premature review request, I wish Phabricator had f? support.

There have been discussions on mitigating such issues on Android for a long time.
Generally speaking, all Gecko mechanics that are built on assumptions that don't hold on Android (e.g., long session durations and orderly process shutdowns in this case) are suspect to this, but that's a design issue we can only tackle in fragments.

I've been interested in an assessment of the potential (performance) risks involved with reducing the timeout. Even with additional mechanics based on Android-specific hints to flush storage, this could be a useful to reduce the chance for losing cached data.

(In reply to Andrew Sutherland [:asuth] from comment #10)

Is there any hint we can use to force a flush? For example, in the web platform we have the "pagehide" event[1] and the page-lifecycle spec is proposing a "freeze" event[2]. Since I think web apps are going to need to be able to receive such events themselves and then issue multiple rounds of async requests themselves (although new methods like IDB's commit()[3] are intended to help with this). If this isn't something we can currently do, then it either needs to be a priority for GeckoView to provide or we should maybe give up on GeckoView.

We could look into adding mechanics to flush storage when getting backgrounded or exposing a GeckoView API to allow flushing storage, which would delegate the responsibility to the app.

At a high level, LocalStorage is massively abused by many sites, so reducing the timer by a factor of 5 potentially will result in a 5x magnification of disk I/O. We can break the symmetry between bad/dumb LocalStorage spammy sites by introducing some kind of token bucket mechanism[4] so that non-spammy sites can experience reduced I/O delays for LocalStorage, but that's a bigger enhancement.

I wonder what the ratio between spammy to useful sites depending on persistent local storage is (or any better metric) to gauge what kind of resources we should invest into improving flushing mechanics on mobile.

If we were to expose a GeckoView API to flush storage, which storage APIs would it make sense to bundle/include into that?

Flags: needinfo?(esawin)

(In reply to Eugen Sawin [:esawin] from comment #11)

I wonder what the ratio between spammy to useful sites depending on persistent local storage is (or any better metric) to gauge what kind of resources we should invest into improving flushing mechanics on mobile.

I think a big problem with localStorage is that we can't assume the site is single-minded in its use of localStorage. localStorage usage patterns cover:

  • Small persistent values that change rarely, at least after LocalStorage's optimizations that void doing anything if the value written is the value that's already present.
  • As a shared-memory construct between windows that also generates "storage" events, allowing sites to use it for things like mutexes/only nag once. In these cases, persistence may not even be desired.
  • As an easy-to-use bulk-ish storage for a variety of things like attempting to cache JS resources, deferring failed network requests, or debug logging.

An even more clever LocalStorage implementation might track what's going on for LocalStorage use by tracking how different keys are used, both in their size and rate of value turnover. For example, one might imagine storing small saturating counters for keys that get decremented over time (maybe with 2 time-scales?) like early read count, late read count, write count, delete count (with delete count implying a bounded number of tombstones). The idea would be that a value that gets read a lot but written rarely should more urgently be flushed to disk. However values that experience high write counts, or have high turnover as indicated by high write/delete numbers might have their writes delayed for a very long time, especially if they're never read. (And the early/late read distinction could help us optimize our preload logic.) One might think of this as adopting a technique from generational garbage collection.

LocalStorage provides basically no durability/consistency guarantees per spec anymore (given that hand-waving is done in the face of multiprocess browsers), so it's really a question of webcompat at this point and to what extent browser vendors can work together to try and maintain sanity for a lot of this stuff. We probably want to try and standardize more as we get fancier, as the more complicated the implementation, the more potential problems we cause for users and other browser vendors.

If we were to expose a GeckoView API to flush storage, which storage APIs would it make sense to bundle/include into that?

So, I think LocalStorage and SessionStorage are the only persistence-related APIs that end up doing any type of intentional delaying of writes. (LocalStorage has its 5 second timer and SessionStorage currently is persisted by SessionStore which snoops on SessionStorage and has its own write throttling.) Every other storage API is best-effort and frequently provides feedback to content about when something is expected to have durably hit the disk (which may depend on guarantees not actually present on our Android implementations).

But take a step back here, I think these are the interesting lifecycle events that are potentially relevant from a storage perspective:

  1. The user is closing this tab.
  2. The user is switching away from this tab and may never interact with the tab again. This would also include the screen being turned off. This may also count for bfcache.
  3. The user is switching away from the browser and the browser may be terminated as a result.
  4. The user is explicitly quitting the browser.

I think we want LocalStorage's flushing to be aware of the 1st and 2nd cases. Right now LocalStorage probably already has what it needs for the 1st case assuming the nsGlobalWindowInner still gets explicitly closed. If you might optimize by not calling that, then a hint will be needed. The 2nd case LocalStorage we may also be able to get from the existing pagehide hookups in nsGlobalWindowInner and the bfcache. This is something LocalStorage also does not process and if GeckoView consumers don't tell Gecko this, they probably should... So I guess LocalStorage needs enhancements spun off for these and GeckoView just needs to be sure to be triggering these.

For the 3rd case, I think this is a notification Gecko doesn't really get but could be useful in the realm of something similar to the idle service. In theory the idle service helps Firefox/Gecko be responsive by not doing busy-work when the user might be actively using the browser. This is less relevant in a world Gecko is both more multi-threaded and more multi-process. It's also probably exactly wrong for Android (and maybe we don't generate idle service notifications for that?) as when the user switches away from the browser may be exactly the worst time to try and do busy-work, especially if it's likely to get terminated partway through. So the idea with a notification here would be to help QuotaManager and IndexedDB and other idle-service aware things to know to 100% not do that stuff right now, at least not until the user's explicit actions have been durably persisted to disk.

Having something like GeckoSession.flush() , which the app can call, will do it for me.

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #12)

Thanks a lot for the discussion!

Let me share some thoughts on your ideas.

  1. The user is closing this tab.
  2. The user is switching away from this tab and may never interact with the tab again. This would also include the screen being turned off. This may also count for bfcache.

GeckoView doesn't manage tabs. Tabs are an app-level mechanic and can be implemented in various ways, e.g., by having a 1:1 mapping of tabs to GeckoSessions, by reusing a GeckoSession to load session state for specific tabs or by loading session state into a new GeckoSession instance for each "tab-switch".

All tab-related interactions are less relevant for this issue, since this is done through user interaction with the app in foreground.

  1. The user is switching away from the browser and the browser may be terminated as a result.
  2. The user is explicitly quitting the browser.

There is really no regular quitting on Android, but case #3 is relevant.

For the 3rd case, I think this is a notification Gecko doesn't really get but could be useful in the realm of something similar to the idle service. In theory the idle service helps Firefox/Gecko be responsive by not doing busy-work when the user might be actively using the browser. This is less relevant in a world Gecko is both more multi-threaded and more multi-process. It's also probably exactly wrong for Android (and maybe we don't generate idle service notifications for that?) as when the user switches away from the browser may be exactly the worst time to try and do busy-work, especially if it's likely to get terminated partway through. So the idea with a notification here would be to help QuotaManager and IndexedDB and other idle-service aware things to know to 100% not do that stuff right now, at least not until the user's explicit actions have been durably persisted to disk.

I would like to know more about the idle service, it would be great if you could provide a relevant link.

I agree that it sounds like a risky mechanic for Android, getting "backgrounded" can mean process termination with little to no warning. Which is also why I think flushing in this phase would be risky, too.

(In reply to colormatch from comment #13)

Having something like GeckoSession.flush() , which the app can call, will do it for me.

If such an API existed, in which situation would you call it, in onSaveInstanceState?
Also, would reducing the flushing timeout to 1s be a reliable solution for your use case?

(In reply to Eugen Sawin [:esawin] from comment #14)

GeckoView doesn't manage tabs. Tabs are an app-level mechanic and can be implemented in various ways, e.g., by having a 1:1 mapping of tabs to GeckoSessions, by reusing a GeckoSession to load session state for specific tabs or by loading session state into a new GeckoSession instance for each "tab-switch".

Gotcha. But someone needs to poke Gecko from GeckoView to make https://developer.mozilla.org/en-US/docs/Web/API/Page_Visibility_API work or it's likely that pages will attempt to act like they're an active foregrounded tab at all times and Gecko will help them monopolize the CPU and GPU.

And the event example in https://www.html5rocks.com/en/tutorials/pagevisibility/intro/ works fine in the reference browser, so we're good to go. I think this means localStorage already has enough to go on to address this bug.

The only issue then becomes whether Gecko stays alive long enough for the changes to durably hit disk when tab hide/closure forces a flush. Certainly if there was a GeckoSession.flush method and it returned a Promise/Future so that Gecko can report back when the changes have hit the filesystem, that would help. (I say 'filesystem' because we don't necessarily need to wait for an fsync; we're guarding against application termination, not system crash.)

I would like to know more about the idle service, it would be great if you could provide a relevant link.

(In reply to Eugen Sawin [:esawin] from comment #14)

(In reply to colormatch from comment #13)

Having something like GeckoSession.flush() , which the app can call, will do it for me.

If such an API existed, in which situation would you call it, in onSaveInstanceState?
Also, would reducing the flushing timeout to 1s be a reliable solution for your use case?

I would call flush on:
onBackPressed() and onPause()

Reducing the timeout to 1s should be enough in my case, but would rather have fewer flushes.

If the timeout is settable by the host-app it will be even better.
If I can ensure a flush on onBackPressed() and onPause(), I'll probably go with a timeout of 30s.

[geckoview:fenix:p2] because Eugen says this bug doesn't need to block Fenix MVP.

Whiteboard: [geckoview:fenix:m4] → [geckoview:fenix:p2]
Depends on: 1544076

I think that we need to resolve this type of issues within GeckoView, without depending on app-level mechanics, as much as possible.

As a first step, we're going to make GeckoRuntime Android lifecycle-aware in bug 1544076. This should enable us to flush storage caches when getting backgrounded.

GV becoming lifecycle-aware of the host-app, may make it more robust, but steps into the app-level mechanics' dependence.

What happens to GV's lifecycle-monitoring if the host-app using GV decides to make a clean exit with Process.killProcess / System.exit onBackPressed?

In a browser if there were many tabs/sessions running, flushing may be more appropriate at tab-switching, and not flushing multiple sessions when getting backgrounded or destroyed.

In an host-app where the settings, choices, and user interaction values are constantly being updated in localStorage/sessionStorage (so the user can continue at the point they were, before closing the app), one may want to have very long flush timeouts, and only flush when the user is preparing to close the app, or getting backgrounded.

Blocks: 1545266

(In reply to colormatch from comment #19)

GV becoming lifecycle-aware of the host-app, may make it more robust, but steps into the app-level mechanics' dependence.

The goal is to decrease the dependence on app-level mechanics by reacting to fundamental process-level states.

What happens to GV's lifecycle-monitoring if the host-app using GV decides to make a clean exit with Process.killProcess / System.exit onBackPressed?

It's impossible to account for all specialized app-mechanics without overloading our API and increasing API maintenance.

However, once our automated lifecycle mechanics are more established, we can expose a simplified API to control it, to account for more irregular use cases.

In a browser if there were many tabs/sessions running, flushing may be more appropriate at tab-switching, and not flushing multiple sessions when getting backgrounded or destroyed.

I don't think I agree. Switching tabs potentially means fetching and rendering of resources from net and cache. You want to do this efficiently for optimal perceived performance. You don't want to introduce additional I/O for data unrelated to the new tab in this phase.

For that matter, app backgrounding is the better phase to flush queued caches. You still have the timeout-based flushing which should adequately account for the many-sessions scenario.

In an host-app where the settings, choices, and user interaction values are constantly being updated in localStorage/sessionStorage (so the user can continue at the point they were, before closing the app), one may want to have very long flush timeouts, and only flush when the user is preparing to close the app, or getting backgrounded.

When there are many of such elements that need to be saved, you don't want to do it in a single flush to avoid losing data. The combination of timeout and lifecycle-based flushing should be ideal for this use case. You would periodically flush storage when the app is in foreground (5s) and force-flush storage when getting backgrounded, no periodic flushes while being backgrounded.

(In reply to Eugen Sawin [:esawin] from comment #20)

(In reply to colormatch from comment #19)

What happens to GV's lifecycle-monitoring if the host-app using GV decides to make a clean exit with Process.killProcess / System.exit onBackPressed?

It's impossible to account for all specialized app-mechanics without overloading our API and increasing API maintenance.

However, once our automated lifecycle mechanics are more established, we can expose a simplified API to control it, to account for more irregular use cases.

How about having GeckoSession.close() do a flush() before closing?

In a browser if there were many tabs/sessions running, flushing may be more appropriate at tab-switching, and not flushing multiple sessions when getting backgrounded or destroyed.

I don't think I agree. Switching tabs potentially means fetching and rendering of resources from net and cache. You want to do this efficiently for optimal perceived performance. You don't want to introduce additional I/O for data unrelated to the new tab in this phase.

For that matter, app backgrounding is the better phase to flush queued caches. You still have the timeout-based flushing which should adequately account for the many-sessions scenario.

ex, currently I have a browser with 586 tabs opened.
If backgrounding in a case like this won't be a problem, we're all good. :)

Eugen, you have a patch on this bug to flush localStorage. Do you plan to land this as part of your GV lifecycle patches? Do we want this for Fenix MVP?

Flags: needinfo?(esawin)
Whiteboard: [geckoview:fenix:p2] → [geckoview:fenix:m5]

(In reply to Chris Peterson [:cpeterson] from comment #22)

Eugen, you have a patch on this bug to flush localStorage. Do you plan to land this as part of your GV lifecycle patches? Do we want this for Fenix MVP?

I don't think we want that patch, but a patch that does something along the lines described in https://bugzilla.mozilla.org/show_bug.cgi?id=1546430#c3.

I don't think this should be blocking MVP, but I would prefer to have a fix in to avoid issues with short-lived sessions like custom tabs.

Flags: needinfo?(esawin)

(In reply to colormatch from comment #21)

How about having GeckoSession.close() do a flush() before closing?

I think this is worth considering.

ex, currently I have a browser with 586 tabs opened.
If backgrounding in a case like this won't be a problem, we're all good. :)

We definitely need to make sure that is the case and btw. thanks a lot for your reports and feedback!

(In reply to Eugen Sawin [:esawin] from comment #23)

I don't think we want that patch, but a patch that does something along the lines described in https://bugzilla.mozilla.org/show_bug.cgi?id=1546430#c3.

I don't think this should be blocking MVP

OK

but I would prefer to have a fix in to avoid issues with short-lived sessions like custom tabs.

Will custom tab fix be your patch in bug 1546430? Should that background telemetry bug block Fenix MVP then?

Whiteboard: [geckoview:fenix:m5] → [geckoview:fenix:p2]

(In reply to Eugen Sawin [:esawin] from comment #24)

(In reply to colormatch from comment #21)

How about having GeckoSession.close() do a flush() before closing?

I think this is worth considering.

In my case this will be just as good as GeckoSession.flush(), so you have my vote for it.

ex, currently I have a browser with 586 tabs opened.
If backgrounding in a case like this won't be a problem, we're all good. :)

We definitely need to make sure that is the case and btw. thanks a lot for your reports and feedback!

imo, "to be sure" situations like this won't be a problem, GV will have to be aware of the host app's design, which moves away from the concept of independence.
Whether it will be "tab-switching" or some something else, I think this should be handled by the host-app/browser, which will decide on when it's best to force GV's session/s to flush().

And you are welcome. :)
GV runs my code 4 times faster and the interface is much smoother compared to WebView, so it is the obvious choice for me.

I'm editing a bunch of GeckoView bugs. If you'd like to filter all this bugmail, search and destroy emails containing this UUID:

e88a5094-0fc0-4b7c-b7c5-aef00a11dbc9

Priority: P1 → P2

We're seeing localStorage values not surviving through browser redirects during oauth 2 flow (https://oauth.net/2/). This is a serious problem as we need to maintain a state value through redirect to the underlying id authentication system.

I can give more detail to reproduce if needed, though I believe this defect covers the problem. Is there work being done to deal with this?

Thanks.

(In reply to Bruce Roberts from comment #28)

We're seeing localStorage values not surviving through browser redirects during oauth 2 flow (https://oauth.net/2/). This is a serious problem as we need to maintain a state value through redirect to the underlying id authentication system.

I can give more detail to reproduce if needed, though I believe this defect covers the problem. Is there work being done to deal with this?

Please file a new bug as oauth flows could be interacting with multi-process issues and/or tracking protection. In particular, please be very clear about what versions of the browser are in use on what platforms, what domains are in use, and whether iframes are involved. It's probably best to file the bug in Core :: DOM: Web Storage to start with and we can move it as appropriate. https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=DOM%3A+Web+Storage should get you there.

See Also: → 1572232
Rank: 25
Whiteboard: [geckoview:fenix:p2]
Attachment #9055538 - Attachment is obsolete: true
Whiteboard: [geckoview:m88]
Assignee: esawin → nobody
Priority: P2 → P3
Whiteboard: [geckoview:m88]
Severity: normal → S3
Rank: 25 → 333
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: