Closed Bug 1581187 Opened 2 months ago Closed Last month

Blank address bar and page upon session restore for pages with registered Service Workers

Categories

(Core :: DOM: Service Workers, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla71
Tracking Status
firefox71 --- verified

People

(Reporter: overholt, Assigned: perry)

References

Details

Attachments

(2 files)

With parent intercept enabled (dom.serviceWorkers.parent_intercept = true), upon restart (i.e. session restore), pages with registered service workers that I have in pinned tabs (e.g. GMail, Slack) "restore" to blank pages with empty URL bars.

Perhaps related: sometimes going to twitter.com I get a blank page but not a blank URL bar (i.e. what I typed in/completed "https://www.twitter.com" remains as I typed/completed it). This only happens on the first attempt (I think in a session) to go to the page.

Priority: -- → P2
Blocks: 1456995
No longer depends on: ServiceWorkers-e10s

I was able to partially reproduce this, I got a blank page (but not empty URL bar) on my gmail pinned tab after session restore with SW parent intercept enabled and fission enabled. A refresh on it brought it back up.

asuth has this captured in pernosco as of at least yesterday and is working on a fix.

Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Pushed by pjiang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/655e52f0fae3
only shutdown remote Service Workers on browser shutdown r=asuth
Status: ASSIGNED → RESOLVED
Closed: Last month
Resolution: --- → FIXED
Target Milestone: --- → mozilla71
Assignee: bugmail → perry

Hello!
Reproduced the issue using Firefox 71.0a1 (20190913092859) on Windows 10x64 by following the next steps:

  1. "Restore previous session" activated and a registered Service Worker on twitter or https://www.united.com/
  2. Closing and reopening Firefox.

The issue is verified fixed with Firefox 71.0a1 (20191009213914) on Windows 10x64, macOS 10.14 and Ubuntu 18.04. The pages are correctly restored after closing and reopening Firefox while having a registered Service Worker on the mentioned websites. No blank pages/ tabs are presented.

Status: RESOLVED → VERIFIED

Hi all.

For the last few nightlies I appear to be having this issue (the last 3-4 days I'd say). It is still occurring with the latest nightly (14-10-2019). This only seems to impact pinned tabs and even then only a few of them.

Is there something that I need to do to the session restore file in order to correct this issue? Otherwise it might be that this bug isn't completely resolved, or a similar bug is at play here.

Hello Ryan!
I can't seem to reproduce this while having several pinned tabs and closing/ reopening Firefox.
Can you provide us the list of websites and platforms that you are experiencing this with?
Are there some particular steps to trigger the issue?
Changing dom.serviceWorkers.parent_intercept = false fixes the issue?
Is this issue reproducible while using a clean profile? Please be aware that creating or switching to a new profile can cause migraines to people with motion sensitivity. Thank you!

Flags: needinfo?(sciguyryan)

(In reply to Alexandru Trif, QA [:atrif] from comment #8)

Hello Ryan!
I can't seem to reproduce this while having several pinned tabs and closing/ reopening Firefox.
Can you provide us the list of websites and platforms that you are experiencing this with?
Are there some particular steps to trigger the issue?
Changing dom.serviceWorkers.parent_intercept = false fixes the issue?
Is this issue reproducible while using a clean profile? Please be aware that creating or switching to a new profile can cause migraines to people with motion sensitivity. Thank you!

Hi Alexandru.

Thanks for your response. I was experiencing the issue with Twitter in particular. However flipping the dom.serviceWorkers.parent_intercept to false and then back to true again seems to have corrected it upon restart and I can no longer reproduce.

I'll keep an eye on it to see if it resurfaces. I don't have a good handle on why that would have fixed it, but it seems to have done so.

Flags: needinfo?(sciguyryan)

(In reply to Ryan Jones-Ward [:sciguyryan] from comment #9)

(In reply to Alexandru Trif, QA [:atrif] from comment #8)

Hello Ryan!
I can't seem to reproduce this while having several pinned tabs and closing/ reopening Firefox.
Can you provide us the list of websites and platforms that you are experiencing this with?
Are there some particular steps to trigger the issue?
Changing dom.serviceWorkers.parent_intercept = false fixes the issue?
Is this issue reproducible while using a clean profile? Please be aware that creating or switching to a new profile can cause migraines to people with motion sensitivity. Thank you!

Hi Alexandru.

Thanks for your response. I was experiencing the issue with Twitter in particular. However flipping the dom.serviceWorkers.parent_intercept to false and then back to true again seems to have corrected it upon restart and I can no longer reproduce.

I'll keep an eye on it to see if it resurfaces. I don't have a good handle on why that would have fixed it, but it seems to have done so.

I'm glad that the problem is fixed. If the problems still ocurs in the future please let us know. Thank you for your help!

Ryan, FYI to help understand if the problem is recurring or a new one, (and for posterity) the underlying problem was basically:

  • A registered ServiceWorker is persisted to disk in 2 places:
    1. The ServiceWorkerRegistrar stores the existence of the registration and the Cache API storage name in PROFILE/serviceworker.txt
    2. The actual storage of the script happens in Cache API storage under PROFILE/storage/default/ORIGIN/cache
  • At shutdown we were purging the Cache API storage for ServiceWorkers, but leaving the registration around. (Shutdown behavior was improved, but there was a surprise hiding in some logic...). The way this was happening was fundamentally race-prone and so the purge might not actually happen based on a number of other browser shutdown factors.
    • There is work under way to improve the test infrastructure so that these types of scenarios can be more easily tested without resorting to writing python marionette tests.
  • So when the browser was next started there was no longer actually any storage for any ServiceWorkers. We have logic in place to handle the problem where a registration exists in serviceworker.txt but there is no backing Cache API storage by purging the registration, but it still breaks the first load. The next time the load is attempted, there is no longer a ServiceWorker, so the normal page load flow happens, and then this reinstalls the ServiceWorker.

So it wouldn't be surprising for you to experience an initial failed load on other sites that have ServiceWorkers but you haven't visited since the bug was fixed. It would be surprising for a single site like twitter to have the problem recur. However, the session store could have become unhappy about the failed loads and had some weird state that is now corrected.

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #11)

Ryan, FYI to help understand if the problem is recurring or a new one, (and for posterity) the underlying problem was basically:

  • A registered ServiceWorker is persisted to disk in 2 places:
    1. The ServiceWorkerRegistrar stores the existence of the registration and the Cache API storage name in PROFILE/serviceworker.txt
    2. The actual storage of the script happens in Cache API storage under PROFILE/storage/default/ORIGIN/cache
  • At shutdown we were purging the Cache API storage for ServiceWorkers, but leaving the registration around. (Shutdown behavior was improved, but there was a surprise hiding in some logic...). The way this was happening was fundamentally race-prone and so the purge might not actually happen based on a number of other browser shutdown factors.
    • There is work under way to improve the test infrastructure so that these types of scenarios can be more easily tested without resorting to writing python marionette tests.
  • So when the browser was next started there was no longer actually any storage for any ServiceWorkers. We have logic in place to handle the problem where a registration exists in serviceworker.txt but there is no backing Cache API storage by purging the registration, but it still breaks the first load. The next time the load is attempted, there is no longer a ServiceWorker, so the normal page load flow happens, and then this reinstalls the ServiceWorker.

So it wouldn't be surprising for you to experience an initial failed load on other sites that have ServiceWorkers but you haven't visited since the bug was fixed. It would be surprising for a single site like twitter to have the problem recur. However, the session store could have become unhappy about the failed loads and had some weird state that is now corrected.

Hi Andrew.

After a cold restart of my system I was able to reproduce this again. If I go through the process of flipping the pref then the same thing will happen again - good until the next cold restart.

Two pinned tabs (Twitter and GMail) were both non-loaded and when I clicked on them nothing was displayed at all. I'm not really sure how to to go about presenting this as testable for it to be debugged. If anyone has any information to help me get the information required to debug this, please let me know. Happy to provide whatever I can.

Cheers all.

Thanks for your assistance in digging into this deeper, Ryan. (Aside: I'm going to include various menu path you're probably already aware of and which I include for the benefit of anyone else who is able to reproduce.)

To rule out edge-cases, I first want to check: Do you use any settings in about:preferences#privacy (via the hamburger menu, "Preferences", then clicking on the "Privacy & Security" tab) that cause Firefox to clear data at shutdown? Specifically, have you either checked "Delete cookies and site data when Nightly is closed" under "Cookies and Site Data" or have you changed "Nightly will...Remember history" under "History" to something else?

Assuming you have the defaults set there, then the question would be are there any incriminating looking error messages in the "Browser Console" when you start up Firefox? (The Browser Console is accessible from the hamburger menu under "Web Developer".) The errors would start with "Quota" that I would expect to happen. Then the other question is whether the devtools for either tab show anything interesting that mentions service workers. (They are accessible via hamburger menu, "Web Developer", "Toggle Tools".)

Assuming there's nothing obvious there, what would be ideal is if we can use some other site that uses a ServiceWorker to test whether the problem is only happening at startup, or if it's also happening the first time you go to load a ServiceWorker even later in the browsing session. For example, https://whatwg.org/ uses a ServiceWorker. The general test idea here would be:

  • Before shutting down the browser, go to https://whatwg.org/ and make sure the page loads and that a ServiceWorker has been installed.
  • (Firefox gets shutdown.)
  • Next time you start the browser, observe whether twitter/gmail are broken.
  • Browse to https://whatwg.org/ via opening a new tab and typing, autocompleting, or pasting the URL. See if the page ends up blank with an empty URL bar similar to twitter/gmail. Knowing whether it loads okay or not would be very informative.

In the meantime we can try and add additional logging that would show up in the browser or devtool consoles that explain what's going on.

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #13)

Thanks for your assistance in digging into this deeper, Ryan. (Aside: I'm going to include various menu path you're probably already aware of and which I include for the benefit of anyone else who is able to reproduce.)

To rule out edge-cases, I first want to check: Do you use any settings in about:preferences#privacy (via the hamburger menu, "Preferences", then clicking on the "Privacy & Security" tab) that cause Firefox to clear data at shutdown? Specifically, have you either checked "Delete cookies and site data when Nightly is closed" under "Cookies and Site Data" or have you changed "Nightly will...Remember history" under "History" to something else?

I usually have the privacy set to strict, but I can still reproduce this with it set to the standard setting too.
Clear Offline Website Data is enabled on shut down. That's the only thing set there.

Assuming you have the defaults set there, then the question would be are there any incriminating looking error messages in the "Browser Console" when you start up Firefox? (The Browser Console is accessible from the hamburger menu under "Web Developer".) The errors would start with "Quota" that I would expect to happen. Then the other question is whether the devtools for either tab show anything interesting that mentions service workers. (They are accessible via hamburger menu, "Web Developer", "Toggle Tools".)

There is nothing of that sort listed in the logs at all. I have set the logs to persist to see if I missed anything of relevance. If there is, I cannot find it. I will include a copy of the log here for you to have a look, you might have a better idea as to what you're looking for.

Assuming there's nothing obvious there, what would be ideal is if we can use some other site that uses a ServiceWorker to test whether the problem is only happening at startup, or if it's also happening the first time you go to load a ServiceWorker even later in the browsing session. For example, https://whatwg.org/ uses a ServiceWorker. The general test idea here would be:

  • Before shutting down the browser, go to https://whatwg.org/ and make sure the page loads and that a ServiceWorker has been installed.
  • (Firefox gets shutdown.)
  • Next time you start the browser, observe whether twitter/gmail are broken.
  • Browse to https://whatwg.org/ via opening a new tab and typing, autocompleting, or pasting the URL. See if the page ends up blank with an empty URL bar similar to twitter/gmail. Knowing whether it loads okay or not would be very informative.

Fascinating. I have done this, exactly as you outlined and it does indeed reproduce the issue every time.

In the meantime we can try and add additional logging that would show up in the browser or devtool consoles that explain what's going on.

Thanks. If there is anything else you'd like me to try, please let me know. Happy to help pin this one down.

Attached file log.txt

This is a copy of the Web Browser log when starting the browser.

Thanks very much for the response and data. This suggests there's a problem where Sanitizer.jsm is not clearing the ServiceWorker registrations but is clearing the underlying cache storages. The logic to clear the registrations happens before clearing the cache storages at https://searchfox.org/mozilla-central/rev/ebe492edacc75bb122a2b380e4cafcca3470864c/mobile/android/modules/Sanitizer.jsm#215 suggesting there's a regression occurring.

And indeed the problem seems to be that:

This needs a little more analysis, but broadly speaking based on existing control flow, the options would be to:

  1. Move the PropagateUnregister parent-intercept-bail check down below the call to ServiceWorkerRegistrar::UnregisterServiceWorker at https://searchfox.org/mozilla-central/rev/ebe492edacc75bb122a2b380e4cafcca3470864c/dom/serviceworkers/ServiceWorkerManagerService.cpp#169
  2. Expose an explicit non-hacky method on nsIServiceWorkerManager with explicit registration-cleared-by-chrome semantics that calls ServiceWorkerManager::MaybeSendUnregister under the hood.
  3. Fix bug 1183245 so that it's the clearing of the cache storage that triggers the registration to be wiped, helping us eliminate the hacky ServiceWorkerCleanUp.jsm module (https://searchfox.org/mozilla-central/source/toolkit/components/cleardata/ServiceWorkerCleanUp.jsm) which only exists because that bug isn't fixed.

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #16)

Thanks very much for the response and data. This suggests there's a problem where Sanitizer.jsm is not clearing the ServiceWorker registrations but is clearing the underlying cache storages. The logic to clear the registrations happens before clearing the cache storages at https://searchfox.org/mozilla-central/rev/ebe492edacc75bb122a2b380e4cafcca3470864c/mobile/android/modules/Sanitizer.jsm#215 suggesting there's a regression occurring.

And indeed the problem seems to be that:

This needs a little more analysis, but broadly speaking based on existing control flow, the options would be to:

  1. Move the PropagateUnregister parent-intercept-bail check down below the call to ServiceWorkerRegistrar::UnregisterServiceWorker at https://searchfox.org/mozilla-central/rev/ebe492edacc75bb122a2b380e4cafcca3470864c/dom/serviceworkers/ServiceWorkerManagerService.cpp#169
  2. Expose an explicit non-hacky method on nsIServiceWorkerManager with explicit registration-cleared-by-chrome semantics that calls ServiceWorkerManager::MaybeSendUnregister under the hood.
  3. Fix bug 1183245 so that it's the clearing of the cache storage that triggers the registration to be wiped, helping us eliminate the hacky ServiceWorkerCleanUp.jsm module (https://searchfox.org/mozilla-central/source/toolkit/components/cleardata/ServiceWorkerCleanUp.jsm) which only exists because that bug isn't fixed.

Thanks for the breakdown on what could be going on here. As I said in my last post, if there is anything you'd like me to test (specific builds or whatnot) or if there is any further information I can provide please let me know. I will answer any needinfo's sent my way.

I'm still waiting for some more information here. Does this bug need to be reopened to track the progress here? As it stands, the bug is still reproducible in the latest nightly and I still haven't been able to get any closer to figuring out why.

(In reply to Ryan Jones-Ward [:sciguyryan] from comment #18)

I'm still waiting for some more information here. Does this bug need to be reopened to track the progress here? As it stands, the bug is still reproducible in the latest nightly and I still haven't been able to get any closer to figuring out why.

Apologies, I spun off comment 16 to bug 1589708 in haste and failed to correctly cc you and establish the "see also" link. It sounds like the fix there was insufficient and we'll need to reopen.

You need to log in before you can comment on or make changes to this bug.