Walmart Grocery service worker fails to launch after initial run
Categories
(Core :: DOM: Service Workers, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox72 | --- | unaffected |
firefox73 | + | verified disabled |
firefox74 | + | verified |
firefox75 | --- | verified |
People
(Reporter: chris.cushman, Assigned: perry)
References
Details
(Keywords: regression, reproducible)
Attachments
(3 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0
Steps to reproduce:
- Go to: https://grocery.walmart.com/
- Click the picture of the first item you see.
- You will be navigated to that product. Close out the page and try to go back to the website.
(to fix the problem you will have to clear the cookies for that website: https://support.mozilla.org/en-US/kb/clear-cookies-and-site-data-firefox#w_clear-cookies-for-any-website)
Actual results:
The website will not load. You will see a blank page with the address https://grocery.walmart.com/ in the URL bar.
Expected results:
The page should load.
Reporter | ||
Comment 1•5 years ago
|
||
I didn't know about corrupt cookies until my question was answered by the exceptional support volunteers: https://support.mozilla.org/en-US/questions/1273210
Comment 2•5 years ago
|
||
Hi Chris,
Thanks for contacting us. Given that your question was already answered then, is it ok if we mark this issue as resolved?
Regards, have a great day!
Clara
Updated•5 years ago
|
Reporter | ||
Comment 3•5 years ago
|
||
Hi Clara,
This is a temporary fix and I was hoping that you guys could find out the actual issue and solve it so that I don't have to keep clearing the cookies from the specific website each time. I asked the question because I couldn't get the site to work at all and they told me how to work around it, I mentioned this so it would give you guys a better idea of how to fix it.
Reporter | ||
Updated•5 years ago
|
Comment 4•5 years ago
|
||
Hi Chris, my apologies!
I've chosen a component for this bug in hope that someone with more expertise may look at it. We'll await their answer. If you consider that there's another component that's more proper for this case you may change it.
Regards, Clara.
Comment 6•5 years ago
|
||
This bug doesn't have a reproducible set of steps to reproduce. Without that I'm afraid there isn't anything that I or anyone else can really do here.
Comment 7•5 years ago
|
||
I managed to reproduce it with comment 0 very intermittently. Once I got into a state that it would fail to load the page, I tried to see which cookies were "corrupted", and it seems it's not the cookies at all here, but the service worker that's messing things up.
Steps that always reproduce for me:
- start with clean profile
- go to https://grocery.walmart.com/ and do what comment 0 says
- open page to about:debugging and > This Nightly > Take a look at active SW
- Wait for the wallmart SW to go into stopped state
- go to https://grocery.walmart.com/ again and witness bug
- Optionally, click start on SW. Terminal output:
Child 14018, Main Thread] WARNING: '!window', file /home/icecold/mozilla-central/dom/cache/CacheStorage.cpp, line 573
[Child 14018, Main Thread] WARNING: '!window', file /home/icecold/mozilla-central/dom/cache/CacheStorage.cpp, line 573
[Child 14018, Main Thread] WARNING: 'mIsWorkerScript || (mState != ServiceWorkerState::Parsed && mState != ServiceWorkerState::Installing)', file /home/icecold/mozilla-central/dom/workers/ScriptLoader.cpp, line 1717
[Parent 13857, IPDL Background] WARNING: 'aOwner->mState == RemoteWorkerController::eTerminated', file /home/icecold/mozilla-central/dom/workers/remoteworkers/RemoteWorkerController.cpp, line 399
[Parent 13857, IPDL Background] WARNING: 'aResult.IsReject()', file /home/icecold/mozilla-central/dom/workers/remoteworkers/RemoteWorkerControllerParent.cpp, line 108
Updated•5 years ago
|
Comment 9•5 years ago
|
||
Calling this tracking+ for now, but it sounds to me like this could be a blocker for sw-e10s if confirmed.
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Updated•5 years ago
|
Updated•5 years ago
|
Comment 15•5 years ago
|
||
The service worker imports scripts:
importScripts(
'/sw-data.js?cb=' + Date.now(),
'https://storage.googleapis.com/workbox-cdn/releases/3.4.1/workbox-sw.js'
);
The first script fails to load since this is a second run of the service worker, where we don't allow it to load new scripts, and that script is always new due to the use of Date.now()
. I'd like to know how it worked with parent intercept off; we should probably also rethink the policy regarding loading new scripts in subsequent runs.
Comment 16•5 years ago
|
||
We used to forbid service worker from importing new scripts after
installation, restricting them to scripts they already imported.
This restriction never actually worked before the inception of
parent intercept mode, and now it turns out that the Walmart
Grocery site depends on this ability to function properly. This
commit removes the restriction altogether in order to avoid
breaking sites that already depend on it.
Comment 17•5 years ago
|
||
Does Chrome allow these sorts of second-run imports? Does the spec?
Comment 18•5 years ago
|
||
The spec forbids them entirely. See https://phabricator.services.mozilla.com/D60698#1851619 for a little more discussion.
Comment 19•5 years ago
|
||
I see. What does Chrome do here?
Comment 20•5 years ago
|
||
Comment 21•5 years ago
|
||
Comment 22•5 years ago
|
||
So we have actually an explicit WPT test that ensures we do not allow these second-run imports? Then I wonder how we ever passed this test (or has it been enabled recently?).
If the spec is such clear, probably we must break the site then. Given that we have some time until SW e10s is enabled in release, maybe we can just warn them?
Comment 23•5 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #22)
So we have actually an explicit WPT test that ensures we do not allow these second-run imports? Then I wonder how we ever passed this test (or has it been enabled recently?).
The failure of installing.https.html
is a TEST-UNEXPECTED-PASS
, it is configured to expect FAIL at https://searchfox.org/mozilla-central/rev/ad6234148892bc29bf08d736604ac71925138040/testing/web-platform/meta/service-workers/service-worker/installing.https.html.ini
Comment 24•5 years ago
|
||
Failure log: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=641d24ccb2f69c69b13aff168d983ec057857807&selectedJob=286975206
https://treeherder.mozilla.org/logviewer.html#?job_id=286957357&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/92d629854868
There is:
TEST-UNEXPECTED-FAIL | /service-workers/service-worker/import-scripts-updated-flag.https.html | import script not previously imported - assert_equals: expected (string) "NetworkError" but got (object) null
TEST-UNEXPECTED-PASS | /service-workers/service-worker/installing.https.html | installing is set - expected FAIL
Comment 25•5 years ago
|
||
There's a test that ensures that we don't allow new scripts to be imported, and it's 3 years old. Since it passed with child intercept, it seems that it had the same behavior -- at least in some cases. I'll try to find out why it doesn't stop this website from running its service worker, and also why Chrome lets it run. Anyhow, we should probably drop this patch.
So far I didn't manage to get in touch with anyone at Walmart; any ideas would be helpful.
Comment 26•5 years ago
|
||
:mconca, can you try to find a way to reach out to Walmart? Thank you!
Comment 27•5 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #22)
If the spec is such clear, probably we must break the site then. Given that we have some time until SW e10s is enabled in release, maybe we can just warn them?
The rationale for this fix that we attempted to land (previously discussed at https://phabricator.services.mozilla.com/D60698#1851619) was that we don't want to change this behavior between child-intercept and parent-intercept because it confuses the decision process in switching to parent intercept. Especially since broken Service Workers will manifest as broken sites until we learn how to mark ServiceWorkers as broken and automatically bypass them. (We do have a bug on that at least for corrupted content purposes, but there are variants on this failure mode that wouldn't result in an explicit corrupted content error page that would still be worth reloading the page to bypass the ServiceWorker when this happens.)
If we land this patch/another variant that maintains the child-intercept status quo, then we would want a follow-up bug to fix this behavior and improve the WPT tests to cover this exact scenario. We still pass the WPT test added (https://searchfox.org/mozilla-central/source/dom/serviceworkers/test/importscript_worker.js#31) which tests new loads after the top-level script execution has completed, at least. Haven't looked at the regressions on this bug yet.
Comment 28•5 years ago
|
||
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #19)
I see. What does Chrome do here?
It appears that what happens is:
- The ServiceWorker throws an error.
- Intercept is aborted and the network connection is reset to go to the network.
navigator.serviceWorker.controller
ends up null.- The network panel shows that the connection went to the network.
- They constantly end up reinstalling the ServiceWorker with every visit, which is interesting because the bytewise checks should think things are stable and the soft update should only run once per day.
Comment 29•5 years ago
•
|
||
I will see if we have a way to contact Walmart. I would not be comfortable landing a patch that breaks previous Firefox behavior, even if it is more correct according to the spec. Walmart is the case we know about, there are likely other sites that do this that we are unaware of, especially if Chrome has been allowing this behavior.
Comment 30•5 years ago
|
||
(In reply to Mike Conca [:mconca] from comment #29)
I will see if we have a way to contact Walmart. I would not be comfortable landing a patch that breaks previous Firefox behavior, even if it is more correct according to the spec. Walmart is the case we know about, there are likely other sites that do this that we are unaware of, especially if Chrome has been allowing this behavior.
I had this page cached, so didn't see asuth's comment on Chrome behavior before commenting. Looks like Chrome doesn't permit new script import, but the site works because it is reinstalling the worker on each visit. That makes me feel better about this patch, and most sites wouldn't allow new imports because Chrome hasn't allowed it. I will continue to contact Walmart about this issue.
Comment 31•5 years ago
|
||
The fallback seems to be non-standard behavior. I filed an issue with the spec to have it specified.
Comment 32•5 years ago
|
||
Just to be clear: does this mean, we will not land the patch?
Comment 33•5 years ago
|
||
I think the best solution in this case is to go to the network like Chrome does, and if possible, standardize this behavior. This means we won't land the current patch, and a new one is required.
Comment 34•5 years ago
|
||
Fx73 was reverted to the previous behavior. Chris, can you confirm that current Fx73 beta builds are working OK for you?
Reporter | ||
Comment 35•5 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #34)
Fx73 was reverted to the previous behavior. Chris, can you confirm that current Fx73 beta builds are working OK for you?
Yes version 73.0 (64-bit) is working for me! Nightly version 74.0a1 (2020-02-04) (64-bit) still doesn't though.
Comment 36•5 years ago
•
|
||
Thank you for the confirmation. It is expected to not work with the current Nightly.
Updated•5 years ago
|
Comment 37•5 years ago
|
||
(In reply to Yaron Tausky [:ytausky] from comment #33)
I think the best solution in this case is to go to the network like Chrome does, and if possible, standardize this behavior. This means we won't land the current patch, and a new one is required.
It seems like we have a good path forward for this issue. In parallel, we've reached out to Walmart via two different channels to let them know 1) we are working on this, and 2) the design of their site and use of service workers is not optimal for users of any browser.
Comment 38•5 years ago
|
||
Summarizing my current understanding of what's going on and what should happen (but currently doesn't):
- The service worker is installed.
- The service worker shuts down.
- The user navigates to the website, triggering the Handle Fetch algorithm through step 3.1 of the HTTP fetch algorithm.
- In step 18 of the algorithm the Run Service Worker algorithm is called.
- In step 7.12 the worker's script is evaluated. During the evaluation the script calls
importScripts
, and fails the check in step 4.1, so aNetworkError
is thrown. - In step 7.13 of the Run Service Worker algorithm we erroneously set startFailed to true and abort, instead of continuing.
- Back in step 18 of the Handle Fetch algorithm we erroneously set handleFetchFailed to true and skip step 19.
- In step 23 we erroneously return a network error instead of continuing to step 24 and returning response, which is null.
- Back in the HTTP fetch algorithm we don't fall back to the network, as we should.
So this situation is actually specified correctly and our implementation deviates from the spec starting at point 6 above.
Assignee | ||
Comment 39•5 years ago
|
||
Adding to comment 38, in particular, step 7.12 of the Run Service Worker algorithm states:
Let evaluationStatus be the result of running the classic script script if script is a classic script, otherwise, the result of running the module script script if script is a module script.
The run a classic script algorithm is run with rethrow errors not given, so its value is false. Additionally, script's error to rethrow is null; this value is only non-null in the event of a script parse error, which couldn't have happened because the script is known to have been evaluated when the ServiceWorker installs, and script evaluation implies no parse error. So, we definitely reach step 7 of run a classic script:
Otherwise, set evaluationStatus to ScriptEvaluation(script's record).
We also definitely reach step 12.a of the ScriptEvaluation algorithm:
Set result to the result of evaluating scriptBody.
Because evaluating the ServiceWorker script in its activated
state throws an uncaught exception, result.[[Type]] is "throw" and evaluationStatus from run a classic script is an abrupt completion. Then, because rethrow errors is false, step 8.3.1 of run a classic script returns the evaluationStatus. Finally, because evaluationStatus.[[Value]] is not empty (a NetworkError
was thrown) and the [Terminate Service Worker] algorithm doesn't run, neither steps 7.13 nor 7.14 of Run Service Worker sets startFailed to true. Rather, startFailed remains false throughout Run Service Worker and the algorithm returns the evaluationStatus of the ServiceWorker script (which is the serviceWorker's start status) and not failure.
The final result is that, in this case, the Handle Fetch's step 18
If the result of running the Run Service Worker algorithm with activeWorker is failure, then set handleFetchFailed to true.
does not set handleFetchFailed to true because the result of Run Service Worker is not failure.
Updated•5 years ago
|
Assignee | ||
Comment 40•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 41•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Comment 44•5 years ago
|
||
bugherder |
Comment 46•5 years ago
|
||
Backed out for permafailing on shared-worker-import-data-url.any.html
backout: https://hg.mozilla.org/integration/autoland/rev/ff928c66923e98f671f44592a1f179c780ca7a75
failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=288713695&repo=autoland&lineNumber=9737
[task 2020-02-13T13:34:03.867Z] 13:34:03 INFO - TEST-START | /workers/modules/shared-worker-import-data-url.any.html
[task 2020-02-13T13:34:03.872Z] 13:34:03 INFO - Closing window 53
[task 2020-02-13T13:34:14.152Z] 13:34:14 INFO - TEST-UNEXPECTED-ERROR | /workers/modules/shared-worker-import-data-url.any.html | SyntaxError: import declarations may only appear at top level of a module
[task 2020-02-13T13:34:14.152Z] 13:34:14 INFO - data:text/javascript,import "http://web-platform.test:8000/workers/modules/resources/static-import-worker.js?pipe=header(Access-Control-Allow-Origin, *)";:1:0
[task 2020-02-13T13:34:14.152Z] 13:34:14 INFO - TEST-INFO expected TIMEOUT | took 10287ms
Comment 53•5 years ago
|
||
Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/ff928c66923e
Assignee | ||
Comment 54•5 years ago
|
||
The failure is "TEST-UNEXPECTED-ERROR | /workers/modules/shared-worker-import-data-url.any.html | SyntaxError: import declarations may only appear at top level of a module", but the expectation described in shared-worker-import-data-url.any.js.ini is "[TIMEOUT, ERROR]" for the test case, so I'm not sure why it was reported as an unexpected error. I'll re-run the WPTs.
Assignee | ||
Comment 55•5 years ago
|
||
Comment 56•5 years ago
|
||
(In reply to Perry Jiang [:perry] from comment #54)
The failure is "TEST-UNEXPECTED-ERROR | /workers/modules/shared-worker-import-data-url.any.html | SyntaxError: import declarations may only appear at top level of a module", but the expectation described in shared-worker-import-data-url.any.js.ini is "[TIMEOUT, ERROR]" for the test case, so I'm not sure why it was reported as an unexpected error. I'll re-run the WPTs.
It's a frequent intermittent: bug 1612699.
Comment 57•5 years ago
|
||
Bugbug thinks this bug is a regression, but please revert this change in case of error.
Comment 58•5 years ago
|
||
Comment 61•5 years ago
|
||
bugherder |
Assignee | ||
Comment 64•5 years ago
•
|
||
Comment on attachment 9125694 [details]
Bug 1603484 - worker script evaluation resulting in abrupt completion shouldn't be a start failure r?#dom-workers-and-storage,asuth
Beta/Release Uplift Approval Request
- User impact if declined: Workers that throw an uncaught exception during initial script evaluation will be considered closed, when they should be considered running (according to the web/shared workers specifications).
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: https://bugzilla.mozilla.org/show_bug.cgi?id=1603484#c0
- List of other uplifts needed: https://phabricator.services.mozilla.com/D63348
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The change is just allowing the worker to continue running if it throws an uncaught exception.
- String changes made/needed:
Assignee | ||
Updated•5 years ago
|
Comment 65•5 years ago
•
|
||
Verified the issue with Firefox 75.0a1 (20200219215002) on Windows 10x64, macOS 10.15 and Ubuntu 18.04. The https://grocery.walmart.com/ website is working after following the STR from comment 0 and comment 7.
The only problem is that after loading https://grocery.walmart.com/ in a new tab Firefox tries to register a new service worker for the page in about:debugging#/runtime/this-firefox and for a brief period of time. Is this the expected behavior? Thanks!
Comment 66•5 years ago
|
||
AFAIU, yes. The site is behaving in a strange and unexpected but not forbidden way. See comment 16 and comment 39 for details. We added also an explicit WPT test.
Updated•5 years ago
|
Comment 67•5 years ago
|
||
Comment on attachment 9125694 [details]
Bug 1603484 - worker script evaluation resulting in abrupt completion shouldn't be a start failure r?#dom-workers-and-storage,asuth
Fix for a top site, patch has tests, uplift approved for 74.0b6, thanks.
Note to Sheriffs: there is an additional patch to uplift to beta mentioned in the uplift request.
Assignee | ||
Comment 68•5 years ago
|
||
(In reply to Alexandru Trif, QA [:atrif] from comment #65)
Verified the issue with Firefox 75.0a1 (20200219215002) on Windows 10x64, macOS 10.15 and Ubuntu 18.04. The https://grocery.walmart.com/ website is working after following the STR from comment 0 and comment 7.
The only problem is that after loading https://grocery.walmart.com/ in a new tab Firefox tries to register a new service worker for the page in about:debugging#/runtime/this-firefox and for a brief period of time. Is this the expected behavior? Thanks!
Yes, this looks like the expected behavior. The second service worker that appears to be registering has the same URL as the one already registered (at least it does for me when I try), so the second service worker should disappear from about:debugging.
Comment 69•5 years ago
|
||
(In reply to Perry Jiang [:perry] from comment #68)
(In reply to Alexandru Trif, QA [:atrif] from comment #65)
Verified the issue with Firefox 75.0a1 (20200219215002) on Windows 10x64, macOS 10.15 and Ubuntu 18.04. The https://grocery.walmart.com/ website is working after following the STR from comment 0 and comment 7.
The only problem is that after loading https://grocery.walmart.com/ in a new tab Firefox tries to register a new service worker for the page in about:debugging#/runtime/this-firefox and for a brief period of time. Is this the expected behavior? Thanks!Yes, this looks like the expected behavior. The second service worker that appears to be registering has the same URL as the one already registered (at least it does for me when I try), so the second service worker should disappear from about:debugging.
Yep is disappearing after a period of time. That being said the issue is verified with Firefox 75.0a1 (20200219215002) on Windows 10x64, macOS 10.15 and Ubuntu 18.04.
Comment 70•5 years ago
|
||
bugherder uplift |
Comment 71•5 years ago
|
||
Verified with Firefox 74.0b6 (20200221001238) on Windows 10x64, macOS 10.15 and Ubuntu 18.04. The website is working after following STR from comment 0 and comment 7.
Description
•