Closed Bug 1247580 Opened 4 years ago Closed 4 years ago

service worker controlled pages sometimes fail to load and you just see a white screen

Categories

(Core :: DOM: Service Workers, defect)

32 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox44 --- unaffected
firefox45 --- unaffected
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

Details

(Keywords: regression)

Attachments

(2 files, 1 obsolete file)

I have two different reports of an intermittent load failure with service worker sites using fetch event and skipWaiting():

1) I saw this on my own site [1] with nightly 47.  Clearing storage fixed the issue for me.
2) Alex Gibson reports [2] he has seen this on SVGOMG [3] with dev edition 46.  He had to "manually unregister" the service worker to fix it.

I initially thought the problem on my own blog was due to me having a messed up profile from local testing.  The report from Alex, though, makes me think there is something else going on.

We don't have steps to reproduce yet, but filing this bug to track the issue.

If you think you see this, please capture a zip of your profile!

[1]: https://blog.wanderview.com
[2]: https://twitter.com/alex_gibson/status/697737149697945600
[3]: https://jakearchibald.github.io/svgomg/
Some weak theories as to what is going on:

1) Updates are somehow still firing in the middle of the navigation and skipWaiting() is killing the active worker in the process of handling the fetch event.  This has been reported on FF44. [1]  This really shouldn't happen on FF45+, though, since we now delay updates.  It would also not be a permanent failure and things would work when opened in a new tab.

2) The Cache API is trying to do a schema update and is failing for some reason.  This *should* result in network requests falling back to normal operation without interception, though.  This would be consistent with needing to wipe the storage, though.  (Alex notes he previously had it open on same dev edition version, although maybe we did a silent update on browser restart since this report is so close to the branch merge.)

[1]: https://twitter.com/ericf/status/697479909598490624
For the record, I hit this bug using Firefox Dev Edition 46.0a2 (2016-02-10). I had previously visited the svgomg site around 6/7 days prior using the same profile. I also use a separate profile for Dev Edition than I do for Release/Beta/Nightly.
I'm going to test the cache schema change theory in comment 0 by hacking a failure into the code.  That should show if it correctly falls back to network.
I simulated failures in cache schema upgrade, opening the sqlite db, match query, and put query.  None of these seemed to cause the problem in this bug.  If Cache API is busted the site just falls back to network as designed.

Let me see if problems with the ServiceWorkerRegistrar could cause this.
I experienced a similar issue while reproducing bug 1241045.
I used that test case, then removed the service worker. I then tested some new code on the same URL, and the old content (from the first, failing, test case) was loaded.
Despite clearing the cache and everything, the sqlite files in storage/ still referenced the old content.

Unfortunately I removed the directory since then and I haven't kept a copy.
Depends on: 1247623
Another theory:

The ssl cert associated with the site is expiring between when its cached by the service worker and returned in the failing case.

Combined with bug 1247623 this could cause the problem we are seeing here.  Also, SVGOMG's current cert was issued on Jan 19, 2016.  If there was some overlap with last cert then perhaps that fits the time window.
I got some profile data from @dangoor.  I think I might know whats going on.  Also, see bug 1246319.
See Also: → 1246319
So, thanks to the fixes in bug 1247623 and bug 1246319 I may have reproduced this locally again.

It seems we fail to deserialize the resource's ssl cert starting in FF46.  This is because the UUID changed in bug 1240173.

I will hack in a work around for now and file a follow-up to come up with a more reliable serialization strategy for the security info.
Blocks: 1240173
I will look into writing a gtest this afternoon.
See Also: → 1248628
Comment on attachment 8719803 [details] [diff] [review]
Allow old nsIX509Cert serialized objects to be read off disk. r=bz

>+ *       serialization than more complex changes will be needed.

s/than/then/

>+  //        workers around this issue.

s/workers/works/

r=me.  Make sure to uplift to 46!
Attachment #8719803 - Flags: review?(bzbarsky) → review+
Comment on attachment 8719990 [details] [diff] [review]
P2 Add gtest to ensure we can continue to deserialize old security info strings. r=bz

r=me.  I assume you've verified that the test fails without the patch?
Attachment #8719990 - Flags: review?(bzbarsky) → review+
Comment on attachment 8719987 [details] [diff] [review]
P1 Allow old nsIX509Cert serialized objects to be read off disk. r=bz

Approval Request Comment
[Feature/regressing bug #]: Service workers interaction with bug 1247623. 
[User impact if declined]: If a user visited a service worker enabled site prior to bug 1247623 and then visit after that bug, then the user will get a base experience.  Depending on what the SW does, they may see a white screen and the site will never load.
[Describe test coverage new/current, TreeHerder]: P2 includes a gtest.
[Risks and why]: Minimal.  We are simply adding a compat workaround to load security information with an old UUID.
[String/UUID change made/needed]: None.
Attachment #8719987 - Flags: approval-mozilla-aurora?
Keywords: regression
Comment on attachment 8719990 [details] [diff] [review]
P2 Add gtest to ensure we can continue to deserialize old security info strings. r=bz

See comment 17.
Attachment #8719990 - Flags: approval-mozilla-aurora?
Note, if the idl commit hook complains about us not bumping the UUID when uplifting, please do not change the UUID!  Instead we should use the IGNORE commit syntax since we only added a comment to the idl.  Bumping the UUID would create further bustage here.
https://hg.mozilla.org/mozilla-central/rev/26ef2d27d757
https://hg.mozilla.org/mozilla-central/rev/e9a8fbf9af4d
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Comment on attachment 8719987 [details] [diff] [review]
P1 Allow old nsIX509Cert serialized objects to be read off disk. r=bz

Fixes recent SW regression, includes a new test, please uplift to aurora.
Attachment #8719987 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8719990 [details] [diff] [review]
P2 Add gtest to ensure we can continue to deserialize old security info strings. r=bz

Test changes, please uplift to aurora.
Attachment #8719990 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.