Closed Bug 1180765 Opened 9 years ago Closed 9 years ago

medium.com connection hangs on subsequent loads

Categories

(Core :: DOM: Service Workers, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla42
Tracking Status
firefox40 --- unaffected
firefox41 + fixed
firefox42 + fixed

People

(Reporter: ttaubert, Assigned: ehsan.akhgari)

References

Details

Attachments

(1 file)

medium.com seems to serve a service worker. This is the second time I got into a state where medium.com will stay in the "connecting" phase and is just stuck. Opening the same link in a private window loads just fine.

I previously resolved that by going to about:serviceworker and clearing the cached version there but I thought it would be great to investigate what's going on. I'll just leave it as-is and might need some assistance to debug this.
Tim, can you reproduce this with the rev from bug 1178508?  You need to wipe your serviceworker registration for medium.com after updating nightly.

Thanks.
Flags: needinfo?(ttaubert)
I might have a script registered before that landed, not sure. I'll clear the cache and reopen if I see it again. Thanks!
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(ttaubert)
Resolution: --- → INVALID
Here we are again. So what can I do to debug this? :)
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Flags: needinfo?(bkelly)
Blocks: 1178508
Seconding Tim’s offer to help debug this.  I’m getting the same problem, where medium loads fine the first time, and then subsequent visits hang, and unregistering the SW restarts that cycle.
Tim, Eric, are you guys running in e10s or single process?  What platform?  How often does it trigger?

Obviously we have a race condition somewhere and I'm not triggering it.  Maybe if I make my environment look more like yours I can reproduce it now.
Flags: needinfo?(ttaubert)
Flags: needinfo?(eric)
Flags: needinfo?(bkelly)
I think maybe we should also disable service workers on nightly until we can isolate and fix this issue.
Assignee: nobody → bkelly
Status: REOPENED → ASSIGNED
I have e10s disabled, and am an OS X 10.8.5.  However, the current build only seems to have this problem in my default setup, which has a number of add-ons.  My testing profile, with no add-ons, does not seem to hang; at least, I haven’t been able to get it to do so in my testing.  Here are my enabled add-ons:

ADB Helper
Advanced Cookie Manager
Autofill Forms
Awesome Screenshot
Bookmark Shortcut Keys
CS Lite Mod
Custom New Tab (currently broken by the new tab behaviors, as it happens)
Disconnect
DOM Blaster
Flashblock
JS Switch
Multirow Bookmarks Toolbar Plus
RSS Icon in Awesombar
Stylish (no styles defined for medium.com)
Terms of Service; Didn't Read
uBlock Origin
Valence
Video DownloadHelper

Tim, any overlap?
Flags: needinfo?(eric)
I run with e10s enabled, OS X 10.10. Add-ons:

AutoAuth	2.1
BugzillaJS	
Customizable Shortcuts
Easy App Tabs
Self-Destructing Cookies

No add-on overlap it seems. Ben, what do we use to store the service workers? If it's a separate file I could probably send it to you?
Flags: needinfo?(ttaubert)
Ehsan says he can reproduce on his mac and will look at it.

I expect this is more of a race condition and my windows machine just doesn't get the timing right (or wrong as the case may be).  The specific file is probably not the problem.
Assignee: bkelly → ehsan
Eric, Tim, can you disable dom.serviceWorkers.enabled for the time being and see if that avoids the problem for you?
Flags: needinfo?(ttaubert)
Flags: needinfo?(eric)
Hmm, this doesn't seem to help?
Flags: needinfo?(ttaubert)
I've been through several (self-initiated) restarts of Firefox, with SWs both enabled and disabled.  What I've found is that if there is a SW for medium.com present, *even if SWs are disabled*, the hang behavior occurs.  So you have to close all tabs/windows with Medium in them, unregister the SW, _then_ disable SWs, and the problem goes away entirely.  All medium.com URLs load fine for me in that particular state.

I also found that, in situations where the SW was present and thus page loads were hanging, it was usually possible to get the page to load normally with a shift-click on the reload icon.  This was true whether SWs were enabled or disabled.  As long the SW was registered, regardless of SW enable state, shift-reloads could get the page to load; regular reloads, or regular page visits (including typing URLs into the address bar), got hung.
Flags: needinfo?(eric)
Interesting followup: I left SWs disabled, and went about my day.  A few minutes ago, I tried to go to medium.com, and it hung.  I opened about:config, and dom.serviceWorkers.enabled was still false.  I closed the still-hung medium.com tab, made sure I didn’t have any other tabs open to medium.com (I didn’t), and then enabled SWs.  When I loaded up about:serviceworkers, it showed that I had a SW for medium.com.  I am pretty confident that I didn’t have one after my last round of testing (described in comment #12), and as I say, I left SWs disabled after I was done.
So far, I have found out that inserting an entry into the cache for https://medium.com/service-worker.js is failing with:

The SQL statement \'INSERT INTO entries (request_method, request_url_no_query, request_url_no_query_hash, request_url_query, request_url_query_hash, request_referrer, request_headers_guard, request_mode, request_credentials, request_contentpolicytype, request_cache, request_body_id, response_type, response_url, response_status, response_status_text, response_headers_guard, response_body_id, response_security_info_id, response_principal_info, response_redirected, response_redirected_url, cache_id ) VALUES (:request_method, :request_url_no_query, :request_url_no_query_hash, :request_url_query, :request_url_query_hash, :request_referrer, :request_headers_guard, :request_mode, :request_credentials, :request_contentpolicytype, :request_cache, :request_body_id, :response_type, :response_url, :response_status, :response_status_text, :response_headers_guard, :response_body_id, :response_security_info_id, :response_principal_info, :response_redirected, :response_redirected_url, :cache_id );\' could not be compiled due to an error: table entries has no column named response_principal_info

My database's version is 14.  The entries table is like the following:

CREATE TABLE entries (
  id INTEGER NOT NULL PRIMARY KEY,
  request_method TEXT NOT NULL,
  request_url_no_query TEXT NOT NULL,
  request_url_no_query_hash BLOB NOT NULL,
  request_url_query TEXT NOT NULL,
  request_url_query_hash BLOB NOT NULL,
  request_referrer TEXT NOT NULL,
  request_headers_guard INTEGER NOT NULL,
  request_mode INTEGER NOT NULL,
  request_credentials INTEGER NOT NULL,
  request_contentpolicytype INTEGER NOT NULL,
  request_cache INTEGER NOT NULL,
  request_body_id TEXT NULL,
  response_type INTEGER NOT NULL,
  response_url TEXT NOT NULL,
  response_status INTEGER NOT NULL,
  response_status_text TEXT NOT NULL,
  response_headers_guard INTEGER NOT NULL,
  response_body_id TEXT NULL,
  response_security_info_id INTEGER NULL REFERENCES security_info(id),
  response_redirected INTEGER NOT NULL,
  response_redirected_url TEXT NOT NULL,
  cache_id INTEGER NOT NULL
REFERENCES caches(id) ON DELETE CASCADE)

The response_principal_info column was added in bug 1169044 without bumping the DB version, and I reviewed that.  Sorry, I screwed up.  :(
Blocks: 1169044
Attachment #8631316 - Flags: review+
In theory we should do a migration, but lets go with the wipe version bump on this one since its breaking people in the wild.
Comment on attachment 8631316 [details] [diff] [review]
Bump the caches.sqlite version numbers because of the field that was added in bug 1169044

Approval Request Comment
[Feature/regressing bug #]: Bug 1169044
[User impact if declined]: medium.com (and possibly other websites) won't load.
[Describe test coverage new/current, TreeHerder]: Locally.
[Risks and why]: This is very low risk.
[String/UUID change made/needed]: None.
Attachment #8631316 - Flags: approval-mozilla-aurora?
So does that mean we should be better at handling DB failures that might occur for whatever reason?
(In reply to Tim Taubert [:ttaubert] from comment #19)
> So does that mean we should be better at handling DB failures that might
> occur for whatever reason?

Yes.  Ehsan filed bug 1181887 for that.
https://hg.mozilla.org/mozilla-central/rev/012f4ec8e2a6
Status: ASSIGNED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Tim, Ehsan: Could you please confirm that the fix from the nightly build works for you? Once verified, I will approve for uplift. 

Adding a tracking flag for FF41.
Flags: needinfo?(ehsan)
This patch missed today's Nightly.  I will have to wait until tomorrow to test it.
medium.com started to work for me as soon as I updated to today's Nightly.
Status: RESOLVED → VERIFIED
Flags: needinfo?(ehsan)
I’m also getting consistent loading of medium.com.  That’s with SWs enabled, and with an SW for medium.com listed in about:serviceworkers.  This feels fixed!
Comment on attachment 8631316 [details] [diff] [review]
Bump the caches.sqlite version numbers because of the field that was added in bug 1169044

Thanks for the verification Ehsan. Approving for uplift to Aurora.
Attachment #8631316 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Flags: needinfo?(ttaubert)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: