1275917 - Recently, some websites don't finish loading and the throbber keeps spinning forever

Reporter

Description

•

9 years ago

This started happening for me a few days ago, on OS X Nightly: Sometimes, when I open a new tab and navigate to a page (by entering a URL or by Cmd-clicking a link), the tab stays blank and the throbber keeps spinning forever, until I give up and reload the page. I think the page title was displayed in the tab during that time, but I'm not 100% sure. Unfortunately I don't have URLs. I'll grab them when I see the issue again. It has happened to me about 15 times over the last 5 days or so.

Jeff Muizelaar [:jrmuizel]

Comment 1

•

9 years ago

I've seen things like this too. Also, perhaps related, my gmail tabs don't ever finish loading.

Patrick McManus [:mcmanus]

Assignee

Comment 2

•

9 years ago

I saw this on yesterday's nightly, but not (so far) today. This kind of thing can be tricky to get in front of the right people - its generally a matter of what has been added/removed from the load group which is something necko orchestrates but doesn't actually do the adding/removing.. so it could literally be anywhere. if it comes back for me I'll try and get it in rr..

Dragana Damjanovic [:dragana]

Comment 3

•

9 years ago

I got it couple of time in last days. Also this morning, I will try to get logging.

Patrick McManus [:mcmanus]

Assignee

Comment 4

•

9 years ago

I feel like I've had a couple extra cases of pages not loading (which would reload fine).. that would cause the throbber issue if it were a small subresource that wasn't loading.

Patrick McManus [:mcmanus]

Assignee

Comment 5

•

9 years ago

this needs an owner soon. jason please dispatch.

Assignee: nobody → jduell.mcbugs

Whiteboard: [necko-active]

Markus Stange [:mstange]

Reporter

Comment 6

•

9 years ago

I mostly get this on reddit.com, about once a day. I use uBlock.

Ryan VanderMeulen [:RyanVM]

Comment 7

•

8 years ago

[Tracking Requested - why for this release]: Pages don't finish loading, which is pretty obviously user-visible.

status-firefox48: --- → unaffected

status-firefox50: --- → affected

tracking-firefox49: --- → ?

tracking-firefox50: --- → ?

Stephen Donner [:stephend] Not actively reading bugmail

Comment 8

•

8 years ago

I see this quite often on https://stephendonner.wordpress.com/ - but not, strangely, wpengine.com-hosted WordPress sites, like https://blog.mozilla.org/webqa/ (perhaps due to different CDNs?) In particular, of the sites I've noticed it happening, it's often "Waiting for...s[x].wp.com" where x=1, 2, etc. Nightly, both OS X and Windows 10 builds.

Marcia Knous [:marcia]

Comment 9

•

8 years ago

Tracking 50+ - as Ryan notes this is very visible to the end user.

tracking-firefox50: ? → +

u528881

Comment 11

•

8 years ago

Did anyone try clearing the cache when this happens? (Use Control+Shift+Delete to get the dialog for that.) I recently cleared all history (browsing, download, form, search), cache, active logins, and offline website data. Then I also deleted the profile folder in AppData\Local, leaving the one in AppData\Roaming. It seemed like the problem was happening more and more often before, and this helped. It hasn't happened yet after that cleanup.

Gary [:streetwolf52]

Comment 13

•

8 years ago

I've had situations where a page wouldn't load and the throbber on the tab would spin. Clicking on my bookmark for the site a second time during the spinning throbber would usually get the page to load. I don't know if my problem is the same as this bug report. Because of Bug 1271944 I disabled the predictor. Lo and behold I haven't had the problem I stated above since then.

u528881

Comment 14

•

8 years ago

Clicking on the bookmark a second time almost never helped me. Once a URL stopped loading properly, there were very few occasions when I got it to load eventually without restarting Firefox.

u528881

Comment 15

•

8 years ago

The problem happened again, and clearing only the cache via the Control+Shift+Delete interface made the URL work again. I did not restart Firefox or clear any other things. This is in 50.0a1 (2016-06-19) 64-bit in Windows 10 build 14367.

Patrick McManus [:mcmanus]

Assignee

Comment 16

•

8 years ago

comment 15

Flags: needinfo?(honzab.moz)

u528881

Comment 17

•

8 years ago

Setting network.predictor.enable-prefetch to false in about:config has not helped me in 49.0a2 (2016-06-20) in 32-bit Windows 7. Reddit links get stuck, and clearing only the cache makes them work again. I've seen cache clearing work enough times that I believe in it as a workaround. Cache cleaning has never failed me. It's not a very nice workaround because of impact on performance and bandwidth usage, and the need to repeat it when the problem comes up again. I'm using my usual extensions. I went back to my regular profile and re-enabled them after seeing that disabling extensions, safe mode, and even a new profile didn't help.

Dragana Damjanovic [:dragana]

Comment 18

•

8 years ago

We should land bug 1239686 at least on Nightly it will be helpful for diagnostics in the situations like this.

Stephen Donner [:stephend] Not actively reading bugmail

Comment 19

•

8 years ago

(In reply to Dragana Damjanovic [:dragana] from comment #18) > We should land bug 1239686 at least on Nightly it will be helpful for > diagnostics in the situations like this. FWIW, I tried to get a Necko log on Nightly builds, but seem to be hitting bug 1272704 :-(

David Smith

Comment 20

•

8 years ago

I have been seeing this problem for a while now, probably similar to the time this bug has been open. However the problem almost exclusively happened when trying to open Google search result links, so I'd been blaming it on Google (and particularly their 'redirect' rewriting of the links) and/or my ISP's connection to Google. I had gotten an addon to fix Google's redirect links to see if that would fix it, and it did reduce the number of such problems a great deal. However over the last couple of weeks I've had another problem that, now that I look at it, may be related. Open up the debugging area, go to the Network tab, and reload a page. When I do that, almost all loaded assets from the page have a Connecting time (orange bar, Connecting in the Timings tab) ranging from 500 milliseconds to over 5 seconds. The most common delay is between 1 and 2 seconds. Total page load time tends to be on the order of 8-10 seconds instead of the expected 2-3 seconds (for relatively complex pages). All other timing elements are either 0 ms (DNS resolution, Sending, Receiving) or a reasonably expected time (Waiting lasts for a time typically comparable to a ping result). Also note that this is the case even when the vast majority of the page elements will return a 304 for already being in the cache. Safe mode and such had no real effect. Last week I did a full profile reset, and for a couple days the problem went away (possibly related to the suggestion of fully clearing the cache?). But, eventually, it was back. At the present time, reloading a page generally has the page sitting on the grey (unconnected) spinner for several seconds before it switches to the blue (connected) spinner, and even then, the time it takes for the page to load is excessively long compared to other browsers. However, it does seem like there could be a connection between very slow connection times, and perpetual connection times that never finish loading at all. Currently using the Aurora/Developer channel, v49.

David Smith

Comment 21

•

8 years ago

Progressing through other bugs I had opened, #1271944, dealing with how prefetch affects load times, got me to try turning off the prefetch mechanism. I found that turning prefetch off completely eliminates the slow Connection times I described above. I've only been using this a short time, and will note if it reverts to the old bad behavior, but it's possible there may be a connection between the two issues.

bull500

Comment 22

•

8 years ago

Im getting this heavily on 9gag - http://9gag.com/gag/aK3PB2g I open it with a middle mouse click in a new tab - to let it load in the background. It never completes loading. Also the image is fuzzy/partially downloaded. I had opened another bug just for the image loading(fuzzy/pixelated) issue though - many sites images are extremely slow to load - https://bugzilla.mozilla.org/show_bug.cgi?id=1268237

Nathan Froyd [:froydnj]

Comment 23

•

8 years ago

I think this is probably a dup of bug 1281366. Can the affected people try tomorrow's nightly and see if that improves things? (I mean, it's an intermittent thing, but still...)

Patrick McManus [:mcmanus]

Assignee

Comment 24

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #23) > I think this is probably a dup of bug 1281366. Can the affected people try > tomorrow's nightly and see if that improves things? (I mean, it's an > intermittent thing, but still...) thanks nathan - that's a good pointer

Honza Bambas (:mayhemer)

Comment 25

•

8 years ago

(In reply to Patrick McManus [:mcmanus] from comment #24) > (In reply to Nathan Froyd [:froydnj] from comment #23) > > I think this is probably a dup of bug 1281366. Can the affected people try > > tomorrow's nightly and see if that improves things? (I mean, it's an > > intermittent thing, but still...) > > thanks nathan - that's a good pointer Dropping ni based on this.

Flags: needinfo?(honzab.moz)

Ryan VanderMeulen [:RyanVM]

Comment 26

•

8 years ago

Doesn't seem to be helping here.

Patrick McManus [:mcmanus]

Assignee

Comment 27

•

8 years ago

The situation is much better here.

Ryan VanderMeulen [:RyanVM]

Comment 28

•

8 years ago

Reddit still gives me the permaspinner frequently with a trunk build from today. I've seen it on other sites intermittently as well.

Dragana Damjanovic [:dragana]

Comment 29

•

8 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #28) > Reddit still gives me the permaspinner frequently with a trunk build from > today. I've seen it on other sites intermittently as well. if it is somehow reproducible for you, can you make a http log? Thanks!

Flags: needinfo?(ryanvm)

Ryan VanderMeulen [:RyanVM]

Comment 30

•

8 years ago

I haven't been able to get an NSPR log no matter what I do :(

Flags: needinfo?(ryanvm)

Dragana Damjanovic [:dragana]

Comment 31

•

8 years ago

Ryan, have you manage to get something from mozregression, as you suggested on irc?

Flags: needinfo?(ryanvm)

IU

Comment 32

•

8 years ago

See if you get any improvement if you... 1. toggle the following to false: network.predictor.enable-prefetch network.prefetch-next 2. toggle the following to true: network.dns.disablePrefetch 3. Set the value of the following to 0: network.predictor.prefetch-force-valid-for

IU

Comment 33

•

8 years ago

By the way, be sure to do all steps and not just one or the other.

u528881

Comment 34

•

8 years ago

Seems like things have improved recently. I'm using Developer Edition in Windows 7 with network.predictor.enable-prefetch set to false and Nightly in Windows 10 Insider with network.predictor.enable-prefetch left at the default, true. On both, all I've seen recently is sites failing to load once, and then successfully loading when I try again. There haven't been any recent occasions when I had to clear the cache or restart Firefox to access a site.

Honza Bambas (:mayhemer)

Comment 35

•

8 years ago

I've implemented a limit for nspr log file size (rotate, bug 1244306). If this still reproduces for anyone, at least occasionally, please: - install a try build from [1] (Nightly capable of log size limiting) - enable logging by setting following environment variables: MOZ_LOG=rotate:100,nsHttp:4,cache2:5,LoadGroup:5 MOZ_LOG_FILE=/some/writable/directory/moz.log (or on Windows: MOZ_LOG_FILE=c:\some\writable\directory\moz.log) When the problem reproduces, close firefox, and then zip and upload all moz.log.* files from your some/writable/directory (there should be files with .0, .1, .2, .3 extensions, each having up to 25MB) Don't forget to note which URL was the troubling one (the page URL at least). Thanks. [1] http://archive.mozilla.org/pub/firefox/try-builds/honzab.moz@firemni.cz-27bc6057389372e2d844755c56b02b0c44d619e9/

Marcia Knous [:marcia]

Comment 36

•

8 years ago

Tracking 49+ for the same reason in Comment 9.

tracking-firefox49: ? → +

Markus Stange [:mstange]

Reporter

Comment 37

•

8 years ago

I'm still seeing this. I'll start using the try build now.

Honza Bambas (:mayhemer)

Comment 38

•

8 years ago

(In reply to Markus Stange [:mstange] from comment #37) > I'm still seeing this. I'll start using the try build now. Markus, please use nsHttp:5 instead of 4. Somebody screwed up the logging recently again, so that it doesn't log the important stuff. You need all the "5" crap :)

Honza Bambas (:mayhemer)

Comment 39

•

8 years ago

Got logs for a problem on reddit.com, forwarding them to Patrick, since it is apparently a spdy problem.

Haik Aftandilian [:haik]

Comment 40

•

8 years ago

(In reply to IU from comment #32) > See if you get any improvement if you... > > 1. toggle the following to false: > > network.predictor.enable-prefetch > network.prefetch-next > > 2. toggle the following to true: > network.dns.disablePrefetch > > 3. Set the value of the following to 0: > > network.predictor.prefetch-force-valid-for I was seeing the problem with a gmail tab where it would never complete loading (the throbber kept spinning and File->Print reports the page is still loading). The firefox parent process also seems to be much busier and the OS reports it using much more CPU than normal. top (on OS X) shows the firefox process continually spiking in CPU usage until the gmail tab is closed. With the above config changes and today's nightly the gmail tab loads normally and CPU usage by the parent is considerably lower, per visual 30-second check of top (on OS X).

Ryan VanderMeulen [:RyanVM]

Comment 41

•

8 years ago

Tracking this down via mozregression turned out to be more of a PITA than I'd originally thought. However, I did manage to end up at bug 865314, which looks like a plausible-looking candidate given comment 39. So that's encouraging at least. Also might explain why some people have had better results by disabling the predictor? Some notes: 1) Ctrl+F5 was needed to reproduce, and sometimes it took quite a few attempts before it finally would. AFAICT, just F5 didn't work for reproducing. 2) Bisection was done on Win10. I wasn't able to reproduce this on Linux. Not saying other OSes are unaffected, just that Win10 appears to be easier to reproduce it on. 3) The hangs generally appeared to be waiting on a random ad server or thumbs.redditmedia.com.

Flags: needinfo?(ryanvm) → needinfo?(mcmanus)

Patrick McManus [:mcmanus]

Assignee

Comment 42

•

8 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #41) > Tracking this down via mozregression turned out to be more of a PITA than > I'd originally thought. However, I did manage to end up at bug 865314, which > looks like a plausible-looking candidate given comment 39. So that's > encouraging at least. that's interesting - thanks. Nick is going to start looking at this while I'm afk this week. 865314 has surfaced latent/hard-to-otherwise-see bugs in the past so that's one possibility. >Also might explain why some people have had better > results by disabling the predictor? plausible but hard to know at this stage.. > > Some notes: > 1) Ctrl+F5 was needed to reproduce, and sometimes it took quite a few > attempts before it finally would. AFAICT, just F5 didn't work for > reproducing. so that makes sense - ctrl-f5 creates new connections, while plain f5 does not. > 2) Bisection was done on Win10. I wasn't able to reproduce this on Linux. > Not saying other OSes are unaffected, just that Win10 appears to be easier > to reproduce it on. > 3) The hangs generally appeared to be waiting on a random ad server or > thumbs.redditmedia.com. thanks again.. any particular reddit url or just the homepage?

Flags: needinfo?(mcmanus)

Ryan VanderMeulen [:RyanVM]

Comment 43

•

8 years ago

Just the homepage.

David Smith

Comment 44

•

8 years ago

Another sample, if needed. Just created a temp profile for the try build and tried to load the page. Log available here: https://drive.google.com/file/d/0B0A0wGYYRRdaWjJabjh5SVMwWTA/view?usp=sharing Page load in log starts around line 99,000. Site: https://codeblog.jonskeet.uk/ Loading on this page regularly stalls when trying to load one of two scripts from https://s2.wp.com/_static (+stuff). Sometimes also stalls on a stylesheet at s1.wp.com; I didn't check when running the try build. Entire page loads in Chrome in under a second.

Patrick McManus [:mcmanus]

Assignee

Comment 45

•

8 years ago

Thanks for the folks that have provided logs. From the log in comment 44: 2016-06-29 21:47:17.516000 UTC - [Socket Thread]: V/nsHttp nsHttpConnection::DontReuse 8da2020 spdysession=11e83000 2016-06-29 21:47:17.516000 UTC - [Socket Thread]: I/nsHttp Http2Session::DontReuse 11e83000 2016-06-29 21:47:17.516000 UTC - [Socket Thread]: I/nsHttp Http2Session::Close 11e83000 0 2016-06-29 21:47:17.516000 UTC - [Socket Thread]: I/nsHttp Http2Session::GenerateGoAway 11e83000 code=0 2016-06-29 21:47:17.516000 UTC - [Socket Thread]: V/nsHttp Http2Session::LogIO 11e83000 stream=0 id=0x0 [Generate GoAway] 2016-06-29 21:47:17.515000 UTC - [Socket Thread]: V/nsHttp ReportSpdyConnection 8a09880,s2.wp.com conn 8da2020 prefers 8a09100,s1.wp.com 2016-06-29 21:47:17.516000 UTC - [Socket Thread]: V/nsHttp ReportSpdyConnection graceful close of conn=8da2020 ent=8a09880 to migrate to preferred (desharding) 2016-06-29 21:47:17.516000 UTC - [Socket Thread]: V/nsHttp nsHttpConnection::StartSpdy moves single transaction ae85400 into SpdySession 11e83000 so that's a transaction being moved into a session that has been marked goAway - that's not going to work. it also shouldn't happen :) let's figure out why.

Patrick McManus [:mcmanus]

Assignee

Updated

•

8 years ago

Assignee: jduell.mcbugs → mcmanus

Patrick McManus [:mcmanus]

Assignee

Comment 46

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0cf5c9664019

Patrick McManus [:mcmanus]

Assignee

Comment 47

•

8 years ago

Attached patch 1st transaction on dontresused h2 session (obsolete) — Details — Splinter Review

Attachment #8768059 - Flags: review?(hurley)

Patrick McManus [:mcmanus]

Assignee

Comment 48

•

8 years ago

the analysis in comment 45 is not quite right. The transaction in question is actually the 1st transaction for the h2 session.. as h2 was discovered via alpn the connection discovered that it was not the preferred connection for this origin anymore - because in parallel a new connection had been established that should do the coalescing. And as a side effect it marked itself dontreuse.. which in turn impacted the starting of the first transaction. this is a long standing bug in both h2 and spdy, but bug 865314 makes this way more obvious and frequent because it introduces new parallel tls handshakes.. it would be pretty hard to trip across otherwise. The fix is just to late check whether the h2/spdy session has been marked dontreuse when AddStream is done and toss the transaction back to the connectionmgr if necessary (so it ends up on the preferred connection).

Blocks: 865314

u408661

Updated

•

8 years ago

Attachment #8768059 - Flags: review?(hurley) → review+

Patrick McManus [:mcmanus]

Assignee

Comment 49

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/7132b115ab9114979f8200056e597ca47726aa0a Bug 1275917 - 1st transaction on dontresused h2 session r=hurley

Patrick McManus [:mcmanus]

Assignee

Comment 50

•

8 years ago

Thanks all - especially David - that testcase in comment 44 is pretty close to the minimal repro. much appreciated.

Carsten Book [:Tomcat]

Comment 51

•

8 years ago

sorry had to back this out for causing near perma failures like https://treeherder.mozilla.org/logviewer.html#?job_id=31259526&repo=mozilla-inbound

Flags: needinfo?(mcmanus)

Pulsebot

Comment 52

•

8 years ago

Backout by cbook@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/ee56ec8064fb Backed out changeset 7132b115ab91 for causing timeouts in test_notification_http2.js

Patrick McManus [:mcmanus]

Assignee

Comment 53

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9bbe49a9f773

Patrick McManus [:mcmanus]

Assignee

Comment 54

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2e3924d03eba

Patrick McManus [:mcmanus]

Assignee

Comment 55

•

8 years ago

Thanks :tomcat the patch rescheduled streams on sessions marked goaway.. but a pushed stream is prebound to a particular session so rescheduling it just makes it go back to the same place and loop.. in that case the goaway doesn't matter because the stream is already initiated (by the server). so the fix here is just to make the patch conditional on !pushed

Flags: needinfo?(mcmanus)

Patrick McManus [:mcmanus]

Assignee

Comment 56

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dea4c3cedc2c

Patrick McManus [:mcmanus]

Assignee

Comment 57

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=225618861cde

Patrick McManus [:mcmanus]

Assignee

Comment 58

•

8 years ago

Attached patch 1st transaction on dontresused h2 session — Details — Splinter Review

Attachment #8768612 - Flags: review?(hurley)

Patrick McManus [:mcmanus]

Assignee

Updated

•

8 years ago

Attachment #8768059 - Attachment is obsolete: true

u408661

Updated

•

8 years ago

Attachment #8768612 - Flags: review?(hurley) → review+

Patrick McManus [:mcmanus]

Assignee

Comment 59

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/752282c3e152f3c56de2c66dd56cb05e79b05807 Bug 1275917 - 1st transaction on dontresused h2 session r=hurley

Carsten Book [:Tomcat]

Comment 60

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/752282c3e152

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox50: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla50

Patrick McManus [:mcmanus]

Assignee

Comment 61

•

8 years ago

Comment on attachment 8768612 [details] [diff] [review] 1st transaction on dontresused h2 session Approval Request Comment [Feature/regressing bug #]: 865314 (actually a long standing bug but almost unnoticeable without 865314).. which landed on 49 [User impact if declined]: occasional lost http transactions [Describe test coverage new/current, TreeHerder]: well exercised by existing treeherder tests [Risks and why]: low - very targetted patch [String/UUID change made/needed]: none

Attachment #8768612 - Flags: approval-mozilla-aurora?

Ryan VanderMeulen [:RyanVM]

Comment 62

•

8 years ago

Just updated my local build and reddit is much happier today!

Liz Henry (:lizzard) (relman/hg->git project)

Comment 63

•

8 years ago

Comment on attachment 8768612 [details] [diff] [review] 1st transaction on dontresused h2 session The reddit must flow. Please uplift to aurora!

Attachment #8768612 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Carsten Book [:Tomcat]

Comment 64

•

8 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/50dfed915c24

status-firefox49: affected → fixed

1st transaction on dontresused h2 session 8 years ago Patrick McManus [:mcmanus] 2.41 KB, patch	u408661 : review+	Details \| Diff \| Splinter Review
1st transaction on dontresused h2 session 8 years ago Patrick McManus [:mcmanus] 3.33 KB, patch	u408661 : review+ lizzard : approval-mozilla-aurora+	Details \| Diff \| Splinter Review