Closed Bug 1281366 Opened 5 years ago Closed 5 years ago

Various websites stall forever when loading

Categories

(Core :: DOM: Core & HTML, defect)

49 Branch
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox48 --- unaffected
firefox49 + fixed
firefox50 + fixed

People

(Reporter: kael, Assigned: bzbarsky)

References

()

Details

(Keywords: regression)

Attachments

(4 files)

Various websites have begun stalling forever when loading in Firefox. The most consistent one is SRL, the website in the bug's URL field - speedrunslive.com. In broken builds of FF it never finishes loading - the favicon remains a spinner, the reload button is a stop button, and the page displays a 'LOADING' message. I've also seen this problem very frequently with Google Calendar, and as you might imagine that's kind of a pain.

Loading the same page in Edge or Chrome works, and it works in older builds of Firefox as well.

On SRL, at least, no active network requests appear in the Network panel in developer tools, so it's impossible to tell what operation is actually stalled. The status bar tooltip says it's waiting for cdn.speedrunslive.com, though.

Bisected SRL with mozregression, here's the output:

2016-06-21T19:40:30: INFO : Narrowed inbound regression window from [3ef68cc3, bc6334ae] (4 revisions) to [f66a5b9d, bc6334ae] (2 revisions) (~1 steps left)
2016-06-21T19:40:30: DEBUG : Starting merge handling...
2016-06-21T19:40:30: DEBUG : Using url: https://hg.mozilla.org/integration/mozilla-inbound/json-pushes?changeset=bc6334aea116516a4f639cf1fd0ead741968624b&full=1
2016-06-21T19:40:30: DEBUG : Found commit message:
Bug 1273255 - Don't try alternate URIs at the end of pageload if the channel was redirected: we don't want to be applying that sort of fixup to the redirect target URI.  r=smaug

2016-06-21T19:40:30: INFO : The bisection is done.
2016-06-21T19:40:30: INFO : Stopped


According to mozregression the last working regular build was 2016-05-23 and the first broken regular build was 2016-05-24. I'm not sure if I've been observing this problem that long, but I'm on Dev channel, so...
Does this reproduce if you use "www.speedrunslive.com" instead of the non-www-prefixed version?
Blocks: 1273255
Flags: needinfo?(kg)
(In reply to K. Gadd (:kael) from comment #0)
> On SRL, at least, no active network requests appear in the Network panel in
> developer tools, so it's impossible to tell what operation is actually
> stalled. The status bar tooltip says it's waiting for cdn.speedrunslive.com,
> though.

jsnajdr, any idea why whatever request it's waiting for isn't picked up by devtools?
Flags: needinfo?(jsnajdr)
It still happens if I type 'www.speedrunslive.com' into the address bar and press enter, yes
Flags: needinfo?(kg)
hmm, I don't see the alternate fixing from bug 1273255 being hit at all, either in my debugger or when I add printfs. I could be doing something wrong, of course - but I think mozregression is misleading us. The two last revisions from the output point to:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=f66a5b9d&tochange=bc6334ae

which doesn't just include bug 1273255 but also bug 1267989 and bug 1273661 and bug 909633.

Boris, any chance one of those broke this site instead? (I mean, I guess not the microdata removal, but the other ones...)

:wlach, can we make mozregression not output bug numbers when there's 1 push with multiple bugs (and so it's really not sure which one caused an issue) ?
Flags: needinfo?(wlachance)
Flags: needinfo?(bzbarsky)
(In reply to :Gijs Kruitbosch from comment #2)
> jsnajdr, any idea why whatever request it's waiting for isn't picked up by
> devtools?

It seems that we're waiting for something even before the request is initiated. I tried this:

- open the browser toolbox, go to netmonitor there
- go to the www.speedrunslive.com page
- the bug is now triggered, the favicon is a spinner, the reload button is a stop button
- click on the stop button to stop the page load

Now, the favicon appears and the browser toolbox netmonitor shows a new request for http://cdn.speedrunslive.com/images/favico.png, served from cache.

My debugging shows that this request is initiated (i.e., the channel.asyncOpen2 method is called and the the http-on-opening-request event is fired) only after I clicked the stop button, not before.
Flags: needinfo?(jsnajdr)
ni? myself to fire off Try pushes for the intermediate bugs in that range
Flags: needinfo?(ryanvm)
> Boris, any chance one of those broke this site instead? 

Absolutely.  Looking into it.
Flags: needinfo?(bzbarsky)
Flags: needinfo?(bzbarsky)
Regressed by bug 1267989.

If you want to try out any of the intermediate builds from that range:
https://treeherder.mozilla.org/#/jobs?repo=try&author=ryanvm@gmail.com&fromchange=df992ead14f2262ecc7d0e55bb9d9921c9ba3c6e&tochange=e083501859ec055ff4411ba7e5bdab78853caf02&group_state=expanded
Blocks: 1267989
No longer blocks: 1273255
Component: General → DOM
Flags: needinfo?(ryanvm)
Keywords: regression
Product: Firefox → Core
Flags: in-testsuite?
Attached file Minimalish testcase
Blocks: 1280091
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
That wpt times out without the patch, passes with.
Flags: needinfo?(bzbarsky)
Comment on attachment 8764421 [details] [diff] [review]
Ensure that when unblocking the scriptloader we try to process scripts if we have _any_ scripts to process, not just parser-blocking ones

Uh, such a messy code.
This should be safe, and needed now that we have aDocument->ScriptLoader()->RemoveExecuteBlocker(); call, but
I couldn't convince myself that there weren't bugs even before bug 1267989.
Attachment #8764421 - Flags: review?(bugs) → review+
Pushed by bzbarsky@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d41de66c5fdd
Ensure that when unblocking the scriptloader we try to process scripts if we have _any_ scripts to process, not just parser-blocking ones.  r=smaug
Comment on attachment 8764421 [details] [diff] [review]
Ensure that when unblocking the scriptloader we try to process scripts if we have _any_ scripts to process, not just parser-blocking ones

Approval Request Comment
[Feature/regressing bug #]: Bug 1267989
[User impact if declined]: Some sites will sometimes fail to run some of their
   scripts and hence will not work correctly.
[Describe test coverage new/current, TreeHerder]: Well, more now than we used to
   have.  Patch does include a test for this.
[Risks and why]: Risk is medium; I don't think the scriptloader logic goes below
   that on the dial.  That said, I think this was pretty straightforwardly
   buggy before this patch in that we'd fail post a runnable to run scripts
   when we had some pending scripts to run.  Now we check all the places that
   might be pending when deciding whether to post the runnable.
[String/UUID change made/needed]: None.
Attachment #8764421 - Flags: approval-mozilla-aurora?
Tracking this user visible regression for both 49 and 50.
https://hg.mozilla.org/mozilla-central/rev/d41de66c5fdd
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
(In reply to :Gijs Kruitbosch from comment #4)

> :wlach, can we make mozregression not output bug numbers when there's 1 push
> with multiple bugs (and so it's really not sure which one caused an issue) ?

Filed bug 1281790
Flags: needinfo?(wlachance)
So this appears to fix 1282163 which manifests on beta. Is there any chance we'd risk uplifting to beta? Or is it even possible?
(In reply to Eric Rahm [:erahm] from comment #18)
> So this appears to fix 1282163 which manifests on beta. Is there any chance
> we'd risk uplifting to beta? Or is it even possible?

I'm under the impression that would require uplifting at least bug 1267989 as well (and maybe others?) . bz is out this week and part of next week. Olli, what is your impression?
Flags: needinfo?(bugs)
well, bug 1267989 did have a regression, this bug. Though, one could also say that bug 1267989 just revealed this ancient bug.
Without regressions I'd be rather comfortable taking Bug 1267989 to beta ( https://bugzilla.mozilla.org/show_bug.cgi?id=1267989#c18 ), but since there was a regression...
Though, if the release calendar is right, the next merge day is 2016-08-01 so we do have some time to get feedback.

But would be really really nice to understand why bug 1282163 happen on beta and not on 47, so, regression range.

Overall, I think taking Bug 1267989 and bug 1281366 to beta isn't super risky. It is code which gets executed all the time, so we should catch issues rather soon. But we should land the patches only once we understand why bug 1282163 happens on 48 but not on 47.
Flags: needinfo?(bugs)
Fine with me to give this a try on beta (I am covering for Sylvestre for the next couple of weeks), though please don't think of merge day as the cutoff -- think of that as at least a week earlier, since the RC beta build and beta to release merge will happen early morning 7/25.
Flags: needinfo?(bugs)
Comment on attachment 8764421 [details] [diff] [review]
Ensure that when unblocking the scriptloader we try to process scripts if we have _any_ scripts to process, not just parser-blocking ones

May be a little risky but let's make sure this looks good on aurora before trying to bring these scriptloading changes to beta.
Attachment #8764421 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
This is a regression bug from bug 1267989, and that is the one we should consider (with this fix too).
Flags: needinfo?(bugs)
I'll check if the patch for both bugs apply to 49, and if not, upload such patches.
Er, wait, bug 1267989 is already in FF49.
The patch here applies cleanly to FF49.

(I'll prepare patches for FF48 too, for this one and bug 1267989. But I still think we need regression range for bug 1282163 before considering these patches for FF48.)
sylvestre i guess this need beta approval
Flags: needinfo?(sledru)
In comment #9, Ryan says that 48 is not affected?!
Flags: needinfo?(sledru)
bug 1267989 revealed this old bud, so this is definitely needed for aurora.
If we take bug 1267989 also to beta, then we should take also this.
See comment 26 and others.
To echo Olli's comments, I'm pretty opposed to uplifting bug 1267989 to beta just on a guess.  We should figure out what's really going on in bug 1282163 instead.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.