Closed Bug 1370629 Opened 7 years ago Closed 11 months ago

Report of browser being unresponsive on www.lematin.ch and requiring force kill

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: lmandel, Unassigned)

Details

(Whiteboard: [necko-triaged])

Reported on the ESR list with Firefox 52.1 and 45.9. I'm guessing at the component based on the number of network requests but cannot reproduce on OS X.

Using FF 45.9.0.6310 & Windows 7.
(1) Start FF
(2) goto this URL www.lematin.ch (newspaper in Geneva, Switzerland)
Results: Browser waits for the page to load for two-three minutes to load (but would actually load);
Results: Other browsers, No issue;
(4) Close browser
Browser hangs and requires a kill process.
(5) Restart Browser and Try different web sites, even ones that were ok before;
Results: Browser waits for web site for over two / three minutes;
(5) Kill Process, Restart browser and repeat step 2 and 5:
Same Result;
(6) Kill Process, Replace FF user profile and repeat step 2 and 5:
Same Result;
(7) Uninstall FF for users who were affected; Install FF 52.1.0.6316
Issue no longer reproducible

Link to mailing list thread https://mail.mozilla.org/private/enterprise/2017-May/007712.html
> (2) goto this URL www.lematin.ch (newspaper in Geneva, Switzerland)
[FYI] This loaded in 30 seconds for me on Firefox 55, with a fast network and computer, in e10s.

> (6) Kill Process, Replace FF user profile and repeat step 2 and 5:
> Same Result;

This I take to mean that replacing the profile and going back to www.lematin.ch still causes the hang, but that simply replacing the profile would prevent the browser from hanging on other websites.

> (7) Uninstall FF for users who were affected; Install FF 52.1.0.6316
> Issue no longer reproducible

I'm wondering if it's still worth trying to fix this, since 45.9 is approaching EOL.
A lot of work has been done since then, and I doubt that there is a single feature that we can uplift to fix this.
This is fixed in esr52. Should we mark this as WONTFIX for 45?
Flags: needinfo?(mcmanus)
yes, wontfix 45 given we have more modern answers
Flags: needinfo?(mcmanus)
this i fixed in esr52, so marking this as WONTFIX.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Reopening as this is reported to still be an issue in ESR 52.2.1.

From the enterprise list today:

"I don't know about anyone else, but I am still having the slowness issue with Firefox, even after upgrading to the (unannounced) 52.2.1 release.  For this reason I am pushing Chrome to our users because we cannot keep having random page timeouts slowing us down like this.

I cannot find a way to recreate the issue, other than at least once per hour the whole browser stops loading pages and it takes anywhere from 30-60 seconds before I can use firefox again.  I can open new tabs and start trying to load pages, but none of them load until whatever is bottle-necking the browser finally releases."
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Flags: needinfo?(dd.mozilla)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #5)
> Reopening as this is reported to still be an issue in ESR 52.2.1.
> 
> From the enterprise list today:
> 
> "I don't know about anyone else, but I am still having the slowness issue
> with Firefox, even after upgrading to the (unannounced) 52.2.1 release.  For
> this reason I am pushing Chrome to our users because we cannot keep having
> random page timeouts slowing us down like this.
> 
> I cannot find a way to recreate the issue, other than at least once per hour
> the whole browser stops loading pages and it takes anywhere from 30-60
> seconds before I can use firefox again.  I can open new tabs and start
> trying to load pages, but none of them load until whatever is bottle-necking
> the browser finally releases."

Is it possible to get a http log from someone who can reproduce the issue?
Flags: needinfo?(dd.mozilla) → needinfo?(lmandel)
or at least start logging when this occurs. We have an experimental feature in about:networking than go to logging.
(In reply to Dragana Damjanovic [:dragana] from comment #6)
> Is it possible to get a http log from someone who can reproduce the issue?

I'm happy to post the request back to the enterprise list or you can do that yourself to interact with the people who have reported the problem there.

https://mail.mozilla.org/private/enterprise/2017-May/007712.html

Do you have docs that I can share that detail how to get an HTTP log or how to enable the experimental logging feature?
Flags: needinfo?(lmandel)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #8)
> (In reply to Dragana Damjanovic [:dragana] from comment #6)
> > Is it possible to get a http log from someone who can reproduce the issue?
> 
> I'm happy to post the request back to the enterprise list or you can do that
> yourself to interact with the people who have reported the problem there.
> 
> https://mail.mozilla.org/private/enterprise/2017-May/007712.html
> 
> Do you have docs that I can share that detail how to get an HTTP log or how
> to enable the experimental logging feature?

Yes, https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Using_aboutnetworking
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #8)
> (In reply to Dragana Damjanovic [:dragana] from comment #6)
> > Is it possible to get a http log from someone who can reproduce the issue?
> 
> I'm happy to post the request back to the enterprise list or you can do that
> yourself to interact with the people who have reported the problem there.
> 
> https://mail.mozilla.org/private/enterprise/2017-May/007712.html
> 
> Do you have docs that I can share that detail how to get an HTTP log or how
> to enable the experimental logging feature?

I do not have access. How can I get access?

So what I would like to get is:

A http log file:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

rotate:200 parameter limits a log file to only 200MB and when it is full the new log entries are going to overwrite the old ones. Users can adjust this parameter if they want. So when problem happened please close the browser as soon as possible so that the relevant part of a log is not overwritten.

Http logging will make firefox be slow, maybe that can be annoying.

The other option is to use about:networking:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Using_aboutnetworking
They can start logging as soon as they notice the problem and stop it when the problem goes away.
Hi Lawrence,
Could you offer the http log for clarifying this bug?
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Using_aboutnetworking
Thanks!
Flags: needinfo?(lmandel)
I followed up with a request to the enterprise list today.
Flags: needinfo?(lmandel)
I have posted logs from one of our ESR users on Google Drive. Note that access is restricted to those who need access to fix this bug.

https://drive.google.com/drive/folders/0B-n79IWvmJYYbEdldXBjS08wY2c?usp=sharing
ni Dragana to take a look at the log.
Flags: needinfo?(dd.mozilla)
The log starts at 17:20:30 and ends at about 17:24:07 (shutdown)

At this log I do not see any stalls in necko and data are read and written to the network normally. There are short periods that no data are read or written for a not ore than a second or 2, but that is normal.

I assume this log captured a stall of 30-60 seconds. If so this stall is not in necko, but I cannot say where it is.

Sorry for this delay.

Has the user written which uris he tried to load?
Flags: needinfo?(dd.mozilla) → needinfo?(lmandel)
I have shared the responses from comment 15 with the user and asked if they can share the URIs.
Response from the user "It should have been our internal wiki, facebook, google, and perhaps twitter and sbnation." The user also noted that they haven't been using Firefox recently due to this pageload issue and that 52.3 still exhibits the same behavior, although it has somewhat faster load times (45 seconds vs. 60 seconds).
Flags: needinfo?(lmandel)
So, I think first of all, we need the users to determine if this is also happening in Firefox release and Nightly.
The fact that I'm not able (at all) to reproduce it might mean that we've already fixed it via some other bug. If that is the case, I think the best way to deliver a fix to ESR is for the user to use mozregression to identify the patch(es) that need to be backported.
Priority: -- → P3
Whiteboard: [necko-triaged]
This seems to be a nsChannelClassifier problem.
We start a channel at 2017-09-19 15:31:27.840000
The channel is suspended by nsChannelClassifier at:
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::Starting nsChannelClassifier 16e5d0a0 [this=1b983800]
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::SuspendInternal [this=1b983800]

It is resumed a minute later at:
2017-09-19 15:32:29.812000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::ResumeInternal [this=1b983800]

At the same time 14 other channels are resumed as well.

Francois, can you take a look? Do you know what have change to cause this problem?


Relevant part of the log:

1) a channel is created
2017-09-19 15:31:27.840000 UTC - [Main Thread]: V/nsHttp Creating HttpBaseChannel @1b983800
2017-09-19 15:31:27.840000 UTC - [Main Thread]: D/nsHttp Creating nsHttpChannel [this=1b983800]
2017-09-19 15:31:27.840000 UTC - [Main Thread]: V/nsHttp HttpBaseChannel::Init [this=1b983800]
...
2017-09-19 15:31:27.843000 UTC - [Main Thread]: D/nsHttp nsHttpChannel 1b983800 Using default connection info
2017-09-19 15:31:27.843000 UTC - [Main Thread]: V/nsHttp Init nsHttpConnectionInfo @84b8eb0
...
2017-09-19 15:31:27.843000 UTC - [Main Thread]: D/nsHttp nsHttpHandler::NotifyObservers [chan=1b983848 event="http-on-modify-request"]
2017-09-19 15:31:27.846000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::SuspendInternal [this=1b983800]
2017-09-19 15:31:27.846000 UTC - [Main Thread]: D/nsHttp nsHttpHandler::NotifyObservers [chan=1b983848 event="http-on-useragent-request"]
2017-09-19 15:31:27.846000 UTC - [Main Thread]: D/nsHttp Waiting until resume BeginConnect [this=1b983800]
2017-09-19 15:31:27.875000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::ResumeInternal [this=1b983800]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::HandleBeginConnectContinue [this=1b983800]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::BeginConnect [this=1b983800] prefetching
...
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::ContinueBeginConnectWithResult [this=1b983800]
...
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::Connect [this=1b983800]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: V/nsHttp HttpBaseChannel::SetRequestHeader [this=1b983800 header="Upgrade-Insecure-Requests" value="1" merge=0]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: V/nsHttp nsHttpConnectionMgr::SpeculativeConnect [ci=.......sbnation.com:80]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp NulHttpTransaction::NullHttpTransaction() mActivityDistributor is active [this=13a882e0, sbnation.com]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsSocketTransport STS dispatch [1ba4f380]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsSocketTransport PollableEvent::Signal
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsSocketTransport PollableEvent::Signal PR_Write 1
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::OpenCacheEntry [this=1b983800]
2017-09-19 15:31:27.881000 UTC - [Socket Thread]: D/nsSocketTransport     ...returned after 3309 milliseconds
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::OnCacheEntryAvailable [this=1b983800 entry=1a330fc0 new=1 appcache=0 status=0 mAppCache=0 mAppCacheForWrite=0]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::SetupTransaction [this=1b983800]
2017-09-19 15:31:27.881000 UTC - [Socket Thread]: D/nsSocketTransport PollableEvent::Clear
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp Creating nsHttpTransaction @c6a6800
2017-09-19 15:31:27.881000 UTC - [Socket Thread]: D/nsSocketTransport PollableEvent::Signal PR_Read 1
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel 1b983800 created nsHttpTransaction c6a6800
...
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpTransaction c6a6800 SetRequestContext 267f67f0
2017-09-19 15:31:27.881000 UTC - [Main Thread]: V/nsHttp nsHttpConnectionMgr::AddTransaction [trans=c6a6800 -10]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsSocketTransport STS dispatch [1c58fbc0]
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsSocketTransport PollableEvent::Signal
...
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsSocketTransport PollableEvent::Signal PR_Write 1
2017-09-19 15:31:27.882000 UTC - [Socket Thread]: D/nsSocketTransport PollableEvent::Signal
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::ContinueBeginConnectWithResult result [this=1b983800 rv=0 mCanceled=0]
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::Starting nsChannelClassifier 16e5d0a0 [this=1b983800]
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::SuspendInternal [this=1b983800]
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsStreamPump nsInputStreamPump::Suspend [this=1a1a6aa0]

In line above the channel is suspended.

And resume comes at line:
2017-09-19 15:32:29.812000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::ResumeInternal [this=1b983800]
2017-09-19 15:32:29.812000 UTC - [Main Thread]: D/nsStreamPump nsInputStreamPump::Resume [this=1a1a6aa0]
Flags: needinfo?(francois)
The channel classifier has changed a lot since 52. Thanks mostly to Kershaw for most of these optimizations and fixes.

It would be good to know if that's still a problem on 57+. I'm not sure there's a lot of value in debugging this on 52.
Flags: needinfo?(francois)
(In reply to François Marier [:francois] from comment #20)
> The channel classifier has changed a lot since 52. Thanks mostly to Kershaw
> for most of these optimizations and fixes.
> 
> It would be good to know if that's still a problem on 57+. I'm not sure
> there's a lot of value in debugging this on 52.

Is there any work around for this problem? Turning off a part of the channel classifier?
Flags: needinfo?(francois)
(In reply to Dragana Damjanovic [:dragana] from comment #21)
> Is there any work around for this problem? Turning off a part of the channel
> classifier?

privacy.trackingprotection.enabled = false
privacy.trackingprotection.pbmode.enabled = false
privacy.trackingprotection.annotate_channels = false

If turning off the above eliminates the problem then it is indeed a problem with the channel classifier.
Flags: needinfo?(francois)
Severity: normal → S3

This bug is pretty old and to summarize the above comments I'm not seeing any confirmation that the channel classifier is an issue from anyone who can replicate or if it's even possible to replicate anymore on 57+.

Please re-open if we can get some kind of confirmation.

Status: REOPENED → RESOLVED
Closed: 7 years ago11 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.