1370629 - Report of browser being unresponsive on www.lematin.ch and requiring force kill

Lawrence Mandel [:lmandel] (use needinfo)

Reporter

Description

•

7 years ago

Reported on the ESR list with Firefox 52.1 and 45.9. I'm guessing at the component based on the number of network requests but cannot reproduce on OS X.

Using FF 45.9.0.6310 & Windows 7.
(1) Start FF
(2) goto this URL www.lematin.ch (newspaper in Geneva, Switzerland)
Results: Browser waits for the page to load for two-three minutes to load (but would actually load);
Results: Other browsers, No issue;
(4) Close browser
Browser hangs and requires a kill process.
(5) Restart Browser and Try different web sites, even ones that were ok before;
Results: Browser waits for web site for over two / three minutes;
(5) Kill Process, Restart browser and repeat step 2 and 5:
Same Result;
(6) Kill Process, Replace FF user profile and repeat step 2 and 5:
Same Result;
(7) Uninstall FF for users who were affected; Install FF 52.1.0.6316
Issue no longer reproducible

Link to mailing list thread https://mail.mozilla.org/private/enterprise/2017-May/007712.html

Valentin Gosu [:valentin] (he/him)

Comment 1

•

7 years ago

> (2) goto this URL www.lematin.ch (newspaper in Geneva, Switzerland)
[FYI] This loaded in 30 seconds for me on Firefox 55, with a fast network and computer, in e10s.

> (6) Kill Process, Replace FF user profile and repeat step 2 and 5:
> Same Result;

This I take to mean that replacing the profile and going back to www.lematin.ch still causes the hang, but that simply replacing the profile would prevent the browser from hanging on other websites.

> (7) Uninstall FF for users who were affected; Install FF 52.1.0.6316
> Issue no longer reproducible

I'm wondering if it's still worth trying to fix this, since 45.9 is approaching EOL.
A lot of work has been done since then, and I doubt that there is a single feature that we can uplift to fix this.

Dragana Damjanovic [:dragana]

Comment 2

•

7 years ago

This is fixed in esr52. Should we mark this as WONTFIX for 45?

Flags: needinfo?(mcmanus)

Patrick McManus [:mcmanus]

Comment 3

•

7 years ago

yes, wontfix 45 given we have more modern answers

Flags: needinfo?(mcmanus)

Dragana Damjanovic [:dragana]

Comment 4

•

7 years ago

this i fixed in esr52, so marking this as WONTFIX.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WONTFIX

Lawrence Mandel [:lmandel] (use needinfo)

Reporter

Comment 5

•

7 years ago

Reopening as this is reported to still be an issue in ESR 52.2.1.

From the enterprise list today:

"I don't know about anyone else, but I am still having the slowness issue with Firefox, even after upgrading to the (unannounced) 52.2.1 release.  For this reason I am pushing Chrome to our users because we cannot keep having random page timeouts slowing us down like this.

I cannot find a way to recreate the issue, other than at least once per hour the whole browser stops loading pages and it takes anywhere from 30-60 seconds before I can use firefox again.  I can open new tabs and start trying to load pages, but none of them load until whatever is bottle-necking the browser finally releases."

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

Gary Chen [:xeonchen]

Updated

•

7 years ago

Flags: needinfo?(dd.mozilla)

Dragana Damjanovic [:dragana]

Comment 6

•

7 years ago

(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #5)
> Reopening as this is reported to still be an issue in ESR 52.2.1.
> 
> From the enterprise list today:
> 
> "I don't know about anyone else, but I am still having the slowness issue
> with Firefox, even after upgrading to the (unannounced) 52.2.1 release.  For
> this reason I am pushing Chrome to our users because we cannot keep having
> random page timeouts slowing us down like this.
> 
> I cannot find a way to recreate the issue, other than at least once per hour
> the whole browser stops loading pages and it takes anywhere from 30-60
> seconds before I can use firefox again.  I can open new tabs and start
> trying to load pages, but none of them load until whatever is bottle-necking
> the browser finally releases."

Is it possible to get a http log from someone who can reproduce the issue?

Flags: needinfo?(dd.mozilla) → needinfo?(lmandel)

Dragana Damjanovic [:dragana]

Comment 7

•

7 years ago

or at least start logging when this occurs. We have an experimental feature in about:networking than go to logging.

Lawrence Mandel [:lmandel] (use needinfo)

Reporter

Comment 8

•

7 years ago

(In reply to Dragana Damjanovic [:dragana] from comment #6)
> Is it possible to get a http log from someone who can reproduce the issue?

I'm happy to post the request back to the enterprise list or you can do that yourself to interact with the people who have reported the problem there.

https://mail.mozilla.org/private/enterprise/2017-May/007712.html

Do you have docs that I can share that detail how to get an HTTP log or how to enable the experimental logging feature?

Flags: needinfo?(lmandel)

Gary Chen [:xeonchen]

Comment 9

•

7 years ago

(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #8)
> (In reply to Dragana Damjanovic [:dragana] from comment #6)
> > Is it possible to get a http log from someone who can reproduce the issue?
> 
> I'm happy to post the request back to the enterprise list or you can do that
> yourself to interact with the people who have reported the problem there.
> 
> https://mail.mozilla.org/private/enterprise/2017-May/007712.html
> 
> Do you have docs that I can share that detail how to get an HTTP log or how
> to enable the experimental logging feature?

Yes, https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Using_aboutnetworking

Dragana Damjanovic [:dragana]

Comment 10

•

7 years ago

(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #8)
> (In reply to Dragana Damjanovic [:dragana] from comment #6)
> > Is it possible to get a http log from someone who can reproduce the issue?
> 
> I'm happy to post the request back to the enterprise list or you can do that
> yourself to interact with the people who have reported the problem there.
> 
> https://mail.mozilla.org/private/enterprise/2017-May/007712.html
> 
> Do you have docs that I can share that detail how to get an HTTP log or how
> to enable the experimental logging feature?

I do not have access. How can I get access?

So what I would like to get is:

A http log file:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

rotate:200 parameter limits a log file to only 200MB and when it is full the new log entries are going to overwrite the old ones. Users can adjust this parameter if they want. So when problem happened please close the browser as soon as possible so that the relevant part of a log is not overwritten.

Http logging will make firefox be slow, maybe that can be annoying.

The other option is to use about:networking:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Using_aboutnetworking
They can start logging as soon as they notice the problem and stop it when the problem goes away.

Amy Chung [:Amy]

Comment 11

•

7 years ago

Hi Lawrence,
Could you offer the http log for clarifying this bug?
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging#Using_aboutnetworking
Thanks!

Flags: needinfo?(lmandel)

Lawrence Mandel [:lmandel] (use needinfo)

Reporter

Comment 12

•

7 years ago

I followed up with a request to the enterprise list today.

Flags: needinfo?(lmandel)

Lawrence Mandel [:lmandel] (use needinfo)

Reporter

Comment 13

•

7 years ago

I have posted logs from one of our ESR users on Google Drive. Note that access is restricted to those who need access to fix this bug.

https://drive.google.com/drive/folders/0B-n79IWvmJYYbEdldXBjS08wY2c?usp=sharing

Kershaw Chang [:kershaw]

Comment 14

•

7 years ago

ni Dragana to take a look at the log.

Flags: needinfo?(dd.mozilla)

Dragana Damjanovic [:dragana]

Comment 15

•

7 years ago

The log starts at 17:20:30 and ends at about 17:24:07 (shutdown)

At this log I do not see any stalls in necko and data are read and written to the network normally. There are short periods that no data are read or written for a not ore than a second or 2, but that is normal.

I assume this log captured a stall of 30-60 seconds. If so this stall is not in necko, but I cannot say where it is.

Sorry for this delay.

Has the user written which uris he tried to load?

Flags: needinfo?(dd.mozilla) → needinfo?(lmandel)

Lawrence Mandel [:lmandel] (use needinfo)

Reporter

Comment 16

•

7 years ago

I have shared the responses from comment 15 with the user and asked if they can share the URIs.

Lawrence Mandel [:lmandel] (use needinfo)

Reporter

Comment 17

•

7 years ago

Response from the user "It should have been our internal wiki, facebook, google, and perhaps twitter and sbnation." The user also noted that they haven't been using Firefox recently due to this pageload issue and that 52.3 still exhibits the same behavior, although it has somewhat faster load times (45 seconds vs. 60 seconds).

Flags: needinfo?(lmandel)

Valentin Gosu [:valentin] (he/him)

Comment 18

•

7 years ago

So, I think first of all, we need the users to determine if this is also happening in Firefox release and Nightly.
The fact that I'm not able (at all) to reproduce it might mean that we've already fixed it via some other bug. If that is the case, I think the best way to deliver a fix to ESR is for the user to use mozregression to identify the patch(es) that need to be backported.

Valentin Gosu [:valentin] (he/him)

Updated

•

7 years ago

Priority: -- → P3

Whiteboard: [necko-triaged]

Dragana Damjanovic [:dragana]

Comment 19

•

7 years ago

This seems to be a nsChannelClassifier problem.
We start a channel at 2017-09-19 15:31:27.840000
The channel is suspended by nsChannelClassifier at:
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::Starting nsChannelClassifier 16e5d0a0 [this=1b983800]
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::SuspendInternal [this=1b983800]

It is resumed a minute later at:
2017-09-19 15:32:29.812000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::ResumeInternal [this=1b983800]

At the same time 14 other channels are resumed as well.

Francois, can you take a look? Do you know what have change to cause this problem?


Relevant part of the log:

1) a channel is created
2017-09-19 15:31:27.840000 UTC - [Main Thread]: V/nsHttp Creating HttpBaseChannel @1b983800
2017-09-19 15:31:27.840000 UTC - [Main Thread]: D/nsHttp Creating nsHttpChannel [this=1b983800]
2017-09-19 15:31:27.840000 UTC - [Main Thread]: V/nsHttp HttpBaseChannel::Init [this=1b983800]
...
2017-09-19 15:31:27.843000 UTC - [Main Thread]: D/nsHttp nsHttpChannel 1b983800 Using default connection info
2017-09-19 15:31:27.843000 UTC - [Main Thread]: V/nsHttp Init nsHttpConnectionInfo @84b8eb0
...
2017-09-19 15:31:27.843000 UTC - [Main Thread]: D/nsHttp nsHttpHandler::NotifyObservers [chan=1b983848 event="http-on-modify-request"]
2017-09-19 15:31:27.846000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::SuspendInternal [this=1b983800]
2017-09-19 15:31:27.846000 UTC - [Main Thread]: D/nsHttp nsHttpHandler::NotifyObservers [chan=1b983848 event="http-on-useragent-request"]
2017-09-19 15:31:27.846000 UTC - [Main Thread]: D/nsHttp Waiting until resume BeginConnect [this=1b983800]
2017-09-19 15:31:27.875000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::ResumeInternal [this=1b983800]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::HandleBeginConnectContinue [this=1b983800]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::BeginConnect [this=1b983800] prefetching
...
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::ContinueBeginConnectWithResult [this=1b983800]
...
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::Connect [this=1b983800]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: V/nsHttp HttpBaseChannel::SetRequestHeader [this=1b983800 header="Upgrade-Insecure-Requests" value="1" merge=0]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: V/nsHttp nsHttpConnectionMgr::SpeculativeConnect [ci=.......sbnation.com:80]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp NulHttpTransaction::NullHttpTransaction() mActivityDistributor is active [this=13a882e0, sbnation.com]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsSocketTransport STS dispatch [1ba4f380]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsSocketTransport PollableEvent::Signal
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsSocketTransport PollableEvent::Signal PR_Write 1
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::OpenCacheEntry [this=1b983800]
2017-09-19 15:31:27.881000 UTC - [Socket Thread]: D/nsSocketTransport     ...returned after 3309 milliseconds
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::OnCacheEntryAvailable [this=1b983800 entry=1a330fc0 new=1 appcache=0 status=0 mAppCache=0 mAppCacheForWrite=0]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::SetupTransaction [this=1b983800]
2017-09-19 15:31:27.881000 UTC - [Socket Thread]: D/nsSocketTransport PollableEvent::Clear
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp Creating nsHttpTransaction @c6a6800
2017-09-19 15:31:27.881000 UTC - [Socket Thread]: D/nsSocketTransport PollableEvent::Signal PR_Read 1
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpChannel 1b983800 created nsHttpTransaction c6a6800
...
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsHttp nsHttpTransaction c6a6800 SetRequestContext 267f67f0
2017-09-19 15:31:27.881000 UTC - [Main Thread]: V/nsHttp nsHttpConnectionMgr::AddTransaction [trans=c6a6800 -10]
2017-09-19 15:31:27.881000 UTC - [Main Thread]: D/nsSocketTransport STS dispatch [1c58fbc0]
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsSocketTransport PollableEvent::Signal
...
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsSocketTransport PollableEvent::Signal PR_Write 1
2017-09-19 15:31:27.882000 UTC - [Socket Thread]: D/nsSocketTransport PollableEvent::Signal
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::ContinueBeginConnectWithResult result [this=1b983800 rv=0 mCanceled=0]
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::Starting nsChannelClassifier 16e5d0a0 [this=1b983800]
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::SuspendInternal [this=1b983800]
2017-09-19 15:31:27.882000 UTC - [Main Thread]: D/nsStreamPump nsInputStreamPump::Suspend [this=1a1a6aa0]

In line above the channel is suspended.

And resume comes at line:
2017-09-19 15:32:29.812000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::ResumeInternal [this=1b983800]
2017-09-19 15:32:29.812000 UTC - [Main Thread]: D/nsStreamPump nsInputStreamPump::Resume [this=1a1a6aa0]

Flags: needinfo?(francois)

François Marier [:francois]

Comment 20

•

7 years ago

The channel classifier has changed a lot since 52. Thanks mostly to Kershaw for most of these optimizations and fixes.

It would be good to know if that's still a problem on 57+. I'm not sure there's a lot of value in debugging this on 52.

Flags: needinfo?(francois)

Dragana Damjanovic [:dragana]

Comment 21

•

7 years ago

(In reply to François Marier [:francois] from comment #20)
> The channel classifier has changed a lot since 52. Thanks mostly to Kershaw
> for most of these optimizations and fixes.
> 
> It would be good to know if that's still a problem on 57+. I'm not sure
> there's a lot of value in debugging this on 52.

Is there any work around for this problem? Turning off a part of the channel classifier?

Flags: needinfo?(francois)

François Marier [:francois]

Comment 22

•

7 years ago

(In reply to Dragana Damjanovic [:dragana] from comment #21)
> Is there any work around for this problem? Turning off a part of the channel
> classifier?

privacy.trackingprotection.enabled = false
privacy.trackingprotection.pbmode.enabled = false
privacy.trackingprotection.annotate_channels = false

If turning off the above eliminates the problem then it is indeed a problem with the channel classifier.

Flags: needinfo?(francois)

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Ed Guloien [:edgul]

Comment 23

•

11 months ago

This bug is pretty old and to summarize the above comments I'm not seeing any confirmation that the channel classifier is an issue from anyone who can replicate or if it's even possible to replicate anymore on 57+.

Please re-open if we can get some kind of confirmation.

Status: REOPENED → RESOLVED

Closed: 7 years ago → 11 months ago

Resolution: --- → INCOMPLETE