Closed Bug 1337826 Opened 7 years ago Closed 7 years ago

V52 beta 4 under Windows 7 crashes once per hour, was once a week before

Categories

(Core :: Networking: HTTP, defect)

52 Branch
x86_64
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox52 --- fixed
firefox-esr52 --- fixed
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: frank, Assigned: dragana)

References

Details

(Keywords: crash, Whiteboard: [necko-active])

Crash Data

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
Build ID: 20170206101855

Steps to reproduce:

No clue, sometimes it crashes when I click a link, sometimes it crashes when in background.


Actual results:

Firefox closes hard, shows "send problem" form. 


Expected results:

Firefox should continue to run.
https://developer.mozilla.org/en-US/docs/How_to_get_a_stacktrace_for_a_bug_report
Severity: normal → critical
Flags: needinfo?(frank)
Keywords: crash
1. I have sent plenty of crash reports. Didn't this help?
2. I am in the wrong environment tonight, if you need more, tell me and I see I can send it tomorrow from the affected machine (I do not use beta on the other machines).
3. Some today crashes also exhibited a "plugin container" popup. Removing Google Hangouts from the permanent tabs seems to make Firefox slightly more stable (2h till crash).
You must provide a crash ID to get the crash signature.
Ah, maybe for historical reasons: Things where much more stable before, below find all previous crashes. Setup and usage essentially didn't change much lately.

bp-e6f5f431-2a66-4ad6-9e49-da8ff2170203
	3.2.17	14:47
bp-5735a18f-b963-4317-b31f-7cdcd2170131
	31.1.17	13:52
bp-35aa79f1-7c86-46d2-a0bd-1def42170127
	27.1.17	16:38
bp-3bd37b6e-880a-4d34-a623-a73072170127
	27.1.17	16:37
bp-d2313819-c8a9-467b-9b85-6702a2170127
	27.1.17	16:13
bp-bc1529d5-686e-4030-9780-150c22170127
	27.1.17	16:13
bp-e99d58c9-d9db-47e2-a097-c29a72161213
	13.12.16	15:42
bp-07a8aa7f-5dd5-499b-bc5f-b08c32161117
	17.11.16	09:52
bp-db212784-fcad-4562-8ea7-440aa2160901
	1.9.16	09:00
bp-ed8c48ec-dc46-4363-925e-4df6c2160628
	28.6.16	09:41
Crash Signature: [@ mozilla::net::nsHttpConnection::DontReuse ]
Component: Untriaged → Networking: HTTP
OS: Unspecified → Windows 7
Product: Firefox → Core
Hardware: Unspecified → x86_64
Hi, thank you for the report.

May I ask you to make a log and send it to me. The instructions are:

https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

can you change the line:
set MOZ_LOG=timestamp,rotate:200,nsHttp:5,nsSocketTransport:5,nsStreamPump:5,nsHostResolver:5

into:
set MOZ_LOG=timestamp,rotate:200,nsHttp:5,nsSocketTransport:5,nsStreamPump:5

rotate:200 means that the log file will not exceed 200MB and then it will start to over write the earliest log lines. If you have space to make a larger file that would be good too. The file is a txt and it can be zipped well. Please attach the log or send it to me via e-mail.

Thanks a lot.
Hi, 

I received beta 5 today, thought it might help. However I crashed it very soon again. 

I hope a b5 log will suffice? 

I'll go and check how to do a log, however there might be privacy concerns about logs. 

How much log do you need? 
Till a crash?

Hope I can help, its Friday night here today, might not get to it soon I am afraid.
(In reply to Frank Nestel from comment #7)
> Hi, 
> 
> I received beta 5 today, thought it might help. However I crashed it very
> soon again. 
> 
> I hope a b5 log will suffice? 

Yes

> 
> I'll go and check how to do a log, however there might be privacy concerns
> about logs. 
> 
> How much log do you need? 
> Till a crash?

I do not know exactly. I cannot tell. Try it wit 200MB and if it is not enough we can try with more.

> 
> Hope I can help, its Friday night here today, might not get to it soon I am
> afraid.

No worry. Thanks!
Assignee: nobody → dd.mozilla
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [necko-active]
Took my company notebook home and did some after work experiments. 

Network situation is not like at office, logging seems to have performance impact. In other word, I do not get it to crash, however I got it to stall, i.e. nothing being loaded any more, no page, no ajax, while a youtube was still playing for a while. Tell me, if the attached tar.gz file helps, if I can do something to the logging or setup.

Here is the link to the logfiles (as tar.gz): https://drive.google.com/file/d/0B_nbjQohON5weTdjZGlkSjFyYWM/view?usp=sharing
(In reply to Frank Nestel from comment #9)
> Took my company notebook home and did some after work experiments. 
> 
> Network situation is not like at office, logging seems to have performance
> impact. In other word, I do not get it to crash, however I got it to stall,
> i.e. nothing being loaded any more, no page, no ajax, while a youtube was
> still playing for a while. Tell me, if the attached tar.gz file helps, if I
> can do something to the logging or setup.
> 
> Here is the link to the logfiles (as tar.gz):
> https://drive.google.com/file/d/0B_nbjQohON5weTdjZGlkSjFyYWM/view?usp=sharing

Firefox with logging is slower.

From the log I see that you network is really slow, there is extremely high RTT. I do not see any problem in firefox, the main thread and the socket thread are not blocked but at some point a tcp connection establishment takes 3seconds. Do you have some firewall install? To be sure whether it is the network or some firewall delaying packet I would need a wireshark dump.
Network is not that bad here, checking a Chrome running in parallel, it really feels like Firefox is blocking itself: Chrome is running smooth. 

Anyway, as mentioned, I am in the wrong network right now, and I did not got Firefox+Logging to crash. 

I will be in the same place/network like last weeks crashes only on Tuesday. I will try to do some logging then. Hopefully I will get to the crash. Like often: Different timing sometimes heals issues :-(

Wild speculation: Having tabs open with https://hangouts.google.com and https://plus.google.com contributes to the observed issues.
OK, back at office, I managed a "trace till crash". Hope this is more helpful!

Logs can be found here: https://drive.google.com/file/d/0B_nbjQohON5weXBDSXJwT2xjX1U/view?usp=sharing

And here is the corresponding crash: https://crash-stats.mozilla.com/report/index/977174c5-13bf-4416-a83b-b0b4f2170214
Can it be, that it is now the logging itself, that is crashing?
Thanks for the log. I think I found what went wrong.
You are using the pipelining. The pipelining is turned off by default, because it is an old technology that has a lot of problems(the technique has problems not the code).

You can try turning it off.
I will described the problem in details in the next comment.
For this transaction we are using pipeline, but i am wondering if we could have a similar problem with h2.

2017-02-14 08:12:10.078000 UTC - [Socket Thread]: V/nsHttp nsHttpConnectionMgr::ActivateTimeoutTick() this=e553c00 mTimeoutTick=1fb3f4b0
2017-02-14 08:12:10.079000 UTC - [Socket Thread]: V/nsHttp nsHttpConnectionMgr::DispatchTransaction [ent-ci=P......sfl-hza-asg-01.schaeffler.com:8080 ab9a040 trans=3c497400 caps=63 conn=4bd5f470 priority=-10]
2017-02-14 08:12:10.079000 UTC - [Socket Thread]: V/nsHttp nsHttpConnectionMgr::DispatchAbstractTransaction [ci=P......sfl-hza-asg-01.schaeffler.com:8080 trans=3c497400 caps=63 conn=4bd5f470]
2017-02-14 08:12:10.079000 UTC - [Socket Thread]: V/nsHttp    using pipeline datastructure.
2017-02-14 08:12:10.079000 UTC - [Socket Thread]: V/nsHttp nsHttpPipeline::AddTransaction [this=36091460 trans=3c497400]
2017-02-14 08:12:10.079000 UTC - [Socket Thread]: V/nsHttp nsHttpPipeline::SetConnection [this=36091460 conn=399acfd0]
2017-02-14 08:12:10.079000 UTC - [Socket Thread]: V/nsHttp nsHttpConnection::Activate [this=4bd5f470 trans=36091464 caps=63]
2017-02-14 08:12:10.079000 UTC - [Socket Thread]: V/nsHttp nsHttpConnection::SetupSSL 4bd5f470 caps=0x63 P......sfl-hza-asg-0

3 other transaction are attached to the pipeline as well.

The first 2 return 407. So we do AuthRetry and the 2 transactions provide a connection:

2017-02-14 08:12:10.186000 UTC - [Main Thread]: D/nsHttp   transaction 3c497400 provides connection 36091460
2017-02-14 08:12:10.186000 UTC - [Main Thread]: D/nsHttp nsHttpTransaction::DeleteSelfOnConsumerThread [this=3c497400]
2017-02-14 08:12:10.186000 UTC - [Main Thread]: D/nsHttp Destroying nsHttpTransaction @3c497400
2017-02-14 08:12:10.186000 UTC - [Main Thread]: D/nsHttp nsHttpTransaction 3c497400 request context set to null in ReleaseBlockingTransaction() - was 0
2017-02-14 08:12:10.186000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::DoAuthRetry [this=3cb3c800]
2017-02-14 08:12:10.186000 UTC - [Main Thread]: V/nsHttp HttpBaseChannel::SetRequestHeader [this=3cb3c800 header="Cookie" value="" merge=0]
2017-02-14 08:12:10.186000 UTC - [Main Thread]: D/nsHttp nsHttpHandler::NotifyObservers [chan=3cb3c82c event="http-on-modify-request"]
2017-02-14 08:12:10.188000 UTC - [Main Thread]: V/nsHttp HttpBaseChannel::SetRequestHeader [this=3cb3c800 header="pragma" value="akamai-x-cache-on, akamai-x-cache-remote-on, akamai-x-check-cacheable, akamai-x-get-cache-key, akamai-x-get-ssl-client-session-id, akamai-x-get-true-cache-key, akamai-x-get-request-id" merge=1]
2017-02-14 08:12:10.188000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::SetupTransaction [this=3cb3c800]
2017-02-14 08:12:10.188000 UTC - [Main Thread]: D/nsHttp Creating nsHttpTransaction @b7bc400
2017-02-14 08:12:10.188000 UTC - [Main Thread]: D/nsHttp nsHttpChannel 3cb3c800 created nsHttpTransaction b7bc400
2017-02-14 08:12:10.188000 UTC - [Main Thread]: D/nsHttp nsHttpTransaction::Init [this=b7bc400 caps=65]


While dispatching the new transaction 0xb7bc400 we take httpConnection out of the pipeline(httPipeline is 0x36091460 and nsHttpConnection is 0x4bd5f470) and trying to activate nsHttpConnection fails with a hard error:

2017-02-14 08:12:10.189000 UTC - [Socket Thread]: V/nsHttp nsHttpConnectionMgr::ProcessNewTransaction trans=b7bc400 sticky connection=4bd5f470
2017-02-14 08:12:10.189000 UTC - [Socket Thread]: V/nsHttp nsHttpConnectionMgr::DispatchTransaction [ent-ci=P......sfl-hza-asg-01.schaeffler.com:8080 ab9a040 trans=b7bc400 caps=65 conn=4bd5f470 priority=-10]
2017-02-14 08:12:10.189000 UTC - [Socket Thread]: V/nsHttp nsHttpConnectionMgr::DispatchAbstractTransaction [ci=P......sfl-hza-asg-01.schaeffler.com:8080 trans=b7bc400 caps=65 conn=4bd5f470]
2017-02-14 08:12:10.189000 UTC - [Socket Thread]: V/nsHttp    not using pipeline datastructure due to class solo.
2017-02-14 08:12:10.189000 UTC - [Socket Thread]: V/nsHttp nsHttpConnection::Activate [this=4bd5f470 trans=b7bc400 caps=65]
2017-02-14 08:12:10.189000 UTC - [Socket Thread]: V/nsHttp   conn->Activate failed [rv=804b000f]
2017-02-14 08:12:10.189000 UTC - [Socket Thread]: V/nsHttp nsHttpConnectionMgr::ConditionallyStopTimeoutTick armed=1 active=8
2017-02-14 08:12:10.189000 UTC - [Socket Thread]: V/nsHttp   ProcessNewTransaction Hard Error trans=b7bc400 rv=804b000f
2017-02-14 08:12:10.189000 UTC - [Socket Thread]: D/nsHttp nsHttpTransaction::Close [this=b7bc400 reason=804b000f]


In nsHttpConnectionMgr::ProcessNewTransaction we really take the nsHttpConnection reference:
https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpConnectionMgr.cpp#1899

so nsHttpPipeline has a reference to ConnectionHandler which has a reference to nsHttpConnection. TakeHttpConnection() take the reference to nsHttpConnection from ConnectionHandler but nsPipeline still have a reference to ConnectionHandler. 

The problem is that ConnectionHandler very often uses reference to nsHttpConnection without checking if it is null.

The crash reports are bit strange, because it is crashing in nsHttpConnection. I think that the crash interpreter is wrong.

I will write a patch that forbids using sticky connection if it is a pipeline.

Patrick, you know this code better, I have a filling that this code can cause problems not only with httpPipeline, what do you think?
Flags: needinfo?(mcmanus)
(In reply to Dragana Damjanovic [:dragana] from comment #14)
> Thanks for the log. I think I found what went wrong.
> You are using the pipelining. The pipelining is turned off by default,
> because it is an old technology that has a lot of problems(the technique has
> problems not the code).
> 
> You can try turning it off.
> I will described the problem in details in the next comment.

OK, I turned of pipelining by now. Haven't tweaked this for 2 years or so. I thought pipelining does not mix with SPDY or HTTP2 anyway?!

Thank you!
I looked it up, this is not going to affect h2.
Flags: needinfo?(mcmanus)
(In reply to Frank Nestel from comment #16)
> 
> OK, I turned of pipelining by now. Haven't tweaked this for 2 years or so. I
> thought pipelining does not mix with SPDY or HTTP2 anyway?!
> 
> Thank you!

It does not. Http2 and pipelining will not be used at the same time, but you can still use pipelining on http 1 connections.

May I ask you to try running this firefox (with pipelining turned off)  on your home network? It was stange that was behaving so bad. Thanks!
Attachment #8837260 - Flags: review?(mcmanus)
Here are logs from my home WLAN. Pingtime to German google.de was around 30ms, not super bad. No crash.

https://drive.google.com/file/d/0B_nbjQohON5wVTl4RjZkdUVYaVU/view?usp=sharing

BTW I have no clue about the object hierarchy. Can it be, that pipelining+SPDY creates memory leaks in FireFox? Memory consumption here was getting worse lately.
Depends on: 130655
so the sticky stuff is generally related to NTLM connection based auth, which is not normally deployed with h2 (as conn based auth really makes no sense on h2).. so I think we're ok there.

as you've probably seen I think its time to kill pipelines anyhow - there is just a wasted maint burden there. bur this should probly be uplifted (please nom), so lets land it.
Comment on attachment 8837260 [details] [diff] [review]
bug_1337826.patch

Review of attachment 8837260 [details] [diff] [review]:
-----------------------------------------------------------------

thanks for tracking this down
Attachment #8837260 - Flags: review?(mcmanus) → review+
Depends on: 1340655
No longer depends on: 130655
Keywords: checkin-needed
Comment on attachment 8837260 [details] [diff] [review]
bug_1337826.patch

Approval Request Comment
[Feature/Bug causing the regression]: Conbination of the http pipelining and http authentication. Both a very log existing features. 
[User impact if declined]: crash
[Is this code covered by automated tests?]: No.
[Has the fix been verified in Nightly?]:no
[Needs manual test from QE? If yes, steps to reproduce]: 
[List of other uplifts needed for the feature/fix]: only this change, it is really small one.
[Is the change risky?]: Very low, if an authentication fails the patch forbids reusing the same connection if pipeline is used. This code path already exists and this patch adds one more check.
[Why is the change risky/not risky?]: The pipeline is turned off by default. It is going to affect very small amount of users. The affected users use authentication and have pipelining turned on. 
[String changes made/needed]:none
Attachment #8837260 - Flags: approval-mozilla-beta?
Attachment #8837260 - Flags: approval-mozilla-aurora?
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3f70707e9a98
Do not use sticky connection if it is pipeline. r=mcmanus
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/3f70707e9a98
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Comment on attachment 8837260 [details] [diff] [review]
bug_1337826.patch

Fix a crash. Aurora53+.
Attachment #8837260 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Do we know what made this worse for the reporter in 52?
(In reply to Julien Cristau [:jcristau] from comment #28)
> Do we know what made this worse for the reporter in 52?

As the reporter I have no clue, it just happened to bekomme way more often from 52 beta 3 to 52 beta 4. Someone probably could do the diffs? I cannot rule out however, that my companies firewalls have been tempered just in this time interval. Different department, different story ..
(In reply to Julien Cristau [:jcristau] from comment #28)
> Do we know what made this worse for the reporter in 52?

I do not know. This code is old and it will crash every time when the http pipelining and http authentication is involved. But both need to be there. In Firefox pipeline is turned off by default(and it will be removed soon). The user said that he has not changed his config settings recently. So maybe server changed, e.g. started to accept pipelining.
Comment on attachment 8837260 [details] [diff] [review]
bug_1337826.patch

fix crash with http pipeline and auth, beta/release 52+

Should be in 52 rc2.
Attachment #8837260 - Flags: approval-mozilla-release+
Attachment #8837260 - Flags: approval-mozilla-beta?
Attachment #8837260 - Flags: approval-mozilla-beta+
Blocks: 1347856
I have a report of crashes from 53.0.2 with pipelining enabled 

https://www.reddit.com/r/firefox/comments/6co4q2/continual_firefox_5302_crashes/

Can we get it verified that the patch as uplifted actually fixed the crash?
Flags: needinfo?(dd.mozilla)
(In reply to Wes Kocher (:KWierso) from comment #36)
> I have a report of crashes from 53.0.2 with pipelining enabled 
> 
> https://www.reddit.com/r/firefox/comments/6co4q2/
> continual_firefox_5302_crashes/
> 
> Can we get it verified that the patch as uplifted actually fixed the crash?

The patch has fixed the problem I have seen in the log. Probably there are different paths that can cause the same crash. The pipelining has never turned on by default and it is removed in firefox 54.

The best approach is to turn off pipeling.

I could take a look at a http log(comment #6) to figure out what is happening. Maybe it is worth fixing this for 52esr.
Flags: needinfo?(dd.mozilla)
See Also: → 1451293
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: