Stack overflow nsHttpConnectionMgr with port out of range

RESOLVED FIXED in Firefox 15

Status

()

Core
Networking: HTTP
--
critical
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: bc, Assigned: mcmanus)

Tracking

(Blocks: 1 bug, {crash, reproducible})

Trunk
mozilla17
All
Windows XP
crash, reproducible
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox15+ verified, firefox16+ fixed, firefox17 fixed)

Details

(Whiteboard: qa-, crash signature, URL)

Attachments

(4 attachments)

(Reporter)

Description

5 years ago
1. http://www.reliancecommodities.co.in/
2. stack overflow crash in nsHttpConnectionMgr

Note repeated occurrences of:

WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80070057: file c:/work/mozilla/builds/nightly/mozilla/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 2368
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80070057: file c:/work/mozilla/builds/nightly/mozilla/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 1728
WARNING: NS_ENSURE_TRUE(port >= 0 && port <= 0xFFFF) failed: file c:/work/mozilla/builds/nightly/mozilla/netwerk/base/src/nsSocketTransportService2.cpp, line 528

Beta/15, Aurora/16, Nightly/17 Windows XP, 7

the crash automation pseudo-stack/signature in debug is:

NS_DebugBreak_P | nsSocketTransportService::CreateTransport(char const**, unsigned int, nsACString_internal const&, int, nsIProxyInfo*, nsISocketTransport**) nsHttpConnectionMgr::nsHalfOpenSocket::SetupStreams(nsISocketTransport**, nsIAsyncInputStream**, nsIAsyncOutputStream**, bool) nsHttpConnectionMgr::nsHalfOpenSocket::SetupPrimaryStreams() nsHttpConnectionMgr::CreateTransport(nsHttpConnectionMgr::nsConnectionEntry*, nsAHttpTransaction*, unsigned char, bool) nsHttpConnectionMgr::MakeNewConnection(nsHttpConnectionMgr::nsConnectionEntry*, nsHttpTransaction*)

though crashes in opt builds are all over the place:

bp-d624faba-7cd9-4286-9179-caca32120805
bp-23968934-be67-4ceb-b683-f4f002120805
bp-10d01768-4dc4-4186-bb1a-40bbd2120805
bp-cb568ff3-e3af-4d7c-bd08-9878d2120805
I'll triage this monday. Its possible this is related to something I promoted to beta a week ago.
Assignee: nobody → mcmanus
(Reporter)

Comment 2

5 years ago
Nightly:
Found regression between 20120719205043-20120720205642
Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3a05d298599e&tochange=045c11dd41a6
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/07/2012-07-20-mozilla-central-debug/firefox-17.0a1.en-US.debug-win32.installer.exe
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/07/2012-07-21-mozilla-central-debug/firefox-17.0a1.en-US.debug-win32.installer.exe

These look like possible winners:

bug 775515 nshttpconnectionmgr::restrictconnections() for half opens only if they never connected r=honzab
bug 758972 - make spdysession::verifystream() a DEBUG only operation r=honzab
bug 775508 http connection diagnostic half-open elapsed syn time incorrect r=honzab
bug 770331 - always try and negotiate HTTP Keep-Alive r=biesi

Aurora: Found regression between 20120725192259-20120727000417
Beta:   Found regression between 20120728074337-20120729162537

Unfortunately the pushlog ranges for Beta and Aurora don't include these bugs. I'm not sure whats up with that.
Created attachment 649398 [details] [diff] [review]
test.0

test case to illustrate the problem, and verify the fix
Attachment #649398 - Flags: review?(honzab.moz)
Created attachment 649402 [details] [diff] [review]
fix part 1 v0

762162 triggered a latent bug. in 762126 we changed the invocation of ProcessPendingQForEntry in the dtor of nsHalfSocket() to happen more often.

This bug shows that there was always an infinite loop with that logic, we just weren't seeing it before. The ProcessPendingQ logic should not be done on the same stack freeing the nsHalfOpen socket, instead it should go through the PostEvent interface to avoid messing with a pending vector that might be currently being iterated.

It turns out there is more to be addressed here, that will be in the next patch.
Attachment #649402 - Flags: review?(honzab.moz)
Created attachment 649405 [details] [diff] [review]
ifx part 2 v0

A transaction that synchronously fails CreateTransport() will have the transaction::close() method called (which will generate onStopRequest), but will not be deleted from the pending queue.. this both gums up the scheduling works (constantly trying to dispatch that item) and is essentially a memory leak as the transaction entry holds a reference.

a reference to an invalid port is one such use case.
Attachment #649405 - Flags: review?(honzab.moz)
I think I'm going to suggest a backout from beta of the code that changed the halfopen dtor handling, and then a promotion of these fixes to aurora.
Created attachment 649522 [details] [diff] [review]
beta (ff15) backout of 762162 v0

This is a straight backout of 762162 from beta (ff15).

While I really want to see 762162 fixed, given the non-trivial complexity of the necessary fix I'm going to split the baby and suggest we should back it out of beta-15 and validate the followon fix on aurora-16 (once it gets reviewed and merged into m-c).

I say that based on the relatively few complaints I've heard about 762162, but if drivers feel that is more of a problem than I am representing then the main patches in this bug should be approved for beta as followon fixes to 762162.
Attachment #649522 - Flags: approval-mozilla-beta?
Depends on: 762162
Comment on attachment 649398 [details] [diff] [review]
test.0

Review of attachment 649398 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/test/unit/test_invalidport.js
@@ +16,5 @@
> +    do_throw("Should not get any data!");
> +  },
> +
> +  onStopRequest: function test_onStopR(request, ctx, status) {
> +    if (counter++ == 10)

Nicer would be to have this as a const at the top of the test.

Is 10 enough?  Maybe on your machine is, what about other platforms?

@@ +25,5 @@
> +};
> +
> +function run_test() {
> +  do_test_pending();
> +  execute_test();

Just swap the two lines please (do_test_pending() last).
Attachment #649398 - Flags: review?(honzab.moz) → review+
Attachment #649402 - Flags: review?(honzab.moz) → review+
Comment on attachment 649522 [details] [diff] [review]
beta (ff15) backout of 762162 v0

approving backout, and yes - let's get the follow-on fixes on Aurora.
Attachment #649522 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
status-firefox15: --- → affected
status-firefox16: --- → affected
tracking-firefox15: --- → +
tracking-firefox16: --- → +
Comment on attachment 649405 [details] [diff] [review]
ifx part 2 v0

Review of attachment 649405 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab
Attachment #649405 - Flags: review?(honzab.moz) → review+
backout of from beta-15
https://hg.mozilla.org/releases/mozilla-beta/rev/ac27ec3a105c
status-firefox15: affected → fixed
status-firefox17: --- → affected
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/a15c1d9a4490
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/1d635991a241
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/05ccb3544dc8
status-firefox17: affected → fixed
https://hg.mozilla.org/mozilla-central/rev/a15c1d9a4490
https://hg.mozilla.org/mozilla-central/rev/1d635991a241
https://hg.mozilla.org/mozilla-central/rev/05ccb3544dc8
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Comment on attachment 649402 [details] [diff] [review]
fix part 1 v0

in 3 parts. followon fixes for aurora as per comments 7 and 9.. they are now on m-c. 1 line of this covers a regression from 762162 (part 1), but the rest addresses a long standing deadlock bug illustrated by the STR (part 2). Test case included. (part 3)
Attachment #649402 - Flags: approval-mozilla-aurora?
Comment on attachment 649405 [details] [diff] [review]
ifx part 2 v0

in 3 parts. followon fixes for aurora as per comments 7 and 9.. they are now on m-c. 1 line of this covers a regression from 762162 (part 1), but the rest addresses a long standing deadlock bug illustrated by the STR (part 2). Test case included. (part 3)
Attachment #649405 - Flags: approval-mozilla-aurora?
Comment on attachment 649398 [details] [diff] [review]
test.0

in 3 parts. followon fixes for aurora as per comments 7 and 9.. they are now on m-c. 1 line of this covers a regression from 762162 (part 1), but the rest addresses a long standing deadlock bug illustrated by the STR (part 2). Test case included. (part 3)
Attachment #649398 - Flags: approval-mozilla-aurora?
Attachment #649398 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #649402 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #649405 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
remote:   https://hg.mozilla.org/releases/mozilla-aurora/rev/47a96c3f4b5e
remote:   https://hg.mozilla.org/releases/mozilla-aurora/rev/ffcd0fbd502a
remote:   https://hg.mozilla.org/releases/mozilla-aurora/rev/4768b4381334
status-firefox16: affected → fixed
Keywords: verifyme
Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20100101 Firefox/15.0
Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0

Verified that Firefox 15 beta 5 does not crash on Windows 7 and on Win XP when following the steps to reproduce from the Description.

Checked the crash stats on Socorro for all the 4 signatures and I only found one crash report after the bug was fixed that has a different crash stack from the ones listed in this bug.
https://crash-stats.mozilla.com/report/index/0b392988-9c07-4ee4-b1d9-1eb172120815

Setting tracking flag for Firefox 15 to verified.
status-firefox15: fixed → verified
Keywords: verifyme
QA Contact: simona.marcu
Whiteboard: qa-
You need to log in before you can comment on or make changes to this bug.