Last Comment Bug 780522 - Stack overflow nsHttpConnectionMgr with port out of range
: Stack overflow nsHttpConnectionMgr with port out of range
Status: RESOLVED FIXED
qa-
: crash, reproducible
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: All Windows XP
: -- critical (vote)
: mozilla17
Assigned To: Patrick McManus [:mcmanus]
: Simona B [:simonab ]
: Patrick McManus [:mcmanus]
Mentors:
http://www.reliancecommodities.co.in/
Depends on: 762162
Blocks: 532972
  Show dependency treegraph
 
Reported: 2012-08-05 16:48 PDT by Bob Clary [:bc:]
Modified: 2012-08-30 06:51 PDT (History)
7 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
verified
+
fixed
fixed


Attachments
test.0 (1.93 KB, patch)
2012-08-06 13:35 PDT, Patrick McManus [:mcmanus]
honzab.moz: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
fix part 1 v0 (1.00 KB, patch)
2012-08-06 13:42 PDT, Patrick McManus [:mcmanus]
honzab.moz: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
ifx part 2 v0 (9.26 KB, patch)
2012-08-06 13:52 PDT, Patrick McManus [:mcmanus]
honzab.moz: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
beta (ff15) backout of 762162 v0 (14.18 KB, patch)
2012-08-06 19:11 PDT, Patrick McManus [:mcmanus]
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Bob Clary [:bc:] 2012-08-05 16:48:57 PDT
1. http://www.reliancecommodities.co.in/
2. stack overflow crash in nsHttpConnectionMgr

Note repeated occurrences of:

WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80070057: file c:/work/mozilla/builds/nightly/mozilla/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 2368
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80070057: file c:/work/mozilla/builds/nightly/mozilla/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 1728
WARNING: NS_ENSURE_TRUE(port >= 0 && port <= 0xFFFF) failed: file c:/work/mozilla/builds/nightly/mozilla/netwerk/base/src/nsSocketTransportService2.cpp, line 528

Beta/15, Aurora/16, Nightly/17 Windows XP, 7

the crash automation pseudo-stack/signature in debug is:

NS_DebugBreak_P | nsSocketTransportService::CreateTransport(char const**, unsigned int, nsACString_internal const&, int, nsIProxyInfo*, nsISocketTransport**) nsHttpConnectionMgr::nsHalfOpenSocket::SetupStreams(nsISocketTransport**, nsIAsyncInputStream**, nsIAsyncOutputStream**, bool) nsHttpConnectionMgr::nsHalfOpenSocket::SetupPrimaryStreams() nsHttpConnectionMgr::CreateTransport(nsHttpConnectionMgr::nsConnectionEntry*, nsAHttpTransaction*, unsigned char, bool) nsHttpConnectionMgr::MakeNewConnection(nsHttpConnectionMgr::nsConnectionEntry*, nsHttpTransaction*)

though crashes in opt builds are all over the place:

bp-d624faba-7cd9-4286-9179-caca32120805
bp-23968934-be67-4ceb-b683-f4f002120805
bp-10d01768-4dc4-4186-bb1a-40bbd2120805
bp-cb568ff3-e3af-4d7c-bd08-9878d2120805
Comment 1 Patrick McManus [:mcmanus] 2012-08-05 17:08:19 PDT
I'll triage this monday. Its possible this is related to something I promoted to beta a week ago.
Comment 2 Bob Clary [:bc:] 2012-08-05 19:42:23 PDT
Nightly:
Found regression between 20120719205043-20120720205642
Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3a05d298599e&tochange=045c11dd41a6
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/07/2012-07-20-mozilla-central-debug/firefox-17.0a1.en-US.debug-win32.installer.exe
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/07/2012-07-21-mozilla-central-debug/firefox-17.0a1.en-US.debug-win32.installer.exe

These look like possible winners:

bug 775515 nshttpconnectionmgr::restrictconnections() for half opens only if they never connected r=honzab
bug 758972 - make spdysession::verifystream() a DEBUG only operation r=honzab
bug 775508 http connection diagnostic half-open elapsed syn time incorrect r=honzab
bug 770331 - always try and negotiate HTTP Keep-Alive r=biesi

Aurora: Found regression between 20120725192259-20120727000417
Beta:   Found regression between 20120728074337-20120729162537

Unfortunately the pushlog ranges for Beta and Aurora don't include these bugs. I'm not sure whats up with that.
Comment 3 Patrick McManus [:mcmanus] 2012-08-06 13:35:59 PDT
Created attachment 649398 [details] [diff] [review]
test.0

test case to illustrate the problem, and verify the fix
Comment 4 Patrick McManus [:mcmanus] 2012-08-06 13:42:27 PDT
Created attachment 649402 [details] [diff] [review]
fix part 1 v0

762162 triggered a latent bug. in 762126 we changed the invocation of ProcessPendingQForEntry in the dtor of nsHalfSocket() to happen more often.

This bug shows that there was always an infinite loop with that logic, we just weren't seeing it before. The ProcessPendingQ logic should not be done on the same stack freeing the nsHalfOpen socket, instead it should go through the PostEvent interface to avoid messing with a pending vector that might be currently being iterated.

It turns out there is more to be addressed here, that will be in the next patch.
Comment 5 Patrick McManus [:mcmanus] 2012-08-06 13:52:07 PDT
Created attachment 649405 [details] [diff] [review]
ifx part 2 v0

A transaction that synchronously fails CreateTransport() will have the transaction::close() method called (which will generate onStopRequest), but will not be deleted from the pending queue.. this both gums up the scheduling works (constantly trying to dispatch that item) and is essentially a memory leak as the transaction entry holds a reference.

a reference to an invalid port is one such use case.
Comment 6 Patrick McManus [:mcmanus] 2012-08-06 14:03:44 PDT
I think I'm going to suggest a backout from beta of the code that changed the halfopen dtor handling, and then a promotion of these fixes to aurora.
Comment 7 Patrick McManus [:mcmanus] 2012-08-06 19:11:28 PDT
Created attachment 649522 [details] [diff] [review]
beta (ff15) backout of 762162 v0

This is a straight backout of 762162 from beta (ff15).

While I really want to see 762162 fixed, given the non-trivial complexity of the necessary fix I'm going to split the baby and suggest we should back it out of beta-15 and validate the followon fix on aurora-16 (once it gets reviewed and merged into m-c).

I say that based on the relatively few complaints I've heard about 762162, but if drivers feel that is more of a problem than I am representing then the main patches in this bug should be approved for beta as followon fixes to 762162.
Comment 8 Honza Bambas (:mayhemer) 2012-08-07 14:49:44 PDT
Comment on attachment 649398 [details] [diff] [review]
test.0

Review of attachment 649398 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/test/unit/test_invalidport.js
@@ +16,5 @@
> +    do_throw("Should not get any data!");
> +  },
> +
> +  onStopRequest: function test_onStopR(request, ctx, status) {
> +    if (counter++ == 10)

Nicer would be to have this as a const at the top of the test.

Is 10 enough?  Maybe on your machine is, what about other platforms?

@@ +25,5 @@
> +};
> +
> +function run_test() {
> +  do_test_pending();
> +  execute_test();

Just swap the two lines please (do_test_pending() last).
Comment 9 Lukas Blakk [:lsblakk] use ?needinfo 2012-08-09 10:14:04 PDT
Comment on attachment 649522 [details] [diff] [review]
beta (ff15) backout of 762162 v0

approving backout, and yes - let's get the follow-on fixes on Aurora.
Comment 10 Honza Bambas (:mayhemer) 2012-08-09 12:58:49 PDT
Comment on attachment 649405 [details] [diff] [review]
ifx part 2 v0

Review of attachment 649405 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab
Comment 11 Patrick McManus [:mcmanus] 2012-08-09 14:34:44 PDT
backout of from beta-15
https://hg.mozilla.org/releases/mozilla-beta/rev/ac27ec3a105c
Comment 14 Patrick McManus [:mcmanus] 2012-08-10 07:19:38 PDT
Comment on attachment 649402 [details] [diff] [review]
fix part 1 v0

in 3 parts. followon fixes for aurora as per comments 7 and 9.. they are now on m-c. 1 line of this covers a regression from 762162 (part 1), but the rest addresses a long standing deadlock bug illustrated by the STR (part 2). Test case included. (part 3)
Comment 15 Patrick McManus [:mcmanus] 2012-08-10 07:20:08 PDT
Comment on attachment 649405 [details] [diff] [review]
ifx part 2 v0

in 3 parts. followon fixes for aurora as per comments 7 and 9.. they are now on m-c. 1 line of this covers a regression from 762162 (part 1), but the rest addresses a long standing deadlock bug illustrated by the STR (part 2). Test case included. (part 3)
Comment 16 Patrick McManus [:mcmanus] 2012-08-10 07:20:40 PDT
Comment on attachment 649398 [details] [diff] [review]
test.0

in 3 parts. followon fixes for aurora as per comments 7 and 9.. they are now on m-c. 1 line of this covers a regression from 762162 (part 1), but the rest addresses a long standing deadlock bug illustrated by the STR (part 2). Test case included. (part 3)
Comment 18 Simona B [:simonab ] 2012-08-17 05:21:51 PDT
Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20100101 Firefox/15.0
Mozilla/5.0 (Windows NT 5.1; rv:15.0) Gecko/20100101 Firefox/15.0

Verified that Firefox 15 beta 5 does not crash on Windows 7 and on Win XP when following the steps to reproduce from the Description.

Checked the crash stats on Socorro for all the 4 signatures and I only found one crash report after the bug was fixed that has a different crash stack from the ones listed in this bug.
https://crash-stats.mozilla.com/report/index/0b392988-9c07-4ee4-b1d9-1eb172120815

Setting tracking flag for Firefox 15 to verified.

Note You need to log in before you can comment on or make changes to this bug.