Intermittent invalid certificate error prompt in security tests causing timeouts

VERIFIED FIXED in mozilla2.0b9

Status

()

Core
Networking: HTTP
VERIFIED FIXED
7 years ago
6 years ago

People

(Reporter: aaronmt, Assigned: mayhemer)

Tracking

Trunk
mozilla2.0b9
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 betaN+)

Details

(Whiteboard: [mozmill], softblocker)

Attachments

(5 attachments, 10 obsolete attachments)

31.92 KB, image/png
Details
33.55 KB, image/png
Details
18.53 KB, patch
Details | Diff | Splinter Review
9.96 KB, patch
mcmanus
: review+
Details | Diff | Splinter Review
19.97 KB, patch
mayhemer
: review+
Details | Diff | Splinter Review
(Reporter)

Description

7 years ago
Created attachment 492336 [details]
Certificate error prompt (screenshot)

Currently, when running the testSecurity suite a select number of tests are timing out due to an intermittent invalid certificate prompt. The certificate error prompt dictates: 

"mozilla.org:443 uses an invalid security certificate. The certificate is only valid for *.mozilla.org

(Error code: ssl_error_bad_cert_domain)"

As well, the likes of the same issues appears for mur.at:443, "The certificate is only valid for secure.mur.at"

[1] http://mxr.mozilla.org/mozilla-central/source/security/manager/pki/resources/content/certerror.xul

I have seen the issue intermittently in the following tests:

* testEncryptedPageWarning.js
* testSecurityNotification.js
* testSSLDisabledErrorPage.js
* testSubmitUnencryptedInfoWarning.js
* testUnknownIssuer.js

See screenshot for the certificate error prompt
When did that start? Which builds are affected? I miss a couple of information in this report. Even a link to the brasstacks top results.
(Reporter)

Comment 2

7 years ago
Assume it started recently, although not seeing any published results. Seeing this on default only. I'm letting it run locally right now, happens intermittently as indicated.
(Reporter)

Comment 3

7 years ago
Created attachment 492357 [details]
Certificate error prompt (mur.at) (screenshot)
So it has been started recently on your own box and qa-horus doesn't fail here?
(Reporter)

Comment 5

7 years ago
Yes, I just reproduced this on qa-horus using the latest nightly, running the testSSLDisabledErrorPage repeatedly.
(Reporter)

Comment 6

7 years ago
Able to reproduce with prompts.tab_modal.enabled, disabled as well.
Please try to find out if this has been regressed lately. I have never seen that behavior in the last couple of months.
(Reporter)

Comment 8

7 years ago
Started happening with the November 21st nightly on mozilla-central

November 20st (Good) - baa6cc2f72e4
November 21st (Bad) -  be4b69a4babf

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=baa6cc2f72e4&tochange=be4b69a4babf
(In reply to comment #8)
> http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=baa6cc2f72e4&tochange=be4b69a4babf

I wonder if bug 592284 comes into play here.
(Reporter)

Comment 10

7 years ago
That is indeed the culprit. Changing the preference: network.http.connection-retry-timeout, to a value less than the default 250 will promote the dialog to appear. Let's get Patrick CC'ed in here to see what we can do.
(Reporter)

Updated

7 years ago
Depends on: 592284
(Assignee)

Comment 11

7 years ago
We apparently fail EnsureDocShellStuffKnown.  Link to security info may be lost.  I'll look into this.
Moving to the hopefully correct component.
Component: Mozmill Tests → Security: UI
Product: Mozilla QA → Core
QA Contact: mozmill-tests → ui
Whiteboard: [mozmill]
Version: unspecified → Trunk
(In reply to comment #11)
> We apparently fail EnsureDocShellStuffKnown.  Link to security info may be
> lost.  I'll look into this.

Thanks - I'm out of town and might not be able to look at this until monday.. 

the answer is likely as simple as not doing the reclaimconnection() bit in ReleaseBackupTransport() for TLS connections. (it will still get cleaned up, just not put in the reuse pool).
(Assignee)

Updated

7 years ago
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Would the reporter try this build:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mcmanus@ducksong.com-2be20026e175/

That build removes just the reclaim logic associated with surplus extra connections. If that does successfully isolate the problem we can build something a little more nuanced for merge.
(In reply to comment #0)
> Created attachment 492336 [details]
> Certificate error prompt (screenshot)
> 
> Currently, when running the testSecurity suite a select number of tests are
> timing out due to an intermittent invalid certificate prompt. The certificate
> error prompt dictates: 
> 

Hi, Can you provide some information on how to run the testSecurity suite? I'd like to confirm the problem and potential solution before wasting everyone's time with a blind patch, if possible.

I'm sorry if the answer should be obvious..
(Reporter)

Comment 16

7 years ago
Hey Patrick,

The simplest way to run the suite is to download the Mozmill add-on here https://addons.mozilla.org/en-US/firefox/addon/9018/ and checkout/clone the latest version of our Mozmill tests at http://hg.mozilla.org/qa/mozmill-tests.

In Mozmill, you can click Actions button, and click Run Directory and select testSecurity under firefox.

I will also try that above try build right now.
(Reporter)

Comment 17

7 years ago
(In reply to comment #14)
> Would the reporter try this build:
> 
> http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mcmanus@ducksong.com-2be20026e175/
> 
> That build removes just the reclaim logic associated with surplus extra
> connections. If that does successfully isolate the problem we can build
> something a little more nuanced for merge.

Tried out this build with a few test runs while setting the preference, network.http.connection-retry-timeout, to about 100. Results are indicating that your patched build resolves the issue.
(In reply to comment #16)
> Hey Patrick,
> 
> The simplest way to run the suite is to download the Mozmill add-on here

thanks for that quick info. I can reproduce now. Mozmill is pretty neat.

I figured it was just a matter of setting the security callbacks on the cloned connection that is reclaimed, but that doesn't do the trick. I'll keep looking and update the bug accordingly.
Created attachment 493785 [details] [diff] [review]
Fix security info patch (v1)

# HG changeset patch
# Parent 0e6b8ae8eb21087ff8e8adb64fb1a6621081ecde
Bug 613977 - Intermittent invalid certificate error prompt
Bug 614677 - Connection is reset message appears intermittently
Bug 614950 - Connections stall occasionally after 592284 landed

A couple of follow-on changes to 592284 rolled together to prevent
diff conflicts

1] hold a hard reference to the http transaction in nsHttpConnection
to accomodate interrogation of security callbacks after the
identification of the network EOF. (613977)

2] Set the securitycallback information for unused speculative
connections in the connection manager to be the new cloned connection
rather than the one they originated on. (613977)

3] When adding unused speculative connections to the connection
manager, due so with a short timeout (<= 5 seconds) as some servers
get grumpy if they haven't seen a request in that time. Most will
close the connection, but some will just sit there quietly and RST
things when the connection is used - so if you don't use the
connection quickly don't use it at all. This is probably a L4 load
balancer issue, actually. Mozillazine illustrates the
problem. Connections are made in bursts anyhow, so the reuse optimization is
likely still quite useful. (614677 and 614950)

4] mark every connection in the connection manager persistent
conneciton pool as "reused". This allows the transaction to be
restarted if a RST is recvd upon sending the requests (see #2) - with
the conservative timeout this is now a rare event, but still possible
so recovery is the right thing to do. (614677 and 614950)
Attachment #493785 - Flags: review?(honzab.moz)
Blocks: 614950
Blocks: 614677
a try build with the included patch for verification:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mcmanus@ducksong.com-8420011e4738/
(Reporter)

Comment 21

7 years ago
(In reply to comment #20)
> a try build with the included patch for verification:
> 
> http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mcmanus@ducksong.com-8420011e4738/

Build above work's great. I am not encountering the dialog.
I have to agree with Aaron, running good here over past two hours.  In fact, it feels smoother and faster than the try build from bug 614950

Comment 23

7 years ago
No connection resets, so far, with this build but I am still seeing connection stalls. It's not site specific and is difficult to reproduce for any given site.

Comment 24

7 years ago
Hold that! Having now run with the build for several hours, as stated before, I'm not seeing ant connection reset messages, I am, however, getting a great many server not found messages. 

I'm running 3.6.14pre along side minefield and I've not had a single issue...

Comment 25

7 years ago
(In reply to comment #20)
> a try build with the included patch for verification:
> 
> http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mcmanus@ducksong.com-8420011e4738/

Unfortunately, for me, problems are still there even with this tryserver build :-(
(In reply to comment #25)
> (In reply to comment #20)
> > a try build with the included patch for verification:
> > 
> > http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mcmanus@ducksong.com-8420011e4738/
> 
> Unfortunately, for me, problems are still there even with this tryserver build
> :-(

Can you be clear on what problem you are seeing. Are you seeing "intermittent invalid certificate error prompt in security tests"?
(Reporter)

Comment 27

7 years ago
Steps to reproduce are fairly trivial, without the need to run some tests

1. Set the value of network.http.connection-retry-timeout, to a value of 100 or less.
2. Visit the website https://mozilla.org
3. If you don't see the dialog, keep holding refresh until you do.

Comment 28

7 years ago
(In reply to comment #26)
> (In reply to comment #25)
> > (In reply to comment #20)
> > > a try build with the included patch for verification:
> > > 
> > > http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mcmanus@ducksong.com-8420011e4738/
> > 
> > Unfortunately, for me, problems are still there even with this tryserver build
> > :-(
> 
> Can you be clear on what problem you are seeing. Are you seeing "intermittent
> invalid certificate error prompt in security tests"?

Indeed, it's not the same problem, sorry I did a mistake by posting here, I thought we were speaking about connection reset/timeout problem since this try build was posted in my topic about "Connecting..."/"Connection Reset"... and similar problems : (http://forums.mozillazine.org/viewtopic.php?f=23&t=2043587&start=0)

To be short : I don't experiment this bug and I don't speak about the same problem

Sorry again
(In reply to comment #27)
> Steps to reproduce are fairly trivial, without the need to run some tests
> 
> 1. Set the value of network.http.connection-retry-timeout, to a value of 100 or
> less.
> 2. Visit the website https://mozilla.org
> 3. If you don't see the dialog, keep holding refresh until you do.

Hi Aaron, I just want to be clear that in c27 you say you're seeing the old dialog problem again even with the patch-try-build from c20? Or you just giving a new STR for the old problem? (i.e. I'm trying to figure out if you're still experiencing a security problem, or if we are down to zouk/grimwalds server not found issues)

I cannot reproduce the security issue when using this patch using the STR in 27.
(Reporter)

Comment 30

7 years ago
Patrick, I was just listing simpler STR rather than the running the tests as you listed in comment #26.

 With the build in comment #20, I am unable to reproduce the issue.
Created attachment 494794 [details] [diff] [review]
Fix security info and connection stalls patch v2

changed to: obtain an nshttpconnection object from the connection manager,
subject to the max connection constraints, at the same time as
starting the backup conneciton. If we defer that until recycling time
the exceeded limits of the SocketService can cause problems for other
connections.

as an aside, our global limit of 50 is way too low for almost every system and makes this kind of bug too easy to hit (it of course is still a bug). 

Other code not using the httpconnectionmgr but using the nsSocketService would also trigger this.. I bet FTP falls into that category.

I have started a try server build, when it is done they will be here:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mcmanus@ducksong.com-0e1fc09e5195
Attachment #493785 - Attachment is obsolete: true
Attachment #494794 - Flags: review?(honzab.moz)
Attachment #493785 - Flags: review?(honzab.moz)
blocking2.0: --- → ?
(Reporter)

Comment 32

7 years ago
(In reply to comment #31)
> http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mcmanus@ducksong.com-0e1fc09e5195

Unable to reproduce the certificate prompts on this build as well. Works for me.
Blocks: 603503

Updated

7 years ago
Blocks: 592284
No longer blocks: 614950
No longer depends on: 592284
(Assignee)

Comment 33

7 years ago
Patrick, can you exaplain please, how is it possible that not holding the reference to the transaction did work before your patch for accelarion has been introduced?
(Assignee)

Comment 34

7 years ago
Comment on attachment 494794 [details] [diff] [review]
Fix security info and connection stalls patch v2

> nsresult
> nsHttpConnection::ReleaseBackupTransport(nsISocketTransport *sock,
>+    mBackupConnection->mSocketTransport = sock;
>+    mBackupConnection->mSocketTransport->SetEventSink(mBackupConnection, nsnull);
>+    mBackupConnection->mSocketTransport->SetSecurityCallbacks(mBackupConnection);
>+    mBackupConnection->mSocketOut = outs;
>+    mBackupConnection->mSocketIn = ins;

Yes, this is what I missed during the review.  Could we move these exact lines to a method of nsHttpConnection?  Let's call it AssignTransport or alike.

I can give r+ after I get answer to comment 33.
(In reply to comment #33)
> Patrick, can you exaplain please, how is it possible that not holding the
> reference to the transaction did work before your patch for accelarion has been
> introduced?

It's a little sketchy for me too - but this is what I understand:

there is a call to getinterface() that fails because mTransaction has been closed, which results in the prompt at the heart of this bug. mTransaction is closed via CloseTransaction because we have seen BASE_STREAM_CLOSED.

the getInterface() which has this problem is proxied from nsNSSSocketInfo::EnsureDocShellDependentStuffKnown which appears up the stack from an SSL HandshakeCallback().

That doesn't happen without the parallel syn patch. I think this is the sequence of events:

First the idle timer goes off and we initiate 2 parallel connections. SSL in this case. Each has the same connection object as the security callback info. One of those completes successfully, goes through the handhakecallback() just fine, and reads the connection until BASE_STREAM_CLOSED without incident - when that happens mTransaction is set to nsnull in httpconnection.

But before the connection and transaction object are destroyed (while the dom stuff is running if I understand that) the other TCP connection completes and goes into the ssl HandshakeCallback() code, which calls into the same connection object looking for the security context - this is why we were seeing GetInterface() with the transaction set to nsnull. 

This is the right transaction to be calling that on, even if it is for the second time - so keeping the reference around makes sense to me.  And we want to give it as much time as possible to complete, as opposed to canceling the outstanding socket connection at the time of CloseTransaction, because a successful handshake has value to us in the persistent connection pool.

does that make sense?
Created attachment 495633 [details] [diff] [review]
494794: Fix security info and connection stalls patch v3

address comment 34, and a whitespace change for column width
Attachment #494794 - Attachment is obsolete: true
Attachment #495633 - Flags: review?(honzab.moz)
Attachment #494794 - Flags: review?(honzab.moz)
(Assignee)

Comment 37

7 years ago
Comment on attachment 495633 [details] [diff] [review]
494794: Fix security info and connection stalls patch v3

I can still reproduce the cert error dialog with:
- changeset 63d439971e7e
- windows debug build
- applied this patch
- connect a localhost server with enabled SSL in a way to make the cert invalid (e.g to 127.0.0.1 or something like this)
- hold F5 to constantly refresh
- almost immediatelly I get the cert dialog
- this is not a problem with the transaction reference IMO but with race condition between socket detach and call to EnsureDocShellDependentStuffKnown: mCallbacks may have already been released from nsNSSSocketInfo that time

This patch and even turning off the parallel connections don't help to fix it.

For now I'm removing the review request and going to investigate the race.
Attachment #495633 - Flags: review?(honzab.moz)
(Reporter)

Comment 38

7 years ago
Are there any recent try server builds that I can test?
(In reply to comment #37)
> Comment on attachment 495633 [details] [diff] [review]
> 494794: Fix security info and connection stalls patch v3

> For now I'm removing the review request and going to investigate the race.

ok, thanks. If it looks like it might go past the b8 window we should probably consider a patch with just the mtransaction bits removed as that should still address the resets and stalls people are seeing in 614677.

fwiw using your STR I can make it happen too, using a linux64 debug build... I tested 3.6 as well just in case it was a long standing issue, but I cannot produce there.
(Assignee)

Comment 40

7 years ago
I was also able to reproduce with Fx4b7 (the same machine/same local server) ones.

Patrick, please update your patch ones again:
- remove the bits with mTransactionReference from your patch, it may lead to leaks and we apparently don't need it
- the new method (AssignTransport) should be bacwards: you should call mBackupConnection->AssignTransport(trans, sin, sout); the way you did it in the last patch is just another "release transport" kind of method
- this method will actually duplicate some bits from CreateTransport method
(Assignee)

Comment 41

7 years ago
I am also able to reproduce with Fx3.6.12
(Assignee)

Updated

7 years ago
Assignee: honzab.moz → mcmanus
Component: Security: UI → Networking: HTTP
QA Contact: ui → networking.http
(In reply to comment #40)

> - this method will actually duplicate some bits from CreateTransport method

can you rephrase this comment? I'm not sure what to do with it..
Created attachment 495895 [details] [diff] [review]
connection stalls, resets, and some missing setsecuritycallbacks v4
Attachment #495633 - Attachment is obsolete: true
Attachment #495895 - Flags: review?(honzab.moz)
(Assignee)

Comment 44

7 years ago
Comment on attachment 495895 [details] [diff] [review]
connection stalls, resets, and some missing setsecuritycallbacks v4

>@@ -149,25 +154,30 @@ nsHttpConnection::IdleSynTimeout(nsITime
>+        gHttpHandler->ConnMgr()->GetConnection(self->mConnInfo,
>+                                               self->mSocketCaps,
>+                                               getter_AddRefs(self->mBackupConnection));

80 lines please (probably wrap between getter_AddRefs( and self->mBackupConnection)

>+nsHttpConnection::AssignTransport(nsISocketTransport *sock,
>+                                  nsIAsyncOutputStream *outs,
>+                                  nsIAsyncInputStream *ins)
>+{
>+    mSocketTransport = sock;
>+    mSocketTransport->SetEventSink(this, nsnull);
>+    mSocketTransport->SetSecurityCallbacks(this);
>+    mSocketOut = outs;
>+    mSocketIn = ins;
>+}

That's it, but please:
- check for retvals of SetEventSink and SetSecurityCallbacks
- assign mSocketTransport after those checks passed

> nsHttpConnection::ReleaseBackupTransport(nsISocketTransport *sock,

>+    mBackupConnection->mIsReused = PR_TRUE;

So, the reason to set this is because the server leaves a connection to linger but when we write to it (we actually send the request) the server closes the connection and we get failure while we read or we get exception on the socket?  If so, then I have to agree with setting this flag.  Otherwise, if we detect the connection close synchronously during write (but according to comment 19, para 3 it is not the case, right?), there is no need to do it.


It is important to emphesize that setting this flag may lead to resends of a request and therefor e.g. repeated purchases or other types of POSTs.  Would be good to think of some more sophisticated decision about a 'stall' connection and its reuse here.

>+    mBackupConnection->AssignTransport(sock, outs, ins);

Check result here and only output a warning on failure.  Loosing a backup connection because it wasn't able to accept a transport is not a fatal failure to break the existing connection.

>+    nsresult rv = gHttpHandler->ReclaimConnection(mBackupConnection);
>+    mBackupConnection = nsnull;
>     return rv;

Also please only report a warning (NS_WARNING) here.

In general, I don't see a reason why ReleaseBackupTransport should ever return a failure now, actually, in this form it could be a void method.  It just always has to release the backup connection.

>diff --git a/netwerk/protocol/http/nsHttpConnection.h b/netwerk/protocol/http/nsHttpConnection.h
>+    void         AssignTransport(nsISocketTransport *sock,
>+                                 nsIAsyncOutputStream *outs,
>+                                 nsIAsyncInputStream *ins);

Please put declaration and definition of AssignTransport close to CreateTransport methods.  It is related more to them then to SelectPrimaryTransport.

>diff --git a/netwerk/protocol/http/nsHttpConnectionMgr.cpp b/netwerk/protocol/http/nsHttpConnectionMgr.cpp
> nsHttpConnectionMgr::AtActiveConnectionLimit(nsConnectionEntry *ent, PRUint8 caps)
>+    // If we have more active connections than the global limit, then we're done --

80 lines please.


Please just confirm my question about mIsReused and otherwise the patch is ok.
(In reply to comment #44)
> Comment on attachment 495895 [details] [diff] [review]
> connection stalls, resets, and some missing setsecuritycallbacks v4
> 
> >+    mBackupConnection->mIsReused = PR_TRUE;
> 
> So, the reason to set this is because the server leaves a connection to linger
> but when we write to it (we actually send the request) the server closes the
> connection and we get failure while we read or we get exception on the socket? 

yes. it is worth noting that under normal circumstances when a server times us out it sends a FIN, which we note before writing to the connection and we never send the request down that connection (we just move onto a different one). The reuse bit is necessary either when there is a race condition (server sends the fin after we send the request but before the server receives it), or in the more common case when some other NAT-like device has timed out its mapping.. a content switch or firewall will do this without sending anything to firefox until our request is received, at which point it generates the RST.

> 
> Please just confirm my question about mIsReused and otherwise the patch is ok.

great, I'll update the patch with your r+. Thanks.

will the patch be ok for b8? Should I set checkin-needed?
Created attachment 496054 [details] [diff] [review]
connection stalls, resets, and some missing setsecuritycallbacks v5

addresses review comments in comment 44
Attachment #495895 - Attachment is obsolete: true
Attachment #495895 - Flags: review?(honzab.moz)
Attachment #496054 - Flags: review+
Attachment #496054 - Flags: approval2.0?
(Assignee)

Comment 47

7 years ago
(In reply to comment #45)
> will the patch be ok for b8? Should I set checkin-needed?

We should get this to b8 IMO.  First get b+ or a+ then set checkin-needed.  If I get to this then I can land it for you.
(In reply to comment #41)
> I am also able to reproduce with Fx3.6.12

Honza filed that old bug here:
https://bugzilla.mozilla.org/show_bug.cgi?id=617381

Aaron, that's the root cause of the security prompt issue - not a regression but certainly easier to see now.

the patch in here should still be committed as it repairs a real problem with setting the security callbacks on the extra connection, and solves the connection reset/hang issue.
Keywords: regression
Can someone help drivers understand if this is must-need for beta8 vs something that can wait for beta9? I'm a little nervous taking unbaked network changes as we're trying to close down for building beta8.
blocking2.0: ? → beta9+
Based on what Patrick's telling me in email, I think this can wait until after beta 8 - please check it in sometime next week.
Comment on attachment 496054 [details] [diff] [review]
connection stalls, resets, and some missing setsecuritycallbacks v5

removing approval request as this is a blocker
Attachment #496054 - Flags: approval2.0?

Comment 52

7 years ago
(In reply to comment #50)
> Based on what Patrick's telling me in email, I think this can wait until after
> beta 8 - please check it in sometime next week.

Given that this is responsible for bug 614677, I suppose the beta 8 user experience is of little concern?
I'm going to make a b8 patch that disables the underlying feature by default. We can turn it back on when the fix-patch here gets added to the pre-b9 nightlies.
(In reply to comment #53)
> I'm going to make a b8 patch that disables the underlying feature by default.
> We can turn it back on when the fix-patch here gets added to the pre-b9
> nightlies.

That pref patch is in 614677 (which is really the regression it works around) so this bug can stay B9.
Created attachment 496813 [details] [diff] [review]
connection stalls, resets, and some missing setsecuritycallbacks v6 [Check in comment 64]

update the patch to also re-enable the retry timeout preference for beta 9. Not to be commited until beta 8 is tagged for release.
Attachment #496054 - Attachment is obsolete: true
Comment on attachment 496813 [details] [diff] [review]
connection stalls, resets, and some missing setsecuritycallbacks v6 [Check in comment 64]

So is this +r and ready to check in, or needs another review?
it is r+ for beta 9..

did beta8 get tagged today?
(In reply to comment #57)
> it is r+ for beta 9..
> 
> did beta8 get tagged today?

I guess I should say a+ for beta 9.
(Reporter)

Comment 59

7 years ago
Patrick,

Is there, or can you provide a recent try-server build with your latest patches that I can use to confirm that the issue is fixed during our run of Mozmill security tests?

> Patrick,
> 
> Is there, or can you provide a recent try-server build with your latest patches
> that I can use to confirm that the issue is fixed during our run of Mozmill
> security tests?

As mentioned in comment 48, the root cause of the cert prompt you are seeing in mozmill is not a new bug. It exists in 3.6 even if mozmill doesn't show it there - 

https://bugzilla.mozilla.org/show_bug.cgi?id=617381

the change to 4.0 made it more likely to appear, but the race remained so honza had me remove the code that reduced the race. So the patch will not help much mozmill. It will help for some (I suspect small) subset of cases where the securityinfo was not properly assigned.

The patch will help with the connection resets, hangs, etc as well.
Patrick: is this ready to land?
(In reply to comment #61)
> Patrick: is this ready to land?

yes, as soon as tree opens up for beta9 commits.
honza, I'm concerned that closing this with the current patch while leaving 617381 open is going to feel like a regression even though the underlying bug is long standing.

thoughts?
connection stalls, resets, and some missing setsecuritycallbacks v6  pushed as e465d98c56c2

leaving bug open for a few more minutes to see if we cannot make the mozmill situation better even though that's technically an old problem. (617381)
Created attachment 498116 [details] [diff] [review]
dont-reclaim-secondary-ssl v1

Honza,

would you consider this patch which cancels the extra outstanding SSL request when one of them succeeds? That means we don't get the benefit of "reusing" the unused connection, which is a shame given SSL is expensive to setup, but we still do the early retry which is the main goal for the feature. 

It's basically just a workaround until 617381 is fixed - it should bring us back down to the same amount of pain that was visible from it before syn retry.

I have confirmed locally that it makes the mozmill security tests run ok and submitted a minimal try run.
Attachment #498116 - Flags: review?(honzab.moz)
Aaron, et al, there is a try build ongoing at:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mcmanus@ducksong.com-bf177d48a2fb/

it will have the patch from comment 65 in it. Can you confirm it makes mozmill happier?
(Reporter)

Comment 67

7 years ago
(In reply to comment #66)
> Can you confirm it makes mozmill happier?

Tested this build and am unable to reproduce the intermittent prompts in our security suite of tests. I also tested by altering the connection retry timeout preference to a lesser value and was unable to reproduce. Looks good.
(In reply to comment #67)

> Tested this build and am unable to reproduce the intermittent prompts in our
> security suite of tests. I also tested by altering the connection retry timeout
> preference to a lesser value and was unable to reproduce. Looks good.


it also had a green test run through try server.
(Assignee)

Comment 69

7 years ago
Patrick, few questions:

- the patch that was referring the transaction (introduced mTransactionReferece), did it fix the mozmill security tests?
- did you ever try your self to analyze cause of why the pop-up dialog shows?  there might be a different cause for it then bug 617381.
(In reply to comment #69)
> Patrick, few questions:
> 
> - the patch that was referring the transaction (introduced
> mTransactionReferece), did it fix the mozmill security tests?

yes, it did.

> - did you ever try your self to analyze cause of why the pop-up dialog shows? 
> there might be a different cause for it then bug 617381.

I'm reasonably sure it is the same issue as per comment 35 - ensuredocshell calls into connection::getinterface after the :Close() has been called on the main connection. That's hard (but as you illustrated, not impossible) to make happen when you only have the one connection, but we are seeing it all the time with the backup connection establishing the TCP handshake several ticks after the primary one has gotten its full SSL data moving.
(Assignee)

Comment 71

7 years ago
For the record, the dialog is displayed because:

- we are connecting https://mozilla.org
- an initial connection + initial transport is created
- we time the syn idle interval out
- we ask the connection manager for the backup connection
- then the backup transport is created by the initial connection
- therefor the backup transport callbacks are assigned the initial connection
- and also the ssl socket callbacks are assigned the initial connection
- the initial transport (now assigned the main transport) finishes its job with for instance SSL_ERROR_BAD_CERT_DOMAIN -> the error is correctly reported by the error page
- then we immediately close and forget the transaction and throw away the backup connection (that has never been assigned a transport)
- when the backup transport gets to the point it has detected the bad cert as well, it calls to the initial connection's GetInterface
- mTransaction is gone at that moment -> we fail the GetInterface call and therefor nsNSSSocketInfo::GetExternalErrorReporting returns false -> no error page but rather the dialog


I see one major problem with all the system we have here:
- the backup transport is created with callbacks set to the initial connection
- this callback is during creation of the socket adopted by the ssl socket layer
- when we bind the backup socket with the backup connection, ssl socket layer will point to the initial connection and transaction -> this is bad

At the moment I'm not sure how much it is bad.

Looks like with the architecture we have we shouldn't even start the backup transport for ssl sockets.

I have to think.
(Assignee)

Comment 72

7 years ago
Simply said, the system doesn't count with a possibility that a socket transport ever changes its http connection.  The acceleration patch breaks that rule..
(In reply to comment #72)
> Simply said, the system doesn't count with a possibility that a socket
> transport ever changes its http connection.  The acceleration patch breaks that
> rule..

The patch in comment 65 does not break that assumption.

It craetes 2 sockets with the same HTTP connection and abandons 1 (or both) of them while still doing early syn retries. It however does not do recycling of extra connections.
Some really minor nits, that might help in thinking about this:

(In reply to comment #71)
> For the record, the dialog is displayed because:

> - the initial transport (now assigned the main transport) finishes its job with
> for instance SSL_ERROR_BAD_CERT_DOMAIN -> the error is correctly reported by
> the error page

It is not necessarily the initial transport. It could be either the first or second one to start, but it is the first one to finish setting up the connection that does this bit you describe. Again, it could be either socket (which is the whole point of early retry).

> - then we immediately close and forget the transaction and throw away the
> backup connection (that has never been assigned a transport)

The backup connection is not discarded when the primary is established, it continues to try and connect in the background.
(In reply to comment #72)
> Simply said, the system doesn't count with a possibility that a socket
> transport ever changes its http connection.  The acceleration patch breaks that
> rule..

as long as the callbacks are set, where is the problem? If the event sink and the security callbacks cannot be updated then why don't SetEventSink and SetSecurityCallbacks throw errors if they are changed? (I admit I'm not 100% positive they don't do that - but I don't think they do.)

It seems to me that the original nsHttpConnection object should just be prepared to answer GetInterface() via a valid transaction reference for those securitycallbacks as long as there is a socket that might handshake against it or an active transaction. That is exactly what it does in the pre-syn-retry world, its just that the scope of that reference needs to be extended now to cover the period with backup socket running after the transaction reference has been dropped.

The fact that the secondary connection does not have a transaction reference when it is assigned to the second nsHttpConnection object is not a big deal. All idle connection in the connectionmanager have null transaction references.

If that logic isn't convincing, the patch in comment 65 (which I guess needs some comment fixes) should still be acceptable. It will do the 2 connection attempts in parallel (so both sockets have the same nsHTTPConnection object) but upon one of them succeeding it closes the other one down - so nothing ever has its nsHttpConnection object changed and the extra socket is never reused. The lack of reuse is kind of a shame, but at least you still get the early syn retry behavior to deal with lossy networks.
(Assignee)

Comment 76

7 years ago
Created attachment 498617 [details] [diff] [review]
keep transaction, reset callbacks on re-activate

(In reply to comment #73)
> The patch in comment 65 does not break that assumption.
Fully agree.


(In reply to comment #74)
> The backup connection is not discarded when the primary is established, it
> continues to try and connect in the background.
Ah, yes, right.  But only until we finish the primary transport work.  After that, after we throw the transaction away, we never reach SelectPrimaryTransport method anymore.


Summary of problems I see here:
- we leave security info of the psm (ssl) nspr socket of the backup transport assigned to wrong callbacks (=wrong instance of nsHttpConnection)
- this means that it actually reaches a wrong transaction, if there even still is a transaction, when calling to those callbacks
- *one* of consequences of it is, if there is no transaction, we wrongly decide on the error reporting state (this bug)

Changing the code of nsSocketTransport to also re-set callbacks on the ssl control object of the nspr socket when it is set a new callbacks itself doesn't help.  We get OnOutputStreamReady before the certificate is examined.  We set the backup connection as callbacks that is not assigned a transaction at that moment.  When the transport detects a wrong certificate it will pop a dialog up anyway - GI fails.

So, this doesn't fix the problem of error reporting by the dialog.  If we hold the transaction referred after we close it, the only occasion we call on the callbacks is bad cert error.  If there is no cert/connection error, then the connection may freely linger for a while with incorrect callbacks assigned because until we use the connection there is not any call to the callbacks made (except a cert error).

With this, we have to (and the patch does):
- hold the transaction, as Patrick did in one of the earlier version of the patch
- set callbacks on the security info as well at the moment the connection gets activated
- allow backup transport be joint with the backup connection even after we throw the transaction away, otherwise we may loose a working connection
Attachment #498617 - Flags: review?(mcmanus)
(In reply to comment #76)

> So, this doesn't fix the problem of error reporting by the dialog.  If we hold
> the transaction referred after we close it, the only occasion we call on the
> callbacks is bad cert error.  If there is no cert/connection error, then the
> connection may freely linger for a while with incorrect callbacks assigned
> because until we use the connection there is not any call to the callbacks made
> (except a cert error).

I didn't understand this piece. Thank you.
Attachment #498617 - Flags: review?(mcmanus) → review+
Are we agreed that "keep transaction, reset callbacks on re-activate" is the right thing to do?
(Assignee)

Updated

7 years ago
Attachment #498116 - Flags: review?(honzab.moz)
(Assignee)

Comment 79

7 years ago
(In reply to comment #78)
> Are we agreed that "keep transaction, reset callbacks on re-activate" is the
> right thing to do?

Actually, the core fix in my patch is to keep the transaction referred to access callbacks later.  There is actually no other way to do that this late w/o large API changes.

To reset the callbacks is a fix of a problem unrelated to this bug, but actually "blocking" it.  We had to do that in the original version of the http acceleration patch, but I missed that during review.


As I'm thinking of it now, the true fix would be to completely change the acceleration code like this:
- put every transaction to the pending queue, and start a connection in the background
- server them to transactions asynchronously, as we already do in case of max connections number limit
- a connection would be assigned to the transaction on successful connect or after a failure
- this way we may remove some of your http acceleration changes in nsHttpConnection and simply establish two connections (after the 250ms timeout of course) in the background and serve the one that is established sooner to a pending transaction
- the second one may linger as a reused/idle connection for a while, as we do now, but in a much cleaner way, w/o assigning a transport, resetting callbacks, keeping on already closed transactions at least
- however, to prevent the cert error dialog we would have to change the code that decides on the way of error reporting (there is already an older bug on that, don't have a number by hand)
- based on how that code would be re-architecture ensure prevention of the dialog for firefox connections coming from a window (that will be assigned a transaction w/ a docshell)

All this would use a lot of discoveries and code you have already made, just would be much cleaner and prevent bugs like these (I'm also worried of other regressions).  Problem is that it seems to be too late to do these changes.
(In reply to comment #79)
> (In reply to comment #78)
> > Are we agreed that "keep transaction, reset callbacks on re-activate" is the
> > right thing to do?
> 
> Actually, the core fix in my patch is to keep the transaction referred to
> access callbacks later.  There is actually no other way to do that this late
> w/o large API changes.
> 

right, I was just referring to the name you gave the attachment. For FF 4 I think the right thing to do is to just commit that code to the trunk asap to maximize its exposure to nightlies before beta 9 and leave it be until 4.0 ships. I can't identify any actionable problems with it.

The better structure for this, long term, is probably to put it in the connection manager where the amount of parallelism, and conceivably even opportunistic prefetching, can be managed in a wider context (taking in consideration other connections to the host, etc..) and not invoke Activate() until there is a writable socket to attach it to. But when interest in this feature for FF 4 came about there had already been some alphas/betas shipped and I didn't want to propose anything involving that kind of restructuring - and I certainly don't want to do it now at beta 9 :)

We can feel good about shipping this (with your patch based on my patch) in ff 4 and then make it better for ff4+ .. imo check it in and close this bug and possibly file a follow on.
(Assignee)

Comment 81

7 years ago
(In reply to comment #80)
> The better structure for this, long term, is probably to put it in the
> connection manager where the amount of parallelism, and conceivably even
> opportunistic prefetching, can be managed in a wider context (taking in
> consideration other connections to the host, etc..) and not invoke Activate()
> until there is a writable socket to attach it to. But when interest in this
> feature for FF 4 came about there had already been some alphas/betas shipped
> and I didn't want to propose anything involving that kind of restructuring -
> and I certainly don't want to do it now at beta 9 :)

Exactly, fully agree!  It's good approach to use just a simple patch to verify the idea and then rework it to a better looking and working code.

> We can feel good about shipping this (with your patch based on my patch) in ff
> 4 and then make it better for ff4+ .. imo check it in and close this bug and
> possibly file a follow on.

I'll land it tomorrow morning of my time.
(Assignee)

Comment 82

7 years ago
Comment on attachment 496813 [details] [diff] [review]
connection stalls, resets, and some missing setsecuritycallbacks v6 [Check in comment 64]

Changeset URL for reference: http://hg.mozilla.org/mozilla-central/rev/e465d98c56c2
Attachment #496813 - Attachment description: connection stalls, resets, and some missing setsecuritycallbacks v6 → connection stalls, resets, and some missing setsecuritycallbacks v6 [Check in comment 64]
(Assignee)

Comment 83

7 years ago
Created attachment 499288 [details] [diff] [review]
keep transaction, reset callbacks on re-activate  [Check in as f397877da0dd][Backed out comment 85 (leaks/test failures)]

qref...
Attachment #498617 - Attachment is obsolete: true
Attachment #499288 - Flags: review?(mcmanus)
Attachment #499288 - Flags: review?(mcmanus) → review+
(Assignee)

Comment 84

7 years ago
Comment on attachment 499288 [details] [diff] [review]
keep transaction, reset callbacks on re-activate  [Check in as f397877da0dd][Backed out comment 85 (leaks/test failures)]

http://hg.mozilla.org/mozilla-central/rev/f397877da0dd
Attachment #499288 - Attachment description: keep transaction, reset callbacks on re-activate → keep transaction, reset callbacks on re-activate [Check in comment 84]
(Assignee)

Updated

7 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
(Assignee)

Updated

7 years ago
Attachment #499288 - Attachment description: keep transaction, reset callbacks on re-activate [Check in comment 84] → keep transaction, reset callbacks on re-activate [Backed out comment 85 (leaks/test failures)]
(Assignee)

Updated

7 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Updated

7 years ago
Assignee: mcmanus → honzab.moz
Status: REOPENED → ASSIGNED
a (the?) leak is caused because there is a circular relationship between nshttptranscation and nshttpconnection which is normally broken when nshttptransaction::Close() is called from nshttpconnection::closetransaction(). 

in the days before syn-accel the connection dropped its ref to the transaction there, and the transaction drops its reference to the connection unless there is a "sticky-retry" which is used for authorization retries. 

Now of course the connection maintains its ref through dtor and the transaction never dropped it on sticky-retry, so we get the circular reference and a leak.

I've never understood the sticky retry thing.. http is stateless - why can't we just let the connection manager choose a connection in the usual way? It shouldn't be an efficiency thing, the connection manager will favor persistent connections. Is it perhaps derived from some old broken website that demanded a get->401->get sequence on the same connection? I would feel odd breaking it at this point in beta 4.

hmm
(Assignee)

Comment 86

7 years ago
(In reply to comment #85)
At least what I know about sticking is NTLM authentication that needs 2 or 3 rounds to authenticate a *connection*.
(In reply to comment #86)
> (In reply to comment #85)
> At least what I know about sticking is NTLM authentication that needs 2 or 3
> rounds to authenticate a *connection*.

yes, forgot that.
Created attachment 499452 [details] [diff] [review]
keep callbacks, reset callbacks on re-activate

I'll include below the "interdiff" vs the one that was pushed earlier today.

Here are the conceptual changes:

* the most important change is to not keep a long lived reference to the transaction, but rather make that a long lived reference to the transaction's security callbacks. That's the only part of the transaction we use after closing it anyhow, and that avoids the circular reference. We can't release that on the network thread, so store its target and ns_ProxyRelease() it when appropriate. To minimize the proxy releasing, don't take the ref at all unless we create the backup connection after 250ms.

* remove the ResetCallbacksOnActivation flag - there is already a codepath for "first activation of a connection that already has an active socket" so just add it to that.

This makes the leak reports I was looking at from TBPL happy (specifically mochitest-5 TEST_PATH=toolkit/components/passwordmgr/test/) on my desktop, and additionally mozmill runs cleanly.

I will submit to try server and update the bug tomorrow.


::::

diff --git a/netwerk/protocol/http/nsAHttpTransaction.h b/netwerk/protocol/http/nsAHttpTransaction.h
--- a/netwerk/protocol/http/nsAHttpTransaction.h
+++ b/netwerk/protocol/http/nsAHttpTransaction.h
@@ -39,16 +39,17 @@
 #define nsAHttpTransaction_h__
 
 #include "nsISupports.h"
 
 class nsAHttpConnection;
 class nsAHttpSegmentReader;
 class nsAHttpSegmentWriter;
 class nsIInterfaceRequestor;
+class nsIEventTarget;
 
 //----------------------------------------------------------------------------
 // Abstract base class for a HTTP transaction:
 //
 // A transaction is a "sink" for the response data.  The connection pushes
 // data to the transaction by writing to it.  The transaction supports
 // WriteSegments and may refuse to accept data if its buffers are full (its
 // write function returns NS_BASE_STREAM_WOULD_BLOCK in this case).
@@ -57,17 +58,18 @@ class nsIInterfaceRequestor;
 class nsAHttpTransaction : public nsISupports
 {
 public:
     // called by the connection when it takes ownership of the transaction.
     virtual void SetConnection(nsAHttpConnection *) = 0;
 
     // called by the connection to get security callbacks to set on the 
     // socket transport.
-    virtual void GetSecurityCallbacks(nsIInterfaceRequestor **) = 0;
+    virtual void GetSecurityCallbacks(nsIInterfaceRequestor **,
+                                      nsIEventTarget **) = 0;
 
     // called to report socket status (see nsITransportEventSink)
     virtual void OnTransportStatus(nsresult status, PRUint64 progress) = 0;
 
     // called to check the transaction status.
     virtual PRBool   IsDone() = 0;
     virtual nsresult Status() = 0;
 
@@ -83,17 +85,18 @@ public:
                                    PRUint32 count, PRUint32 *countWritten) = 0;
 
     // called to close the transaction
     virtual void Close(nsresult reason) = 0;
 };
 
 #define NS_DECL_NSAHTTPTRANSACTION \
     void SetConnection(nsAHttpConnection *); \
-    void GetSecurityCallbacks(nsIInterfaceRequestor **); \
+    void GetSecurityCallbacks(nsIInterfaceRequestor **, \
+                              nsIEventTarget **);       \
     void OnTransportStatus(nsresult status, PRUint64 progress); \
     PRBool   IsDone(); \
     nsresult Status(); \
     PRUint32 Available(); \
     nsresult ReadSegments(nsAHttpSegmentReader *, PRUint32, PRUint32 *); \
     nsresult WriteSegments(nsAHttpSegmentWriter *, PRUint32, PRUint32 *); \
     void     Close(nsresult reason);
 
diff --git a/netwerk/protocol/http/nsHttpConnection.cpp b/netwerk/protocol/http/nsHttpConnection.cpp
--- a/netwerk/protocol/http/nsHttpConnection.cpp
+++ b/netwerk/protocol/http/nsHttpConnection.cpp
@@ -46,16 +46,17 @@
 #include "nsISocketTransportService.h"
 #include "nsISocketTransport.h"
 #include "nsIServiceManager.h"
 #include "nsISSLSocketControl.h"
 #include "nsStringStream.h"
 #include "netCore.h"
 #include "nsNetCID.h"
 #include "nsAutoLock.h"
+#include "nsProxyRelease.h"
 #include "prmem.h"
 
 #ifdef DEBUG
 // defined by the socket transport service while active
 extern PRThread *gSocketThread;
 #endif
 
 static NS_DEFINE_CID(kSocketTransportServiceCID, NS_SOCKETTRANSPORTSERVICE_CID);
@@ -81,17 +82,16 @@ nsHttpConnection::nsHttpConnection()
     , mLock(nsnull)
     , mLastReadTime(0)
     , mIdleTimeout(0)
     , mKeepAlive(PR_TRUE) // assume to keep-alive by default
     , mKeepAliveMask(PR_TRUE)
     , mSupportsPipelining(PR_FALSE) // assume low-grade server
     , mIsReused(PR_FALSE)
     , mCompletedSSLConnect(PR_FALSE)
-    , mResetCallbackOnActivation(PR_FALSE)
     , mActivationCount(0)
 {
     LOG(("Creating nsHttpConnection @%x\n", this));
 
     // grab a reference to the handler to ensure that it doesn't go away.
     nsHttpHandler *handler = gHttpHandler;
     NS_ADDREF(handler);
 }
@@ -105,16 +105,23 @@ nsHttpConnection::~nsHttpConnection()
         mIdleSynTimer = nsnull;
     }
 
     if (mBackupConnection) {
         gHttpHandler->ReclaimConnection(mBackupConnection);
         mBackupConnection = nsnull;
     }
     
+    if (mCallbacks)
+    {
+        nsIInterfaceRequestor *cbs = nsnull;
+        mCallbacks.swap(cbs);
+        NS_ProxyRelease(mCallbackTarget, cbs);
+    }
+
     NS_IF_RELEASE(mConnInfo);
 
     if (mLock) {
         PR_DestroyLock(mLock);
         mLock = nsnull;
     }
 
     // release our reference to the handler
@@ -167,16 +174,20 @@ nsHttpConnection::IdleSynTimeout(nsITime
             return;
         nsresult rv =
             self->CreateTransport(self->mSocketCaps,
                                   getter_AddRefs(self->mSocketTransport2),
                                   getter_AddRefs(self->mSocketIn2),
                                   getter_AddRefs(self->mSocketOut2));
         if (NS_SUCCEEDED(rv)) {
             sCreateTransport2++;
+            self->mTransaction->
+                GetSecurityCallbacks(
+                    getter_AddRefs(self->mCallbacks),
+                    getter_AddRefs(self->mCallbackTarget));
             self->mSocketOut2->AsyncWait(self, 0, 0, nsnull);
         }
     }
 
     return;
 }
 
 // called on the socket thread
@@ -188,29 +199,19 @@ nsHttpConnection::Activate(nsAHttpTransa
     NS_ABORT_IF_FALSE(PR_GetCurrentThread() == gSocketThread, "wrong thread");
     LOG(("nsHttpConnection::Activate [this=%x trans=%x caps=%x]\n",
          this, trans, caps));
 
     NS_ENSURE_ARG_POINTER(trans);
     NS_ENSURE_TRUE(!mTransaction, NS_ERROR_IN_PROGRESS);
 
     // take ownership of the transaction
-    mTransactionReference = trans;
     mTransaction = trans;
     mActivationCount++;
 
-    if (mResetCallbackOnActivation) {
-        mResetCallbackOnActivation = PR_FALSE;
-
-        rv = mSocketTransport->SetEventSink(this, nsnull);
-        NS_ENSURE_SUCCESS(rv, rv);
-        rv = mSocketTransport->SetSecurityCallbacks(this);
-        NS_ENSURE_SUCCESS(rv, rv);
-    }
-
     // set mKeepAlive according to what will be requested
     mKeepAliveMask = mKeepAlive = (caps & NS_HTTP_ALLOW_KEEPALIVE);
 
     // need to handle SSL proxy CONNECT if this is the first time.
     if (mConnInfo->UsingSSL() && mConnInfo->UsingHttpProxy() && !mCompletedSSLConnect) {
         rv = SetupSSLProxyConnect();
         if (NS_FAILED(rv))
             goto failed_activation;
@@ -223,27 +224,29 @@ nsHttpConnection::Activate(nsAHttpTransa
     else {
         NS_ABORT_IF_FALSE(mSocketOut && mSocketIn,
                           "Socket Transport and SocketOut mismatch");
         
         // If this is the first transaction on this connection, but
         // we already have a socket that means the socket was created
         // speculatively in the past, not used at that time, and
         // given to the connection manager.
-        if (mActivationCount == 1)
+        if (mActivationCount == 1) {
             sWastedReuseCount++;
-
+            rv = mSocketTransport->SetEventSink(this, nsnull);
+            NS_ENSURE_SUCCESS(rv, rv);
+            rv = mSocketTransport->SetSecurityCallbacks(this);
+            NS_ENSURE_SUCCESS(rv, rv);
+        }
         rv = mSocketOut->AsyncWait(this, 0, 0, nsnull);
     }
     
 failed_activation:
-    if (NS_FAILED(rv)) {
+    if (NS_FAILED(rv))
         mTransaction = nsnull;
-        mTransactionReference = nsnull;
-    }
     return rv;
 }
 
 void
 nsHttpConnection::Close(nsresult reason)
 {
     LOG(("nsHttpConnection::Close [this=%x reason=%x]\n", this, reason));
 
@@ -492,17 +495,17 @@ nsHttpConnection::OnHeadersAvailable(nsA
             NS_ASSERTION(NS_SUCCEEDED(rv), "mSocketOut->AsyncWait failed");
         }
         else {
             LOG(("SSL proxy CONNECT failed!\n"));
             // NOTE: this cast is valid since this connection cannot be
             // processing a transaction pipeline until after the first HTTP/1.1
             // response.
             nsHttpTransaction *trans =
-                    static_cast<nsHttpTransaction *>(mTransaction);
+                     static_cast<nsHttpTransaction *>(mTransaction.get());
             trans->SetSSLConnectFailed();
         }
     }
 
     return NS_OK;
 }
 
 void
@@ -654,18 +657,16 @@ nsHttpConnection::CreateTransport(PRUint
     return NS_OK;
 }
 
 nsresult
 nsHttpConnection::AssignTransport(nsISocketTransport *sock,
                                   nsIAsyncOutputStream *outs,
                                   nsIAsyncInputStream *ins)
 {
-    mResetCallbackOnActivation = PR_TRUE;
-
     mSocketTransport = sock;
     mSocketOut = outs;
     mSocketIn = ins;
     return NS_OK;
 }
 
 void
 nsHttpConnection::CloseTransaction(nsAHttpTransaction *trans, nsresult reason)
@@ -887,17 +888,18 @@ nsHttpConnection::SetupSSLProxyConnect()
     request.SetRequestURI(buf);
     request.SetHeader(nsHttp::User_Agent, gHttpHandler->UserAgent());
 
     // send this header for backwards compatibility.
     request.SetHeader(nsHttp::Proxy_Connection, NS_LITERAL_CSTRING("keep-alive"));
 
     // NOTE: this cast is valid since this connection cannot be processing a
     // transaction pipeline until after the first HTTP/1.1 response.
-    nsHttpTransaction *trans = static_cast<nsHttpTransaction *>(mTransaction);
+    nsHttpTransaction *trans =
+        static_cast<nsHttpTransaction *>(mTransaction.get());
     
     val = trans->RequestHead()->PeekHeader(nsHttp::Host);
     if (val) {
         // all HTTP/1.1 requests must include a Host header (even though it
         // may seem redundant in this case; see bug 82388).
         request.SetHeader(nsHttp::Host, nsDependentCString(val));
     }
 
@@ -1085,18 +1087,17 @@ nsHttpConnection::OnTransportStatus(nsIT
 NS_IMETHODIMP
 nsHttpConnection::GetInterface(const nsIID &iid, void **result)
 {
     // NOTE: This function is only called on the UI thread via sync proxy from
     //       the socket transport thread.  If that weren't the case, then we'd
     //       have to worry about the possibility of mTransaction going away
     //       part-way through this function call.  See CloseTransaction.
     NS_ASSERTION(PR_GetCurrentThread() != gSocketThread, "wrong thread");
- 
-    if (mTransactionReference) {
-        nsCOMPtr<nsIInterfaceRequestor> callbacks;
-        mTransactionReference->GetSecurityCallbacks(getter_AddRefs(callbacks));
-        if (callbacks)
-            return callbacks->GetInterface(iid, result);
-    }
+        
+    nsCOMPtr<nsIInterfaceRequestor> callbacks = mCallbacks;
+    if (!callbacks && mTransaction)
+        mTransaction->GetSecurityCallbacks(getter_AddRefs(callbacks), nsnull);
+    if (callbacks)
+        return callbacks->GetInterface(iid, result);
 
     return NS_ERROR_NO_INTERFACE;
 }
diff --git a/netwerk/protocol/http/nsHttpConnection.h b/netwerk/protocol/http/nsHttpConnection.h
--- a/netwerk/protocol/http/nsHttpConnection.h
+++ b/netwerk/protocol/http/nsHttpConnection.h
@@ -48,16 +48,17 @@
 #include "prlock.h"
 #include "nsAutoPtr.h"
 
 #include "nsIStreamListener.h"
 #include "nsISocketTransport.h"
 #include "nsIAsyncInputStream.h"
 #include "nsIAsyncOutputStream.h"
 #include "nsIInterfaceRequestor.h"
+#include "nsIEventTarget.h"
 #include "nsITimer.h"
 
 //-----------------------------------------------------------------------------
 // nsHttpConnection - represents a connection to a HTTP server (or proxy)
 //
 // NOTE: this objects lives on the socket thread only.  it should not be
 // accessed from any other thread.
 //-----------------------------------------------------------------------------
@@ -160,34 +161,39 @@ private:
     nsCOMPtr<nsIAsyncOutputStream>  mSocketOut;
 
     nsresult                        mSocketInCondition;
     nsresult                        mSocketOutCondition;
 
     nsCOMPtr<nsIInputStream>        mSSLProxyConnectStream;
     nsCOMPtr<nsIInputStream>        mRequestStream;
 
-    // mTransaction only points to mTransactionReference if the
+    // mTransaction only points to the HTTP Transaction callbacks if the
     // transaction is open, otherwise it is null.
-    nsRefPtr<nsAHttpTransaction>    mTransactionReference;
-    nsAHttpTransaction             *mTransaction; // hard ref
+    nsRefPtr<nsAHttpTransaction>    mTransaction;
+
+    // The security callbacks are only stored if we initiate a
+    // backup connection because they need to be proxy released
+    // on the main thread.
+    nsCOMPtr<nsIInterfaceRequestor> mCallbacks;
+    nsCOMPtr<nsIEventTarget>        mCallbackTarget;
+
     nsHttpConnectionInfo           *mConnInfo;    // hard ref
 
     PRLock                         *mLock;
 
     PRUint32                        mLastReadTime;
     PRUint16                        mMaxHangTime;    // max download time before dropping keep-alive status
     PRUint16                        mIdleTimeout;    // value of keep-alive: timeout=
 
     PRPackedBool                    mKeepAlive;
     PRPackedBool                    mKeepAliveMask;
     PRPackedBool                    mSupportsPipelining;
     PRPackedBool                    mIsReused;
     PRPackedBool                    mCompletedSSLConnect;
-    PRPackedBool                    mResetCallbackOnActivation;
 
     PRUint32                        mActivationCount;
 
     // These items are used to implement a parallel connection opening
     // attempt when network.http.connection-retry-timeout has expired
     PRUint8                         mSocketCaps;
     nsCOMPtr<nsITimer>              mIdleSynTimer;
     nsRefPtr<nsHttpConnection>      mBackupConnection;
diff --git a/netwerk/protocol/http/nsHttpPipeline.cpp b/netwerk/protocol/http/nsHttpPipeline.cpp
--- a/netwerk/protocol/http/nsHttpPipeline.cpp
+++ b/netwerk/protocol/http/nsHttpPipeline.cpp
@@ -305,26 +305,29 @@ nsHttpPipeline::SetConnection(nsAHttpCon
     NS_IF_ADDREF(mConnection = conn);
 
     PRInt32 i, count = mRequestQ.Length();
     for (i=0; i<count; ++i)
         Request(i)->SetConnection(this);
 }
 
 void
-nsHttpPipeline::GetSecurityCallbacks(nsIInterfaceRequestor **result)
+nsHttpPipeline::GetSecurityCallbacks(nsIInterfaceRequestor **result,
+                                     nsIEventTarget        **target)
 {
     NS_ASSERTION(PR_GetCurrentThread() == gSocketThread, "wrong thread");
 
     // return security callbacks from first request
     nsAHttpTransaction *trans = Request(0);
     if (trans)
-        trans->GetSecurityCallbacks(result);
-    else
+        trans->GetSecurityCallbacks(result, target);
+    else {
         *result = nsnull;
+        *target = nsnull;
+    }
 }
 
 void
 nsHttpPipeline::OnTransportStatus(nsresult status, PRUint64 progress)
 {
     LOG(("nsHttpPipeline::OnStatus [this=%x status=%x progress=%llu]\n",
         this, status, progress));
 
diff --git a/netwerk/protocol/http/nsHttpTransaction.cpp b/netwerk/protocol/http/nsHttpTransaction.cpp
--- a/netwerk/protocol/http/nsHttpTransaction.cpp
+++ b/netwerk/protocol/http/nsHttpTransaction.cpp
@@ -323,19 +323,22 @@ nsHttpTransaction::TakeResponseHead()
 void
 nsHttpTransaction::SetConnection(nsAHttpConnection *conn)
 {
     NS_IF_RELEASE(mConnection);
     NS_IF_ADDREF(mConnection = conn);
 }
 
 void
-nsHttpTransaction::GetSecurityCallbacks(nsIInterfaceRequestor **cb)
+nsHttpTransaction::GetSecurityCallbacks(nsIInterfaceRequestor **cb,
+                                        nsIEventTarget        **target)
 {
     NS_IF_ADDREF(*cb = mCallbacks);
+    if (target)
+        NS_IF_ADDREF(*target = mConsumerTarget);
 }
 
 void
 nsHttpTransaction::OnTransportStatus(nsresult status, PRUint64 progress)
 {
     LOG(("nsHttpTransaction::OnSocketStatus [this=%x status=%x progress=%llu]\n",
         this, status, progress));
Attachment #499288 - Attachment is obsolete: true
Attachment #499452 - Flags: review?(honzab.moz)
Created attachment 499455 [details] [diff] [review]
keep callbacks, reset callbacks on re-activate v2

drat - update that last patch for a nsnull check I needed in nsHttpPipeline::GetSecurityCallbacks(). Having pipeline tests someday soon will be good.
Attachment #499455 - Flags: review?(honzab.moz)
Attachment #499452 - Flags: review?(honzab.moz)
Created attachment 499460 [details] [diff] [review]
: keep callbacks, reset callbacks on re-activate v3

I'm too tired - I forgot to clear the callbacks in between activate iterations in the last patch which is necessary for the period before the timer fires.
Attachment #499452 - Attachment is obsolete: true
Attachment #499455 - Attachment is obsolete: true
Attachment #499460 - Flags: review?(honzab.moz)
Attachment #499455 - Flags: review?(honzab.moz)
Attachment #499460 - Attachment is patch: true
Attachment #499460 - Attachment mime type: application/octet-stream → text/plain
> I will submit to try server and update the bug tomorrow.

One try server failure - mTransaction wasnull in IdleSyntTimeout.. I have to convince myself that is expected and then we can just check for it and move on. but the leaks and wide crashes are gone.
Created attachment 499487 [details] [diff] [review]
keep callbacks, reset callbacks on re-activate v4 [Check in comment 98]

Correct the try server failure - mTransaction could have been null in ::IdleSynTimeout on a failed activation (e.g. CreateTransaport()).
Attachment #499460 - Attachment is obsolete: true
Attachment #499487 - Flags: review?(honzab.moz)
Attachment #499460 - Flags: review?(honzab.moz)
Try server is green (or starable) for the v4 patch.
(Assignee)

Comment 94

7 years ago
Comment on attachment 499487 [details] [diff] [review]
keep callbacks, reset callbacks on re-activate v4 [Check in comment 98]

>+nsHttpConnection::ReleaseCallbacks()
>+        nsIInterfaceRequestor *cbs = nsnull;
>+        mCallbacks.swap(cbs);


Better might be, but up to you:

        nsIInterfaceRequestor *cbs;
        mCallbacks.forget(cbs);


r=honzab

P.S.: this starts to be complicated.  Releasing this in final will be conditioned by having a very good test for all this acceleration stuff.  After major release we have to remove all patches for the acceleration and start again coding it in the connection manager.  Probably as said in comment 79.
Attachment #499487 - Flags: review?(honzab.moz) → review+

Comment 95

7 years ago
As per today's meeting, beta 9 will be a time-based release. Marking these all betaN+. Please move it back to beta9+ if  you believe it MUST be in the next beta (ie: trunk is in an unshippable state without this)
blocking2.0: beta9+ → betaN+

Updated

7 years ago
Whiteboard: [mozmill] → [mozmill], softblocker
Honza, looks like this is ready to land, can you check this in for Patrick, assuming I'm right here?
(In reply to comment #96)
> Honza, looks like this is ready to land, can you check this in for Patrick,
> assuming I'm right here?

I recently got push access of my own - will push tomorrow morning when I can watch tbpl.
attachment labeled "keep callbacks, reset callbacks on re-activate v4" committed as http://hg.mozilla.org/mozilla-central/rev/257af9cad364
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago7 years ago
Resolution: --- → FIXED
Blocks: 623201
(Assignee)

Updated

7 years ago
Attachment #498116 - Attachment is obsolete: true
(Assignee)

Updated

7 years ago
Attachment #499487 - Attachment description: keep callbacks, reset callbacks on re-activate v4 → keep callbacks, reset callbacks on re-activate v4 [Check in comment 98]
(Assignee)

Updated

7 years ago
Attachment #499288 - Attachment description: keep transaction, reset callbacks on re-activate [Backed out comment 85 (leaks/test failures)] → keep transaction, reset callbacks on re-activate [Check in as f397877da0dd][Backed out comment 85 (leaks/test failures)]
Attachment #499288 - Attachment is obsolete: false
Aaron, what's the state for our Mozmill tests? Do they pass now?
(Assignee)

Updated

7 years ago
Blocks: 623921
Target Milestone: --- → mozilla2.0b9
(Reporter)

Comment 100

7 years ago
Glad to report that the original issue reported in comment #0 is resolved.

Tested on 

Mozilla/5.0 (X11; Linux x86_64; rv:2.0b9pre) Gecko/20110107 Firefox/4.0b9pre 20110107030356
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b9pre) Gecko/20110107 Firefox/4.0b9pre 20110107030356
Status: RESOLVED → VERIFIED
No longer blocks: 603503
You need to log in before you can comment on or make changes to this bug.