Last Comment Bug 480619 - Firefox does not load CSS and images when loading a secure (SSL, https) page for the first time (or after ‘clear private data’) and requires refresh to load properly.
: Firefox does not load CSS and images when loading a secure (SSL, https) page ...
Status: VERIFIED FIXED
: verified1.9.0.15, verified1.9.1, verified1.9.2
Product: Core
Classification: Components
Component: Networking (show other bugs)
: Trunk
: All All
: -- major (vote)
: mozilla1.9.2a1
Assigned To: Bjarne (:bjarne)
:
Mentors:
https://signup.live.com/signup.aspx
: 487403 490457 492495 493529 511631 (view as bug list)
Depends on:
Blocks: 496335
  Show dependency treegraph
 
Reported: 2009-02-27 13:02 PST by Ben
Modified: 2011-10-14 02:29 PDT (History)
28 users (show)
jst: blocking1.9.1-
jst: wanted1.9.1+
mbeltzner: blocking1.9.1.1-
mbeltzner: wanted1.9.1.x+
dveditz: wanted1.9.0.x+
ehsan: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
beta1-fixed
.4+
.4-fixed


Attachments
HTTP Log results after reproducing error. (99.45 KB, application/zip)
2009-03-03 08:30 PST, Ben
no flags Details
about:cache entries(Memory Cache Device) (11.11 KB, text/plain)
2009-03-03 21:22 PST, WADA
no flags Details
Initial, brute force approach (1.11 KB, patch)
2009-04-24 08:45 PDT, Bjarne (:bjarne)
no flags Details | Diff | Review
Lighter version (1.81 KB, patch)
2009-04-25 15:19 PDT, Bjarne (:bjarne)
no flags Details | Diff | Review
Even lighter version... (6.28 KB, patch)
2009-04-27 14:45 PDT, Bjarne (:bjarne)
jduell.mcbugs: review+
Details | Diff | Review
Updated according to comments from reviewer (5.91 KB, patch)
2009-05-22 03:07 PDT, Bjarne (:bjarne)
no flags Details | Diff | Review
Ultra-simple version derived from comments by biesi (1.58 KB, patch)
2009-05-25 13:10 PDT, Bjarne (:bjarne)
no flags Details | Diff | Review
Added similar logic to poll-method (2.30 KB, patch)
2009-05-26 07:12 PDT, Bjarne (:bjarne)
kaie: review+
Details | Diff | Review
Cleaned-up version of patch ready for check-in (1.69 KB, patch)
2009-06-03 15:11 PDT, Bjarne (:bjarne)
mbeltzner: approval1.9.1-
mbeltzner: approval1.9.1.1-
dveditz: approval1.9.1.4+
dveditz: approval1.9.0.15+
Details | Diff | Review
On my way to a test (2.52 KB, patch)
2009-07-12 11:09 PDT, :Ehsan Akhgari (busy, don't ask for review please)
no flags Details | Diff | Review
Test (3.01 KB, patch)
2009-07-14 04:38 PDT, :Ehsan Akhgari (busy, don't ask for review please)
no flags Details | Diff | Review
Test (v2) (3.06 KB, patch)
2009-07-17 12:38 PDT, :Ehsan Akhgari (busy, don't ask for review please)
cbiesinger: review+
kaie: review+
Details | Diff | Review
Test for check-in (3.35 KB, patch)
2009-08-21 06:45 PDT, :Ehsan Akhgari (busy, don't ask for review please)
no flags Details | Diff | Review

Description Ben 2009-02-27 13:02:58 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5

After you navigate away from the secure site you are attempting to view, 'clear private data' (including Cache, Cookies, Offline Website Data, and Authenticated Sessions), and then visit that secure page (possibly for the first time), roughly 50% of the time will result in a page that does not load CSS and images (you may need to repeat this process a few times to see the issue).  Initially, I thought this was an internal server problem and related to calling an external CSS file.  However, all those who have tested this on home machines (outside of our office) are getting the same results.  I have also tried several work arounds include: embedding the CSS (which ended with more of the page loading, but still not all), moving images to a different secure server (no change), and calling the external CSS twice (no change).  The only way we have found to ensure the page loading properly is to identify different browsers and have the page load twice (essentially a refresh) for those using Firefox.

Also, I have found that when first attempting to load specific secure images, upon hitting return or clicking the green go arrow to start loading the page, nothing happens.  For example, if after you navigate away from the secure site (in this case Google), 'clear private data' (including Cache, Cookies, Offline Website Data, and Authenticated Sessions), plug https://www.google.com/accounts/adwords/images/adwords_logo.gif into the address page and then hit enter or the green go arrow, every so often the browser will just sit idle (you may need to repeat this process a few times to see the issue).  After placing your cursor in the address bar and hitting enter again or hitting the green go button again, the image will then load.  I think this is somehow related to what is happening with the errors that occur during load of a full page.  I’ve also seen this happen to entire webpages such as https://hitchingpost.readyhosting.com/secure/contact.htm.


Reproducible: Sometimes

Steps to Reproduce:
1. In Firefox, navigate away from the site you will be testing a secure page on.
2. “Clear Private Data…” (including Cache, Cookies, Offline Website Data, and Authenticated Sessions).
3. Visit a secure page (especially one that relies more on CSS design).
Actual Results:  
Webpage loads, but without CSS and images.

Expected Results:  
Full webpage load, with CSS and images.
Comment 1 Matthias Versen [:Matti] 2009-02-27 18:38:10 PST
Do you get this in the Firefox safemode ?
http://support.mozilla.com/en-US/kb/Safe+Mode
Comment 2 Ben 2009-03-02 08:33:05 PST
Yes, I get the same behavior.
Comment 3 Matthias Versen [:Matti] 2009-03-02 09:52:22 PST
Can you please attach a networking log :
http://www.mozilla.org/projects/netlib/http/http-debugging.html
Comment 4 WADA 2009-03-02 18:29:15 PST
(In reply to comment #0)
> Gecko/2008120122 Firefox/3.0.5

When(from which Fx version) did your problem start to occur frequently?
After upgrade from previous 3.0.x to current 3.0.5?
Or after upgrade from Fx 2 to Fx 3?
If latter, reduce following value to 2. Frequency of problem declines?
> network.http.max-persistent-connections-per-server (default: Fx 2=2, Fx 3=6)
Comment 5 Ben 2009-03-03 08:30:59 PST
Created attachment 365211 [details]
HTTP Log results after reproducing error.
Comment 6 Ben 2009-03-03 08:32:10 PST
ps – Can anyone duplicate what my peers and I are experiencing here?

Matti – I’ve attached a networking log.  Note, I took the following steps to get the error.

1. Opened Firefox (my homepage is google.com).
2. Went to the secure URL noted in the bug info (https://signup.live.com/signup.aspx).  The error did not occur.
3. Went back to my homepage.
4. Cleared my private data (including Cache, Cookies, Offline Website Data, and Authenticated Sessions).
5. (ERROR) Went back to the secure URL noted.  The error occurred here (browser did not load CSS and images).
6. Exited browser.


WADA – I found the error in 3.0.5 toward the end of January.  I’m not exactly sure when this work machine was upgraded to 3.0.5 (I’ll check with our guys here to see if we have a log of that upgrade).  I also have a laptop at home with Firefox 2 on it and I am able to reproduce the error in version 2 as well.
Comment 7 Ben 2009-03-03 09:15:46 PST
WADA – I just realized what you were saying about reducing that value to 2 in the config.  So I reduced that value to 2 in Fx 3.0.5 and now the error seems to be happening 100% of the time.
Comment 8 WADA 2009-03-03 19:12:59 PST
(In reply to comment #7)
> So I reduced that value to 2 in Fx 3.0.5 and now the error seems to be happening 100% of the time.

The IIS server looks tolerant with persistent connection=6. Please change back to default.

(In reply to comment #7)
>(snip)
> 3. Went back to my homepage.
> 4. Cleared my private data (including Cache, Cookies, Offline Website Data, and Authenticated Sessions).
> 5. (ERROR) Went back to the secure URL noted.  The error occurred here (browser did not load CSS and images).
>(snip)

> https://secure.wlxrs.com/yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css
There is no log of "200 OK" for second HTTP GET of the css file(Step 5, Back just after clear cahce).
What's wrong?

(A) First HTTP GET in log. Step 2, before clear cahce. Log for "200 OK" exists.
> Line-6262 : [328140]:   GET /yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css HTTP/1.1
> Line-6263 : [328140]:   Host: secure.wlxrs.com
>(snip)
> Line-7366 :HTTP/1.x 200 OK
> Line-7367 :Content-Length: 25570
> Line-7368 :Content-Type: text/css
>(snip)
> Line-7371 :Server: Microsoft-IIS/6.0
>(snip)

(B) Second HTTP GET in log. Step 5, Back just after clear cache. Log for "200 OK" doesn't exist.
> https://secure.wlxrs.com/yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css
> Line-18951 : GET /yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css HTTP/1.1
> Line-18952 : Host: secure.wlxrs.com
>(snip)
> No log of "200 OK"

Phenomenon(CSS is not used) is reproduced by Fx 3.0.6 on MS Win, with single tab only, with next two pages.
(1) Mozilla Firefox Central : http://en-us.www.mozilla.com/en-US/firefox/central/
(2) Sign Up page of Windows Live.
  Go http://home.live.com/ (Windows Live Home)
  Click "Sign Up" => Jumps to next page. I used this URL(with search string).
> https://signup.live.com/signup.aspx?id=251248&ru=http%3a%2f%2fhome.live.com%2fdefault.aspx%3fnew%3dtrue&cp=2&mkt=en-US&rollrs=12&lic=1 
  The Sing Up page uses next css file(no other CSS is used)
> https://secure.wlxrs.com/yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css
  And HTTP log for some images have following Referer(images are loaded by the CSS).
> Referer: https://secure.wlxrs.com/yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css

If cookie is cleared, client can't send cookie.
Server(IIS)/Server Application(PHP?)/JScript behaviour seems;
  When cookie is not sent by client,
     (a) JScript doesn't generate <link rel="stylesheet"> in some situations,
  or (b) JScript fails to generate <link rel> in some situations,
  or (c) <link rel="stylesheet"> is generated and HTTP GET is issued by Fx, 
         (c-1) but server application doesn't return ".css" file. 
               (server application ignores HTTP GET for the .css file)
         (c-2) but delay occurs in returning ".css" file data(timeout). 
  So, if cache is cleared and (c) occurs, CSS can't be used.
In your log, it looks that (c) occurs, although Fx's caching of css related issue is still suspected. ("No log of 200 OK for .css" doesn't always mean "no response from server really".)

By the way, an imapge/jpeg file is sent with "Content-Encoding: gzip" by server.
This kind of header sometimes means "careless site(or cache server) configuration".
> Line-14013 : 480[3283c0]:   HTTP/1.1 200 OK
> Line-14014 : 480[3283c0]:   Cache-Control: no-cache
> Line-14015 : 480[3283c0]:   Connection: close
> Line-14016 : 480[3283c0]:   Date: Tue, 03 Mar 2009 05:04:19 GMT
> Line-14017 : 480[3283c0]:   Pragma: no-cache
> Line-14018 : 480[3283c0]:   Content-Type: image/jpeg
> Line-14019 : 480[3283c0]:   Expires: Tue, 03 Mar 2009 05:03:19 GMT
> Line-14020 : 480[3283c0]:   Server: Microsoft-IIS/6.0
> Line-14021 : 480[3283c0]:   PPServer: PPV: 30 H: BAYIDSHIPV1C06 V: 0
> Line-14022 : 480[3283c0]:   P3P: CP="DSP CUR OTPi IND OTRi ONL FIN"
> Line-14023 : 480[3283c0]:   Set-Cookie: HIPChallenge=44fXR7y0nCeTSLMMU62AWoJ!ybeLq0SSqNZU*PF8BpU8QYybSfIUaswdLeGHMdrLkn3dO7qD0ydf0hDvvR2xU8VDKTumqkYODkYoViyVwK4SIWEA7ckYkal84F5CvslylANTYBH0lo354uoCUelrPRuQ6HAaSOLzC7KyeuZWrO3GgOGNeJGShkyzRRa!LdldjFHK2jVr*BVnA7rgUwfESkzx3npwHBsb6USAmlG5Bg6mJe5Tlu96Tzg791FmmvmQ!*RtAsaUnE9zrkPb*zipR7H5fVmMGIJx54BngYaF2kY*23zo4sdYRjH0v6L0fYaIaxkuwnd8zDy8CDyu!ensWdj2p7rbZzSeOR; domain=.live.com;path=/;version=1
> Line-14024 : 480[3283c0]:   Content-Encoding: gzip
> Line-14025 : 480[3283c0]:   Vary: Accept-Encoding
> Line-14026 : 480[3283c0]:   Transfer-Encoding: chunked
Comment 9 WADA 2009-03-03 21:21:21 PST
Correction. <link rel> was hard coded in HTML source. Sorry for wrong comments.
> ...</script><link rel="StyleSheet" href="https://secure.wlxrs.com/yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css" type="text/css"/>
> </head>
(... Why no line break before "<link rel" ... ) 

Phenomenon was observed with Fx latest-trunk(2008/3/03 build) too.
When problem(CSS not used) occurs after "Clear Recent History", no entry for the ".css" was found in about:cache(Memory Cache Device), even though "Done" is displayed in status bar. Entry for the ".css" was created after Reload, and CSS was applied to page.

Loading was canceled/terminated at mid of document load process due to internal 
error?
"Content-Encoding:gzip" of HTML is relevant?
"Content-Type: text/html" for XHTML document is relevant?
 ( http://www.w3.org/TR/xhtml-media-types/#media-types )

> Line-212: 1480[3283c0]:   HTTP/1.1 200 OK
> Line-213: 1480[3283c0]:   Cache-Control: private, max-age=0
>(snip)
> Line-216: 1480[3283c0]:   Content-Type: text/html; charset=UTF-8
> Line-217: 1480[3283c0]:   Content-Encoding: gzip
>(snip)
> Note: This server uses "on-the-fly compression" for HTML,
>       or mis-uses "Content-Encoding:gzip".
> (uses as if "Content-Transfer-Encoding" which is not defined by HTTP 1.1)
Comment 10 WADA 2009-03-03 21:22:58 PST
Created attachment 365384 [details]
about:cache entries(Memory Cache Device)
Comment 11 WADA 2009-03-04 00:41:48 PST
This site returns JavaScript file with "Content-Encoding: gzip", in addition to text/html.

> Line-20169 : 0[328140]: uri=https://secure.shared.live.com/8Cvn5p7UVQ6YTKP9Tiy5Gw/js/omnitureH2.js
>(snip)
> Line-20184 : 0[328140]:   GET /8Cvn5p7UVQ6YTKP9Tiy5Gw/js/omnitureH2.js HTTP/1.1
> Line-20184 : 0[328140]:   Host: secure.shared.live.com
>(snip)
> Line-20995 : 1480[3283c0]:   HTTP/1.1 200 OK
> Line-20996 : 1480[3283c0]:   Content-Type: application/x-javascript
>(snip)
> Line-20999 : 1480[3283c0]:   Server: Microsoft-IIS/6.0
>(snip)
> Line-21002 : 1480[3283c0]:   Content-Encoding: gzip
>(snip)
> Line-21005 : 1480[3283c0]:   Content-Length: 7196
>(snip)

Bug 363109 is relevant?
> Bug 363109 gzipped CSS and JS over SSL fails when reloading
"Script loader"(if such function exists) version of Bug 314689?
> Bug 314689 nsCSSLoader::Stop should cancel the necko channels
Comment 12 Ben 2009-03-04 08:18:51 PST
WADA – I really appreciate your analysis!  I must admit I’m a little out of my league with your findings.  I’ll continue to follow along, as I am highly interested in the outcome/resolution on this, but I’m not sure what else I can do to help.  Of course if there is something I can do, I will.

Where do we go from here?  Does anyone else have input?  Is this 'Ben's Bug'?
Comment 13 WADA 2009-03-06 20:40:23 PST
"Content-Encoding: gzip" seems to be illerevant.

Quick test result with Fx latest-trunk on MS Win-XP SP3.
(0) Open Error Console.
(1) Load this page (Bug 480619). text/html, No Content-Encoding: gzip.
(1-a) Execute Reload, Shift+Reload several times.
(1-b) Access pages of links described in step (4). 
(2) Clear Recent History, Remove: my entire history, Select all items.
(3) (3-a) Shift+Reload
    (3-b) Move to URLBar, Enter
    (3-c) Move to URLBar, Insert blank and remove it, Click arrow icon
          (Go to the address in the location bar)
    => Nothing is done
    After several retries of (3-a)/(3-b)/(3-c), this page was reloaded,
    and following error was displayed in Error Console.
> Warning: Unknown property '-moz-opacity'.  Declaration dropped.
> Source File: https://bugzilla.mozilla.org/skins/standard/yui/autocomplete.css Line: 7
>(Some warnings on CSS)
> Error: keywordAutoComplete.textboxFocusEvent is null
> Source File: https://bugzilla.mozilla.org/show_bug.cgi?id=480619 Line: 1503
(4) "Save Link As" or "Ctrl+Click" of https links.
     HTTPS links in this bug.
      (A) Two attachment links of this bug
          https: //bugzilla.mozilla.org/attachment.cgi?id=nnnnnn
          => 302 moved
          => https: //bug480619.bugzilla.mozilla.org/attachment.cgi?id=nnnnnn
      (A-1) First attachment, application/zip, No Content-Encoding: gzip 
      (A-2) Second attachment, text/plain, Content-Encoding: gzip
      (B) Two links to MS's http site
      (B-1) Link to ".js"  application/x-javascript, Content-Encoding: gzip
      (B-2) Link to ".css", text/css, No Content-Encoding: gzip
 
   (4-a) "Save Link As" => Following dialog was displayed.
> Download Error
>  The download cannot be saved because an unknown error occurred.
>  Please Try again.
   (4-b) Ctrl+Click => Nothing is done(blank page is displayed)
    => After several retries of (4-a)/(4-b), download or display was done.
Above phenomena in step (4) couldn't be observed for http links in this bug.

Confirming, based on my test result in Comment #8.
Comment 14 WADA 2009-03-06 21:21:31 PST
s/(B) Two links to MS's http site/(B) Two links to MS's https site/
Comment 15 Daniel Veditz [:dveditz] 2009-04-13 14:54:12 PDT
Is this flaw limited to 1.9.0? I don't see any comments saying anyone's tried it on 1.9.1
Comment 16 Boris Zbarsky [:bz] 2009-04-13 18:29:05 PDT
So for the second GET mentioned in comment 8, here's the issue:

1480[3283c0]:   PR_Write returned [n=-1]
1480[3283c0]: ErrorAccordingToNSPR [in=-5929 out=80004005]
1480[3283c0]: nsSocketTransport::OnMsgOutputClosed [this=3dce8e0 reason=80004005]
1480[3283c0]:   ReadSegments returned [rv=0 read=0 sock-cond=80004005]
1480[3283c0]: nsHttpConnection::CloseTransaction[this=2e38f40 trans=3a6b080 reason=80004005]
1480[3283c0]: nsHttpTransaction::Close [this=3a6b080 reason=80004005]

0x80004005 is NS_ERROR_FAILURE.  Not helpful...  But -5929 is PR_SOCKET_SHUTDOWN_ERROR.

I thought we had an existing bug on this sort of thing, where already-closed sockets got reused....  Ehsan ran into it with private browsing, no?
Comment 17 WADA 2009-04-13 18:56:33 PDT
(In reply to comment #15)
> Is this flaw limited to 1.9.0? I don't see any comments saying anyone's tried it on 1.9.1

Sorry for my unclear comments on Fx version. "Fx latest-trunk" in my comment is nightly trunk build downloaded from;
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/
Example : "Fx latest-trunk" 2009/3/19 build
> Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090319 Minefield/3.6a1pre

I've changed Version: to Trunk.
Comment 18 WADA 2009-04-14 16:09:52 PDT
Following is open bugs of "HTTP && Socket" in summary(changed within one year).
Boris, Bug 470055? Bug 448575 is relevant too? 

> Bug 255141 PSM attempts to access socket notificationCallbacks on main thread ...(snip)
> Bug 448575 nsHttpChannel silently eats socket errors that occur after the response body is being received
> Bug 450078 HTTP1.1 Keep-Alive mechanism and having a small WEB-Server with only 2 open sockets. -> Not all elements are shown.
> Bug 470055 pkix_HttpCertStore_FindSocketConnection reuses closed socket, OCSP fails
> Bug 481301 ASSERTION: wrong thread: 'PR_GetCurrentThread() == gSocketThread' [@ nsHttpConnection::ResumeRecv]
Comment 19 Boris Zbarsky [:bz] 2009-04-14 18:47:59 PDT
The bug I'm thinking of had a summary more like "SSL sites not loading"...
Comment 20 :Ehsan Akhgari (busy, don't ask for review please) 2009-04-14 20:40:17 PDT
(In reply to comment #19)
> The bug I'm thinking of had a summary more like "SSL sites not loading"...

Yeah, it's bug 463256.  See bug 463256 comment 21 for my analysis on the problem.  This is caused by the call to nsISecretDecoderRing::LogoutAndTeardown which happens in the clear private data dialog as well.

Bug 463256 comment 26 mentions the reason for this behavior.  We worked around it in private browsing by using the offline mode to shut down Necko sockets, but I'm not sure what would be a proper fix in this case.
Comment 21 Mike Beltzner [:beltzner, not reading bugmail] 2009-04-21 12:04:33 PDT
It's not a very common user action in Firefox, so I don't think this needs to block (especially if the fix is difficult) but would be willing to reconsider if someone comes up with a compelling use case. I'll let jst make a call about blocking.

It's not great.
Comment 22 :Gavin Sharp [email: gavin@gavinsharp.com] 2009-04-21 12:07:11 PDT
*** Bug 487403 has been marked as a duplicate of this bug. ***
Comment 23 Johnny Stenback (:jst, jst@mozilla.com) 2009-04-23 17:23:56 PDT
I'd really like to see this fixed for 1.9.1, but I can't say I'd hold the release for it. Bjarne, can you spend some time looking into this? If this is fixable in a safe way we'd consider taking the fix for 1.9.1 still. If you don't have time, please let me know.
Comment 24 Bjarne (:bjarne) 2009-04-24 08:45:17 PDT
Created attachment 374473 [details] [diff] [review]
Initial, brute force approach

Applying the same trick as for bug #463256 seems to make this work. Mochitests also run fine (except for the usual ones on my test-box).

Question is whether this approach is too heavy. It is probably possible to be more light-handed by e.g. re-initializing nsSocketTransportService in the "net:clear-active-logins" case in nsHttpHandler::Observe(). I haven't tried that yet but I'll try it out on Monday unless the current fix is acceptable.
Comment 25 Bjarne (:bjarne) 2009-04-24 08:48:04 PDT
Oh - forgot to mention that the patch is for trunk...
Comment 26 :Ehsan Akhgari (busy, don't ask for review please) 2009-04-24 10:57:38 PDT
(In reply to comment #24)
> Created an attachment (id=374473) [details]
> Initial, brute force approach
> 
> Applying the same trick as for bug #463256 seems to make this work. Mochitests
> also run fine (except for the usual ones on my test-box).

This is less than ideal.  I still feel bad about bug 463256... :-)  For a sample of a problem caused by this approach, see bug 488772.  We're really looking for a way around this, if possible.

Come to think of it, it's actually worse than the private browsing case, because this would cause the download manager / add-on manager, etc. to prompt if there are any downloads in progress.

> Question is whether this approach is too heavy. It is probably possible to be
> more light-handed by e.g. re-initializing nsSocketTransportService in the
> "net:clear-active-logins" case in nsHttpHandler::Observe(). I haven't tried
> that yet but I'll try it out on Monday unless the current fix is acceptable.

I'd very much like to see how this turns out.  Please let us know if that solves the problem in a clean way, so that I can move the private browsing service to do a similar thing instead of the current hack.  Thanks!
Comment 27 Johnny Stenback (:jst, jst@mozilla.com) 2009-04-24 16:08:34 PDT
Comment on attachment 374473 [details] [diff] [review]
Initial, brute force approach

Clearing review request while waiting to see if the smaller hammer approach is acceptable here :)
Comment 28 Bjarne (:bjarne) 2009-04-25 15:19:06 PDT
Created attachment 374616 [details] [diff] [review]
Lighter version

This version is "lighter", but it still shuts down and restarts the STS thread.

(In reply to comment #26)
> Come to think of it, it's actually worse than the private browsing case,
> because this would cause the download manager / add-on manager, etc. to prompt
> if there are any downloads in progress.

User requested to clear private data, including "authenticated sessions", and should imo not be shell shocked if active connections are also terminated. :)  Maybe we could even make the GUI indicate this by e.g. changing the text in the dialogue from "Authenticated sessions" to "Authenticated sessions and active connections" ?

However, I agree with bug #488772 in that it would be more elegant to shut down SSL connections only. In order to implement this we need something like the two loops in nsSocketTransportService::Run() at lines 588++ in nsSocketTransportService2.cpp, but we also need to figure out from SocketContext whether a socket is ssl. Then we must stuff this into a method DetachSSLSockets() or similar and either expose it via a suitable API or implement an extra action in nsSocketTransportService::Observe() (I prefer the latter). No black magic, but it might be a little tricky to deduce ssl from the SocketContext-object.

I'll give this a whack on Monday unless the current patch with above arguments about users being kept responsible for their actions are acceptable. :)
Comment 29 Henrik Skupin (:whimboo) 2009-04-27 07:37:35 PDT
See also bug 474364 where a similar thing happens on restart. I do not clear private data on shutdown, so those pages are still in the cache.
Comment 30 Bjarne (:bjarne) 2009-04-27 14:45:26 PDT
Created attachment 374817 [details] [diff] [review]
Even lighter version...

I think this is about as light as we can get...

Upon receiving the "net:clear-active-logins" msg, nsHttpHandler dispatches a new msg with topic "net:detach-nss-sockets". nsSocketTransportService reacts to this by dispatching an event which will detach sockets in the NSS layer (running on the socket-thread). We could make this simpler by letting nsSocketTransportService listen directly to "net:clear-active-logins", but if someone else wants to use this mechanism to close down NSS-sockets we need to keep the two topics separated in order to avoid undesired side-effects.

A few things about this implementation should be mentioned (input is appreciated)

- The string "net:detach-nss-sockets" is harcoded - is there a natural place for constants like this? It is shared between nsHttpHandler and nsSocketTransportService...

- I think it's safe for nsDetachNssSocketsEvent to keep the reference to nsSocketTransportService as a raw pointer, but someone with stronger xpcom-fu may not share this opinion...  :]

- There should be a better way to filter the sockets than comparing layer-name with "NSS layer" - ideas are welcome here

Any ideas about how to set up a testcase for this is also appreciated...
Comment 31 Matthias Versen [:Matti] 2009-04-28 06:32:20 PDT
*** Bug 490457 has been marked as a duplicate of this bug. ***
Comment 32 Johnny Stenback (:jst, jst@mozilla.com) 2009-04-30 17:18:40 PDT
Comment on attachment 374817 [details] [diff] [review]
Even lighter version...

I'm afraid I'm not a good reviewer for this change. Jason, wanna dig into this one?
Comment 33 Matthias Versen [:Matti] 2009-05-12 05:44:40 PDT
*** Bug 492495 has been marked as a duplicate of this bug. ***
Comment 34 :Ehsan Akhgari (busy, don't ask for review please) 2009-05-12 08:21:09 PDT
Thanks for the patch, it looks really great! :-)

(In reply to comment #30)
> Any ideas about how to set up a testcase for this is also appreciated...

Hmm, I would assume that you would open two channels, one to a http:// location and one to a https:// location, and send out "net:clear-active-logins" and verify that the http channel lives on but the https one aborts.  Does that make sense?
Comment 35 Matthias Versen [:Matti] 2009-05-18 06:51:18 PDT
*** Bug 493529 has been marked as a duplicate of this bug. ***
Comment 36 Bjarne (:bjarne) 2009-05-18 09:56:13 PDT
> Hmm, I would assume that you would open two channels, one to a http:// location
> and one to a https:// location, and send out "net:clear-active-logins" and
> verify that the http channel lives on but the https one aborts.  Does that make
> sense?

Makes sense, but how do I check that a channel "lives"? Persistent connections?
Comment 37 :Ehsan Akhgari (busy, don't ask for review please) 2009-05-19 01:45:29 PDT
(In reply to comment #36)
> > Hmm, I would assume that you would open two channels, one to a http:// location
> > and one to a https:// location, and send out "net:clear-active-logins" and
> > verify that the http channel lives on but the https one aborts.  Does that make
> > sense?
> 
> Makes sense, but how do I check that a channel "lives"? Persistent connections?

I'm not really sure.  A dumb trick would be to use the stuff added in bug 396226 and feed some different data through the connection after observing the notification, and on the HTTP client part expect (or don't expect) that data.

Of course there may be a better way to do that using nsIChannel wizardry.  :-)
Comment 38 Bjarne (:bjarne) 2009-05-19 14:40:12 PDT
Not really clear to me how to test this... :)  I'll think about it some more - maybe the server written for bug #486769 could be used, but I need to figure out how to support https on that one first.

Any chance to get a review on the patch while we're thinking about testing?
Comment 39 Jason Duell [:jduell] (needinfo? me) 2009-05-21 17:27:22 PDT
Comment on attachment 374817 [details] [diff] [review]
Even lighter version...

Looks good.  A much lighter touch than the previous fixes :)

Boris, this seems fairly straightforward, but if you're not comfortable reviewing this code, let me know and I can ping biesi.
Comment 40 Boris Zbarsky [:bz] 2009-05-21 18:47:29 PDT
This seems ok, with the following nits:

1)  Use nsRunnableMethod and just add a method that calls the detaching
    function.  That will incidentally fix the fact that you're not holding a
    strong reference, which I do think is a bug.
2)  Space after ',' when calling strcmp.
3)  I assume that things work ok if the event fires after xpcom shutdown?  If
    not, you presumably need to revoke it then.
Comment 41 Bjarne (:bjarne) 2009-05-22 03:07:54 PDT
Created attachment 379101 [details] [diff] [review]
Updated according to comments from reviewer

> 1)  Use nsRunnableMethod and just add a method that calls the detaching
>     function.  That will incidentally fix the fact that you're not holding a
>     strong reference, which I do think is a bug.

Ok. Using nsRefPtr - is that ok in this situation?

> 2)  Space after ',' when calling strcmp.

Right...

> 3)  I assume that things work ok if the event fires after xpcom shutdown?  If
>     not, you presumably need to revoke it then.

Uhh... what? Is that a potential scenario? Any hints or pointers to code/tests which checks for this?

There are two things I would like to change in this patch, but I'm depending on input on how to achieve it :

1) I would like to identify sockets without using the name of the socket-layer (last issue in comment #30)

2) If it's possible to parametrize the call to a nsRunnableMethod, the function nsSocketTransportService::DetachNssSockets() could be dropped (making code-size smaller)

And, of-course, ideas for how to write a test for this would be nice. Unfortunately, I don't think the approach from comment #37 will do the trick.
Comment 42 Boris Zbarsky [:bz] 2009-05-22 10:27:56 PDT
> Ok. Using nsRefPtr - is that ok in this situation?

I'd use nsCOMPtr, since the type is nsIRunnable.  And probably NS_NEW_RUNNABLE_METHOD.

> Uhh... what? Is that a potential scenario? 

Well.. what prevents it?  You put an event into an event queue.  What guarantees that it will fire before we fire the xpcom-shutdown notification, if there is already an event in the queue that will try to shutdown?

I'm not saying it's definitely possible.  I'm saying we either need to prove it's impossible or handle it.  ccing Benjamin for his shutdown fu.

I have no idea how to deal with your issue (1).  There is no way to parametrize nsRunnableMethod, but I think the codesize is still smaller than having a separate class...
Comment 43 Benjamin Smedberg [:bsmedberg] 2009-05-22 10:34:29 PDT
https://wiki.mozilla.org/XPCOM_Shutdown

runnables can fire after xpcom-shutdown and xpcom-shutdown-threads until the event queue is cleared. The component manager should be available during this time, but various modules will have begun to shut themself down.
Comment 44 Christian :Biesinger (don't email me, ping me on IRC) 2009-05-25 05:41:02 PDT
Why do we need special code for this, especially hardcoding NSS in the STS? In particular: We handle servers closing the connection to an idle socket, and connection resets, etc.

So the specific question is, I guess: Why do we attempt to reuse this socket?

The HTTP code attempts to make sure that the socket is still alive before reusing it:
PRBool
nsHttpConnection::CanReuse()
{
    return IsKeepAlive() && (NowInSeconds() - mLastReadTime < mIdleTimeout)
                         && IsAlive();
}

And IsAlive() basically calls nsSocketTransport::IsAlive, which basically calls PR_Recv. Shouldn't that be returning an error here, making us consider the socket to be dead? Similarly, shouldn't the STS loop polling the sockets already notice that the socket is in an error state?
Comment 45 Bjarne (:bjarne) 2009-05-25 07:49:37 PDT
So, question may be why PR_Read() called from IsAlive() succeeds while PR_Write() fails?

PR_Recv() from IsAlive() calls nsSSLThread::requestRecvMsgPeek() , which do not take conditions PK11LoggedOut / AlreadyShutDown into account (as opposed to nsSSLThread::requestWrite() and nsSSLThread::requestRead()). I'll try to follow this track for a little while and see if it resolves the original issue...
Comment 46 Bjarne (:bjarne) 2009-05-25 13:10:05 PDT
Created attachment 379598 [details] [diff] [review]
Ultra-simple version derived from comments by biesi

I must admit that this approach feels more elegant...  :)  The observation is that the call to PR_Recv() from IsAlive() is made with parameter PR_MSG_PEEK, which translates into a call to nsSSLThread::requestRecvMsgPeek(). This patch adds logic to nsSSLThread::requestRecvMsgPeek() for certain states and sets an appropriate error-code on the socket.

Works nicely with the test-URL for this defect but I haven't checked the duplicated cases. Probably kaie or MisterSSL are more appropriate reviewers, but I'll ask bz initially.
Comment 47 Christian :Biesinger (don't email me, ping me on IRC) 2009-05-26 01:48:09 PDT
Does the poll code require fixing too? It seems like it should report an EXCEPT flag for these sockets, and if it does that then the socket transport service would already notice that the socket has an error, instead of waiting for HTTP to detect it. (and it would improve things for stuff like Chatzilla too)
Comment 48 Bjarne (:bjarne) 2009-05-26 07:12:06 PDT
Created attachment 379680 [details] [diff] [review]
Added similar logic to poll-method

Assuming previous comment was referring to nsSSLThread::requestPoll().

This patch is a "little more aggressive" than previous patch since it now triggers and returns on bad conditions very early in the method-call. Not sure about this, but it's worth a try...

Tested with test-URL for this defect and also by browsing some bank-sites which use https.
Comment 49 Christian :Biesinger (don't email me, ping me on IRC) 2009-05-26 07:41:40 PDT
Comment on attachment 379680 [details] [diff] [review]
Added similar logic to poll-method

I'm not qualified to review PSM code.

But yeah, looks like I did refer to nsSSLThread::requestPoll (via nsNSSIOLayer.cpp:nsSSLIOLayerPoll)

I'm not sure that keeping the LOG in nsSocketTransport is worth keeping...
Comment 50 Boris Zbarsky [:bz] 2009-05-26 12:52:34 PDT
Do you still want my review on the " Updated according to comments from reviewer" diff?
Comment 51 Bjarne (:bjarne) 2009-05-27 00:02:53 PDT
Comment on attachment 379101 [details] [diff] [review]
Updated according to comments from reviewer

Cleared request for review
Comment 52 Bjarne (:bjarne) 2009-05-27 00:06:24 PDT
Comment on attachment 379680 [details] [diff] [review]
Added similar logic to poll-method

Trying new reviewer for PSM code...
Comment 53 Jason Duell [:jduell] (needinfo? me) 2009-06-02 18:30:41 PDT
From my glance over how this bug has evolved, it sounds like the latest method proposed here might also fix the necko problems in bug 463256.  I.e., we might be able to avoid shutting down and restarting the whole necko layer.   Ehsan, does that sounds like a possibility?
Comment 54 :Ehsan Akhgari (busy, don't ask for review please) 2009-06-03 03:06:45 PDT
(In reply to comment #53)
> From my glance over how this bug has evolved, it sounds like the latest method
> proposed here might also fix the necko problems in bug 463256.  I.e., we might
> be able to avoid shutting down and restarting the whole necko layer.   Ehsan,
> does that sounds like a possibility?

Yes, it almost certainly does.
Comment 55 Bjarne (:bjarne) 2009-06-03 03:11:39 PDT
Looks like kaie is busy. Feel free to suggest reviewers for this patch...
Comment 56 :Ehsan Akhgari (busy, don't ask for review please) 2009-06-03 03:45:39 PDT
Comment on attachment 379680 [details] [diff] [review]
Added similar logic to poll-method

Maybe dveditz could help out with the review here?
Comment 57 Boris Zbarsky [:bz] 2009-06-03 04:46:25 PDT
Have you e-mailed Kai?
Comment 58 Kai Engert (:kaie) 2009-06-03 10:55:28 PDT
Comment on attachment 379680 [details] [diff] [review]
Added similar logic to poll-method

Thanks to everyone for your research, thoughts and this patch. It looks reasonable.

r=kaie
Comment 59 Bjarne (:bjarne) 2009-06-03 15:11:33 PDT
Created attachment 381395 [details] [diff] [review]
Cleaned-up version of patch ready for check-in

Removed LOG as suggested by biesi, added meta-information
Comment 60 Bjarne (:bjarne) 2009-06-03 15:14:00 PDT
Requesting check-in to bake it on trunk. No test available - ideas for how to make one are appreciated.
Comment 61 Boris Zbarsky [:bz] 2009-06-03 15:15:26 PDT
Comment on attachment 381395 [details] [diff] [review]
Cleaned-up version of patch ready for check-in

Need trunk approval....
Comment 62 Henrik Skupin (:whimboo) 2009-06-03 16:20:51 PDT
Not fixed yet. Please leave relnote as it is.
Comment 63 Christian :Biesinger (don't email me, ping me on IRC) 2009-06-07 14:48:47 PDT
(In reply to comment #60)
> Requesting check-in to bake it on trunk. No test available - ideas for how to
> make one are appreciated.

Well, we support HTTPS now in the framework, right? So make an HTTPS request, do whatever clear private data does that breaks sockets, and make another request and see if that works.

Could do this on the socket transport level too - open a socket transport stream to the SSL port, do whatever clear private data does, and see if the stream got notified (e.g. the asyncWait callback got called, etc)
Comment 64 Mike Beltzner [:beltzner, not reading bugmail] 2009-06-11 00:31:05 PDT
(no longer need approval for trunk)
Comment 65 Boris Zbarsky [:bz] 2009-06-12 19:11:21 PDT
Pushed http://hg.mozilla.org/mozilla-central/rev/98c455c298d5

Still need tests here.
Comment 66 Henrik Skupin (:whimboo) 2009-06-14 10:33:23 PDT
Verified fixed on trunk with builds on all platforms like Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090613 Minefield/3.6a1pre (.NET CLR 3.5.30729) ID:20090613042114

Boris, can we completely automate those tests or do we also need a Litmus or Mozmill test?
Comment 67 Boris Zbarsky [:bz] 2009-06-14 15:01:23 PDT
The idea would be to have automated tests.
Comment 68 Christian :Biesinger (don't email me, ping me on IRC) 2009-06-14 15:27:07 PDT
I described in comment 63 how an automated test could be written.
Comment 69 Mike Beltzner [:beltzner, not reading bugmail] 2009-06-15 21:05:05 PDT
Comment on attachment 381395 [details] [diff] [review]
Cleaned-up version of patch ready for check-in

(Not going to take this for 1.9.1, minusing the approval flag)
Comment 70 :Ehsan Akhgari (busy, don't ask for review please) 2009-07-01 05:35:29 PDT
(In reply to comment #63)
> (In reply to comment #60)
> > Requesting check-in to bake it on trunk. No test available - ideas for how to
> > make one are appreciated.
> 
> Well, we support HTTPS now in the framework, right? So make an HTTPS request,
> do whatever clear private data does that breaks sockets, and make another
> request and see if that works.
> 
> Could do this on the socket transport level too - open a socket transport
> stream to the SSL port, do whatever clear private data does, and see if the
> stream got notified (e.g. the asyncWait callback got called, etc)

Would a test like what I this help here?

<http://mxr.mozilla.org/mozilla-central/source/browser/components/privatebrowsing/test/browser/browser_privatebrowsing_sslsite_transition.js>
Comment 71 bernhard krumm 2009-07-01 22:40:02 PDT
CSS is also missing after relaoding TABS when updating to 3.5

I opened the "Firefox 3.5 for developers" page 
( https://developer.mozilla.org/en/Firefox_3.5_for_developers )

then I upgraded to 3.5

after/when upgrading the opened page was reloaded without CSS.
a 2nd load in another TAB was correct with CSS 
a refresh in first TAB also helped to load CSS
Comment 72 Henrik Skupin (:whimboo) 2009-07-02 01:19:22 PDT
Bernhard this is more bug 474364.
Comment 73 :Ehsan Akhgari (busy, don't ask for review please) 2009-07-12 09:37:40 PDT
Comment on attachment 381395 [details] [diff] [review]
Cleaned-up version of patch ready for check-in

See bug 496335 comment 7.
Comment 74 Christian :Biesinger (don't email me, ping me on IRC) 2009-07-12 09:54:40 PDT
(In reply to comment #70)
> Would a test like what I this help here?
> 
> <http://mxr.mozilla.org/mozilla-central/source/browser/components/privatebrowsing/test/browser/browser_privatebrowsing_sslsite_transition.js>

Not really. What was wrong with the approach I suggested?
Comment 75 :Ehsan Akhgari (busy, don't ask for review please) 2009-07-12 10:03:56 PDT
(In reply to comment #74)
> (In reply to comment #70)
> > Would a test like what I this help here?
> > 
> > <http://mxr.mozilla.org/mozilla-central/source/browser/components/privatebrowsing/test/browser/browser_privatebrowsing_sslsite_transition.js>
> 
> Not really. What was wrong with the approach I suggested?

Hmm, I think it implements your first suggestion:

> Well, we support HTTPS now in the framework, right? So make an HTTPS request,
> do whatever clear private data does that breaks sockets, and make another
> request and see if that works.

In this case, toggling the private browsing mode is the action which breaks sockets.
Comment 76 Christian :Biesinger (don't email me, ping me on IRC) 2009-07-12 10:09:23 PDT
That test is at a much higher level than a necko test should be. Hm... though I guess this shouldn't be a Necko test anyway, as it was a bug in PSM, so this should actually be in security/manager somewhere.

My point is: this should be a test using channels/sockets and nsISecretDecoderRing, not a test that uses docshells and nsIPrivateBrowsingService. It also should probably be an xpcshell test.

>> Well, we support HTTPS now in the framework, right? So make an HTTPS request,
>> do whatever clear private data does that breaks sockets, and make another
>> request and see if that works.
>In this case, toggling the private browsing mode is the action which breaks
>sockets.

Kinda. What actually breaks sockets is logging out of/tearing down the secret decoder ring, i.e.:
292         let sdr = Cc["@mozilla.org/security/sdr;1"].
293                   getService(Ci.nsISecretDecoderRing);
294         sdr.logoutAndTeardown();
Comment 77 :Ehsan Akhgari (busy, don't ask for review please) 2009-07-12 11:09:11 PDT
Created attachment 388146 [details] [diff] [review]
On my way to a test

Here is what I have so far.  Based on comment 63, I expected at least one of the reader or writer to get called, but none did.  What am I doing wrong?
Comment 78 Christian :Biesinger (don't email me, ping me on IRC) 2009-07-14 01:48:17 PDT
hm... I think that code should work. if you remove the logoutAndTeardown call, does that make a difference?
Comment 79 :Ehsan Akhgari (busy, don't ask for review please) 2009-07-14 04:38:33 PDT
Created attachment 388452 [details] [diff] [review]
Test

OK, my mistake was to use example.org:443.  That only becomes meaningful through the proxy set up by the test harness as a pac file.  For this test we should use the SSL server directly, that is, localhost:4443.

Asking review from biesi to confirm that this test is testing the correct thing, and from kaie as a security peer.
Comment 80 Boris Zbarsky [:bz] 2009-07-14 12:00:39 PDT
The test server isn't guaranteed to be running on port 4443, is it?
Comment 81 :Ehsan Akhgari (busy, don't ask for review please) 2009-07-15 01:05:21 PDT
(In reply to comment #80)
> The test server isn't guaranteed to be running on port 4443, is it?

I'm not sure if this currently guaranteed or not, but it is hardcoded:

<http://mxr.mozilla.org/mozilla-central/source/build/automation.py.in#321>
Comment 82 Boris Zbarsky [:bz] 2009-07-15 22:07:44 PDT
I thought we had support for our test stuff running on a different port if the one it wants is in use.  Did that not land?
Comment 83 Jeff Walden [:Waldo] (remove +bmo to email) 2009-07-15 22:35:03 PDT
That's reftest-only, also much easier there since the server is in-process.  ssltunnel would need to communicate the port it chose back to the Python script to be splatched into a user.js, and that's not quite as easy.

I'm confused as to why the raw port must be used, but do note that https://example.org:443 is fail because it's not in the set of mappings, so naturally it wouldn't work for that choice:

http://mxr.mozilla.org/mozilla-central/source/build/pgo/server-locations.txt
Comment 84 :Ehsan Akhgari (busy, don't ask for review please) 2009-07-17 12:38:35 PDT
Created attachment 389185 [details] [diff] [review]
Test (v2)

(In reply to comment #83)
> I'm confused as to why the raw port must be used, but do note that
> https://example.org:443 is fail because it's not in the set of mappings, so
> naturally it wouldn't work for that choice:

Thanks for the note.  I'm using the raw port because the socket transport layer does not use the proxy pac file.  I tested and made sure that the example.com:443 combination can't be used in this test as well.

This version of the patch is a bit better because it makes sure that the socket gets dropped after the teardown operation.
Comment 85 Mike Beltzner [:beltzner, not reading bugmail] 2009-07-22 17:32:36 PDT
Comment on attachment 381395 [details] [diff] [review]
Cleaned-up version of patch ready for check-in

Please renominate when you have the tests approved.
Comment 86 :Ehsan Akhgari (busy, don't ask for review please) 2009-08-02 07:00:13 PDT
Pinging biesi and kaie for the test review...
Comment 87 Kai Engert (:kaie) 2009-08-20 03:10:47 PDT
Comment on attachment 389185 [details] [diff] [review]
Test (v2)

moa=kaie
Comment 88 :Ehsan Akhgari (busy, don't ask for review please) 2009-08-20 07:59:24 PDT
biesi: ping?
Comment 89 Matthias Versen [:Matti] 2009-08-20 09:18:27 PDT
*** Bug 511631 has been marked as a duplicate of this bug. ***
Comment 90 Christian :Biesinger (don't email me, ping me on IRC) 2009-08-21 04:17:34 PDT
Comment on attachment 389185 [details] [diff] [review]
Test (v2)

sorry for the delay

+  onInputStreamReady: function(stream) {
+    SimpleTest.ok(tearDown, "The stream should be closed after a teardown of secure decoder ring");

You should probably check that the stream has an error state. I.e. try reading from it and see if it throws the desired exception (or, just call available() and see if it throws the expected exception)

+var writer = {
+  onOutputStreamReady: function(stream) {
+  }
+};

you don't need this, do you?

+      getService(Ci.nsISecretDecoderRing).
+      logoutAndTeardown();

can you indent these two lines, so that it's obvious that they are continuation lines?

+outStream.asyncWait(writer, 0, 1, currentThread);

no need for this

actually you don't even need to open the output stream
Comment 91 :Ehsan Akhgari (busy, don't ask for review please) 2009-08-21 06:45:21 PDT
Created attachment 395829 [details] [diff] [review]
Test for check-in

Addressed all of the review comments on the test.  I have submitted this to the try server, and after a green unit test run, I'll land it on m-c.
Comment 92 Christian :Biesinger (don't email me, ping me on IRC) 2009-08-21 06:49:30 PDT
+    try {
+      stream.available();
+    }

Shouldn't you add a:

  SimpleTest.ok(!tearDown, "Stream should be in an error state");

after this call?
Comment 93 :Ehsan Akhgari (busy, don't ask for review please) 2009-08-21 07:56:44 PDT
(In reply to comment #92)
> +    try {
> +      stream.available();
> +    }
> 
> Shouldn't you add a:
> 
>   SimpleTest.ok(!tearDown, "Stream should be in an error state");
> 
> after this call?

I make sure a NS_ERROR_FAILURE is thrown in the catch block, otherwise the test will time out.  Isn't that enough?
Comment 94 Christian :Biesinger (don't email me, ping me on IRC) 2009-08-21 10:22:05 PDT
I think it's better to be explicit about the fact that an unthrown exception is a failure.
Comment 95 :Ehsan Akhgari (busy, don't ask for review please) 2009-08-22 03:32:01 PDT
The test is failing on Linux.  Here is what happens: on Linux, the sink's onTransportStatus method is never called, so the secure decoder ring is never torn down.  After an amount of time which is longer than Windows, reader's onInputStreamReady is called, and calling available on the stream passed to it throws a NS_ERROR_NET_RESET error.

Interestingly, the connection on port 4443 is actually established during the test:

ehsan@ehsan-laptop:~/moz/src$ sudo netstat -p | grep ^tcp
tcp        0      0 ehsan-laptop:8888       ehsan-laptop:37426      TIME_WAIT   -               
tcp        0      0 ehsan-laptop:8888       ehsan-laptop:37424      TIME_WAIT   -               
tcp        0      0 ehsan-laptop:37423      ehsan-laptop:8888       TIME_WAIT   -               
tcp        0      0 ehsan-laptop:4443       ehsan-laptop:48014      ESTABLISHED 25018/ssltunnel 
tcp        0      0 ehsan-laptop:48014      ehsan-laptop:4443       ESTABLISHED 25019/firefox-bin
tcp        0      0 ehsan-laptop:8888       ehsan-laptop:37422      TIME_WAIT   -               
tcp        0      0 ehsan-laptop:8888       ehsan-laptop:37427      TIME_WAIT   -               

Do you have any idea why the socket transport service doesn't get notified about the connection being established on Linux?  This happens both on the try server and locally.
Comment 96 Christian :Biesinger (don't email me, ping me on IRC) 2009-08-22 04:16:20 PDT
Can you try setting the transport event sink before opening the input stream? That might possibly fix the problem (and is probably what should be done anyway)
Comment 97 :Ehsan Akhgari (busy, don't ask for review please) 2009-08-22 10:41:31 PDT
Thanks, that did fix the problem!  I'll submit another try server run to make sure that the test passes on all platforms before pushing.
Comment 98 :Ehsan Akhgari (busy, don't ask for review please) 2009-08-22 23:52:05 PDT
Test landed as:

<http://hg.mozilla.org/mozilla-central/rev/30ead1e19478>
Comment 99 :Ehsan Akhgari (busy, don't ask for review please) 2009-08-23 03:47:00 PDT
Test landed on 1.9.2 as well:

<http://hg.mozilla.org/releases/mozilla-1.9.2/rev/5b0d486603de>
Comment 100 Daniel Veditz [:dveditz] 2009-08-28 12:49:45 PDT
Comment on attachment 381395 [details] [diff] [review]
Cleaned-up version of patch ready for check-in

Approved for 1.9.1.4, a=dveditz for release-drivers
Comment 101 :Ehsan Akhgari (busy, don't ask for review please) 2009-08-29 23:53:55 PDT
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/bcae819c3b43
Comment 102 Daniel Veditz [:dveditz] 2009-08-31 14:18:30 PDT
Comment on attachment 381395 [details] [diff] [review]
Cleaned-up version of patch ready for check-in

Approved for 1.9.0.15, a=dveditz for release-drivers
Comment 103 :Ehsan Akhgari (busy, don't ask for review please) 2009-09-01 10:44:21 PDT
The SSL tunnel is not available on the 1.9.0.x branch, so it is not possible to test this on that branch.  Is it OK to land it there without the test?
Comment 104 Samuel Sidler (old account; do not CC) 2009-09-01 10:55:48 PDT
Yes, that's fine.
Comment 105 :Ehsan Akhgari (busy, don't ask for review please) 2009-09-01 12:42:26 PDT
Attachment 381395 [details] [diff] landed on 1.9:

Checking in security/manager/ssl/src/nsSSLThread.cpp;
/cvsroot/mozilla/security/manager/ssl/src/nsSSLThread.cpp,v  <--  nsSSLThread.cpp
new revision: 1.17; previous revision: 1.16
Comment 106 Al Billings [:abillings] 2009-09-15 12:07:50 PDT
Verified fixed for 1.9.1.4 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.4pre) Gecko/20090915 Shiretoko/3.5.4pre (.NET CLR 3.5.30729).
Comment 107 Al Billings [:abillings] 2009-09-16 16:37:52 PDT
Verified fixed for 1.9.0.15 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.15pre) Gecko/2009091606 GranParadiso/3.0.15pre (.NET CLR 3.5.30729).
Comment 108 Honza Bambas (:mayhemer) 2009-09-17 12:10:04 PDT
(In reply to comment #103)
> The SSL tunnel is not available on the 1.9.0.x branch, so it is not possible to
> test this on that branch.  Is it OK to land it there without the test?

Is it possible to have a test for this on the trunk? in security/manager/ssl/tests/ or in content/something/tests/...
Comment 109 :Ehsan Akhgari (busy, don't ask for review please) 2009-09-17 13:45:07 PDT
(In reply to comment #108)
> (In reply to comment #103)
> > The SSL tunnel is not available on the 1.9.0.x branch, so it is not possible to
> > test this on that branch.  Is it OK to land it there without the test?
> 
> Is it possible to have a test for this on the trunk? in
> security/manager/ssl/tests/ or in content/something/tests/...

I have already written and landed a test for it: <http://mxr.mozilla.org/mozilla-central/source/security/manager/ssl/tests/mochitest/bugs/test_bug480619.html>.  It also exists in 1.9.1 and 1.9.2.
Comment 110 Mike Beltzner [:beltzner, not reading bugmail] 2009-10-30 13:13:06 PDT
Removing relnote
Comment 111 Henrik Skupin (:whimboo) 2009-10-30 13:43:07 PDT
Already verifed fixed on 1.9.2 with an alpha build (comment 66). Adding verified1.9.2 keyword.

Note You need to log in before you can comment on or make changes to this bug.