Closed Bug 480619 Opened 16 years ago Closed 15 years ago

Firefox does not load CSS and images when loading a secure (SSL, https) page for the first time (or after ‘clear private data’) and requires refresh to load properly.

Categories

(Core :: Networking, defect)

defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla1.9.2a1
Tracking Status
status1.9.2 --- beta1-fixed
blocking1.9.1 --- .4+
status1.9.1 --- .4-fixed

People

(Reporter: bclock, Assigned: bjarne)

References

()

Details

(Keywords: verified1.9.0.15, verified1.9.1, verified1.9.2)

Attachments

(5 files, 8 obsolete files)

99.45 KB, application/zip
Details
11.11 KB, text/plain
Details
5.91 KB, patch
Details | Diff | Splinter Review
1.69 KB, patch
beltzner
: approval1.9.1-
Details | Diff | Splinter Review
3.35 KB, patch
Details | Diff | Splinter Review
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 After you navigate away from the secure site you are attempting to view, 'clear private data' (including Cache, Cookies, Offline Website Data, and Authenticated Sessions), and then visit that secure page (possibly for the first time), roughly 50% of the time will result in a page that does not load CSS and images (you may need to repeat this process a few times to see the issue). Initially, I thought this was an internal server problem and related to calling an external CSS file. However, all those who have tested this on home machines (outside of our office) are getting the same results. I have also tried several work arounds include: embedding the CSS (which ended with more of the page loading, but still not all), moving images to a different secure server (no change), and calling the external CSS twice (no change). The only way we have found to ensure the page loading properly is to identify different browsers and have the page load twice (essentially a refresh) for those using Firefox. Also, I have found that when first attempting to load specific secure images, upon hitting return or clicking the green go arrow to start loading the page, nothing happens. For example, if after you navigate away from the secure site (in this case Google), 'clear private data' (including Cache, Cookies, Offline Website Data, and Authenticated Sessions), plug https://www.google.com/accounts/adwords/images/adwords_logo.gif into the address page and then hit enter or the green go arrow, every so often the browser will just sit idle (you may need to repeat this process a few times to see the issue). After placing your cursor in the address bar and hitting enter again or hitting the green go button again, the image will then load. I think this is somehow related to what is happening with the errors that occur during load of a full page. I’ve also seen this happen to entire webpages such as https://hitchingpost.readyhosting.com/secure/contact.htm. Reproducible: Sometimes Steps to Reproduce: 1. In Firefox, navigate away from the site you will be testing a secure page on. 2. “Clear Private Data…” (including Cache, Cookies, Offline Website Data, and Authenticated Sessions). 3. Visit a secure page (especially one that relies more on CSS design). Actual Results: Webpage loads, but without CSS and images. Expected Results: Full webpage load, with CSS and images.
Version: unspecified → 3.0 Branch
Do you get this in the Firefox safemode ? http://support.mozilla.com/en-US/kb/Safe+Mode
Yes, I get the same behavior.
Component: General → Networking
Product: Firefox → Core
QA Contact: general → networking
Version: 3.0 Branch → 1.9.0 Branch
(In reply to comment #0) > Gecko/2008120122 Firefox/3.0.5 When(from which Fx version) did your problem start to occur frequently? After upgrade from previous 3.0.x to current 3.0.5? Or after upgrade from Fx 2 to Fx 3? If latter, reduce following value to 2. Frequency of problem declines? > network.http.max-persistent-connections-per-server (default: Fx 2=2, Fx 3=6)
ps – Can anyone duplicate what my peers and I are experiencing here? Matti – I’ve attached a networking log. Note, I took the following steps to get the error. 1. Opened Firefox (my homepage is google.com). 2. Went to the secure URL noted in the bug info (https://signup.live.com/signup.aspx). The error did not occur. 3. Went back to my homepage. 4. Cleared my private data (including Cache, Cookies, Offline Website Data, and Authenticated Sessions). 5. (ERROR) Went back to the secure URL noted. The error occurred here (browser did not load CSS and images). 6. Exited browser. WADA – I found the error in 3.0.5 toward the end of January. I’m not exactly sure when this work machine was upgraded to 3.0.5 (I’ll check with our guys here to see if we have a log of that upgrade). I also have a laptop at home with Firefox 2 on it and I am able to reproduce the error in version 2 as well.
WADA – I just realized what you were saying about reducing that value to 2 in the config. So I reduced that value to 2 in Fx 3.0.5 and now the error seems to be happening 100% of the time.
(In reply to comment #7) > So I reduced that value to 2 in Fx 3.0.5 and now the error seems to be happening 100% of the time. The IIS server looks tolerant with persistent connection=6. Please change back to default. (In reply to comment #7) >(snip) > 3. Went back to my homepage. > 4. Cleared my private data (including Cache, Cookies, Offline Website Data, and Authenticated Sessions). > 5. (ERROR) Went back to the secure URL noted. The error occurred here (browser did not load CSS and images). >(snip) > https://secure.wlxrs.com/yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css There is no log of "200 OK" for second HTTP GET of the css file(Step 5, Back just after clear cahce). What's wrong? (A) First HTTP GET in log. Step 2, before clear cahce. Log for "200 OK" exists. > Line-6262 : [328140]: GET /yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css HTTP/1.1 > Line-6263 : [328140]: Host: secure.wlxrs.com >(snip) > Line-7366 :HTTP/1.x 200 OK > Line-7367 :Content-Length: 25570 > Line-7368 :Content-Type: text/css >(snip) > Line-7371 :Server: Microsoft-IIS/6.0 >(snip) (B) Second HTTP GET in log. Step 5, Back just after clear cache. Log for "200 OK" doesn't exist. > https://secure.wlxrs.com/yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css > Line-18951 : GET /yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css HTTP/1.1 > Line-18952 : Host: secure.wlxrs.com >(snip) > No log of "200 OK" Phenomenon(CSS is not used) is reproduced by Fx 3.0.6 on MS Win, with single tab only, with next two pages. (1) Mozilla Firefox Central : http://en-us.www.mozilla.com/en-US/firefox/central/ (2) Sign Up page of Windows Live. Go http://home.live.com/ (Windows Live Home) Click "Sign Up" => Jumps to next page. I used this URL(with search string). > https://signup.live.com/signup.aspx?id=251248&ru=http%3a%2f%2fhome.live.com%2fdefault.aspx%3fnew%3dtrue&cp=2&mkt=en-US&rollrs=12&lic=1 The Sing Up page uses next css file(no other CSS is used) > https://secure.wlxrs.com/yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css And HTTP log for some images have following Referer(images are loaded by the CSS). > Referer: https://secure.wlxrs.com/yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css If cookie is cleared, client can't send cookie. Server(IIS)/Server Application(PHP?)/JScript behaviour seems; When cookie is not sent by client, (a) JScript doesn't generate <link rel="stylesheet"> in some situations, or (b) JScript fails to generate <link rel> in some situations, or (c) <link rel="stylesheet"> is generated and HTTP GET is issued by Fx, (c-1) but server application doesn't return ".css" file. (server application ignores HTTP GET for the .css file) (c-2) but delay occurs in returning ".css" file data(timeout). So, if cache is cleared and (c) occurs, CSS can't be used. In your log, it looks that (c) occurs, although Fx's caching of css related issue is still suspected. ("No log of 200 OK for .css" doesn't always mean "no response from server really".) By the way, an imapge/jpeg file is sent with "Content-Encoding: gzip" by server. This kind of header sometimes means "careless site(or cache server) configuration". > Line-14013 : 480[3283c0]: HTTP/1.1 200 OK > Line-14014 : 480[3283c0]: Cache-Control: no-cache > Line-14015 : 480[3283c0]: Connection: close > Line-14016 : 480[3283c0]: Date: Tue, 03 Mar 2009 05:04:19 GMT > Line-14017 : 480[3283c0]: Pragma: no-cache > Line-14018 : 480[3283c0]: Content-Type: image/jpeg > Line-14019 : 480[3283c0]: Expires: Tue, 03 Mar 2009 05:03:19 GMT > Line-14020 : 480[3283c0]: Server: Microsoft-IIS/6.0 > Line-14021 : 480[3283c0]: PPServer: PPV: 30 H: BAYIDSHIPV1C06 V: 0 > Line-14022 : 480[3283c0]: P3P: CP="DSP CUR OTPi IND OTRi ONL FIN" > Line-14023 : 480[3283c0]: Set-Cookie: HIPChallenge=44fXR7y0nCeTSLMMU62AWoJ!ybeLq0SSqNZU*PF8BpU8QYybSfIUaswdLeGHMdrLkn3dO7qD0ydf0hDvvR2xU8VDKTumqkYODkYoViyVwK4SIWEA7ckYkal84F5CvslylANTYBH0lo354uoCUelrPRuQ6HAaSOLzC7KyeuZWrO3GgOGNeJGShkyzRRa!LdldjFHK2jVr*BVnA7rgUwfESkzx3npwHBsb6USAmlG5Bg6mJe5Tlu96Tzg791FmmvmQ!*RtAsaUnE9zrkPb*zipR7H5fVmMGIJx54BngYaF2kY*23zo4sdYRjH0v6L0fYaIaxkuwnd8zDy8CDyu!ensWdj2p7rbZzSeOR; domain=.live.com;path=/;version=1 > Line-14024 : 480[3283c0]: Content-Encoding: gzip > Line-14025 : 480[3283c0]: Vary: Accept-Encoding > Line-14026 : 480[3283c0]: Transfer-Encoding: chunked
Correction. <link rel> was hard coded in HTML source. Sorry for wrong comments. > ...</script><link rel="StyleSheet" href="https://secure.wlxrs.com/yDLmcF2FzGTerKJw2Us!q0tI0qMxoipUtbqnHdX2IWd-W!MqS7tFT1jrjehtb5LNlvZlJHl-Mrc/Base/14.3.8033/hig+signup.css" type="text/css"/> > </head> (... Why no line break before "<link rel" ... ) Phenomenon was observed with Fx latest-trunk(2008/3/03 build) too. When problem(CSS not used) occurs after "Clear Recent History", no entry for the ".css" was found in about:cache(Memory Cache Device), even though "Done" is displayed in status bar. Entry for the ".css" was created after Reload, and CSS was applied to page. Loading was canceled/terminated at mid of document load process due to internal error? "Content-Encoding:gzip" of HTML is relevant? "Content-Type: text/html" for XHTML document is relevant? ( http://www.w3.org/TR/xhtml-media-types/#media-types ) > Line-212: 1480[3283c0]: HTTP/1.1 200 OK > Line-213: 1480[3283c0]: Cache-Control: private, max-age=0 >(snip) > Line-216: 1480[3283c0]: Content-Type: text/html; charset=UTF-8 > Line-217: 1480[3283c0]: Content-Encoding: gzip >(snip) > Note: This server uses "on-the-fly compression" for HTML, > or mis-uses "Content-Encoding:gzip". > (uses as if "Content-Transfer-Encoding" which is not defined by HTTP 1.1)
This site returns JavaScript file with "Content-Encoding: gzip", in addition to text/html. > Line-20169 : 0[328140]: uri=https://secure.shared.live.com/8Cvn5p7UVQ6YTKP9Tiy5Gw/js/omnitureH2.js >(snip) > Line-20184 : 0[328140]: GET /8Cvn5p7UVQ6YTKP9Tiy5Gw/js/omnitureH2.js HTTP/1.1 > Line-20184 : 0[328140]: Host: secure.shared.live.com >(snip) > Line-20995 : 1480[3283c0]: HTTP/1.1 200 OK > Line-20996 : 1480[3283c0]: Content-Type: application/x-javascript >(snip) > Line-20999 : 1480[3283c0]: Server: Microsoft-IIS/6.0 >(snip) > Line-21002 : 1480[3283c0]: Content-Encoding: gzip >(snip) > Line-21005 : 1480[3283c0]: Content-Length: 7196 >(snip) Bug 363109 is relevant? > Bug 363109 gzipped CSS and JS over SSL fails when reloading "Script loader"(if such function exists) version of Bug 314689? > Bug 314689 nsCSSLoader::Stop should cancel the necko channels
WADA – I really appreciate your analysis! I must admit I’m a little out of my league with your findings. I’ll continue to follow along, as I am highly interested in the outcome/resolution on this, but I’m not sure what else I can do to help. Of course if there is something I can do, I will. Where do we go from here? Does anyone else have input? Is this 'Ben's Bug'?
"Content-Encoding: gzip" seems to be illerevant. Quick test result with Fx latest-trunk on MS Win-XP SP3. (0) Open Error Console. (1) Load this page (Bug 480619). text/html, No Content-Encoding: gzip. (1-a) Execute Reload, Shift+Reload several times. (1-b) Access pages of links described in step (4). (2) Clear Recent History, Remove: my entire history, Select all items. (3) (3-a) Shift+Reload (3-b) Move to URLBar, Enter (3-c) Move to URLBar, Insert blank and remove it, Click arrow icon (Go to the address in the location bar) => Nothing is done After several retries of (3-a)/(3-b)/(3-c), this page was reloaded, and following error was displayed in Error Console. > Warning: Unknown property '-moz-opacity'. Declaration dropped. > Source File: https://bugzilla.mozilla.org/skins/standard/yui/autocomplete.css Line: 7 >(Some warnings on CSS) > Error: keywordAutoComplete.textboxFocusEvent is null > Source File: https://bugzilla.mozilla.org/show_bug.cgi?id=480619 Line: 1503 (4) "Save Link As" or "Ctrl+Click" of https links. HTTPS links in this bug. (A) Two attachment links of this bug https: //bugzilla.mozilla.org/attachment.cgi?id=nnnnnn => 302 moved => https: //bug480619.bugzilla.mozilla.org/attachment.cgi?id=nnnnnn (A-1) First attachment, application/zip, No Content-Encoding: gzip (A-2) Second attachment, text/plain, Content-Encoding: gzip (B) Two links to MS's http site (B-1) Link to ".js" application/x-javascript, Content-Encoding: gzip (B-2) Link to ".css", text/css, No Content-Encoding: gzip (4-a) "Save Link As" => Following dialog was displayed. > Download Error > The download cannot be saved because an unknown error occurred. > Please Try again. (4-b) Ctrl+Click => Nothing is done(blank page is displayed) => After several retries of (4-a)/(4-b), download or display was done. Above phenomena in step (4) couldn't be observed for http links in this bug. Confirming, based on my test result in Comment #8.
Status: UNCONFIRMED → NEW
Ever confirmed: true
s/(B) Two links to MS's http site/(B) Two links to MS's https site/
Is this flaw limited to 1.9.0? I don't see any comments saying anyone's tried it on 1.9.1
Flags: wanted1.9.0.x+
Flags: blocking1.9.1?
So for the second GET mentioned in comment 8, here's the issue: 1480[3283c0]: PR_Write returned [n=-1] 1480[3283c0]: ErrorAccordingToNSPR [in=-5929 out=80004005] 1480[3283c0]: nsSocketTransport::OnMsgOutputClosed [this=3dce8e0 reason=80004005] 1480[3283c0]: ReadSegments returned [rv=0 read=0 sock-cond=80004005] 1480[3283c0]: nsHttpConnection::CloseTransaction[this=2e38f40 trans=3a6b080 reason=80004005] 1480[3283c0]: nsHttpTransaction::Close [this=3a6b080 reason=80004005] 0x80004005 is NS_ERROR_FAILURE. Not helpful... But -5929 is PR_SOCKET_SHUTDOWN_ERROR. I thought we had an existing bug on this sort of thing, where already-closed sockets got reused.... Ehsan ran into it with private browsing, no?
(In reply to comment #15) > Is this flaw limited to 1.9.0? I don't see any comments saying anyone's tried it on 1.9.1 Sorry for my unclear comments on Fx version. "Fx latest-trunk" in my comment is nightly trunk build downloaded from; > http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/ Example : "Fx latest-trunk" 2009/3/19 build > Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090319 Minefield/3.6a1pre I've changed Version: to Trunk.
Version: 1.9.0 Branch → Trunk
Following is open bugs of "HTTP && Socket" in summary(changed within one year). Boris, Bug 470055? Bug 448575 is relevant too? > Bug 255141 PSM attempts to access socket notificationCallbacks on main thread ...(snip) > Bug 448575 nsHttpChannel silently eats socket errors that occur after the response body is being received > Bug 450078 HTTP1.1 Keep-Alive mechanism and having a small WEB-Server with only 2 open sockets. -> Not all elements are shown. > Bug 470055 pkix_HttpCertStore_FindSocketConnection reuses closed socket, OCSP fails > Bug 481301 ASSERTION: wrong thread: 'PR_GetCurrentThread() == gSocketThread' [@ nsHttpConnection::ResumeRecv]
The bug I'm thinking of had a summary more like "SSL sites not loading"...
(In reply to comment #19) > The bug I'm thinking of had a summary more like "SSL sites not loading"... Yeah, it's bug 463256. See bug 463256 comment 21 for my analysis on the problem. This is caused by the call to nsISecretDecoderRing::LogoutAndTeardown which happens in the clear private data dialog as well. Bug 463256 comment 26 mentions the reason for this behavior. We worked around it in private browsing by using the offline mode to shut down Necko sockets, but I'm not sure what would be a proper fix in this case.
It's not a very common user action in Firefox, so I don't think this needs to block (especially if the fix is difficult) but would be willing to reconsider if someone comes up with a compelling use case. I'll let jst make a call about blocking. It's not great.
I'd really like to see this fixed for 1.9.1, but I can't say I'd hold the release for it. Bjarne, can you spend some time looking into this? If this is fixable in a safe way we'd consider taking the fix for 1.9.1 still. If you don't have time, please let me know.
Assignee: nobody → bjarne
Flags: wanted1.9.1+
Flags: blocking1.9.1?
Flags: blocking1.9.1-
Keywords: relnote
Attached patch Initial, brute force approach (obsolete) — Splinter Review
Applying the same trick as for bug #463256 seems to make this work. Mochitests also run fine (except for the usual ones on my test-box). Question is whether this approach is too heavy. It is probably possible to be more light-handed by e.g. re-initializing nsSocketTransportService in the "net:clear-active-logins" case in nsHttpHandler::Observe(). I haven't tried that yet but I'll try it out on Monday unless the current fix is acceptable.
Attachment #374473 - Flags: review?(jst)
Oh - forgot to mention that the patch is for trunk...
(In reply to comment #24) > Created an attachment (id=374473) [details] > Initial, brute force approach > > Applying the same trick as for bug #463256 seems to make this work. Mochitests > also run fine (except for the usual ones on my test-box). This is less than ideal. I still feel bad about bug 463256... :-) For a sample of a problem caused by this approach, see bug 488772. We're really looking for a way around this, if possible. Come to think of it, it's actually worse than the private browsing case, because this would cause the download manager / add-on manager, etc. to prompt if there are any downloads in progress. > Question is whether this approach is too heavy. It is probably possible to be > more light-handed by e.g. re-initializing nsSocketTransportService in the > "net:clear-active-logins" case in nsHttpHandler::Observe(). I haven't tried > that yet but I'll try it out on Monday unless the current fix is acceptable. I'd very much like to see how this turns out. Please let us know if that solves the problem in a clean way, so that I can move the private browsing service to do a similar thing instead of the current hack. Thanks!
Comment on attachment 374473 [details] [diff] [review] Initial, brute force approach Clearing review request while waiting to see if the smaller hammer approach is acceptable here :)
Attachment #374473 - Flags: review?(jst)
Attached patch Lighter version (obsolete) — Splinter Review
This version is "lighter", but it still shuts down and restarts the STS thread. (In reply to comment #26) > Come to think of it, it's actually worse than the private browsing case, > because this would cause the download manager / add-on manager, etc. to prompt > if there are any downloads in progress. User requested to clear private data, including "authenticated sessions", and should imo not be shell shocked if active connections are also terminated. :) Maybe we could even make the GUI indicate this by e.g. changing the text in the dialogue from "Authenticated sessions" to "Authenticated sessions and active connections" ? However, I agree with bug #488772 in that it would be more elegant to shut down SSL connections only. In order to implement this we need something like the two loops in nsSocketTransportService::Run() at lines 588++ in nsSocketTransportService2.cpp, but we also need to figure out from SocketContext whether a socket is ssl. Then we must stuff this into a method DetachSSLSockets() or similar and either expose it via a suitable API or implement an extra action in nsSocketTransportService::Observe() (I prefer the latter). No black magic, but it might be a little tricky to deduce ssl from the SocketContext-object. I'll give this a whack on Monday unless the current patch with above arguments about users being kept responsible for their actions are acceptable. :)
Attachment #374473 - Attachment is obsolete: true
Attachment #374616 - Flags: review?(jst)
See also bug 474364 where a similar thing happens on restart. I do not clear private data on shutdown, so those pages are still in the cache.
Status: NEW → ASSIGNED
Attached patch Even lighter version... (obsolete) — Splinter Review
I think this is about as light as we can get... Upon receiving the "net:clear-active-logins" msg, nsHttpHandler dispatches a new msg with topic "net:detach-nss-sockets". nsSocketTransportService reacts to this by dispatching an event which will detach sockets in the NSS layer (running on the socket-thread). We could make this simpler by letting nsSocketTransportService listen directly to "net:clear-active-logins", but if someone else wants to use this mechanism to close down NSS-sockets we need to keep the two topics separated in order to avoid undesired side-effects. A few things about this implementation should be mentioned (input is appreciated) - The string "net:detach-nss-sockets" is harcoded - is there a natural place for constants like this? It is shared between nsHttpHandler and nsSocketTransportService... - I think it's safe for nsDetachNssSocketsEvent to keep the reference to nsSocketTransportService as a raw pointer, but someone with stronger xpcom-fu may not share this opinion... :] - There should be a better way to filter the sockets than comparing layer-name with "NSS layer" - ideas are welcome here Any ideas about how to set up a testcase for this is also appreciated...
Attachment #374616 - Attachment is obsolete: true
Attachment #374817 - Flags: review?(jst)
Attachment #374616 - Flags: review?(jst)
Attachment #374817 - Flags: review?(jst) → review?(jduell.mcbugs)
Comment on attachment 374817 [details] [diff] [review] Even lighter version... I'm afraid I'm not a good reviewer for this change. Jason, wanna dig into this one?
Thanks for the patch, it looks really great! :-) (In reply to comment #30) > Any ideas about how to set up a testcase for this is also appreciated... Hmm, I would assume that you would open two channels, one to a http:// location and one to a https:// location, and send out "net:clear-active-logins" and verify that the http channel lives on but the https one aborts. Does that make sense?
> Hmm, I would assume that you would open two channels, one to a http:// location > and one to a https:// location, and send out "net:clear-active-logins" and > verify that the http channel lives on but the https one aborts. Does that make > sense? Makes sense, but how do I check that a channel "lives"? Persistent connections?
(In reply to comment #36) > > Hmm, I would assume that you would open two channels, one to a http:// location > > and one to a https:// location, and send out "net:clear-active-logins" and > > verify that the http channel lives on but the https one aborts. Does that make > > sense? > > Makes sense, but how do I check that a channel "lives"? Persistent connections? I'm not really sure. A dumb trick would be to use the stuff added in bug 396226 and feed some different data through the connection after observing the notification, and on the HTTP client part expect (or don't expect) that data. Of course there may be a better way to do that using nsIChannel wizardry. :-)
Not really clear to me how to test this... :) I'll think about it some more - maybe the server written for bug #486769 could be used, but I need to figure out how to support https on that one first. Any chance to get a review on the patch while we're thinking about testing?
Comment on attachment 374817 [details] [diff] [review] Even lighter version... Looks good. A much lighter touch than the previous fixes :) Boris, this seems fairly straightforward, but if you're not comfortable reviewing this code, let me know and I can ping biesi.
Attachment #374817 - Flags: superreview?(bzbarsky)
Attachment #374817 - Flags: review?(jduell.mcbugs)
Attachment #374817 - Flags: review+
This seems ok, with the following nits: 1) Use nsRunnableMethod and just add a method that calls the detaching function. That will incidentally fix the fact that you're not holding a strong reference, which I do think is a bug. 2) Space after ',' when calling strcmp. 3) I assume that things work ok if the event fires after xpcom shutdown? If not, you presumably need to revoke it then.
> 1) Use nsRunnableMethod and just add a method that calls the detaching > function. That will incidentally fix the fact that you're not holding a > strong reference, which I do think is a bug. Ok. Using nsRefPtr - is that ok in this situation? > 2) Space after ',' when calling strcmp. Right... > 3) I assume that things work ok if the event fires after xpcom shutdown? If > not, you presumably need to revoke it then. Uhh... what? Is that a potential scenario? Any hints or pointers to code/tests which checks for this? There are two things I would like to change in this patch, but I'm depending on input on how to achieve it : 1) I would like to identify sockets without using the name of the socket-layer (last issue in comment #30) 2) If it's possible to parametrize the call to a nsRunnableMethod, the function nsSocketTransportService::DetachNssSockets() could be dropped (making code-size smaller) And, of-course, ideas for how to write a test for this would be nice. Unfortunately, I don't think the approach from comment #37 will do the trick.
Attachment #374817 - Attachment is obsolete: true
Attachment #379101 - Flags: review?(bzbarsky)
Attachment #374817 - Flags: superreview?(bzbarsky)
> Ok. Using nsRefPtr - is that ok in this situation? I'd use nsCOMPtr, since the type is nsIRunnable. And probably NS_NEW_RUNNABLE_METHOD. > Uhh... what? Is that a potential scenario? Well.. what prevents it? You put an event into an event queue. What guarantees that it will fire before we fire the xpcom-shutdown notification, if there is already an event in the queue that will try to shutdown? I'm not saying it's definitely possible. I'm saying we either need to prove it's impossible or handle it. ccing Benjamin for his shutdown fu. I have no idea how to deal with your issue (1). There is no way to parametrize nsRunnableMethod, but I think the codesize is still smaller than having a separate class...
https://wiki.mozilla.org/XPCOM_Shutdown runnables can fire after xpcom-shutdown and xpcom-shutdown-threads until the event queue is cleared. The component manager should be available during this time, but various modules will have begun to shut themself down.
Why do we need special code for this, especially hardcoding NSS in the STS? In particular: We handle servers closing the connection to an idle socket, and connection resets, etc. So the specific question is, I guess: Why do we attempt to reuse this socket? The HTTP code attempts to make sure that the socket is still alive before reusing it: PRBool nsHttpConnection::CanReuse() { return IsKeepAlive() && (NowInSeconds() - mLastReadTime < mIdleTimeout) && IsAlive(); } And IsAlive() basically calls nsSocketTransport::IsAlive, which basically calls PR_Recv. Shouldn't that be returning an error here, making us consider the socket to be dead? Similarly, shouldn't the STS loop polling the sockets already notice that the socket is in an error state?
So, question may be why PR_Read() called from IsAlive() succeeds while PR_Write() fails? PR_Recv() from IsAlive() calls nsSSLThread::requestRecvMsgPeek() , which do not take conditions PK11LoggedOut / AlreadyShutDown into account (as opposed to nsSSLThread::requestWrite() and nsSSLThread::requestRead()). I'll try to follow this track for a little while and see if it resolves the original issue...
I must admit that this approach feels more elegant... :) The observation is that the call to PR_Recv() from IsAlive() is made with parameter PR_MSG_PEEK, which translates into a call to nsSSLThread::requestRecvMsgPeek(). This patch adds logic to nsSSLThread::requestRecvMsgPeek() for certain states and sets an appropriate error-code on the socket. Works nicely with the test-URL for this defect but I haven't checked the duplicated cases. Probably kaie or MisterSSL are more appropriate reviewers, but I'll ask bz initially.
Attachment #379598 - Flags: review?(bzbarsky)
Does the poll code require fixing too? It seems like it should report an EXCEPT flag for these sockets, and if it does that then the socket transport service would already notice that the socket has an error, instead of waiting for HTTP to detect it. (and it would improve things for stuff like Chatzilla too)
Assuming previous comment was referring to nsSSLThread::requestPoll(). This patch is a "little more aggressive" than previous patch since it now triggers and returns on bad conditions very early in the method-call. Not sure about this, but it's worth a try... Tested with test-URL for this defect and also by browsing some bank-sites which use https.
Attachment #379598 - Attachment is obsolete: true
Attachment #379680 - Flags: review?(cbiesinger)
Attachment #379598 - Flags: review?(bzbarsky)
Comment on attachment 379680 [details] [diff] [review] Added similar logic to poll-method I'm not qualified to review PSM code. But yeah, looks like I did refer to nsSSLThread::requestPoll (via nsNSSIOLayer.cpp:nsSSLIOLayerPoll) I'm not sure that keeping the LOG in nsSocketTransport is worth keeping...
Do you still want my review on the " Updated according to comments from reviewer" diff?
Attachment #379101 - Flags: review?(bzbarsky)
Comment on attachment 379101 [details] [diff] [review] Updated according to comments from reviewer Cleared request for review
Comment on attachment 379680 [details] [diff] [review] Added similar logic to poll-method Trying new reviewer for PSM code...
Attachment #379680 - Flags: review?(kaie)
From my glance over how this bug has evolved, it sounds like the latest method proposed here might also fix the necko problems in bug 463256. I.e., we might be able to avoid shutting down and restarting the whole necko layer. Ehsan, does that sounds like a possibility?
(In reply to comment #53) > From my glance over how this bug has evolved, it sounds like the latest method > proposed here might also fix the necko problems in bug 463256. I.e., we might > be able to avoid shutting down and restarting the whole necko layer. Ehsan, > does that sounds like a possibility? Yes, it almost certainly does.
Looks like kaie is busy. Feel free to suggest reviewers for this patch...
Comment on attachment 379680 [details] [diff] [review] Added similar logic to poll-method Maybe dveditz could help out with the review here?
Attachment #379680 - Flags: review?(dveditz)
Have you e-mailed Kai?
Comment on attachment 379680 [details] [diff] [review] Added similar logic to poll-method Thanks to everyone for your research, thoughts and this patch. It looks reasonable. r=kaie
Attachment #379680 - Flags: review?(kaie) → review+
Removed LOG as suggested by biesi, added meta-information
Attachment #379680 - Attachment is obsolete: true
Attachment #379680 - Flags: review?(dveditz)
Requesting check-in to bake it on trunk. No test available - ideas for how to make one are appreciated.
Keywords: relnotecheckin-needed
Comment on attachment 381395 [details] [diff] [review] Cleaned-up version of patch ready for check-in Need trunk approval....
Attachment #381395 - Flags: approval1.9.1?
Not fixed yet. Please leave relnote as it is.
Keywords: relnote
Blocks: 496335
(In reply to comment #60) > Requesting check-in to bake it on trunk. No test available - ideas for how to > make one are appreciated. Well, we support HTTPS now in the framework, right? So make an HTTPS request, do whatever clear private data does that breaks sockets, and make another request and see if that works. Could do this on the socket transport level too - open a socket transport stream to the SSL port, do whatever clear private data does, and see if the stream got notified (e.g. the asyncWait callback got called, etc)
Flags: wanted1.9.1.x?
Whiteboard: [3.5.1?]
(no longer need approval for trunk)
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Keywords: checkin-needed
Verified fixed on trunk with builds on all platforms like Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090613 Minefield/3.6a1pre (.NET CLR 3.5.30729) ID:20090613042114 Boris, can we completely automate those tests or do we also need a Litmus or Mozmill test?
Status: RESOLVED → VERIFIED
OS: Windows XP → Other
Hardware: x86 → All
Target Milestone: --- → mozilla1.9.2a1
The idea would be to have automated tests.
I described in comment 63 how an automated test could be written.
Attachment #381395 - Flags: approval1.9.1? → approval1.9.1-
Comment on attachment 381395 [details] [diff] [review] Cleaned-up version of patch ready for check-in (Not going to take this for 1.9.1, minusing the approval flag)
Flags: blocking1.9.1.1?
Whiteboard: [3.5.1?]
(In reply to comment #63) > (In reply to comment #60) > > Requesting check-in to bake it on trunk. No test available - ideas for how to > > make one are appreciated. > > Well, we support HTTPS now in the framework, right? So make an HTTPS request, > do whatever clear private data does that breaks sockets, and make another > request and see if that works. > > Could do this on the socket transport level too - open a socket transport > stream to the SSL port, do whatever clear private data does, and see if the > stream got notified (e.g. the asyncWait callback got called, etc) Would a test like what I this help here? <http://mxr.mozilla.org/mozilla-central/source/browser/components/privatebrowsing/test/browser/browser_privatebrowsing_sslsite_transition.js>
CSS is also missing after relaoding TABS when updating to 3.5 I opened the "Firefox 3.5 for developers" page ( https://developer.mozilla.org/en/Firefox_3.5_for_developers ) then I upgraded to 3.5 after/when upgrading the opened page was reloaded without CSS. a 2nd load in another TAB was correct with CSS a refresh in first TAB also helped to load CSS
Bernhard this is more bug 474364.
OS: Other → All
Attachment #381395 - Flags: approval1.9.1.1?
Comment on attachment 381395 [details] [diff] [review] Cleaned-up version of patch ready for check-in See bug 496335 comment 7.
(In reply to comment #74) > (In reply to comment #70) > > Would a test like what I this help here? > > > > <http://mxr.mozilla.org/mozilla-central/source/browser/components/privatebrowsing/test/browser/browser_privatebrowsing_sslsite_transition.js> > > Not really. What was wrong with the approach I suggested? Hmm, I think it implements your first suggestion: > Well, we support HTTPS now in the framework, right? So make an HTTPS request, > do whatever clear private data does that breaks sockets, and make another > request and see if that works. In this case, toggling the private browsing mode is the action which breaks sockets.
That test is at a much higher level than a necko test should be. Hm... though I guess this shouldn't be a Necko test anyway, as it was a bug in PSM, so this should actually be in security/manager somewhere. My point is: this should be a test using channels/sockets and nsISecretDecoderRing, not a test that uses docshells and nsIPrivateBrowsingService. It also should probably be an xpcshell test. >> Well, we support HTTPS now in the framework, right? So make an HTTPS request, >> do whatever clear private data does that breaks sockets, and make another >> request and see if that works. >In this case, toggling the private browsing mode is the action which breaks >sockets. Kinda. What actually breaks sockets is logging out of/tearing down the secret decoder ring, i.e.: 292 let sdr = Cc["@mozilla.org/security/sdr;1"]. 293 getService(Ci.nsISecretDecoderRing); 294 sdr.logoutAndTeardown();
Attached patch On my way to a test (obsolete) — Splinter Review
Here is what I have so far. Based on comment 63, I expected at least one of the reader or writer to get called, but none did. What am I doing wrong?
blocking1.9.1: --- → needed
Flags: wanted1.9.1.x?
Flags: wanted1.9.1.x+
Flags: blocking1.9.1.1?
Flags: blocking1.9.1.1-
hm... I think that code should work. if you remove the logoutAndTeardown call, does that make a difference?
Attached patch Test (obsolete) — Splinter Review
OK, my mistake was to use example.org:443. That only becomes meaningful through the proxy set up by the test harness as a pac file. For this test we should use the SSL server directly, that is, localhost:4443. Asking review from biesi to confirm that this test is testing the correct thing, and from kaie as a security peer.
Attachment #388146 - Attachment is obsolete: true
Attachment #388452 - Flags: review?(cbiesinger)
Attachment #388452 - Flags: review?(kaie)
The test server isn't guaranteed to be running on port 4443, is it?
(In reply to comment #80) > The test server isn't guaranteed to be running on port 4443, is it? I'm not sure if this currently guaranteed or not, but it is hardcoded: <http://mxr.mozilla.org/mozilla-central/source/build/automation.py.in#321>
I thought we had support for our test stuff running on a different port if the one it wants is in use. Did that not land?
That's reftest-only, also much easier there since the server is in-process. ssltunnel would need to communicate the port it chose back to the Python script to be splatched into a user.js, and that's not quite as easy. I'm confused as to why the raw port must be used, but do note that https://example.org:443 is fail because it's not in the set of mappings, so naturally it wouldn't work for that choice: http://mxr.mozilla.org/mozilla-central/source/build/pgo/server-locations.txt
Attached patch Test (v2) (obsolete) — Splinter Review
(In reply to comment #83) > I'm confused as to why the raw port must be used, but do note that > https://example.org:443 is fail because it's not in the set of mappings, so > naturally it wouldn't work for that choice: Thanks for the note. I'm using the raw port because the socket transport layer does not use the proxy pac file. I tested and made sure that the example.com:443 combination can't be used in this test as well. This version of the patch is a bit better because it makes sure that the socket gets dropped after the teardown operation.
Attachment #388452 - Attachment is obsolete: true
Attachment #389185 - Flags: review?(kaie)
Attachment #389185 - Flags: review?(cbiesinger)
Attachment #388452 - Flags: review?(kaie)
Attachment #388452 - Flags: review?(cbiesinger)
Comment on attachment 381395 [details] [diff] [review] Cleaned-up version of patch ready for check-in Please renominate when you have the tests approved.
Attachment #381395 - Flags: approval1.9.1.1? → approval1.9.1.1-
Pinging biesi and kaie for the test review...
Comment on attachment 389185 [details] [diff] [review] Test (v2) moa=kaie
Attachment #389185 - Flags: review?(kaie) → review+
biesi: ping?
Comment on attachment 389185 [details] [diff] [review] Test (v2) sorry for the delay + onInputStreamReady: function(stream) { + SimpleTest.ok(tearDown, "The stream should be closed after a teardown of secure decoder ring"); You should probably check that the stream has an error state. I.e. try reading from it and see if it throws the desired exception (or, just call available() and see if it throws the expected exception) +var writer = { + onOutputStreamReady: function(stream) { + } +}; you don't need this, do you? + getService(Ci.nsISecretDecoderRing). + logoutAndTeardown(); can you indent these two lines, so that it's obvious that they are continuation lines? +outStream.asyncWait(writer, 0, 1, currentThread); no need for this actually you don't even need to open the output stream
Attachment #389185 - Flags: review?(cbiesinger) → review+
Addressed all of the review comments on the test. I have submitted this to the try server, and after a green unit test run, I'll land it on m-c.
Attachment #389185 - Attachment is obsolete: true
+ try { + stream.available(); + } Shouldn't you add a: SimpleTest.ok(!tearDown, "Stream should be in an error state"); after this call?
(In reply to comment #92) > + try { > + stream.available(); > + } > > Shouldn't you add a: > > SimpleTest.ok(!tearDown, "Stream should be in an error state"); > > after this call? I make sure a NS_ERROR_FAILURE is thrown in the catch block, otherwise the test will time out. Isn't that enough?
I think it's better to be explicit about the fact that an unthrown exception is a failure.
The test is failing on Linux. Here is what happens: on Linux, the sink's onTransportStatus method is never called, so the secure decoder ring is never torn down. After an amount of time which is longer than Windows, reader's onInputStreamReady is called, and calling available on the stream passed to it throws a NS_ERROR_NET_RESET error. Interestingly, the connection on port 4443 is actually established during the test: ehsan@ehsan-laptop:~/moz/src$ sudo netstat -p | grep ^tcp tcp 0 0 ehsan-laptop:8888 ehsan-laptop:37426 TIME_WAIT - tcp 0 0 ehsan-laptop:8888 ehsan-laptop:37424 TIME_WAIT - tcp 0 0 ehsan-laptop:37423 ehsan-laptop:8888 TIME_WAIT - tcp 0 0 ehsan-laptop:4443 ehsan-laptop:48014 ESTABLISHED 25018/ssltunnel tcp 0 0 ehsan-laptop:48014 ehsan-laptop:4443 ESTABLISHED 25019/firefox-bin tcp 0 0 ehsan-laptop:8888 ehsan-laptop:37422 TIME_WAIT - tcp 0 0 ehsan-laptop:8888 ehsan-laptop:37427 TIME_WAIT - Do you have any idea why the socket transport service doesn't get notified about the connection being established on Linux? This happens both on the try server and locally.
Can you try setting the transport event sink before opening the input stream? That might possibly fix the problem (and is probably what should be done anyway)
Thanks, that did fix the problem! I'll submit another try server run to make sure that the test passes on all platforms before pushing.
Flags: in-testsuite? → in-testsuite+
Whiteboard: [test needs landing on 1.9.2]
Attachment #381395 - Flags: approval1.9.1.3?
Whiteboard: [test needs landing on 1.9.2]
Comment on attachment 381395 [details] [diff] [review] Cleaned-up version of patch ready for check-in Approved for 1.9.1.4, a=dveditz for release-drivers
Attachment #381395 - Flags: approval1.9.1.3? → approval1.9.1.4+
Attachment #381395 - Flags: approval1.9.0.15?
Comment on attachment 381395 [details] [diff] [review] Cleaned-up version of patch ready for check-in Approved for 1.9.0.15, a=dveditz for release-drivers
Attachment #381395 - Flags: approval1.9.0.15? → approval1.9.0.15+
The SSL tunnel is not available on the 1.9.0.x branch, so it is not possible to test this on that branch. Is it OK to land it there without the test?
Yes, that's fine.
Attachment 381395 [details] [diff] landed on 1.9: Checking in security/manager/ssl/src/nsSSLThread.cpp; /cvsroot/mozilla/security/manager/ssl/src/nsSSLThread.cpp,v <-- nsSSLThread.cpp new revision: 1.17; previous revision: 1.16
Keywords: fixed1.9.0.15
Verified fixed for 1.9.1.4 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.4pre) Gecko/20090915 Shiretoko/3.5.4pre (.NET CLR 3.5.30729).
Keywords: verified1.9.1
Verified fixed for 1.9.0.15 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.15pre) Gecko/2009091606 GranParadiso/3.0.15pre (.NET CLR 3.5.30729).
(In reply to comment #103) > The SSL tunnel is not available on the 1.9.0.x branch, so it is not possible to > test this on that branch. Is it OK to land it there without the test? Is it possible to have a test for this on the trunk? in security/manager/ssl/tests/ or in content/something/tests/...
(In reply to comment #108) > (In reply to comment #103) > > The SSL tunnel is not available on the 1.9.0.x branch, so it is not possible to > > test this on that branch. Is it OK to land it there without the test? > > Is it possible to have a test for this on the trunk? in > security/manager/ssl/tests/ or in content/something/tests/... I have already written and landed a test for it: <http://mxr.mozilla.org/mozilla-central/source/security/manager/ssl/tests/mochitest/bugs/test_bug480619.html>. It also exists in 1.9.1 and 1.9.2.
blocking1.9.1: needed → .4+
Already verifed fixed on 1.9.2 with an alpha build (comment 66). Adding verified1.9.2 keyword.
Keywords: verified1.9.2

Wow, if I saw that this was created 11 years, I was shocked... because I just encountered this issue here... Most of my webpages are loading twice, first time without CSS! I'm using Firefox Quantum 68.0.1 (64-bit) on Windows 10 since 1 day, and I've imported all settings from Chrome. Please reopen the issue.

Sorry I wasn't completely honest, Adblocker Ultimate was still enabled, even it was disabled for the webpage :-(
When disabling that plugin, the page loaded correctly.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: