Closed Bug 513129 Opened 10 years ago Closed 4 years ago

After disabling SSL and TLS in preferences, keep-alive connections are still present until they time out

Categories

(Core :: Networking: Cache, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: whimboo, Assigned: mcmanus)

References

()

Details

(Keywords: regression)

Attachments

(6 files)

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3pre) Gecko/20090820 Shiretoko/3.5.3pre ID:20090820030726

This bug has been discovered in our Mozmill test runs for the security subgroup in Litmus.

When you disable SSL 3.0 in the options you are still able to use the SSL protocol under some situations. It's highly reproducible for me for the Verisign home page which loads a couple of times when hitting the reload button. After some reloads I get an error page but reloading the page shows me the page content again. See the recorded screencast below:

http://screencast.com/t/gNsITXofjKFa

When users uncheck the SSL 3 box we never should use a secure connection and always show up the error page.

Is it the correct component to place this bug into?
Here some steps:

1. Open the Verisign website
2. Uncheck SSL in the options
3. Reload the website (sometimes that will bring up the error page directly)
4. Enable SSL again
5. Reload the website
6. Disable SSL
7. Reload the website (no error page!)
8. Repeat step 7 a couple of times

With step 8 you will notice that the page is loaded successfully at the beginning and show up the error page after a couple of reloaded once. You have to reload the page around 10 times until you definitely get the error page each time.
(In reply to comment #1)
> Here some steps:
> 
> 1. Open the Verisign website
> 2. Uncheck SSL in the options
> 3. Reload the website (sometimes that will bring up the error page directly)
> 4. Enable SSL again
> 5. Reload the website
> 6. Disable SSL
> 7. Reload the website (no error page!)
> 8. Repeat step 7 a couple of times
> 
> With step 8 you will notice that the page is loaded successfully at the
> beginning and show up the error page after a couple of reloaded once. You have
> to reload the page around 10 times until you definitely get the error page each
> time.

In my case, I get the same results. Although, I see a 'Connection has been reset' error where the screencast demonstrates a 'Secure connection failed'. Tested on trunk:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.3a1pre) Gecko/20090827 Minefield/3.7a1pre
Is this a regression? Any tie to bug 368130?
Assignee: nobody → kaie
Component: Security → Security: PSM
QA Contact: toolkit → psm
(In reply to comment #3)
> Is this a regression? Any tie to bug 368130?

I can check this on Sunday. Meanwhile lets add Nelson.
Depending on one's perspective, this bug is either
a) invalid, or 
b) a duplicate of bug 498311, or 
c) a duplicate of bug 368130.
I'll let someone else decide which.

The demo movie starts by showing the user go into prefs and disabling 
SSL 3.0, but leaving TLS (which is SSL 3.1) enabled.  I gather that the user
did not realize that TLS is just another name for SSL, and expected that 
with SSL 3.0 disabled, no https connections would be possible.  That was 
simply a misunderstanding.  It is certainly quite expected that https will
continue to work just fine with most sites with SSL 3.0 disabled, when 
SSL 3.1 (TLS) remains enabled.  So, that aspect of this bug is invalid.

Then the user did a bunch of rapid reloads and saw an error about SSL being
disabled.  That was a demonstration of two separate bugs in firefox. They are:

Bug 368130: Bogus "SSL is disabled" error when SSL3 is disabled
Bug 498311: HTTPS site is marked TLS intolerant when the stop button is clicked

Both of those bugs are bugs in PSM's code that attempts to detect and work 
around servers that do not properly implement SSL protocol version negotiation.
There are https servers that work AOK if you ask them to speak SSL 3.0, and 
should be able to successfully negotiate and use SSL 3.0 when you ask them to 
use SSL 3.1, but instead, when you ask them to use SSL 3.1, they fail in any 
of a large number of ways.  Such servers are said to be "TLS intolerant".
PSM tries to detect such servers and revert to SSL 3.0 for them.  But there 
are bugs in that TLS intolerant server detection code.  

One bug is that, if you're connecting to a server with TLS and you click stop
or reload, PSM erroneously concludes that the server is TLS intolerant.  It
says "This server cannot do TLS", so it tries to switch to SSL 3.0.  It should
never declare a server as TLS intolerant just because the user clicked stop or
reload.  

When PSM declares a server to be TLS intolerant, that means that thereafter,
PSM will never try to use TLS with that server, but will only try to use 
SSL 3.0.  But if SSL 3.0 has been disabled, then it will fail completely, 
and will output that "SSL has beed disabled" message.  So, PSM should NEVER
mark a server as TLS intolerant when SSL 3.0 is diabled.  PSM should only mark a server as TLS intolerant when SSL 3.0 is enabled.  But Today, PSM will mark
a server as TLS intolerant any time a TLS handshake attempt fails, even if 
SSL 3.0 is disabled.  That's the second bug.

Whenever the preferences for SSL 3.0 and TLS are changed, PSM is supposed to 
clear all the remembered information about SSL/TLS sessions and intolerance.
I think (not completely sure) it does that correctly, which is why things 
seem to work again, even AFTER you DISABLE SSL 3.0.  You disable SSL 3.0, 
but TLS is still enabled.  PSM forgets about the TLS intolerant server settings
and tries TLS again, and this time, it succeeds.  So, you see the https page,
even though SSL 3 is disabled.  That's the correct behavior.
Nelson, that was just my fault. The same situation can be reproduced by turning off both preferences (SSL and TLS). Our Mozmill test is using this way and always fails because the page is normally displayed.

In that case I don't think it is a dupe of the other two mentioned bugs because secure channels are completely off now.
Summary: SSL protocol is still used sometimes even with SSL 3.0 disabled in options → SSL protocol is still used sometimes even with SSL 3.0 and TLS 1.0 disabled in options
A major visual change happened between the builds below when we have enabled the error page. Before the patch on bug 107491 we have always displayed a modal warning dialog that a connection cannot be established. Now the warning/error doesn't appear anymore on the first couple of reloads.

Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.9a3pre) Gecko/20070217 Minefield/3.0a3pre

Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.9a3pre) Gecko/20070218 Minefield/3.0a3pre
Blocks: 107491
Keywords: regression
So, your bug report is about a problem in a version from February of 2007?  

Show us a problem with a current version of Firefox, a problem that does 
not involve rapid reloading of web pages.
Ok, I did some more checks this morning and this bug is all about the error page and the cache. Disabling SSL and TLS will not create a new connection to the server. To check this I have setup my own server and there are no log entries for Apache when doing a reload.

So with builds before the switch to error pages you always have been prompted by a modal dialog which said that no secure connection can be used because it has been disabled. With builds after the switch until today you will not recognize that immediately. Only after a couple of reloads the cache is somehow invalidated and the error appears (as what you can see in my screencast). If you clear the cache before reloading the page the error page appears promptly.

So the question is shouldn't we always show the error page to the user even when the cache has data about this particular site.
Severity: major → normal
Summary: SSL protocol is still used sometimes even with SSL 3.0 and TLS 1.0 disabled in options → No SSL error page shown when page is reloaded and SSL 3.0 and TLS 1.0 disabled in options
I'm not sure this is a bug, or that it belongs to PSM.  
The issue, as I understand it, is a browser web page cache issue.  

Even after turning off SSL, pages previously fetched via SSL continue to 
be fetched from the cache.  Only when the browser (not PSM) decides that 
it needs to go check with the server again (for some reason) does it then
discover that it can no longer communicate with the server.  

If the browser is not checking with the server frequently enough for the 
"freshness" of https pages, that is not a PSM issue, I'm pretty sure.  
So, please send this bug to the product/component for the web page cache,
whatever that is.
Moving over to Core/Networking:Cache.
Assignee: kaie → nobody
Component: Security: PSM → Networking: Cache
QA Contact: psm → networking.cache
Blocks: 514528
No longer blocks: 506100
Whiteboard: [mozmill]
This breaks one of our Mozmill tests constantly when running the whole test-suite. Adding new whiteboard entry.

Who can help us to drive this bug forward?
Whiteboard: [mozmill] → [mozmill][mozmill-test-blocked]
I'm not sure I understand why the behaviour needs fixing here. As I understand it:

- User disables all SSL/TLS
- User loads page from cache, which is displayed without incident
- Forcing the user to load from the web shows the error page

In general, showing the page from cache is more likely to be useful to a user than showing the error page (which, basically by definition, can't be useful except to get them to turn on SSL again, which they just chose to turn off).

If they were being put at increased risk because of this, if we were doing something terrible like falling back to http, that would be a problem, but we're loading a local copy. The information might be stale, but their security is not put at risk by interacting with a local copy of a securely delivered page - at least not in any way I can see at first blush.

I understand that this behaviour is breaking mozmill, but if there isn't a good user-facing reason to change it, then I think it's the test that should change, not the browser.
I basically agree with comment 13.  Why is this (desired) behavior a problem for MozMill, exactly?
Thanks Johnathan for your reply. As it turns out it's not a bug but a feature. I will update the Mozmill test so that we clear the cache right before starting the SSL test. Fix upcoming by tomorrow.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Summary: No SSL error page shown when page is reloaded and SSL 3.0 and TLS 1.0 disabled in options → [mozmill] No SSL error page shown when page is reloaded and SSL 3.0 and TLS 1.0 disabled in options
Whiteboard: [mozmill][mozmill-test-blocked]
I will have to put it back because digging more deeper into this problem shows me that even with no data in all cache devices we create new HTTPS connections even with both prefs disabled. Firebug shows the activity in the net panel.

I have created a minimized Mozmill test which I will attach in a moment. Boris or Johnathan, it would be fantastic if one of you could have 2 minutes to run the test and check it for yourself.
Assignee: hskupin → nobody
Status: ASSIGNED → NEW
Whiteboard: [mozmill][mozmill-test-blocked]
Attached file Mozmill test
To run the test follow this steps:
1. Install Mozmill (https://addons.mozilla.org/de/firefox/addon/9018)
2. Open the Mozmill IDE and load this test
3. Run the test from the editor

The tests runs those steps:
1. Load a secure web page from Verisign
2. Opens about:blank to leave any HTTPS site
3. Clears all entries from all cache devices
4. Disables both preferences
5. Loads about:cache and waits 2s (each device shows 0 entries)
6. Loads the Versign web page again
I'm not likely to have time to do anything involving installing extensions this week.  Not sure about next week, even.

That said, if you want to attach an HTTP log of what happens when you do the steps in comment 17, that would be pretty useful.
Whiteboard: [mozmill][mozmill-test-blocked] → [mozmill][mozmill-test-blocked][mozmill-test-failure]
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.3a5pre) Gecko/20100607 Minefield/3.7a5pre

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.6pre) Gecko/20100607 Namoroka/3.6.6pre

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.11pre) Gecko/20100607 Shiretoko/3.5.11pre

Reproducible as described in comment 1 on Shiretoko and Namoroka, but not Minefield.
Clearing the cache works around this issue (Tools > Clear Recent History > Cache)
  prefBranch.setBoolPref("security.enable_ssl3", false);
  prefBranch.setBoolPref("security.enable_tls", false);

  controller.open("about:cache");
- controller.sleep(2000);
+ controller.sleep(10000);

  controller.open("https://www.verisign.com");

Waiting for 10 seconds seems to make the test work.  So it appears there is some sort of time delay here.  Conducting manual testing seems to prove this as running through the steps fast produces the bug, but going slowly does not.
Here is the HTTP Header for the initial GET:

https://www.verisign.com/

GET / HTTP/1.1
Host: www.verisign.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.6pre) Gecko/20100607 Namoroka/3.6.6pre
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: v1st=450A00AC21813B13; mbox=session#1276030245919-438476#1276032829|PC#1276030245919-438476.19#1339102969|check#true#1276031029

HTTP/1.1 200 OK
Date: Tue, 08 Jun 2010 21:08:04 GMT
Server: Apache
Expires: Thu, 01 Dec 1994 16:00:00 GMT, Thu, 01 Dec 1994 16:00:00 GMT
Accept-Ranges: bytes
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html

-----

This appears in the log every time verisign loads.  When the error page appears as it should, there is nothing at all.  However, I see this header again each time verisign loads when it shouldn't.
Anthony, it would be helpful if you could create and attach an HTTP log which Boris requested in comment 18. It will have much more in detail information about the connections.
(In reply to comment #23)
> Anthony, it would be helpful if you could create and attach an HTTP log which
> Boris requested in comment 18. It will have much more in detail information
> about the connections.

What is the best way to generate an http log which would yield information valuable to this bug?
> What is the best way to generate an http log

https://developer.mozilla.org/en/HTTP_Logging
(In reply to comment #25)
> > What is the best way to generate an http log
> 
> https://developer.mozilla.org/en/HTTP_Logging

The log file is HUGE...20.3MB...is there some way I can minimize this?
Uh... did you log the entire test run or something?  Or just the failing test?
(In reply to comment #27)
> Uh... did you log the entire test run or something?  Or just the failing test?

Even just running the test attached to this bug, the log file is 5.9MB.  I believe that is over the attachment limit for Bugzilla.
Attached file http log (linux)
Here's the http log, zipped up, from running the attached testcase on Ubuntu 9.10 32-bit and Namoroka 20100610.
> the log file is 5.9MB.

Of course the first 2.9MB of this was just zero bytes...  wonder how you got those in there.  

But OK.  The attachment loads https://www.verisign.com twice, no?  In the log I only see one newChannel call for https://www.verisign.com.
(In reply to comment #31)
> > the log file is 5.9MB.
> 
> Of course the first 2.9MB of this was just zero bytes...  wonder how you got
> those in there.  
> 
> But OK.  The attachment loads https://www.verisign.com twice, no?  In the log I
> only see one newChannel call for https://www.verisign.com.

From comment 17, here are the steps the test performs:
1. Load https://www.verisign.com
2. Opens about:blank to leave any HTTPS site
3. Clears all entries from all cache devices
4. Disables both preferences
5. Loads about:cache and waits 2s (each device shows 0 entries)
6. Loads the Verisign web page again

In step 6 we should get the error page, not verisign.
My point was that the attached log doesn't show step 6 happening at all.
(In reply to comment #33)
> My point was that the attached log doesn't show step 6 happening at all.

Visually, this step happens.  If it's not appearing in the log something strange is happening...
Indeed.  Would you mind generating the log one more time?
Attached file http log v2 (linux)
Here's another one...
Same thing.  Lots of zeroes up front, then only one load of https://www.verisign.com/
(In reply to comment #37)
> Same thing.  Lots of zeroes up front, then only one load of
> https://www.verisign.com/

I would not expect anything different since I'm running the same steps on the same environment.  Going back and forth on this bug having me test is really creating a lot of overhead.  If you want, I can show you how to set up Mozmill so you can test this locally.  It takes less than 5 minutes.
I'll be honest.  It's a low priority for me, and the time taken would likely be several hours worth once you factor in the interruption quotient.

I can do it once I finish the more important things on my list, I guess.  But that might well be a while.
Where "a while" is weeks to months, if I get lucky.

I really suggest figuring out why your HTTP logs are ending up broken as a good start.  Those leading 3 megs of zero bytes should NOT be there.
(In reply to comment #40)
> Where "a while" is weeks to months, if I get lucky.
> 
> I really suggest figuring out why your HTTP logs are ending up broken as a good
> start.  Those leading 3 megs of zero bytes should NOT be there.

Seems to me like that's a separate bug that should block this one.  Especially since I am doing exactly, to the letter, the steps in MDC to set up logging (as you linked in comment 25)
Does the same thing happen if you log a vanilla Firefox nightly with no mozmill involved?
Here is a http log of running the test manual (without Mozmill).

NOTE: The STR are a bit different when running manually...

1. Start Firefox
2. Go to https://www.verisign.com
3. Disable SSL3 and TLS1
4. Click the reload button

RESULT: Verisign page reloads.  You can keep clicking reload and verisign will continue to load (sometimes without CSS styling).  Eventually the error page will display.

The log file only contains the first reload (to keep file size down).
(In reply to comment #39)
> I'll be honest.  It's a low priority for me, and the time taken would likely be
> several hours worth once you factor in the interruption quotient.
> 
> I can do it once I finish the more important things on my list, I guess.  But
> that might well be a while.

Boris, your help would be really appreciated and when I see the last comments from today I really get the impression that we waste time for two of us. I suspect that you read the logs under 3 minutes. As Anthony said, setting up Mozmill and run this test is only 4 clicks away:

1. Start a Firefox build and install https://addons.mozilla.org/de/firefox/addon/9018/
2. After the restart open Mozmill (Tools -> Mozmill)
3. Copy the test content from attachment 418172 [details] into Mozmill's editor
4. Select Test | Run.

I have completely reduced the test so there is really no overhead also have to install any environment or getting all the tests and shared api's.

I'm sure when you could run it on your own, we could save a lot of time on both of our sides.
To supplement the last comment, I'd like to speak to priorities.  Boris, I know this is not high on your list of priorities, but it's very high on our list of priorities.  This blocks us from fixing our Mozmill tests.  Getting a green testrun in Mozmill is a very important goal for QA.  We'll need a green test-run baseline once Mozmill is integrated with buildbot.  

Long story short, we'd really appreciate it if you could give us a few more minutes of your time to look into this bug.
OK, so...  I followed the steps in comment 43 on Linux with a trunk build.  I clicked the reload button, and got an error page immediately.

I also tried following the steps in comment 44 but that add-on is not compatible with a tip trunk build.  Time to pull an older rev and rebuild, I guess...
(In reply to comment #46)
> OK, so...  I followed the steps in comment 43 on Linux with a trunk build.  I
> clicked the reload button, and got an error page immediately.
> 
> I also tried following the steps in comment 44 but that add-on is not
> compatible with a tip trunk build.  Time to pull an older rev and rebuild, I
> guess...

I would also be interested in seeing if you enable NSPR logging and get the same junk bits at the beginning.
With an older build I tried the steps from comment 44 again.  Was step 4 supposed to be "run editor"?

Assuming so, here's the relevant part of the log from when we start the second Verisign load, which due to the test's clearing the cache is NOT loaded from cache:

-2134894528[7f9080a11040]: nsHttpConnectionMgr::AddTransaction [trans=634fb980 -10]
2001729296[7f9080a119d0]: nsHttpConnectionMgr::OnMsgNewTransaction [trans=7f90634fb980]
2001729296[7f9080a119d0]: nsHttpConnectionMgr::GetConnection [ci=.S.www.verisign.com:443 caps=1]
2001729296[7f9080a119d0]: nsHttpConnectionMgr::AtActiveConnectionLimit [ci=.S.www.verisign.com:443 caps=1]
2001729296[7f9080a119d0]:    total=0, persist=0
2001729296[7f9080a119d0]:    reusing connection [conn=72d6a340]

So we're making a request to the server, all right.  But we're reusing an existing keep-alive HTTP connection.  That means no need for connection setup, and no need for an SSL/TLS handshake.  This last means that the SSL prefs don't affect things, I would assume.

The time-dependence has to do with how long keep-alive connections stick around.  In the verisign case, 5 seconds based on the headers the server sends.

If you want to work around this, you could take down the network stack and bring it back up again after clearing the cache, I think.  Though that's a bit of a hack.
This is the log I was just looking at, in case someone cares.  The log from comment 43 is still corrupt, fwiw.

No I'm going to go to sleep; I have to be up in 4 hours, so this exercise has already eaten up 20% of my sleep for the night.
Boris, thanks for taking some time to look into this.  We really appreciate it.  A workaround that I found works reliably is to make the test pause for 10 seconds.  Based on the language in your previous comment, you think this is not a bug due to keep-alive?
Thanks Boris! I never thought that any keep-alive connections could come into play here. While I write this comment I have talked with Biesi on IRC and as he mentioned it's something we should fix but it's not a high priority. I will update the summary to reflect the latest experiences.

(In reply to comment #50)
>  A workaround that I found works reliably is to make the test pause for 10
> seconds.

Please no workaround. As long as we can't run tests under httpd+ssltunnel please simply use another website to do the verification check. One we don't use in all of our other tests. But lets follow-up on bug 514528.
Summary: [mozmill] No SSL error page shown when page is reloaded and SSL 3.0 and TLS 1.0 disabled in options → After disabling SSL and TLS in preferences, keep-alive connections are still present until they time out
Whiteboard: [mozmill][mozmill-test-blocked][mozmill-test-failure]
> you think this is not a bug due to keep-alive?

I think it might be nice to kill off keep-alive connections when the SSL prefs change, but it's low priority...

You may be able to get away with turning off keep-alive (see the "network.http.keep-alive" preference).  Would that be useful to you guys?
(In reply to comment #52)
> You may be able to get away with turning off keep-alive (see the
> "network.http.keep-alive" preference).  Would that be useful to you guys?

That's absolutely fantastic! Thanks Boris. We are disabling keep-alive for now until this bug has been fixed.
Assignee: nobody → mcmanus
Status: NEW → ASSIGNED
Comment on attachment 8699650 [details] [diff] [review]
Dont reuse connections after security prefs change

Review of attachment 8699650 [details] [diff] [review]:
-----------------------------------------------------------------

This is a pretty big hammer to use (hits a lot of prefs that shouldn't matter for conn reuse), but this is probably such a rare occurrence that I imagine it's not going to be world-ending.
Attachment #8699650 - Flags: review?(hurley) → review+
https://hg.mozilla.org/mozilla-central/rev/533020268a78
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
You need to log in before you can comment on or make changes to this bug.