Closed Bug 587407 Opened 14 years ago Closed 9 years ago

Disallow connections when the server uses a DHE key that is too weak

Categories

(Core :: Security: PSM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- .x+

People

(Reporter: Gavin, Unassigned)

References

()

Details

(Whiteboard: [pcm-tcpip] [urls in comment 33][softblocker])

Attachments

(1 file, 5 obsolete files)

This is the trunk solution to the compatibility problem reported in bug 583337 (caused by bug 575620, and worked around on 1.9.2). I think it should therefore block gecko 2.0.
blocking2.0: --- → ?
Whiteboard: [pcm-tcpip]
Gavin: I won't implement a fallback-and-retry style workaround
for these broken servers in Gecko 2.0.  I already filed TE bugs
for all the broken servers we know of.  Due to a limitation of
Bugzilla, TE bugs cannot block Gecko 2.0.  Please use bug 584138
as the Gecko 2.0 blocker.

This bug should be marked WONTFIX.
Why should it be WONTFIX? I don't understand why we wouldn't want to try alternate ciphers if we reject only one of the several a server will support. That's apparently what Opera does, and Nelson suggests that it's easy to do.

What are the downsides?
Summary: PSM should attempt to renegotiate with DHE disabled when an SSL connection fails due to use of DHE cipher suite with weak keys → PSM should attempt to renegotiate when an SSL connection fails due to use of cipher suite with weak keys
Gavin: The downside is that the fallback-and-retry workaround
will hide the problem, so the insecure servers won't be noticed
and fixed.

A workaround, once added, becomes a "technical debt", and
requires a lot of work in the future to determine if it can
be removed.  I'd rather pay the TE cost upfront.
Well, arguably there is nothing to "fix" (apart perhaps from inefficiency) if the end result is that users connect to the site securely in all browsers. But your point about technical debt is taken.

If we're going to take a hard-line stance on servers that we used to connect to without problems, we shouldn't be alone. What are other browsers doing about this problem?
If we had not implemented a workaround in the Firefox 3.6.8
update, that would have been a hard-line stance.

We give the sites enough time (until Firefox 4 final ships)
to correct the server configuration error.  We reported the
bug to the maintainers of the SSL implementation used in
those sites.  They told me they will fix the bug in the
next release.

So I think we're handling this issue properly.

This is what Google Chrome is doing about this problem:
http://code.google.com/p/chromium/issues/detail?id=51694#c7
How are you expecting that sites will discover the error before Firefox 4 ships?

We've yet to ship this stricter behavior in anything but betas/nightlies release, which have far fewer users than our releases. We're likely to discover that a lot more than 3 servers are affected, possibly in various different ways.
Gavin: This kind of debate, first with Nelson, and now with you,
is wearing me out.  I would rather spend my time on the TE bugs.
I'm sorry that I don't have more to say about this issue.
Let's summarize our options and opinions.


Option (1)
----------
- make DHE ciphers our least preferred ciphers, this is what other browsers do
- Nelson objects, because DHE should "deserve" higher priority.

Nelson, in bug 583337 comment 11 you argued that DHE offers a protection mechanism on top of other ciphers.

But with default settings, the users doesn't know when this additional protection is effective. If the user simply surfs the web, there may be times when this protection is used, and times where it's not being used.

My point is, while the protection mechanism is "nice to have when possible", nobody is able to rely on this by default, because the server makes the decision which cipher gets chosen.

In my understanding, in order to have a guaranteed benefit of the DHE protection, either the server must require DHE ciphers, or the client must require the use of an DHE cipher. Both can do so by disabling any non-DHE ciphers.

Because the choice of cipher is unknown to most users, I feel it's not mandatory to have the DHE ciphers being listed prior to other ciphers.

Because of this I'd see the workaround from bug 583337 as a valid option for mozilla-central (and Firefox 4), too.


(more options to follow in separate comments)
Option (2)
----------
Add another fallback level in PSM, which means:

If a server offers DHE, but uses a weak key with DHE, then reset the connection, disable DHE for this server in the remainder of the current session, and use any other cipher.

Nelson likes this approach, but it's not happening in NSS, this would have to be done in the application logic (PSM).

The disadvantage is the "technical debt" that Wan-Teh cited.
Wan-Teh made it clear that Chromium will not use this Option 2.

Also, I have to agree that I don't like the going-back-and-forth, trying multiple mechanisms. This always has the risk that some smart hacker finds a way to abuse our fallback mechanisms for a downgrade attack.
Option (3)
----------
Take a hard-line stance.
If DHE is preferred by a server, but the server configuration made poor choices, fail the connection.

Try to evangelize the web to fix such broken servers.
Wan-Teh prefers this approach.
My opinion:

I appreciate that we have a new error code that makes people aware in scenarios where client or server require the use of DHE, but the server enforces a weak key.

On one hand side I don't like Option 3.
If the server supports other secure mechanisms (besides DHE),
then there shouldn't be a reason to frustrate our users (and make them go elsewhere), as Gavin has emphasized.

On the other hand side, I like Nelson's argument from bug 583337 comment 10,
reminding us how a similar scenario got handled years ago.
It's good to make admins aware of mistakes and protect users.
We all are trying to make Firefox a more secure browser,
and we should be brave to take a hard-line stance when needed.

So the question is, is it necessary to take a hard-line stance?

I propose:

- keep the current trunk behaviour and ship it with the next one or two Firefox 4 beta(s)

- postpone the decision to a later time (after the next 1 or 2 betas, but before Firefox 4 final)

- change the SSL error page (not about bad certs, the one about protocol errors) to remind people to use the Firefox "Report broken website" feature

- collect more evidence, how many sites have the problem?


Should we learn that it's really a very small amount of sites, let's keep the hard-line stance.


Gavin, in bug 583337 comment 24 you said
  "we haven't even released a beta with this change in it yet"

but isn't that wrong? Contrary to your comment, I believe Firefox 4 beta 2 and later already have this new behavior. Bug 583337 was reported for beta 2 (not for a nightly).


But should we learn that there's really bigger number reports of of broken sites, like, at least 15 sites detected by Firefox 4 beta 5, I'd agree that we shouldn't do Option 3.


In that case I'd propose Option (1) until someone had been able to implemented a well tested Option (2).


Gavin, what do you think?
Kai, in reply to comment 9, The point is to give the user the benefit of DHE
whenever the server properly implements it.  Not all servers do, but a user 
to is careful to visit sites that are known to do so should have the assurance
that he will get that benefit.  The patch now taken by Mozilla assures that 
will almost never happen unless the user disables the non-DHE cipher suites, 
or the server does so.  Worse, we're *SILENTLY* taking use of DHE away.

Our duty is to give users the best security that the client and server can mutually support.  That includes DHE when it is properly implemented 
(adequate key size) in the server.  The patch effectively disables DHE for 
all.  Since it is silent, users will not even know that they may need to 
take action to restore DHE service where they think it is important.

It's never necessary to take a hard line stance if you really only care 
about APPEARING to want security rather than really DELIVERING security.

Security that never says "no" is sham security.
Nelson, yes, I understand. But to repeat, users who want to use DHE are still able to force the client to do so, and those who don't force the client to use a particular cipher probably haven't worried about it much, so I'm also in favor of keeping sites working, rather than insisting on a specific order of ciphers.

As I said, I'd like us to collect more data, before we make a decision for one side or the other, because I agree it's a difficult decision.
I have one more idea, a variation of Option (2).

Instead of "automatic retrying because of weak ciphers", we could improve the error reporting and require the user to ask to retry.

In detail:
- user connects to a site, the server selects a cipher, but we decide the key used is too weak

- we show an error page, as we currently do.

- we add a message to the page, and a button, like this:

    "The site www.someplace.com asked (Firefox) to use insecure communication
     parameters. To protect you from attacks, (Firefox) has terminated the
     connection.

     Error code: sec_err_bla"

     Button [Attempt to use a different communication mechanism]"


This could be seen as a compromise between "take a hard stance" and "make it work anyway".

Instead of adding "more technical dept" to the fallback mechanism, we enable the user to be aware, and offer the user to go ahead anyway.

Thoughts?
(In reply to comment #12)
> I appreciate that we have a new error code that makes people aware in scenarios
> where client or server require the use of DHE, but the server enforces a weak
> key.

As an aside: don't forget to add SSL_ERROR_WEAK_SERVER_KEY to nsNSSErrors.cpp/nsserrors.properties, otherwise users would only get PSM's meaningless "Secure Connection Failed" error.
I certainly yield on crypto expertise to the other minds already on this thread, but I confess to a certain amount of confusion about this discussion.

If a server presented a set of ciphers that included ones we didn't support, even in the preferred position, we'd just move down the list until we found one we did, right? And that logic would not, in the general case, have anything to do with *why* we don't support CIPHER_X, be it for lack of implementation resource, or dislike for the cipher, or it being known weak, or what-have-you. We'd just support it, or not, wouldn't we?

I certainly agree that the DHE+weak_key situation is not that simple - we do support (indeed, prefer!) the cipher, but only find out later in the process that the key is unacceptably weak. So certainly I agree with Wan-Teh that this is exceptional behaviour but it feels to me (and again, I welcome correction on this point) like it's a pretty minor deviation. If the cipher had the decency to identify itself as DHEWEAK in the cipher list, we'd just have rolled down to the next one and continued on our merry way, wouldn't we? If the totality of the difference here is that we discover later in the handshake that what advertised itself as DHE was really DHEWEAK, then I admit I'd prefer to roll down to the next (secure) cipher in the list, rather than killing the page load.

I'm worried about taking this stance, though, because it has me seeming to disagree with WTC, which rarely happens, and basically never has me in the right, long term.

(PS to Nelson - While I think I agree with the general sentiment that "security which never says no is a sham", I also feel pretty strongly that perfect security users route around is substantially inferior to strong security that users work within. I'm concerned, though not convinced, that that might be the result here, just trying to talk it through. Apologies if the rehash is frustrating.)
Johnathan,
> If a server presented a set of ciphers ...

Servers don't present a set of ciphers.  Clients do.  Servers pick one 
from the client's list.  Most servers simply pick the first one in the 
client's list that the server also supports.  Some servers pick the one
that is thought to be the strongest that is mutually supported by the
client and server.  

Within the set of cipher suites with a given symmetric key size, NSS clients list cipher suites that provide "forward secrecy" ahead of those that do not.
This is to give competent servers that pick the first mutually supported one
a chance to use the best available security.  Of course, it also gives 
incompetent servers a chance to demonstrate what they're made of.

Historically, with every asymmetric algorithm, we've encountered servers that tried to use absurdly small values. Misconfigured servers happen from time to
time.  We've always "taken the lead" in enforcing minimums.  We are offering 
security, not fun and games with encryption.  

IMO, Firefox must present the user with a CLEAR message regarding the security of the connection.  That message is not "it used (or didn't use) SSL" without any regard to the strength of the crypto involved.  That message is "it was 
secure (or wasn't)".  SSL with absurdly weak keys is not secure.  We should 
not deceive the user into thinking that it is.  

But it is also not sufficient to simply "open the lock", because users don't
notice that.  When a user clicks an https link, when a web page author puts 
an https link in his page, the expectation is that the user will get a page
delivered securely.  It's not good enough to deliver the page insecurely, 
and then silently hide the lock (that most users never notice).  When the 
user has asked for the page to be delivered securely, the choices are 
(a) deliver it securely, or (b) don't deliver it and tell the user why not.

Do you also feel that perfect security that users work around is inferior 
to absurdly weak security that users work with?  256-bit DHE is a JOKE.
It is so weak that we are embarrassed to have ever allowed it.  

This must be about the 10th time we've had this discussion.  Once again, 
you're saying "Our users don't REALLY want security.  They want to get to 
a page that looks like they expect it to, regardless of the consequences,
and we want to be the browser with which they do so."  Is that really a 
discussion that the browser that claims to be taking the high road of 
security should be having so often?

There's a reason I took a job where I'm not paid to contribute to Mozilla
any more.  I'm tired of Mozilla's ongoing conflict of interest with regard
to security.  

The browser that promotes itself as taking the lead with security is afraid 
to enforce cryptographic security strength minimums, and actually says 
(bug 583337 comment 15) it does not wish to "take the lead" in "making  
sites incompatible for security reasons".  

I can think of another service industry in which the purveyors advertise 
their security ("You won't get any nasty disease when you use MY service") 
but who are also quick to allow their users to disregard/discard security 
measures when it increases business ("I wont make you use protection").
There's helpful content in there, Nelson, and I appreciate that, but there's also a lot of inflammatory rhetoric - I hope you'll not mind too much if I leave most of that out - it's not clear that it helps.

We shouldn't support anything we deem insecure. I'm not sure what the conflict of interest is that you're referring to but we don't have shareholders to kowtow to or nefarious intent - our job is to build a safe and usable browser for our users. Safe and usable - that's not a conflict, it's a balance.

You've been around Mozilla long enough not to actually believe the fallacious synecdoche of "it was said in a bug comment ergo it represents unassailable Mozilla policy forevermore" so I'm reading that instead as frustration about a comment you disagree with. I guess that's your prerogative, but I'd sort of rather focus on the question at hand.

Again - I don't want us to support things we don't think are secure, I actually don't think anyone has been advocating that. Do you, or is that a rhetorical device? I'm certainly not saying "let's use DHE with tiny key sizes," in case that wasn't clear.

I'm trying to re-examine how strongly we believe that the technical debt WTC describes around supporting renegotiation in cases where we encounter weak DHE is too costly. If we renegotiate with a different cipher in those cases, my understanding is that our users will indeed get a secure connection we believe in - that is undeniably a better outcome for that user. It's a worse outcome for us in terms of maintenance burden on that code, and it's a worse outcome for TE because we lose the opportunity to, for lack of a better phrase, use our users as a stick to force sites with weak-key DHE into line. I think the debate about whether DHE with weak keys is "secure enough" or not is a non-sequitur. Let's stipulate that we wont accept it. The question remains as to whether we try to do what our user wanted ("Connect to this site securely") or bail as an advocacy point. That will come down very much to the perceived costs of doing what the user wants, and the perceived wins of using them as an advocacy tool.

Is that a conversation we can have without you calling me a prostitute? I don't feel like I've earned that moniker here, or elsewhere, but I have a lot of respect for your contributions to the project over the years, so I'm trying to find a way to not take it personally.
(In reply to comment #18)
> The browser that promotes itself as taking the lead with security is afraid 
> to enforce cryptographic security strength minimums, and actually says 
> (bug 583337 comment 15) it does not wish to "take the lead" in "making  
> sites incompatible for security reasons".  

You misinterpreted my comment, perhaps because I phrased it poorly. We should absolutely "take the lead" on refusing to connect to servers using insecure ciphers. No one is arguing that we should continue pretending that the connection is secure when it isn't.

The end result of the patches that caused bug 583337 was that Firefox started refusing to connect to servers because only *one* of the multiple cipher options supported by the server (the one sent first in the ClientHello message) was weak. There are alternative ways to connect to these servers that we consider secure enough to not fail the connection, but we don't use them because of implementation details. My opposition is to making our users suffer (e.g. they can no longer pay their gas bill) because of these implementation details. Ignoring this impact to our users - or deflecting it by blaming "incompetent server administrators" - is irresponsible, and that's why I filed this bug.

We may disagree on the tradeoffs involved (not impacting users vs. encouraging fixes to the servers), but I'd sort of like to be able to do that without being accused of not caring about security.
- I conclude from comment 17 that Johnathan prefers Option 2.
- I think Gavin would be find with either Option 1 or 2
- Nelson had original proposed to go with Option 2.
- Wan-Teh prefers Option 3

It seems we are heading towards a democratic majority for Option 2.
Option 2 is more work, but maybe that's the way we have to go.

Now the question is, 
should we do Option 2 as described in comment 9, 
or the variant described in comment 15 ?
Wrong comment number in the previous comment. Correcting my questions:

Should we do Option 2 as described in comment 10 (automatic fallback)
or the variant described in comment 15 (user assisted fallback) ?

I have a tendency towards automatic fallback.
Option 1 and Option 2 would satisfy me.

(I don't like the modified option 2 from comment 15 - pushing the decision on to users has no upside, as far as I can tell.)
Back to answering Jonathan's original question:

Nelson mostly answered it. The client provides the list of protocols in order of it's preference, the server picks from the list. Usually the server picks the first cipher on the client's list that matches, but the SSL protocol allows the server to apply it's own preferences.

NSS orders stronger ciphers to before weaker ciphers. Ciphers with perfect forward secrecy (like DHE) are preferred over ciphers that don't for the same key strength.

In DHE, we do not know the strength of the DHE key until we start in the protocol (It's sent in the server key exchange message). At this point we are already in the protocol. There is no option but to continue or abort, leaving the application to decide what to do (read PSM).

This question was brought before the browser security list and the following was determined:
    Opera implements a modified option 2 (It doesn't disable DHE, it just moves it to the end of the list -- If DHE is again renegotiated, it removes all security decoration). The threshold for Opera is higher. DHE keys must be 1024 bits or higher.
    There was some indication that IE did not support DHE_RSA (and DHE_DSA is exceedingly rare). I'm not sure this is true or not, there was no authoritative response and the poster got some other details wrong.
    I don't know if there were any browsers which lists DHE below non-DHE ciphers.

I believe long turn option 1 is not viable (though I was ok for that to be a short term solution). I haven't seen anyone advocating for this, and I think this option draws the most negative energy from Nelson (probably deservedly).

I mildly prefer option 3. There's something to be said for exposing problems on the web. Long term it does not do websites any favors by hiding what could be serious configuration issues that effect overall security. Many years ago we started being more careful of RSA operations and found someone with an RSA key with exponent 1. Allowing users to connect to that website was just the wrong thing.

Option 2 is acceptable to me, but only because I'm not really a fan of 'perfect forward secrecy' as deployable security concept. I'll separate my rant on perfect forward secrecy in a separate comment. That being said, if I *really* cared about perfect forward secrecy, I would likely be a bit annoyed at option 2. I quietly got to something else.

> The end result of the patches that caused bug 583337 was that Firefox started
> refusing to connect to servers because only *one* of the multiple cipher
> options supported by the server (the one sent first in the ClientHello
> message)

It's reasonable to bring this up, but it may also be reasonable that this is really correct behaviour. That first cipher which is supposed to be the stronger cipher can be implemented badly. If implemented badly, then it become considerably weaker -- even dangerous.

Whether or not we should just fall back is really a matter of degree. If we were talking 512 bit keys, then we are talking a larger number of servers, and a key size that is, though weaker, is still strong against most individual users (I would not trust that certain governments couldn't break 512 bit DH, however). The annoyance of fixing all those servers is much higher than it's worth, so fall back seems more reasonable. In the current case, it's easier to get the three servers fixed (which represent one hardware vendor) and as a result we make the internet more secure as a whole (one has already fixed its website)..

If we were talking 100 servers, then I would be leaning more toward Option 2.

Upshot, I mindly prefer Option 3 in this case over Option 2. Both are strongly preferred to Option 1.

bob
> (I don't like the modified option 2 from comment 15 - pushing the decision on
> to users has no upside, as far as I can tell.)

I agree here.
<promised pfs rant>

So the idea behind perfect forward security is that the keys used to do the key exchange are ephemeral and the server operator or client cannot be forced to be divulged to some random government.

The problem is, in practice, it's difficult to know if you have perfect forward security:

First, as we saw, few browsers give you any guarantee about the size of the DH key used in the exchange. From the surface UI, you only see the certificate and the negotiated key. If the DH cert was 256 bits, the government does not need to get any escrowed keys. (The same might be true for 512 bit keys).

Second, we don't know if the keys really are ephemeral. The server could be presenting the same fixed key every time, or the server could be presenting a set of 1000 keys that it has already escrowed. Unless you control both the server and the client, you don't really know if you have pfs. Again the browser does not give you any information (nor can I think of a way that the average user could get the information about pfs).

If you happen to control your server and your client, however, you can benefit from pfs (or if you trust your server provider). In that case, there isn't a way that your recorded encrypted transcript could be replayed, so it isn't completely useless.

Anyway that's where I'm coming from in not ranking pfs as high as others. I'm not against it, I'm just not convinced it provided more than marginal benefit.
</end rant>
(In reply to comment #24)
> I mildly prefer option 3. There's something to be said for exposing problems on
> the web. Long term it does not do websites any favors by hiding what could be
> serious configuration issues that effect overall security.

I understand this reasoning, but if every client implemented the NSS+PSM+Option 2 behavior, or something equivalent to it, then the insecurity would be purely theoretical, right? No clients would ever use the insecure cipher, instead they would all fall back to alternate secure ciphers that the broken servers also support. Subjecting our user base to functional regressions in an attempt to fix things for other (theoretical?) clients that don't do the key validation we do seems like the wrong trade-off.

(You could also argue that if the server is broken in this particular way, it's may be more likely to be broken in other ways as well, but I don't think that influences my evaluation of the tradeoff very much.)
Gavin,
DHE is not "the insecure cipher".  Please disabuse yourself of that notion.   When properly used by a server, it is the MOST secure one, which is why NSS puts it ahead of the others.

Johnathan,
You speak for Mozilla.  So do Gavin and Mike (e.g. in bug 583337).  You're 
all on record.  Mozilla has spoken.

When I speak of an industry whose purveyors quickly sacrifice principle in 
exchange for market share, I am speaking not of you personally, but of organizations whose behavior consistently is that way.  I will write more 
about this in email, since it does not directly contribute to the resolution 
of this bug.

Regarding this bug, be clear that Mozilla's present position is that she has already sacrificed some user security in the form of forward secrecy.  
Mozilla has disabled that forward secrecy by default for all users.

Here is a restatement of the 3 options given in comments above:

3) restore user security - with no added "technical debt" and if the UI is
done well, some added pressure on the bad servers to improve their security.

2) restore user security - with added "technical debt" (something that 
Mozilla has repeatedly and eagerly taken the lead in doing in the past, 
to retain market share and continue to claim superior security).

1) leave things as they are now, with user security silently lessened.
> I understand this reasoning, but if every client implemented the NSS+PSM+Option
> 2 behavior, or something equivalent to it, then the insecurity would be purely
> theoretical, right?

That would be correct, but in this case, we know for a fact the a pretty significant number of browsers don't -- every old mozilla-based browser. Keeping more broken servers from coming up protects older mozilla users (at least, I don't know if there are other browsers that would negotiate DH in this case). Whether or not it's worth the 'I can't connect to this server with FF but I can with IE' hit is really what we are discussing here. The PR hit of 3 websites (one of which is now fixed), representing one hardware vendor (who has now been informed of the issue and is supposedly addressing it) seems pretty small compared to the benefit that unpatched and ancient browsers from mozilla can now safely connect to those servers.

I admit the trade off between option 2 and option 3 are not clear. I don't think option 2 is wrong (in the way option 1 is). The key question is: Is it really is it worth a small PR hit to Mozilla to make the internet environment as a whole better. Where one falls on this depends on how bit you think the PR hit is versus how much real change to the internet we can affect. 

> DHE is not "the insecure cipher".

To be fair to Gavin, I don't think he was at all implying that in his statement. He meant the insecure use of DHE (not every non-crypto person uses the crypto vocabulary as precisely as crypto people do). He's clearly talking about fallback in the case where the DHE intermediate key is too weak, which I think is a solution acceptable to you. 

The current debate really seems around option 3 versus option 2. Really my goal is to make it clear there are real benefits to option 3 and it should just be dismissed by the 'regression' argument. I don't really have heartache if we come down on the option 2 side (though in that case I may lobby for stronger DHE intermediate key requirements -- say 700-or 800 bits).

bob
Let's talk about coding Option 2:

The new error code is generic. If I understand correctly, it is intended to be used with any kind of weak server keys. It's not restricted to DH, it could happen with other algorithms.

If I'm right, I conclude we should not fix the code to retry without DH, but dynamically find the algorithm that failed.

After NSS returns with the error, we need to inspect the SSL socket and retrieve information about the algorithm that was attempted, so we can disable that specific algorithm when retrying.

The only (externally available) NSS API I could find that might supply such information is SSL_SecurityStatus, but it returns "null" for cipher name.

This brings me to my questions:

Question (I)
Do you agree that SSL_ERROR_WEAK_SERVER_KEY can happen with non-DHE ?

Question (II)
Is there any externally API that would allow me to query 
properties of the failing socket, in particular version number and
  sslSocket->ssl3.hs.kea_def->exchKeyType
?

Question (III)
If not, any proposals for the new API?
Attached patch Experimental Patch v1 (obsolete) — Splinter Review
This patch disables DH after a weak key error and retries.
(It doesn't use the dynamic querying for DH that I proposed in the previous comment)

It requires a newer version of NSS as used currently by mozilla-central:
(I used: python client.py update_nss NSS_3_12_BRANCH )

Any reviews and comments welcome.
Attachment #468795 - Flags: review?
Attached patch Experimental Patch v1 fixed (obsolete) — Splinter Review
Attachment #468795 - Attachment is obsolete: true
Attachment #468795 - Flags: review?
Attachment #468797 - Flags: review?
https://portal-plumprod.cgc.enbridge.com/
https://www.citylink.com.au/

(it appears that https://www.tescodiets.com/ got fixed)
Whiteboard: [pcm-tcpip] → [pcm-tcpip] [urls in comment 33]
(In reply to comment #29)
> The PR hit of 3 websites (one of which is now fixed)

> a small PR hit to Mozilla

We have not determined that this only affects 3 websites, or that the "PR hit" will be small. It's too early to really know, because so far all we have is beta testing feedback. So I think it's reasonable to at least be prepared for the worst :)
What about a static black-list solution? Option 1 but only for servers on the black list.
(In reply to comment #30)
>
> Question (I)
> Do you agree that SSL_ERROR_WEAK_SERVER_KEY can happen with non-DHE ?

I'm going to rename that error code to SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY
so that there is no ambiguity what kind of key it is.  See bug 587234 comment
21.

In your experimental patch, you only need to disable
TLS_DHE_RSA_WITH_AES_256_CBC_SHA.  Remember, lowering the priority of that
cipher suite alone is enough to solves the problem for FF 3.6.9.

Using a static blacklist is a good idea, too.
blocking2.0: ? → betaN+
I checked in the part of Kai's "Experimental Patch v1 fixed"
(attachment 468797 [details] [diff] [review])that adds an error message for the
new error code SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY.

I don't know hg well enough, so my hg commit message
incorrectly shows me as the patch author.  Sorry about that.

http://hg.mozilla.org/mozilla-central/rev/d04cc71e768b
Attachment #476560 - Attachment description: Add error message for SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY, by Kai Engert → Add error message for SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY, by Kai Engert (checked in)
This is the remainder of Kai's experimental patch.
Attachment #468797 - Attachment is obsolete: true
Attachment #468797 - Flags: review?
(In reply to comment #12)
> - change the SSL error page (not about bad certs, the one about protocol
> errors) to remind people to use the Firefox "Report broken website" feature

Note that after bug 572695 this is now called "Submit Feedback..." which no longer stand out as the menuitem referred to in the error page text "use the command found in the help menu to report this broken site".

I'm assuming someone in this CC list will know what to do about that and where to file a bug better than I do.
Depends on: 600104
No longer depends on: 600104
Wan-Teh: Is it really problematic to put the fallback logic in PSM? Or, is your objection primarily to falling back silently so server admins never have to fix their sites? PSM seems like the ideal place for it because the TLS 1.0 fallback logic is there and the inevitable TLS 1.1/1.2 fallback logic will probably be there too. The changes to PSM seem minor and it seems a lot better to me to keep all of this unfortunate cruft in one well-managed and well-understood place.

(In reply to comment #6)
> This is what Google Chrome is doing about this problem:
> http://code.google.com/p/chromium/issues/detail?id=51694#c7

(In reply to comment #39)
> Note that after bug 572695 this is now called "Submit Feedback..." which no
> longer stand out as the menuitem referred to in the error page text "use the
> command found in the help menu to report this broken site".
> 
> I'm assuming someone in this CC list will know what to do about that and where
> to file a bug better than I do.

The end-user is likely to access websites from multiple devices and browsers, with varying behaviors: refuse connection (future Chrome), fall back (Opera), silently use the weak key (many older browsers). Consequently, the weakness of the server's key affects Firefox users even if it doesn't affect them when they are using Firefox. Accordingly, we should notify the user about the problem.

I propose that Firefox should fall back (Option 2) *and* display a doorhanger with a warning about the weak key and a "report this problem" button that will assist us in notifying the website and helping them correct the problem. If we want to go all out, I can create a wiki page describing the problem and/or provide some instructions for server administrators to correct the problem. Then we can include a link to the wiki page in the doorhanger. The UI and wiki page should be factored out into separate bugs if this is the approach we are going to take.

Note that the cases seen so far seem to be using 256-bit keys even though 512+ is the default for almost all software. That suggests to me that the server administrators consciously chose to use a 256-bit key; I wouldn't be surprised if this is a result of misinformed attempts to achieve "256 bit security." This is why I think the admins of the servers affected are going to need extra assistance in understanding and resolving the issue on their servers.
Depends on: 600104
All the broken servers use the same SSL acceleration device.
Let's call it Device X in this bug report.

I've reported this bug to the vendor of Device X, and the
vendor has implemented a fix.  The fix is going through the
QA process.

I've also got step-by-step instructions for disabling DHE
cipher suites from a customer of Device X.  The instructions
can be incorporated into Firefox 4 release notes.

So, this bug is Device X vs. Firefox 4.  Device X can be
updated, and its customers can disable DHE cipher suites
before the update is released.  Firefox 4 should not yield
and implement a fallback that will allow *new* broken
implementations to appear.
I spoke about this during the NSS-DEV teleconference and then again privately with Robert and Wan-Teh. I don't want to put words in other people's mouths but there seem to now be no strong objections to option #2 (fallback). Robert suggested that if we are going to fall back then we should do so whenever the DHE key is less than 1024 bits and I strongly agree with this. Wan-Teh seemed to think this was reasonable as well. The three of us also seem to be in agreement that it would be good to do something UI-wise wise but there's no time to do that for Firefox 4.0.

I suggest that we resolve this bug by having me clean up Kai's patch (with Kai's help) so that PSM falls back to RSA key exchange when NSS reports the DHE key is too small and RSA key exchange with the same bulk cipher is enabled and the RSA key is larger than the DHE key. Bug 134735 was already filed long ago about warning about weak keys, which we can use to prioritize and organize any warning UI separately. (My understanding is that it is too late to implement any new UI for this in FF 4.0 because of strings freeze.)

Is there anybody that objects to this?
Assignee: nobody → bsmith
The plan in comment 42 sounds great to me.
(In reply to comment #42)
> I spoke about this during the NSS-DEV teleconference and then again privately
> with Robert and Wan-Teh. I don't want to put words in other people's mouths but
> there seem to now be no strong objections to option #2 (fallback). Robert
> suggested that if we are going to fall back then we should do so whenever the
> DHE key is less than 1024 bits and I strongly agree with this. Wan-Teh seemed
> to think this was reasonable as well.


Fair enough, but if you want to use 1024 bits as the threshould value right now, then a PSM patch won't be sufficient, because NSS is currently hard coded to 512 bits, as I understand it:

http://mxr.mozilla.org/mozilla-central/source/security/nss/lib/ssl/ssl3con.c#5309


For reasons of simplicity, we might want to consider this approach:

- for now, accept that the threshould will be 512

- reach agreement with the NSS team will increase the value 
  to 1024 bits in the next NSS release

- Mozilla/PSM automatically gets the upgraded threshould when we
  upgrade Mozilla to that NSS release, something we frequently do


If you accept the above, life is simple, and we can proceed with the PSM patch.

If you disagreed with the above, you would have to find a way to change this limit outside of NSS. Given that there is no callback to define it, you'd have to apply a patch on top of NSS, in the copy that Mozilla is using, however, the NSS developers prefer to avoid this practice, and it's usually only considered OK only for fixing very urgent issues, e.g. build fixes. So, in order to do it the right way, you'd have to propose a new API, get that API added to NSS, and wait for that API to appear in a next NSS release.


> The three of us also seem to be in
> agreement that it would be good to do something UI-wise wise but there's no
> time to do that for Firefox 4.0.
> 
> I suggest that we resolve this bug by having me clean up Kai's patch (with
> Kai's help) so that PSM falls back to RSA key exchange


The fallback logic in the currently attached patch is "fall back to anything but these excluded DH ciphers", but yeah, for practical purposes this seems equivalent to saying "fall back to RSA".


> when NSS reports the DHE
> key is too small and RSA key exchange with the same bulk cipher is enabled and
> the RSA key is larger than the DHE key.


The current patch doesn't have such a combined logic that compares key sizes. When it detects the weak DH key, it will retry the connection without DH, and will accept whatever NSS accepts.

If you'd like to impose a new RSA key size check right away, that seems like additional work that will probably complicate fixing this bug.
(In reply to comment #44)
> For reasons of simplicity, we might want to consider this approach:
> 
> - for now, accept that the threshould will be 512

OK, let's use Bug 134735 to organize what to do about 512+.

> - reach agreement with the NSS team will increase the value 
>   to 1024 bits in the next NSS release
> 
> - Mozilla/PSM automatically gets the upgraded threshould when we
>   upgrade Mozilla to that NSS release, something we frequently do

It may be that, for compatibility reasons, we need a soft limit instead of a hard limit for 512+. If we need a soft limit then we will need a new API in NSS to help us implement it. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6330287 

> > when NSS reports the DHE
> > key is too small and RSA key exchange with the same bulk cipher is enabled and
> > the RSA key is larger than the DHE key.
> 
> The current patch doesn't have such a combined logic that compares key sizes.
> When it detects the weak DH key, it will retry the connection without DH, and
> will accept whatever NSS accepts.

Yes, I understand. With the 512 limit, then we don't have to compare key sizes. But, we should still ensure that the RSA equivalent of the DHE-RSA cipher suite is enabled, because we don't want to fall back to a weaker bulk cipher in degenerate cases where people have made mistakes in about:config.

In order for the fallback to be safe, a change to libssl is needed. See Bug 601635.
bsmith - ping on this betaN blocker?
We know of two websites that are still affected: bug 586645 and bug 597166. Most of the other affected websites we have found have disabled DHE cipher suites to solve the problem on their ends. I think this is the solution that was provided by the vendor of the product that causes this problem to their customers. Doing what Wan-Teh suggested--i.e. nothing--looks much better now that we've seen most other sites with this problem (that we know of) implement that workaround.

Kai, what do you think?
The question is, are we confident that no users will require this after a public release?

If we're confident, I'm fine with the "do nothing" approach.

However, if we still had doubts, it would be good to be prepared.
We could add this automatic retry code, disable it by default, but allow users to enable it using about:config.
Anyway, comment 48 is just an idea.

I'm fine to try the "do nothing" apporach for FF 4 final,
and if it really is necessary,
we could add this code on the stable branch for FF 4.0.x,
with or without pref.
Using the SSL Labs dataset, eight (8) more domains (out of 867,361) that have keys less than 512 bits:

www.play65partners.com 
www.selecsonic.com
www.sanyo-i.jp 
www.gmd.com
www.imagesatlantic.com
www.infogenesisasp.com
www.play89partners.com
www.atlanticpayments.com

Here are the DHE key length distributions from this initial run:

<512: 10
512 309
768 3596
1024 460106
2048 278
Whiteboard: [pcm-tcpip] [urls in comment 33] → [pcm-tcpip] [urls in comment 33][softblocker]
softblockers that require a beta are likely no longer in scope for 4.0.  sounds like this needs to move to FF 4.0.x or be made a hardblocker.
This can wait until FF 4.0.x. Other PSM bugs with higher impact have already been minused.
blocking2.0: ? → .x+
(In reply to comment #50)
> www.play65partners.com 
> www.selecsonic.com
> www.sanyo-i.jp 
> www.gmd.com
> www.imagesatlantic.com
> www.infogenesisasp.com
> www.play89partners.com
> www.atlanticpayments.com

Of all of these, only www.gmd.com has the weak key error now. The other sites were apparently fixed, or they are using untrusted (self-signed) certificates and are unimportant. We haven't received (AFAICT) any bug reports since FF 4.0 about this error. So, there is no longer any need (IMO) to consider dealing with the SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY error from NSS, as long as NSS doesn't change the threshold for issuing that error to be 512 or more. (That is, Wan-Teh was right; I was wrong.)

> Here are the DHE key length distributions from this initial run:
> 
> <512: 10
> 512 309
> 768 3596
> 1024 460106
> 2048 278

Less than 1% of websites use a key size less than 1024 bits, which I think should be our minimum key size. We should enforce this minimum for Firefox 6.
Summary: PSM should attempt to renegotiate when an SSL connection fails due to use of cipher suite with weak keys → PSM should attempt to renegotiate when the server uses a DHE key that is too weak
Does this mean, this bug is WONTFIX ?
I think it means we don't have to worry about the case where the key size is less than 512 bits. But, we need to do it for the case where the key size is 512-1024 bits, where NSS won't return an error but where we already know that the encryption is more-or-less broken.
Assignee: brian → nobody
My vote is to just increase the minimum DHE key size limit, targeting 512-bit DHE first and setting a deadline for elimination of 768-bit DHE.
BTW, now that the bug is long fixed do anyone know what the vendor that used 256-bit DHE was?
Assignee: nobody → VYV03354
Attachment #476560 - Attachment is obsolete: true
Attachment #476562 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8532914 - Flags: review?(dkeeler)
Thanks, though it is not my preferred solution. Fortunately, I thin 768-bit DHE is stronger than 768-bit RSA. It does not appear that there are many 512-bit DHE sites left (outside of export cipher suites).
I already tried to notify one of the 512-bit DHE sites, BTW:
https://www.ssllabs.com/ssltest/analyze.html?d=mobilelinkgen.com
Should I file a TE bug?
I'm not sure it has anything to do with Firefox. Firefox will choose ECDHE cipher suites for the site even without the patch.
I know, and I hasn't seen any other 512-bit DHE sites on ssllabs.com, which means we will probably be able to kill it off soon.
(In reply to Yuhong Bao from comment #59)
> Thanks, though it is not my preferred solution. Fortunately, I thin 768-bit
> DHE is stronger than 768-bit RSA. It does not appear that there are many
> 512-bit DHE sites left (outside of export cipher suites).

Even a 1024-bit DH key is approximately weaker than an 80-bit symmetric key (RFC 3766). We are going to force 2048-bit key length against RSA signatures (bug 1049740). I don't think we should allow shorter DH key than 1024-bit.
Yea, I think a deadline for 768-bit DHE is a good idea anyway as I mentioned above. It is just that 512-bit DHE should be targeted first due to the usage being much less.
(In reply to Masatoshi Kimura [:emk] from comment #58)
> Created attachment 8532914 [details] [diff] [review]
> Implement a fallback for weak DH keys (< 1024 bits)

It's taking me a little while to get through this review. I'm hoping to have it done sometime tomorrow and certainly by the end of day Friday.
Comment on attachment 8532914 [details] [diff] [review]
Implement a fallback for weak DH keys (< 1024 bits)

Review of attachment 8532914 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good. Let's add a bit more documentation, though. Also, we should extend BadCertServer and add end-to-end xpcshell testcases for this. Additionally, it would be really nice to be able to gather telemetry on how often we attempt to fallback but then can't negotiate a common ciphersuite (as in, how often we're breaking servers for users as a result of this). Finally, I'd like Brian or someone more familiar with the networking side of things to confirm that infoObject->SetCanceled is the right thing to do and that we're doing it in all the appropriate places.

::: security/manager/ssl/src/nsNSSComponent.cpp
@@ +621,5 @@
>  
>  // Table of pref names and SSL cipher ID
> +enum CipherPrefFlags {
> +  enc_rc4 = 1 << 0,
> +  kx_dh = 1 << 1,

Let's use more descriptive names for these enum values.

::: security/manager/ssl/src/nsNSSIOLayer.h
@@ +209,5 @@
>      uint16_t tolerant;
>      uint16_t intolerant;
>      PRErrorCode intoleranceReason;
>      StrongCipherStatus strongCipherStatus;
> +    uint32_t dhKeyBits;

We should probably have a bit of documentation for this:
"The size of the DH key used. 0 if DH wasn't used."
Attachment #8532914 - Flags: review?(dkeeler) → review+
Comment on attachment 8532914 [details] [diff] [review]
Implement a fallback for weak DH keys (< 1024 bits)

Brian, I'd appreciate you taking a look at this. I can find someone else familiar with the networking code if you're busy, though.
Attachment #8532914 - Flags: review?(brian)
I think that IE don't support DHE_RSA other than GCM, but it would still be good to add telemetry for both how often this fallback happens and how often it results in failed connections, so that for example a deadline can be set.
Comment on attachment 8532914 [details] [diff] [review]
Implement a fallback for weak DH keys (< 1024 bits)

Review of attachment 8532914 [details] [diff] [review]:
-----------------------------------------------------------------

Masatoshi-san, thanks for taking this on. I think this bug is more tricky than we initially realized. I think there are large changes needed. I did not review the whole patch carefully; once I found the larger issues, I stopped reviewing.

::: security/manager/ssl/src/nsNSSCallbacks.cpp
@@ +922,5 @@
>    }
>  }
>  
> +static void
> +AccumulateNonECCKeySize(Telemetry::ID probe, uint32_t bits);

Unlike funciton definitions, function declarations do not have the function name in the first column. This should all be on one line. If it overflows 80 chars, then wrap the parameters like normal.

@@ +972,5 @@
> +                            channelInfo.keaKeyBits);
> +    infoObject->SetCanceled(SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY,
> +                            PlainErrorMessage);
> +    return SECSuccess;
> +  }

First, see bug 952863. This may just be unnecessary now.

But, even without bug 952863, you shouldn't need to call AccumulateNonECCKeySize or SetCancelled here, because HandshakeCallback will still get called, and you already do those things in HandshakeCallback. In general, the CanFalseStartCallback should be ONLY about answering the yes/no question of whether we should false start; it should never try to cancel the connection.

@@ +1204,4 @@
>      rv = SSL_GetCipherSuiteInfo(channelInfo.cipherSuite, &cipherInfo,
>                                  sizeof cipherInfo);
>      MOZ_ASSERT(rv == SECSuccess);
>      if (rv == SECSuccess) {

This code, as it was previously written was OK because the (rv == SECSuccess) branch wasn't doing anything security-sensitive, so it didn't matter in a material way if it never executed. However, now you are adding security-sensitive checks into the (rv == SECSuccess) branch which are not acceptable to skip. That's bad. If the code is going to have this kind of structure, then infoObject->SetCanceled should be called here in the (rv != SECSuccess) case.

@@ +1213,5 @@
> +          channelInfo.keaKeyBits < ioLayerHelpers.mMinDHKeyBits) {
> +        AccumulateNonECCKeySize(Telemetry::SSL_WEAK_DH_KEY_FALLBACK,
> +                                channelInfo.keaKeyBits);
> +        infoObject->SetCanceled(SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY,
> +                                PlainErrorMessage);

I am not confident that calling infoObject->SetCanceled is enough. Once the handshake callback returns, libssl may internally decide to send some sensitive data before PSM can test whether SetCanceled was called.

In fact, libssl might have already decided to send some sensitive non-application-data even before the HandshakeCallback gets called. For example, that might happen if Gecko were to ever implement the encrypted client certificate extension or ChannelID.

libssl already has a check for minimum DHE size; it's just hard-coded to 512 bits. I suggest that you instead define an API for NSS's libssl that lets you change that minimum DHE key size, so that the enforcement is done within libssl. Then libssl will know to never send any sensitive data in the event of a too-small key. Once you do that, you can translate the "DHE key too small" error code into the error code that will cause a retry in the nsNSSIOLayer.cpp code that translates handshake error codes into the error code that causes connection retries. This is a much more bulletproof solution.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +110,5 @@
>  // responding when we attempt to false start.
>  static const bool FALSE_START_REQUIRE_NPN_DEFAULT = true;
>  
> +// keep this value in sync with security-prefs.js
> +static const uint32_t DEFAULT_MIN_DH_KEY_BITS = 1024;

Here and elsewhere, please use "DHE" instead of "DH" because there is a distinction between those two in TLS: "DH" cipher suites don't use ephemeral keys.

@@ +1007,5 @@
>    } else {
>      entry.tolerant = 0;
>      entry.intolerant = 0;
>      entry.intoleranceReason = intoleranceReason;
> +    entry.dhKeyBits = 0;

Why are you remembering the number of bits? Don't you just need to remember a boolean of whether there were "enough" bits?

@@ +1053,2 @@
>  void
>  nsSSLIOLayerHelpers::adjustForTLSIntolerance(const nsACString& hostName,

Perhaps "adjustForFallback" would be a better name since this is now less about TLS intolerance and more about other kinds of fallback.

@@ +1054,5 @@
>  nsSSLIOLayerHelpers::adjustForTLSIntolerance(const nsACString& hostName,
>                                               int16_t port,
>                                               /*in/out*/ SSLVersionRange& range,
> +                                             /*out*/ StrongCipherStatus& strongCipherStatus,
> +                                             /*out*/ bool& weakDHKey)

Please use an enum here. In the calling code, it would be unclear whether true means "strong" or "weak". By using an enum, you avoid that confusion.

@@ +1297,5 @@
>    }
> +
> +  if (err == SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY) {
> +    // Do not fallback if the key length is shorter than 512 bits
> +    // to workaround bug 601635.

I think "to workaround bug 601635" is misleading. Also, I think referring to bug 601635 is unclear because there's no accepted resolution for that bug.

Actually, I think you might be misunderstanding the point of bug 601635. In fact, I was probably misunderstanding the point when I filed it. The point I was trying to make in that bug is that a MITM shouldn't be able to disable DHE by forging a DHE key smaller than what would trigger us to disable DHE. In order to ensure that doesn't happen, the security of the entire handshake must be validated--i.e. the peer's finished message must be processed and the certificate must be verified successfully--before libssl returns SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY. Again, this means changing libssl. That is why bug 601635 is in the NSS product, not the PSM product.
Attachment #8532914 - Flags: review?(brian) → review-
Comment on attachment 8532914 [details] [diff] [review]
Implement a fallback for weak DH keys (< 1024 bits)

Review of attachment 8532914 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +1297,5 @@
>    }
> +
> +  if (err == SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY) {
> +    // Do not fallback if the key length is shorter than 512 bits
> +    // to workaround bug 601635.

Actually, I'll take part of this back: Because the signature of the DHE parameters also covers ClientHello.client_random, you don't have to wait until the Finished message is verified. But you do need to wait until the certificate is verified to return SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY.
Comment on attachment 8532914 [details] [diff] [review]
Implement a fallback for weak DH keys (< 1024 bits)

Review of attachment 8532914 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/manager/ssl/src/nsNSSCallbacks.cpp
@@ +972,5 @@
> +                            channelInfo.keaKeyBits);
> +    infoObject->SetCanceled(SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY,
> +                            PlainErrorMessage);
> +    return SECSuccess;
> +  }

Sometimes fallback attempts failed unless I add the check here. Perhaps HandshakeCallback is too late to fail the handshake (as you said in the following comment).

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +1007,5 @@
>    } else {
>      entry.tolerant = 0;
>      entry.intolerant = 0;
>      entry.intoleranceReason = intoleranceReason;
> +    entry.dhKeyBits = 0;

The "security.tls.min_dh_key_bits" pref value may be lowered after we decided the DH key was not enough long. If we don't have to make the pref "live" or if we don't need the pref at all, I can make it a boolean.

@@ +1297,5 @@
>    }
> +
> +  if (err == SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY) {
> +    // Do not fallback if the key length is shorter than 512 bits
> +    // to workaround bug 601635.

I meant to say "MiTM can disable DHE by forging a DHE key smaller than 512 bits due to bug 601635. Do not try to fallback as a temporary measure." here. MiTM can't disable DHE unless we call rememberDHKeyBits().

But apparently it's inevitable to change NSS anyway...
Unassigning as of now.
Assignee: VYV03354 → nobody
Status: ASSIGNED → NEW
I think we should consider doing the telemetry first.
We can't gather meaningful telemetry unless we actually tries fallback and see if it fails.
I believe I found a solution without changing NSS.
- libssl will never send any sensitive data until the certificate is verified (otherwise it is a vulnerability).
- MiTM can not tamper the DHE key length because we have verified the certificate (according to your comment #70).
- SetCertVerificationResult already uses SetCanceled to tell the failure. I'm just riding on that.
Please correct me if I'm wrong.
Attachment #8532914 - Attachment is obsolete: true
Attachment #8536033 - Flags: feedback?(brian)
I am talking about telemetry about what DHE key size is being used currently.
(In reply to Yuhong Bao from comment #76)
> I am talking about telemetry about what DHE key size is being used currently.

http://telemetry.mozilla.org/#filter=release%2F34%2FSSL_KEA_DHE_KEY_SIZE_FULL
Thanks, and I hope it will be preserved after this patch.
Comment on attachment 8536033 [details] [diff] [review]
WIP: Implement a fallback for weak DH keys (< 1024 bits), v2

Review of attachment 8536033 [details] [diff] [review]:
-----------------------------------------------------------------

I apprecate the effort, Masatoshi-san. However, I think this doesn't solve the fundamental issues regarding the libssl state machine. It seems like you are hesitant to modify libssl, but I think modifying libssl is by far the best way of solving these issues. I will try to create a proof-of-concept patch for bug 601635 to help you.

::: security/manager/ssl/src/nsNSSComponent.h
@@ +153,5 @@
>  
>    ::mozilla::TemporaryRef<mozilla::psm::SharedCertVerifier>
>      GetDefaultCertVerifier() MOZ_OVERRIDE;
>  
> +  // The following three methods are thread-safe.

How is DisableDHECiphersOnSocket thread-safe? It seems like it doesn't do anything to be thread safe.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +670,5 @@
>  }
>  
> +bool
> +nsNSSSocketInfo::IsDHEKeyWeak(PRErrorCode& errorCode,
> +                              SSLErrorMessageType& errorMessageType)

Please mark output parameters with "/*out*/".

@@ +678,5 @@
> +    return false;
> +  }
> +
> +  SSLChannelInfo channelInfo;
> +  SECStatus rv = SSL_GetChannelInfo(mFd, &channelInfo, sizeof(channelInfo));

This code is racy. In general, libssl doesn't guarantee that you can call SSL_GetChannelInfo in the auth certificate callback. In particular, SSL_GetChannelInfo requires the enoughFirstHsDone to be set for it to return useful information. However, in the case of async cert verification, the code that will eventually set enoughFirstHsDone may or may not have run, depending on how much of the handshake we've received from the caller. In fact, we might execute this code before we even receive the ServerKeyExchange message containing the DHE parameters from the server! (We kick off cert verification when we receive the server's Certificate message, which comes before the ServerKeyExchange message.)

Note also SSL_GetChannelInfo will return SECSuccess with non-useful information in the case enoughFirstHsDone isn't set!

@@ +710,5 @@
>  // attempt to acquire locks that are already held by libssl when it calls
>  // callbacks.
>  void
>  nsNSSSocketInfo::SetCertVerificationResult(PRErrorCode errorCode,
>                                             SSLErrorMessageType errorMessageType)

IIRC, SetCertVerificationResult only gets called for *async* cert verification. However, we also have *sync* cert verification, which is used by Thunderbird. Search for "We can't do certificate verification on a background thread". I think this code won't execute in the sync case.

@@ +715,5 @@
>  {
>    NS_ASSERTION(mCertVerificationState == waiting_for_cert_verification,
>                 "Invalid state transition to cert_verification_finished");
>  
> +  if (mFd && !IsDHEKeyWeak(errorCode, errorMessageType)) {

I think this should be something like this:

if (mFd) {
  if (errorCode == 0) {
    // replace errorCode/errorMessageType if the DHE key was
    // weak, but only for valid certificates, to avoid a MitM
    // causing us to disable DHE cipher suites by giving us
    // a forged ServerKeyExchange signed with an invalid cert.
    (void) IsDHEKeyWeak(errorCode, errorMessageType);
  }

Howeer, see my comments above regarding why you can't do this in the auth certificate callback.
Attachment #8536033 - Flags: feedback?(brian) → feedback+
I attached a patch for bug 601635, but I haven't tested it.

I think you should consider another, simpler, approach: Why not just fail the handshake when the DHE parameters are <1024 bits, without any fallback? Maybe that would have too negative of a compatibility impact, but it seems better than doing the fallback to me.
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #80)
> I attached a patch for bug 601635, but I haven't tested it.
> 
> I think you should consider another, simpler, approach: Why not just fail
> the handshake when the DHE parameters are <1024 bits, without any fallback?
> Maybe that would have too negative of a compatibility impact, but it seems
> better than doing the fallback to me.

Exactly why I suggesting starting with <768 bits first. Fortunately factoring 768 bit DHE is more difficult than factoring 768 bit RSA, and I think it would have be done individually for each session.
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #77)
> (In reply to Yuhong Bao from comment #76)
> > I am talking about telemetry about what DHE key size is being used currently.
> 
> http://telemetry.mozilla.org/#filter=release%2F34%2FSSL_KEA_DHE_KEY_SIZE_FULL

I looked at this telemetry. It says that ~9/1,000 (0.9%) of DHE handshakes use a key size of less than 1024 bits. I also looked at the key exchange algorithm selection telemetry [1]. It says that about 48/1,000 (4.8%) of handshakes use DHE key exchange. That means 0.009 * 0.048 = 0.000432 = 0.04% = 1/2,500 handshakes would fail if there were a hard limit of 1024 bits. That seems low enough to me to justify having a hard limit that is controllable with a pref, without any fallback logic for disabling DHE cipher suites.

[1] http://telemetry.mozilla.org/#filter=release%2F34%2FSSL_KEY_EXCHANGE_ALGORITHM_FULL
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #80)
> I attached a patch for bug 601635, but I haven't tested it.
> 
> I think you should consider another, simpler, approach: Why not just fail
> the handshake when the DHE parameters are <1024 bits, without any fallback?
> Maybe that would have too negative of a compatibility impact, but it seems
> better than doing the fallback to me.

I'm fine with the 1024-bit hard-limit, but I consider bumping the lower-limit to 2048-bit. It will not be possible without any fallback. Do you think DHE with 1024-bit key is better than (no-FS) RSA at this time?
Flags: needinfo?(brian)
Also, if server admins upgrade the server to support >1024-bit DHE key, they should rather add a support for ECDHE. Is the DHE support really needed at all?
(In reply to Masatoshi Kimura [:emk] from comment #83)
> (In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment
> #80)
> > I attached a patch for bug 601635, but I haven't tested it.
> > 
> > I think you should consider another, simpler, approach: Why not just fail
> > the handshake when the DHE parameters are <1024 bits, without any fallback?
> > Maybe that would have too negative of a compatibility impact, but it seems
> > better than doing the fallback to me.
> 
> I'm fine with the 1024-bit hard-limit, but I consider bumping the
> lower-limit to 2048-bit. It will not be possible without any fallback. Do
> you think DHE with 1024-bit key is better than (no-FS) RSA at this time?

I think so. Remember that DHE is more difficult to factor than RSA and I think that each session would have to be factored individually.
(In reply to Masatoshi Kimura [:emk] from comment #83)
> I'm fine with the 1024-bit hard-limit, but I consider bumping the
> lower-limit to 2048-bit. It will not be possible without any fallback. Do
> you think DHE with 1024-bit key is better than (no-FS) RSA at this time?

Yes, I think 1024-bit DHE is much better than no-FS RSA for almost every use. Forward Secrecy is really important.

(In reply to Masatoshi Kimura [:emk] from comment #84)
> Also, if server admins upgrade the server to support >1024-bit DHE key, they
> should rather add a support for ECDHE.

Right. The newest versions of most servers support ECDHE P-256 and 2048-bit DHE. So, when the server admin upgrades, they'll almost always get ECDHE support too. Most servers prefer ECDHE over DHE (AFAICT). Many server admins will reduce the DHE support to 1024-bit DHE because some popular clients cannot do >1024-bit DHE. (The admin of a VERY large service just verified this to me a couple of days ago.) Also, P-256 ECDHE is stronger than 2048-bit DHE (quantum attacks aside), and ECDHE is much more efficient. So, ECDHE is the solution, not stronger DHE.

> Is the DHE support really needed at all?

It makes sense to offer TLS_DHE_x for the cases where TLS_RSA_x has to be offered, for all x, because 1024-bit DHE is better than any non-FS RSA. However, I believe the best policy is to avoid adding any new TLS_DHE_* cipher suites.

(In reply to Yuhong Bao from comment #85)
> I think so. Remember that DHE is more difficult to factor than RSA and I
> think that each session would have to be factored individually.

Right. Also, often an attacker will be able to steal the RSA private keys, but it won't be able to steal any (maybe a few) (EC)DHE keys, and if the server is implemented well, there will be few (ideally one) use of each (EC)DHE. This is the primary advantage of TLS_(EC)DHE_* over TLS_RSA_*.
Flags: needinfo?(brian)
OK, let's avoid the over-engineer.
Summary: PSM should attempt to renegotiate when the server uses a DHE key that is too weak → Disallow connections when the server uses a DHE key that is too weak
No longer depends on: 601635
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #86)
> (In reply to Masatoshi Kimura [:emk] from comment #84)
> > Also, if server admins upgrade the server to support >1024-bit DHE key, they
> > should rather add a support for ECDHE.
> 
> Right. The newest versions of most servers support ECDHE P-256 and 2048-bit
> DHE. So, when the server admin upgrades, they'll almost always get ECDHE
> support too. Most servers prefer ECDHE over DHE (AFAICT). Many server admins
> will reduce the DHE support to 1024-bit DHE because some popular clients
> cannot do >1024-bit DHE. (The admin of a VERY large service just verified
> this to me a couple of days ago.)
One of them is JSSE before Java 8. They also hardcode the server side to 768 bit (one of the reasons why I suggested trying that first). Luckily Java 7 and later also supports ECDHE. I wonder if OpenJDK has considered backporting Java 8's JSSE to Java 7.
FREAK just hit the news. Given that obtaining the master secret allows the Finished hash to be forged, I wonder if 512-bit DHE would be vulnerable to a similar attack too.
We already disallows 512-bit DHE key in NSS.
Update: the DHE attack would require the attack hold the connection open during the factoring. I wonder if GPUs would be fast enough.
The attack does not have to be online. The server typically generates DH params at startup and keeps them until the server is booted (typically several days.) Even the public key is reused by default in many servers. This gives the attacker time to compute the discrete log of the public key.
(In reply to Masatoshi Kimura [:emk] from comment #90)
> We already disallows 512-bit DHE key in NSS.

I am not sure this is true. In ssl3con.c there is this:

	if (dh_p.len < 512/8) {
	    errCode = SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY;
	    goto alert_loser;
	}

Also there's bug 623658, which is something to keep in mind.
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #93)
> (In reply to Masatoshi Kimura [:emk] from comment #90)
> > We already disallows 512-bit DHE key in NSS.
> 
> I am not sure this is true. In ssl3con.c there is this:
> 
> 	if (dh_p.len < 512/8) {
> 	    errCode = SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY;
> 	    goto alert_loser;
> 	}
> 
> Also there's bug 623658, which is something to keep in mind.

Oops, I confused <512 with <=512.
In case there's any doubt, I tested the latest firefox and it does accept 512-bit groups.
The changes in the recent IcedTea releases to use 1024-bit DHE are disabled by default. Set jdk.tls.ephemeralDHKeySize to jdk8 to enable.
Fixed by bug 1166031.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Due to this change (or rather bug #1166031), Firefox 41 can no longer access Cisco Unified Communications 9.1.2 appliance web console and web dialer. The diagnostics page says: […] ssl_error_weak_server_ephemeral_dh_key.

As a workaround, you can make the following change in about:config:

security.ssl3.dhe_rsa_aes_128_sha=false
I'm afraid that the work around didn't work for me will69.

Well done firefox - now I'm using a different browser.  After 15 years or so of using Netscape or Firefox, I can no longer access a sight that is absolutely imperative to my bread and butter, here:
http://www.leeds.ac.uk/library/authentication/link.cgi?Item=Westlaw

I really don't give a stuff about the copyrights that Westlaw might be leaking because Thompson/Reuters are incapable of maintaining there IT infrastructure, what I care about is being able to do my work!

How to kill off the last remaining firefox users, brilliant!

Nose face anyone?
Actually, doing both:
security.ssl3.dhe_rsa_aes_128_sha=false
security.ssl3.dhe_rsa_aes_256_sha=false

seems to fix the bug introduced by comment 97

Really guys, this is the peak of arrogance - if someone wants to use an insecure service, then that's there business.
Seems to be 768-bit DHE, which I already suggested as the minimum.
"Actually, doing both:
security.ssl3.dhe_rsa_aes_128_sha=false
security.ssl3.dhe_rsa_aes_256_sha=false"

This resolved the issue for me. I noticed after updating to Firefox 39 on Mac OS I was unable to access my TridentHE login page. It would work in Safari and Chrome so I knew the site was operational. I at the very least there should be an option to ignore the warning rather than all out block it. For me TridentHE is a mission critical internal site that should not be blocked by a browser with out my consent.
grrrr,  systems down again today to militant stance on SSL again.  Not ALL use of SSL is over the Internet or has a MITM risk at all.  Web sites are internal, reporting the error does not help, I am my own sysadmin.
Thanks guys. Fire drills all over the company due to this. You've cost us dozens of developer hours. We're considering removing FF from our supported browsers.
(In reply to Steve from comment #104)
> Thanks guys. Fire drills all over the company due to this. You've cost us
> dozens of developer hours. We're considering removing FF from our supported
> browsers.

The good news is that for those with support contracts Oracle finally released an update for Java 6 and 7 this Tuesday. I agree that they should have gone for the 768-bit DHE minimum for now.
I agree - we have also had to move to Chrome. Whilst I appreciate the attempt to raise security matters, not allowing developers control over their own internal sites seems heavy handed and self defeating.
(In reply to peterarmistead from comment #106)
> we have also had to move to Chrome

Chrome 45+ will also disable <1024-bit DHE key.
https://code.google.com/p/chromium/issues/detail?id=490240
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: