Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Disable insecure TLS version fallback

RESOLVED FIXED in Firefox 37

Status

()

Core
Security: PSM
--
critical
RESOLVED FIXED
3 years ago
29 days ago

People

(Reporter: Stefan Franke, Assigned: emk)

Tracking

({site-compat})

33 Branch
mozilla37
site-compat
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox35+ wontfix, firefox36+ wontfix, firefox37+ fixed, relnote-firefox 37+)

Details

Attachments

(7 attachments, 5 obsolete attachments)

1.14 KB, patch
keeler
: review+
Details | Diff | Splinter Review
5.65 KB, patch
keeler
: review+
Details | Diff | Splinter Review
1.31 KB, patch
emk
: review+
keeler
: review+
Details | Diff | Splinter Review
6.00 KB, patch
Details | Diff | Splinter Review
3.43 KB, patch
Details | Diff | Splinter Review
38.69 KB, text/plain
Details
528.11 KB, text/plain
Details
(Reporter)

Description

3 years ago
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
Build ID: 20141011015303

Steps to reproduce:

I connect to a server providing TLS 1.0, TLS 1.1 and TLS 1.2.


Actual results:

An attacker blockes the handshake if TLS 1.1 or TLS 1.2 is used.

The Browser downgrades the TLS version to TLS 1.0.


Expected results:

The Browser MUST not downgrade the security level. Since the TLS protocol negotiates the TLS version itself.

The offending code is in nsNSSIOLayer.cpp:

void
nsSSLIOLayerHelpers::adjustForTLSIntolerance(const nsACString& hostName,
                                             int16_t port,
                                             /*in/out*/ SSLVersionRange& range)
{
  IntoleranceEntry entry;

  {
    nsCString key;
    getSiteKey(hostName, port, key);

    MutexAutoLock lock(mutex);
    if (!mTLSIntoleranceInfo.Get(key, &entry)) {
      return;
    }
  }

  entry.AssertInvariant();

  if (entry.intolerant != 0) {
    // We've tried connecting at a higher range but failed, so try at the
    // version we haven't tried yet, unless we have reached the minimum.
    if (range.min < entry.intolerant) {
      range.max = entry.intolerant - 1;
    }
  }
}
(Reporter)

Updated

3 years ago
OS: Windows 7 → All
Hardware: x86_64 → All
(Reporter)

Updated

3 years ago
Component: Untriaged → Security
(Reporter)

Updated

3 years ago
Severity: normal → critical

Comment 1

3 years ago
Mozilla is already aware, see https://blog.mozilla.org/security/2014/10/14/the-poodle-attack-and-the-end-of-ssl-3-0/

"SSLv3 will be disabled by default in Firefox 34, which will be released on Nov 25. The code to disable it is landing today in Nightly, and will be promoted to Aurora and Beta in the next few weeks."
Status: UNCONFIRMED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → WORKSFORME
(Reporter)

Comment 2

3 years ago
This ticket is not related to SSL 3.0

It's about bogus TLS 1.2 downgrade to TLS 1.0
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
(Assignee)

Comment 3

3 years ago
Then how is this bug related to POODLE?
Summary: (POODLE) TLS is downgraded by Browser → TLS is downgraded by Browser
(Reporter)

Comment 4

3 years ago
Part of POODLE is "On Downgraded Legacy Encryption".

An attacker can still provoke using TLS 1.0 instead of TLS 1.2.

And TLS 1.0 is weaker than TLS 1.2.
(Assignee)

Comment 5

3 years ago
It applies to downgrading attacks generally. How is it specially related to POODLE?
(Assignee)

Comment 6

3 years ago
Now you can effectively disable the insecure fallback by setting "security.tls.version.fallback-limit" to 3.

I found a discussion about removing the insecure fallback in dev.tech.crypto:
https://groups.google.com/d/msg/mozilla.dev.tech.crypto/Qis_B8b2b7U/CCOouHbcLN4J

FYI, Brian Smith introduced the insecure fallback to TLS 1.1 or later (Bug 839310). According to his comment, he didn't care much about making Firefox more secure than only-TLS1.0-capable browsers.
https://bugzilla.mozilla.org/show_bug.cgi?id=839310#c6
Blocks: 839310
Status: UNCONFIRMED → NEW
Component: Security → Security: PSM
Depends on: 1083058
Ever confirmed: true
Product: Firefox → Core
Please don't close bugs that you didn't open, unless you are a module owner or module peer in this module.

Back when I wrote my comments in bug 839310, our goal was to turn TLS 1.2 and TLS 1.1 on in Firefox ASAP. Because we wanted to do so ASAP, and because other browsers were already doing the non-secure fallback, we didn't have much choice except to do the fallback the same way the others do. I always planned on trying to find some way to do so with minimal compatibility impact, but I left Mozilla before I could do so.

Now, you can see on the IETF TLS WG mailing list that I am arguing that all browsers should remove the non-secure fallback. And, I've provided a pretty good argument as to why. See [1]. I believe now is the best time, politically, to disable the non-secure fallback in all browsers, and there are practical security concerns motivating the decision to do so. Mozilla should take the lead in doing so, so that other browsers feel comfortable in joining them. However, that said, I have not done the research to verify how plausible that is, as far as compatibility is concerned, and I haven't talked to Firefox management about whether they're willing to accept the compatibility risk. David Keeler is the PSM module owner now. So, it will be up to other people, at Mozilla, to decide this.

[1] http://www.ietf.org/mail-archive/web/tls/current/msg14072.html

Comment 8

3 years ago
As a former developer of NSS from 2001-2009, I wholeheartedly agree with Brian.
I have always thought that this PSM insecure fallback was crazy.
IMO, it's high time for it to go.
There is mitigation for this: bug 1083058 allows you to control this behaviour now.
(In reply to Martin Thomson [:mt] from comment #9)
> There is mitigation for this: bug 1083058 allows you to control this
> behaviour now.

That's not a real mitigation, because we can't expect end-users to know about that preference. What matters is the default behavior.
(Assignee)

Comment 11

3 years ago
Created attachment 8511585 [details] [diff] [review]
Disable insecure version fallback (Nightly only)

According to [1], TLS 1.2 intolerance servers were under 1% even a year ago.
Let's experiment the fallback-free world.

[1] http://www.ietf.org/mail-archive/web/tls/current/msg10657.html
Attachment #8511585 - Flags: review?(dkeeler)
(Assignee)

Updated

3 years ago
Attachment #8511585 - Attachment description: bug1084025_disable_insecure_fallback → Disable insecure version fallback (Nightly only)
What information do you plan to collect from this experiment?  The screams of frustrated users?  We haven't had many as a result of disabling SSL 3.0, even in Aurora, so I'm not sure that we'll be able to definitely say that the experiment succeeded.  Failure is easier to detect, but I don't want to have this experiment fail.

Here's an experiment that might work, and the only cost is latency (which can be tuned differently).

Leave fallback in place for now.  But - with some probability - ignore any previously learned intolerance information.  Then, record telemetry if a previously determined intolerant version becomes marked as tolerant; and the converse if the value is re-confirmed.  We can use this to learn about the false fallback rates.  From that, we can determine what reasons for fallback can be removed.

With SSL 3.0 out of the picture, this is definitely worth pursuing in one form or other, but I think that the above experiment might be a better first step.
I extrapolated from Mozilla's telemetry data that disabling SSL 3.0 will cause about 7 times as many failures as disabling non-secure fallback from TLS 1.2 to lower versions. I think it is likely that the failure can be fixed in a different way than re-enabling the TLS intolerance fallback logic. Also, many of the things that need to be done to make disabling the TLS intolerance fallback logic tolerable should also be done prior to disabling SSL 3.0, to minimize the compatibility impact of SSL 3.0.

I think it is a good idea to land Masatoshi's patch from bug 1088915 before landing a patch for this bug, because Masatoshi's patch from bug 1088915 is likely to be a foundation for some compatibility workarounds that don't involve version fallback.
Comment on attachment 8511585 [details] [diff] [review]
Disable insecure version fallback (Nightly only)

Review of attachment 8511585 [details] [diff] [review]:
-----------------------------------------------------------------

Please make sure you get David's review before checking this in.

::: netwerk/base/public/security-prefs.js
@@ +3,5 @@
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  pref("security.tls.version.min", 1);
>  pref("security.tls.version.max", 3);
> +#ifdef NIGHTLY_BUILD

I don't think it is a good idea to have these conditions in these patches. Instead, you could just do this unconditionally, and then when Nightly gets uplifted to Aurora, the change can be backed out of Aurora, if it is causing problems. That is the normal way of doing things and it works well.
Attachment #8511585 - Flags: review+
(Assignee)

Comment 15

3 years ago
Created attachment 8511941 [details] [diff] [review]
Disable insecure version fallback, v2

OK, it would be simple enough in case we have to flip it back.
Attachment #8511941 - Flags: review?(dkeeler)
(Assignee)

Updated

3 years ago
Attachment #8511585 - Flags: review?(dkeeler)
(Assignee)

Updated

3 years ago
Attachment #8511585 - Attachment is obsolete: true

Updated

3 years ago
Summary: TLS is downgraded by Browser → Disable insecure TLS version fallback
Comment on attachment 8511941 [details] [diff] [review]
Disable insecure version fallback, v2

Review of attachment 8511941 [details] [diff] [review]:
-----------------------------------------------------------------

I'm all for trying this, but we need some way of measuring the outcome (e.g. a telemetry count of failures due to not falling back). r=me with that.
Attachment #8511941 - Flags: review?(dkeeler) → review+
(Assignee)

Comment 17

3 years ago
(In reply to David Keeler (:keeler) [use needinfo?] from comment #16)
> I'm all for trying this, but we need some way of measuring the outcome (e.g.
> a telemetry count of failures due to not falling back). r=me with that.

SSL_TLS12_INTOLERANCE_REASON_PRE will serve the purpose.
(Reporter)

Comment 18

3 years ago
> I'm all for trying this, but we need some way of measuring the outcome (e.g.
> a telemetry count of failures due to not falling back). r=me with that.

Does this telemetry contain the information how many failures are caused by attackers?
(Assignee)

Comment 19

3 years ago
(In reply to Stefan Franke from comment #18)
> > I'm all for trying this, but we need some way of measuring the outcome (e.g.
> > a telemetry count of failures due to not falling back). r=me with that.
> 
> Does this telemetry contain the information how many failures are caused by
> attackers?

It's impossible by definition. If it's possible, we could stop the fallback only when attackers shut down the connection.
(Assignee)

Comment 20

3 years ago
Created attachment 8513857 [details] [diff] [review]
Add telemetry to measure failures due to not falling back

I had to change the logic to reduce false positives, so requesting an additional review.
Attachment #8513857 - Flags: review?(dkeeler)
Comment on attachment 8513857 [details] [diff] [review]
Add telemetry to measure failures due to not falling back

Review of attachment 8513857 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for doing this. In general this is good, but it will need rebasing (see comments). Also, I made a suggestion for a slightly different way of collecting the data that I think will give us more insight. r- for now.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +879,5 @@
>  
>    MutexAutoLock lock(mutex);
>  
> +  bool fallbackLimitReached =
> +    intolerant <= minVersion || intolerant <= mVersionFallbackLimit;

I was thinking we would want to distinguish between hitting the limit from mVersionFallbackLimit vs. hitting minVersion (and maybe even depending on what minVersion is?)

How about we have a histogram with 8 buckets. The first 4 will be for hitting minVersion, for minVersion=0, 1, 2, 3. The last 4 will be the same, but for mVersionFallbackLimit. Does that sound like useful data to collect?

@@ +892,4 @@
>        // We already know that the server is intolerant at a lower version.
>        return true;
> +    } else if (fallbackLimitReached) {
> +      // We can't fall back any further. Assume that intolerance isn't the issue.

Looks like this will have to be rebased on top of the recent changes from bug 1088950.

@@ +906,5 @@
> +    Telemetry::Accumulate(Telemetry::SSL_FALLBACK_LIMIT_REACHED, true);
> +    return false;
> +  }
> +
> +

nit: unnecessary blank line

::: toolkit/components/telemetry/Histograms.json
@@ +6382,5 @@
>      "n_values": 64,
>      "description": "TLS/SSL version intolerance was falsely detected, server rejected handshake"
>    },
> +  "SSL_FALLBACK_LIMIT_REACHED": {
> +    "expires_in_version": "never",

We should probably have an expiration on this, since this is more or less a temporary experiment, right? Version 40 or maybe 42 is probably good.
Attachment #8513857 - Flags: review?(dkeeler) → review-
(Assignee)

Comment 22

3 years ago
Created attachment 8515427 [details] [diff] [review]
Add telemetry to measure failures due to not falling back

Resolved review comments.
Attachment #8513857 - Attachment is obsolete: true
Attachment #8515427 - Flags: review?(dkeeler)
(Assignee)

Comment 23

3 years ago
Created attachment 8515897 [details] [diff] [review]
Add telemetry to measure failures due to not falling back

Moved the telemetry call to the caller.
Attachment #8515897 - Flags: review?(dkeeler)
(Assignee)

Updated

3 years ago
Attachment #8515427 - Attachment is obsolete: true
Attachment #8515427 - Flags: review?(dkeeler)
Comment on attachment 8515897 [details] [diff] [review]
Add telemetry to measure failures due to not falling back

Review of attachment 8515897 [details] [diff] [review]:
-----------------------------------------------------------------

I think we can gather this telemetry with fewer changes to the code. Also, be careful about how you convert version values to histogram buckets. For example, TLS 1.0 is represented by the value 0x0301, not 0x01.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +879,5 @@
>    MutexAutoLock lock(mutex);
> +  forgetIntolerance(key, lock);
> +}
> +
> +void nsSSLIOLayerHelpers::forgetIntolerance(const nsCString& key,

I'm not sure refactoring this function is useful.

@@ -903,5 @@
>  {
> -  if (intolerant <= minVersion || intolerant <= mVersionFallbackLimit) {
> -    // We can't fall back any further. Assume that intolerance isn't the issue.
> -    forgetIntolerance(hostName, port);
> -    return false;

These changes to rememberIntolerantAtVersion change the control flow in a way that I'm not convinced is equivalent to the original code. Why can't we add the bucket determination/telemetry accumulation here?

@@ +940,5 @@
> +    if (intolerant <= minVersion) {
> +      fallbackLimitBucket += minVersion;
> +    }
> +    if (intolerant <= mVersionFallbackLimit) {
> +      fallbackLimitBucket += (mVersionFallbackLimit << 2);

We should document how this is encoded (i.e. what each bucket in the histogram corresponds to).
And, actually, this won't work as-is, because mVersionFallbackLimit is something like 0x0301, which puts the bucket out of range for just 16 values.

@@ +1296,5 @@
>    if (!socketInfo->SharedState().IOLayerHelpers()
>                   .rememberIntolerantAtVersion(socketInfo->GetHostName(),
>                                                socketInfo->GetPort(),
> +                                              range.min, range.max, err,
> +                                              fallbackLimitBucket)) {

I think it would be best if rememberIntolerantAtVersion accumulated the appropriate telemetry on its own, rather than having an extra output parameter.
Attachment #8515897 - Flags: review?(dkeeler) → review-
(Assignee)

Comment 25

3 years ago
(In reply to David Keeler (:keeler) [use needinfo?] from comment #24)
> ::: security/manager/ssl/src/nsNSSIOLayer.cpp
> @@ +879,5 @@
> >    MutexAutoLock lock(mutex);
> > +  forgetIntolerance(key, lock);
> > +}
> > +
> > +void nsSSLIOLayerHelpers::forgetIntolerance(const nsCString& key,
> 
> I'm not sure refactoring this function is useful.

MutexAutoLock is an RAII helper for mozilla::Mutext [1] which is a bare wrapper around PRLock [2] which is not recursive at least on Linux [3][4]. I moved a forgetIntolerance call into the MutexAutoLock scope, so I have to add a non-lock version of the function.
[1] https://developer.mozilla.org/ja/docs/Namespace/Mozilla/MutexAutoLock
[2] https://developer.mozilla.org/en-US/docs/Namespace/Mozilla/Mutex
[3] https://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/pthreads/ptsynch.c?rev=3a49b88c5bb0#52
[4] https://stackoverflow.com/questions/19863734/what-is-pthread-mutex-adaptive-np#answer-19863902

> @@ -903,5 @@
> >  {
> > -  if (intolerant <= minVersion || intolerant <= mVersionFallbackLimit) {
> > -    // We can't fall back any further. Assume that intolerance isn't the issue.
> > -    forgetIntolerance(hostName, port);
> > -    return false;
> 
> These changes to rememberIntolerantAtVersion change the control flow in a
> way that I'm not convinced is equivalent to the original code. Why can't we
> add the bucket determination/telemetry accumulation here?

I'm attempting to avoid accumulating the telemetry if we already know whether the server is tolerant at the version. I called it "false positives" in comment #20. If we don't have to exclude this case, the second patch is unneccessary in the first place because SSL_TLS12_INTOLERANCE_REASON_PRE will be sufficient as I said in comment #17. Do you think this patch is not needed?
Flags: needinfo?(dkeeler)
What scenarios cause these false positives?
Assignee: nobody → VYV03354
Flags: needinfo?(dkeeler) → needinfo?(VYV03354)
(Assignee)

Comment 27

3 years ago
1. Access example.com with TLS version 1.2. Firefox will remember the server is tolerant at TLS 1.2.
2. Press the reload button. Access failed due to network glitches.
3. rememberIntolerantAtVersion will find that the version reaches the fallback limit. It will accumulate the new telemetry and returns false while we already know the server is tolerant at version 1.2. I called it the "false positive".

Moreover, retryDueToTLSIntolerance will accumulate SSL_TLS12_INTOLERANCE_REASON_PRE and will not accumulate SSL_TLS12_INTOLERANCE_REASON_POST whenever rememberIntolerantAtVersion returns true. Therefore the new telemetry is redundant.
Flags: needinfo?(VYV03354)
(Assignee)

Comment 28

3 years ago
> whenever rememberIntolerantAtVersion returns true.

Correction: whenever rememberIntolerantAtVersion returns false.
See Also: → bug 999544
(Assignee)

Comment 29

3 years ago
Created attachment 8530477 [details] [diff] [review]
Add telemetry to measure failures due to not falling back, v2

I'm not still convinced of the usefulness of this telemetry, but I can live with that and I would like to land this early in the cycle.

> We should document how this is encoded (i.e. what each bucket in the
> histogram corresponds to).

I think the new code is self-documenting enough.
Attachment #8515897 - Attachment is obsolete: true
Attachment #8530477 - Flags: review?(dkeeler)
(Assignee)

Comment 30

3 years ago
Created attachment 8530480 [details] [diff] [review]
Add telemetry to measure failures due to not falling back, v3

OK, I found a much cleaner way to reduce false positives.
Attachment #8530477 - Attachment is obsolete: true
Attachment #8530477 - Flags: review?(dkeeler)
Attachment #8530480 - Flags: review?(dkeeler)
(Assignee)

Comment 31

3 years ago
https://bugzilla.mozilla.org/show_bug.cgi?id=1084025

Comment 32

3 years ago
My feeling is that the best thing would be to set a clear deadline to disable TLS 1.2 intolerance fallback (I suggest 6 months to a year), and to announce it widely. TLS 1.3 intolerance will probably take longer to solve though, though I recently tried to notify a few sites.
(In reply to Yuhong Bao from comment #32)
> My feeling is that the best thing would be to set a clear deadline to
> disable TLS 1.2 intolerance fallback (I suggest 6 months to a year), and to
> announce it widely.

That cannot be done until more is learned about TLS 1.2/1.1 intolerance. EMK's patch in this bug will help us learn more about it. With EMK's patch, people at Mozilla can use Matt Wobensmith's compatibility testing system to identify some of the most popular servers that would be affected by this change. EMK's patch can always be augmented with improved compatibility logic and/or backed out if it is too disruptive. 

There are many compatibility issues that get treated as "version intolerance" that aren't really version intolerance, so once there is a (partial) list of affected sites, people can triage that list to find out if it is really version intolerance, or intolerance to a specific cipher suite, or intolerance to a specific extension, or intolerance to the specific ordering of extensions in NSS's ClientHello, or something else. Then safer workarounds can be implemented for those other issues (e.g. reordering cipher suites, and/or reordering extensions, and/or asking CAs to give their customers better certificates).

Comment 34

3 years ago
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #33)
> (In reply to Yuhong Bao from comment #32)
> > My feeling is that the best thing would be to set a clear deadline to
> > disable TLS 1.2 intolerance fallback (I suggest 6 months to a year), and to
> > announce it widely.
> 
> That cannot be done until more is learned about TLS 1.2/1.1 intolerance.
> EMK's patch in this bug will help us learn more about it. With EMK's patch,
> people at Mozilla can use Matt Wobensmith's compatibility testing system to
> identify some of the most popular servers that would be affected by this
> change. EMK's patch can always be augmented with improved compatibility
> logic and/or backed out if it is too disruptive. 

I agree actually, and that the deadline set should depend on how many servers are affected.
(Reporter)

Comment 35

3 years ago
(In reply to Yuhong Bao from comment #34)
> (In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment
> #33)
> > (In reply to Yuhong Bao from comment #32)
> > > My feeling is that the best thing would be to set a clear deadline to
> > > disable TLS 1.2 intolerance fallback (I suggest 6 months to a year), and to
> > > announce it widely.
> > 
> > That cannot be done until more is learned about TLS 1.2/1.1 intolerance.
> > EMK's patch in this bug will help us learn more about it. With EMK's patch,
> > people at Mozilla can use Matt Wobensmith's compatibility testing system to
> > identify some of the most popular servers that would be affected by this
> > change. EMK's patch can always be augmented with improved compatibility
> > logic and/or backed out if it is too disruptive. 
> 
> I agree actually, and that the deadline set should depend on how many
> servers are affected.

Summary:
Since it is impossible to distinguish between a bad server or an attacked server, attacks are permanently supported.

The intention of this tickt was to prevent attacks.
Comment on attachment 8530480 [details] [diff] [review]
Add telemetry to measure failures due to not falling back, v3

Review of attachment 8530480 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM with nits addressed.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +874,1 @@
>                                              int16_t port)

nit: indentation (also, put the return type on its own line)

@@ +917,5 @@
> +    uint32_t fallbackLimitBucket = 0;
> +    // added if the version has reached the min version.
> +    if (intolerant <= minVersion) {
> +      switch (minVersion) {
> +      case SSL_LIBRARY_VERSION_TLS_1_0:

nit: indent the body of the switch statement one more level

@@ +931,5 @@
> +    }
> +    // added if the version has reached the fallback limit.
> +    if (intolerant <= mVersionFallbackLimit) {
> +      switch (mVersionFallbackLimit) {
> +      case SSL_LIBRARY_VERSION_TLS_1_0:

nit: indentation

::: security/manager/ssl/src/nsNSSIOLayer.h
@@ +224,5 @@
>                                     uint16_t intolerant, uint16_t minVersion,
>                                     PRErrorCode intoleranceReason);
>    bool rememberStrongCiphersFailed(const nsACString& hostName, int16_t port,
>                                     PRErrorCode intoleranceReason);
> +  // returns the known tolerant version

nit: add "or 0 if there is no known tolerant version" or equivalent

::: toolkit/components/telemetry/Histograms.json
@@ +6456,5 @@
> +  "SSL_FALLBACK_LIMIT_REACHED": {
> +    "expires_in_version": "default",
> +    "kind": "enumerated",
> +    "n_values": 16,
> +    "description": "TLS/SSL version fallback reached the limit, stopped the fallback"

Which limit? From what I can tell, this telemetry will be collected both when the version fallback limit is reached and when the minimum enabled version is reached. We should document that here.
Attachment #8530480 - Flags: review?(dkeeler) → review+
(Assignee)

Comment 37

3 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/35f5ec149ad5
https://hg.mozilla.org/integration/mozilla-inbound/rev/38616c7e2da9
Status: NEW → ASSIGNED
(Assignee)

Comment 38

3 years ago
And followup to fix the wrong description:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fa0ba58df12f
https://hg.mozilla.org/mozilla-central/rev/35f5ec149ad5
https://hg.mozilla.org/mozilla-central/rev/38616c7e2da9
https://hg.mozilla.org/mozilla-central/rev/fa0ba58df12f
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
See Also: → bug 861310
(Assignee)

Updated

3 years ago
Depends on: 1109613
(Assignee)

Updated

3 years ago
Depends on: 1095507
I don't think bug 1095507 or any bugs for SSL3-only sites should be linked to *this* bug; instead, they should get linked (only) to the POODLEBITES bug. It's better to keep the dependency list of this bug reserved for TLS 1.2 and TLS 1.1 intolerance bugs. In particular, reverting the patch for this bug won't fix any of those SSL3-only sites.
No longer depends on: 1095507
(Assignee)

Comment 41

3 years ago
www.kredodirect.com.ua is not only SSLv3-only but also TLS 1.1+ intolerant. If the site administrator only enables TLS 1.0 without fixing the intolerance, Nightly still cannot access the site.
Should I file a separate bug even if the site URL happens to be the same?
Flags: needinfo?(brian)
(In reply to Masatoshi Kimura [:emk] from comment #41)
> www.kredodirect.com.ua is not only SSLv3-only but also TLS 1.1+ intolerant.
> If the site administrator only enables TLS 1.0 without fixing the
> intolerance, Nightly still cannot access the site.
> Should I file a separate bug even if the site URL happens to be the same?

I would only file bugs for the case where the site already supports TLS 1.0+ but no longer works due to this patch. i.e. only for sites where backing out this patch would get the site working again.

It is possible that the site admin might fix the bug in a way that makes this bug relevant, but it is unlikely that will happen.
Flags: needinfo?(brian)
(Assignee)

Comment 43

3 years ago
> i.e. only for sites where backing out this patch would get the site working again.

The site will work again if *both* bug 1076983 and this bug are backed out. So I set the dependency on both.
(In reply to Masatoshi Kimura [:emk] from comment #43)
> > i.e. only for sites where backing out this patch would get the site working again.
> 
> The site will work again if *both* bug 1076983 and this bug are backed out.
> So I set the dependency on both.

Yes, but at this point I think it is safe to assume that bug 1076983 won't get backed out, so it doesn't matter.
Created attachment 8535930 [details] [diff] [review]
Part 3: Clean up some bits

When we change a preference in security-prefs.js or firefox.js or elsewhere, we also need to make a change like the one in this patch.
Attachment #8535930 - Flags: review?(dkeeler)
Attachment #8535930 - Flags: review?(VYV03354)
Attachment #8535930 - Attachment description: disable-non-secure-fallback-p3.patch → Part 3: Clean up some bits
David and Masatoshi-san, I think we should uplift these patches to mozilla-aurora and mozilla-beta. I just saw the latest SSL Pulse report at https://www.trustworthyinternet.org/ssl-pulse/ that indicates that 10% of the (approx.) top 200,000 sites have the POODLE vulnerability. That's quite a large percentage.

Obviously, this change has some higher-than-normal compatibility risks. But, the setting is controlled by a pref, so we can revert it if necessary. At the very least, it would be good to get the code (including the new telemetry changes) into Firefox 35 so that there is the *option* of controlling this with minimal risk, should things really get out of hand. WDYT?
Flags: needinfo?(dkeeler)
Flags: needinfo?(VYV03354)
Attachment #8535930 - Flags: review?(dkeeler) → review+
I wouldn't be opposed to it, but I think we need more bake time for the pref change. We could start by uplifting the telemetry changes, see how the numbers look, and go from there.
Flags: needinfo?(dkeeler)
(Assignee)

Updated

3 years ago
Flags: needinfo?(VYV03354)
Attachment #8535930 - Flags: review?(VYV03354) → review+
(Assignee)

Comment 48

3 years ago
Oops, needinfo request was accidentally cleared. I don't disagree with uplifting the change.
(In reply to David Keeler (:keeler) [use needinfo?] from comment #47)
> I wouldn't be opposed to it, but I think we need more bake time for the pref
> change. We could start by uplifting the telemetry changes, see how the
> numbers look, and go from there.

Do you think that a few days on Nightly would suffice?  Or maybe we can get as much time as possible before the next release?  Is this worth a chemspill release?  It seems like it might, almost.

I think that we should consider updating the "SSL version control" add-on to bump this pref, or shipping another similar add-on, so that we can provide some sort of coverage in the meantime.  Richard, see comment 46.
Flags: needinfo?(rlb)
Thanks for the quick reviews!

This is part 3: https://hg.mozilla.org/integration/mozilla-inbound/rev/e6e990808c4f
Comment on attachment 8530480 [details] [diff] [review]
Add telemetry to measure failures due to not falling back, v3

Approval Request Comment
[Feature/regressing bug #]: bug 1076983, the POODLE attack
[User impact if declined]: Increased exposure to the POODLE attack.
[Describe test coverage new/current, TBPL]: This is just telemetry. This has been on mozilla-central for 2014-12-09.

[Risks and why]: This is just telemetry. Seems like low risk. (:emk, :keeler: correct me if I'm wrong). AFAICT, no issues have been reported that have to deal with this *telemetry*.

[String/UUID change made/needed]: none

It is important that Mozilla be prepared to land all the patches in this bug in case the POODLE stuff gets real bad. Some would argue it is already really bad. In the discussion above, we agreed to uplift the lowest-risk part (the telemetry) ahead of a possible decision to uplift the riskier parts (the actual disabling of the non-secure fallback logic). Landing this patch in -aurora and -beta ASAP will reduce risk later.
Attachment #8530480 - Flags: approval-mozilla-beta?
Attachment #8530480 - Flags: approval-mozilla-aurora?
Comment on attachment 8511941 [details] [diff] [review]
Disable insecure version fallback, v2

Approval Request Comment
[Feature/regressing bug #]: See above uplift request for the telemetry.

[User impact if declined]: Less testing of the impact of disabling non-secure version fallback. Without this testing, it will be riskier to decide whether to uplift this to -beta later. Note that :mt has already hinted at uplifting it to -release, even. If so, we need as much testing as possible.

[Describe test coverage new/current, TBPL]: The test coverage from pre-existing tests is good. This landed on -central on 2014-12-09.

[Risks and why]: Note that this uplift request is for -aurora only, not -beta. My hope is that testing on -central and -aurora over the next few days will go well, and then we'll uplift to -beta. The risk for -aurora is low, because it is so easy to undo (just revert the patch, which just flips a pref). Risk for -release is much higher, because there are potentially serious and hard-to-predict compatibility issues that can arise from this. (This bug is disabling a feature that trades security for compatibility, because it turns out the security tradeoff is much larger than we expected.) Firefox would be the first browser to turn off this compatibility feature. Although Mozilla can and should do additionally compatibility testing before uplifting to -beta and -release, lots of end-user testing on -aurora and soon -beta will be needed because of compatibility issues that arise due to anti-virus software and proxies, which cannot easily be tested.

[String/UUID change made/needed]: None
Attachment #8511941 - Flags: approval-mozilla-aurora?
Comment on attachment 8535930 [details] [diff] [review]
Part 3: Clean up some bits

Approval Request Comment: See the most recent, aurora-only, comment.
Attachment #8535930 - Flags: approval-mozilla-aurora?
As you can see from the above uplift requests, my suggestion for handling this is to uplift to -aurora immediately. Concurrently with that, somebody should run the pkix-compatibility-test thing to find known-incompatible servers. The results of the pkix-compatibility-test should be triaged so that compatibility issues that Gecko can work around can be fixed, and so that Mozilla can contact server vendors about compatibility fixes. Then, a decision should be made about uplifting to -beta or, like :mt hinted at, to -release. Because the compatibility issues are due to things that pkix-compatibility-test cannot measure, as much end-user testing as possible is needed. It would be good for Mozilla to make a special public call-to-action to ask volunteers to test this.
(Assignee)

Comment 55

3 years ago
Created attachment 8535985 [details] [diff] [review]
Telemetry patch for aurora

Folded https://hg.mozilla.org/integration/mozilla-inbound/rev/38616c7e2da9 and https://hg.mozilla.org/integration/mozilla-inbound/rev/fa0ba58df12f for ease of uplift.
(Assignee)

Comment 56

3 years ago
Created attachment 8535986 [details] [diff] [review]
Telemetry patch for beta

The nightly/aurora patch did not apply beta cleanly.
Before we get carried away with uplifts to release channels, we have to ask if this is going to fix a real problem (this in particular: http://arstechnica.com/security/2014/12/meaner-poodle-bug-that-bypasses-tls-crypto-bites-10-percent-of-websites/).  That flaw hits a large number of sites, but doesn't get fixed by having a newer TLS version.  For instance, bankofamerica.com negotiates a CBC mode with TLS 1.2.

It might be addressed if an AEAD cipher suite is selected, but I haven't seen any evidence that the affected sites are selecting GCM, even though a number of them support TLS 1.2.  Someone asked if we were willing to mark CBC as "weak" in the same way that we mark RC4, but I suspect that the net effect will be that we slow down 90% of the web to no measurable improvement in security.

(Sorry if I got too enthuasiastic about this.  I thought that it might help.  After looking at the numbers, I'm fairly sure it will have a negligible impact.)
Flags: needinfo?(rlb)
(In reply to Martin Thomson [:mt] from comment #57)
> That flaw hits a large number of
> sites, but doesn't get fixed by having a newer TLS version.  For instance,
> bankofamerica.com negotiates a CBC mode with TLS 1.2.

Good point.

> It might be addressed if an AEAD cipher suite is selected, but I haven't
> seen any evidence that the affected sites are selecting GCM, even though a
> number of them support TLS 1.2.

>  Someone asked if we were willing to mark
> CBC as "weak" in the same way that we mark RC4, but I suspect that the net
> effect will be that we slow down 90% of the web to no measurable improvement
> in security.

I agree.

> (Sorry if I got too enthuasiastic about this.  I thought that it might help.
> After looking at the numbers, I'm fairly sure it will have a negligible
> impact.)

It would be good to see the actual numbers and methodology used to collect them.

Also, peek in mind that Ivan's current number of 0.7% of websites (7/1,000) seems to greatly overestimate the compatibility impact of this change, because when he collected those numbers he put 0x0303 in the ClientHello record layer, but browsers (including Firefox) put 0x0301. I think he's planning on collecting new numbers, but in the meantime the Firefox telemetry numbers are a better indicator of compatibility impact. As I said above, the compatibility impact is about 1/7th the compatibliity impact of disabling SSL 3.0, based on the Firefox telemetry.
Also, I think that uplifting the additional telemetry to -aurora and -beta makes sense, regardless of the final decision on whether to uplift the actual change. And, I think it makes sense to uplift the preference change to -aurora for now, to get (slightly) more testing before a decision is made.
Comment on attachment 8530480 [details] [diff] [review]
Add telemetry to measure failures due to not falling back, v3

Approving the telemetry portion for now.
Attachment #8530480 - Flags: approval-mozilla-beta?
Attachment #8530480 - Flags: approval-mozilla-beta+
Attachment #8530480 - Flags: approval-mozilla-aurora?
Attachment #8530480 - Flags: approval-mozilla-aurora+
(Assignee)

Updated

3 years ago
Keywords: checkin-needed
(Assignee)

Updated

3 years ago
Attachment #8535985 - Flags: checkin?
(Assignee)

Updated

3 years ago
Attachment #8535986 - Flags: checkin?
status-firefox35: --- → affected
status-firefox36: --- → affected
status-firefox37: --- → fixed
tracking-firefox35: --- → +
tracking-firefox36: --- → +
Comment on attachment 8535985 [details] [diff] [review]
Telemetry patch for aurora

We have bug queries specfically for uplifts. Adding flags all over just clutters up other queries.
Attachment #8535985 - Flags: checkin?
Attachment #8535986 - Flags: checkin?
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #58)
> > (Sorry if I got too enthuasiastic about this.  I thought that it might help.
> > After looking at the numbers, I'm fairly sure it will have a negligible
> > impact.)
> 
> It would be good to see the actual numbers and methodology used to collect
> them.

I'm using these numbers, which were derived from telemetry (see slide 12): https://tools.ietf.org/agenda/91/slides/slides-91-saag-3.pdf

The key information I'm looking for is the number of sites that have GCM, but don't otherwise select it.  This indicates that it's better than you'd think.  25% moved off RC4; 1.2% moved from RC4 to GCM.  That's probably the 1.2% that were concerned about BEAST and weren't able to order for <=1.0 differently.  I don't expect that we can do anything at all about the 1.2% who moved to 3DES, and I don't typically see someone picking a CBC mode over GCM if GCM is present.  I don't see any significant moves.

Let's look at what the telemetry tells us about fallback.  I'm still concerned that we aren't buying ourselves much value here.  BEAST mitigation is perhaps the best we can hope for.  Maybe that's enough.
 
> Also, peek in mind that Ivan's current number of 0.7% of websites (7/1,000)
> seems to greatly overestimate the compatibility impact of this change,
> because when he collected those numbers he put 0x0303 in the ClientHello
> record layer, but browsers (including Firefox) put 0x0301. I think he's
> planning on collecting new numbers, but in the meantime the Firefox
> telemetry numbers are a better indicator of compatibility impact. As I said
> above, the compatibility impact is about 1/7th the compatibliity impact of
> disabling SSL 3.0, based on the Firefox telemetry.

Based on nightly, we hit our fallback limit pretty rarely.  I have 122k hitting the fallback (mostly due to TCP RST) against 2008k that encounter a fallback condition.  But we're only falling back now for RC4, so it's hard to tell how many of those might have succeeded if we let them progress.

Most fallbacks fall all the way through, so it's hard to get a good sense of what is going on.
https://hg.mozilla.org/releases/mozilla-aurora/rev/409a02246b8a
https://hg.mozilla.org/releases/mozilla-beta/rev/e07fa09385d5
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/e6e990808c4f

Updated

3 years ago
Depends on: 1111354

Updated

3 years ago
Depends on: 1112110
(Assignee)

Updated

3 years ago
Depends on: 1112399

Updated

3 years ago
Depends on: 1095507
(Assignee)

Comment 65

3 years ago
(In reply to Masatoshi Kimura [:emk] from comment #41)
> www.kredodirect.com.ua is not only SSLv3-only but also TLS 1.1+ intolerant.
> If the site administrator only enables TLS 1.0 without fixing the
> intolerance, Nightly still cannot access the site.

Sadly I got it right :(
Comment on attachment 8511941 [details] [diff] [review]
Disable insecure version fallback, v2

Brian, Can an user override the error page?
Flags: needinfo?(brian)
Attachment #8511941 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8535930 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Comment 67

3 years ago
The user can change the "security.tls.version.fallback-limit" pref value from about:config to bring back the old behavior.
(In reply to Sylvestre Ledru [:sylvestre] from comment #66)
> Comment on attachment 8511941 [details] [diff] [review]
> Disable insecure version fallback, v2
> 
> Brian, Can an user override the error page?

No.

(In reply to Masatoshi Kimura [:emk] from comment #67)
> The user can change the "security.tls.version.fallback-limit" pref value
> from about:config to bring back the old behavior.

Yes. Another (better) option would be to create a pref that is a comma-separated list of domain names for which non-secure fallback to TLS 1.0 is still done. We could even pre-populate this pref with a list of domains we know are TLS 1.2 intolerant. This would help ensure that the feature stays enable for release.
Flags: needinfo?(brian)
(Assignee)

Updated

3 years ago
Depends on: 1114816
(Assignee)

Comment 69

3 years ago
Filed bug 1114816.
Depends on: 1115883
We're too late in 35 to take this, wontfixing.
status-firefox35: affected → wontfix

Updated

3 years ago
Depends on: 1116891
(Assignee)

Updated

3 years ago
No longer depends on: 1116891
status-firefox36: affected → fixed
I think that I would like to see this in aurora for another release before taking this to release.  We have seen a few sites that break already, and we know that release and beta users encounter more legacy sites than on nightly and aurora.  :keeler, :emk, :briansmith, am I being too cautious here? Or could this actually risk losing us users?

:sylvestre, Providing that others agree with me, how do I ensure that this doesn't accidentally ride the trains?
Flags: needinfo?(sledru)
Flags: needinfo?(dkeeler)
Matt is working on an analysis of the impact of this change. With clear numbers, it should be easier to make a call on this for 36.
Flags: needinfo?(sledru) → needinfo?(mwobensmith)
36 is about to go to beta, right? I think it would be worthwhile to see the impact of this on the beta population (given that aurora and nightly are much smaller and different populations). If there's too much breakage, we can back it out of 36 once it's in beta.
Flags: needinfo?(dkeeler)
(Assignee)

Updated

3 years ago
Depends on: 1120942
I added two more sites to bug 1116891 that appear to be TLS intolerant in some form.

In short, only about a dozen sites out of over 200k run seem to be affected by this change, if I'm not mistaken.

I did notice a small amount of sites broken by our recent removal of 1024-bit signed root certs (bug 881553) but that's to be expected.

In summary, I'm not seeing a huge impact.
Flags: needinfo?(mwobensmith)
If you are willing to stand by that, I think that I'm willing to proceed with this.  If we get a catastrophic number of complaints in Beta, that might be a cue for backout, but I think that we're fine.  I should ping the other browser vendors about this.

Comment 76

3 years ago
(In reply to Martin Thomson [:mt] from comment #75)
> I should ping the other browser vendors about this.

Hi. I work on WebKitGTK+, which is used by many desktop applications and smaller browsers. I want to drop insecure TLS version fallback, but hesitate to do so because we already have enough problems with HTTPS sites that Firefox handles and we don't. If you go through with this change, then we will drop insecure fallback from libsoup, affecting both WebKitGTK+ and EFL WebKit.

This change will also be good for the web, since TLS intolerance is a chicken-and-egg problem: servers will likely be fixed much faster if a major browser like Firefox stops connecting, and other major browsers will hesitate to make the change until another one does first.
Martin, what would you consider to be an indicator of whether we ship this or not? Percent of sites affected? Specific sites affected? And - to be sure - the "connection reset" issues (bug 1116891) that only happen in Fx36 are solely due to this change, right?
(In reply to Michael Catanzaro from comment #76)
> If you go through with this change, then we
> will drop insecure fallback from libsoup, 

Thanks for the feedback. I have received positive signals from Microsoft and a "good luck with that" from Google.  Note that we will retain the preference that will re-enable this (that helps us back out the change too); and we will still likely retain the fallback code when we start deploying TLS 1.3.

(In reply to Matt Wobensmith from comment #77)
> Martin, what would you consider to be an indicator of whether we ship this
> or not? Percent of sites affected? Specific sites affected? 

Probably a combination of all of the above.  If we see a noticeable increase in connection failures in telemetry, that's a bad sign.  When we dropped SSL3, the change was hard to detect, though we might have lost somewhere in the order of 0.3%.  If this has a similar effect, then I think we're good.

> And - to be sure
> - the "connection reset" issues (bug 1116891) that only happen in Fx36 are
> solely due to this change, right?

Yes, those sites are intolerant to our handshake in various interesting ways.  Those should be addressed by making our fallback to RC4 more sensitive to different failures though (i.e., :emk's patch on that bug).

Interesting observation:  Removing fallback entirely has an interesting effect on exposing initial page loads to network glitches.  With fallback enabled, an initial page load has a higher probability of success because we get two or more attempts at connecting (assuming the server doesn't support the downgrade SCSV).  The glitch rate seems to be quite high (based on the number of spurious inappropriate_fallback alerts we're seeing).  Therefore, I suspect that it will be very hard to properly distinguish between mere network glitches and real intolerance with the telemetry we have.  That's why your testing is so valuable.
(In reply to Martin Thomson [:mt] from comment #78)

> Probably a combination of all of the above.  If we see a noticeable increase
> in connection failures in telemetry, that's a bad sign.  When we dropped
> SSL3, the change was hard to detect, though we might have lost somewhere in
> the order of 0.3%.  If this has a similar effect, then I think we're good.

I use two sets of data for testing SSL compatibility - the Pulse top sites list (220k) and the Google CT list (1.8 million). The former is more interesting as far as active, popular sites, while the latter contains a lot of outliers.

For Fx36, it appears that when using the Pulse list, we have a failure rate of less than .005% for this specific change. My findings were recorded in bugs 1116891, 1116892 and 1116893. To me, this seems very small. Affected sites are listed in those bugs.

I haven't run the entire Google CT list due to time constraints, but I did a pass yesterday with a smaller random selection of 100k sites. From that sample, I am seeing a failure rate of around .024% (24 out of 100k). I haven't dug into all errors individually, but the majority appear to be the same as bug 1116891. Another data point - many broken sites from that list appear abandoned or often unused (default IIS home page). I have the error site list but it is not conclusive since it is only a subset. If you need a full run with the Google CT list, it will take more time.

Let me know if you need more from me.
(Assignee)

Comment 80

3 years ago
Looks like this was not actually landed on beta:
https://mxr.mozilla.org/mozilla-beta/source/netwerk/base/public/security-prefs.js?rev=e23ba6e29b84&mark=7-7#7
status-firefox36: fixed → affected
Yeah, those approvals got missed due to the other approval noise from this bug (doing things the way this bug did them makes uplift tracking a big PITA, fwiw). Anyway, you'll need to request beta approval on those patches at this point.
Flags: needinfo?(VYV03354)
Also, Part 3 depends on bug 1093724, so presumably you're going to want to request beta approval on that as well.
Indeed, next time, it would be nice to have a bug for the experiment and an other for the change itself.

Anyway, is it a big deal if it is not released with 36 but with 37?
:sylvestre, I don't think the delay harms us much, if at all.  This does have the potential to cause problems.
Martin, any idea then what caused the TLS intolerance regressions in Fx36, if it wasn't this change? Was it bug 861310?
Flags: needinfo?(martin.thomson)
Matt, I don't know what you tested with.  I think that the patch is on Aurora, and it only missed the uplift to Beta.  If you tested on Aurora, then this would still be the cause.  None of the regressions that I've seen are likely to be as a result of false start changes.
Flags: needinfo?(martin.thomson)
(Assignee)

Comment 87

3 years ago
IMO if we land this on beta, we should land some mitigations either (e.g. bug 1112399 and bug 1114816).
Flags: needinfo?(VYV03354)
(Assignee)

Comment 88

3 years ago
(In reply to Martin Thomson [:mt] from comment #86)
> Matt, I don't know what you tested with.  I think that the patch is on
> Aurora, and it only missed the uplift to Beta.  If you tested on Aurora,
> then this would still be the cause.  None of the regressions that I've seen
> are likely to be as a result of false start changes.

Matt refers bug 1116891 to 1116893. And indeed bug 1116891 and 1116892 happen on Firefox 36 (currently Beta, was Aurora when Matt tested that).
I'd suggest that you do try to enable this in Firefox 36 beta, and make a decision whether to ship it in the release in the last beta cycle.

Reasoning:

* The non-secure fallback logic makes it easy for an active attacker to downgrade connections from the secure AEAD cipher suites to RC4 (and CBC-mode) cipher suites, because the AEAD cipher suites only work for TLS 1.2.

* There is a talk [1] scheduled for around March 24th, which is AFTER the Firefox 36 release and BEFORE the Firefox 37 release. Deferring this change until Firefox 37 means that it won't be helpful during the period of time between March 24 and the Firefox 37 release.

* I've heard rumors that another paper on attacking RC4 in TLS that will be released soon--maybe even sooner than the Blackhat paper.

* :emk has already written a patch for adding a whitelist of known-TLS-intolerant sites to Firefox.

* If Matt re-runs his compatibility tests against the entire Google CT dataset, he will be able to report a fairly comprehensive list of sites that can be added to the default value of the whitelist.

* Uplifting the patch for bug 1112399 (and bug 1096197) will allow Beta channel users to report additional websites that are broken through the "tell Mozilla about this" feature on the SSL error page. By monitoring this feedback, Mozilla can expand the whitelist further as necssary.

* Using the Hotfix update, Mozilla can dynamically expand the set of sites in the known-TLS-intolerant whitelist without having to do a chemspill release.

* There is telemetry for measuring this feature, which can be monitored during the next three weeks in order to make a decision whether to ship this in the final release.

* The patches that would need to be uplifted are all low risk and all are easy to undo.

* Even if it is decided to be too risky to ship in Firefox 36, and the feature gets delayed until Firefox 37, Firefox 37 would still benefit from the additional 3 weeks of learning about how the feature works on Firefox 36 beta.

We had a very similar situation with TLS 1.2 around this time last year. Firefox ended up getting some notably bad press due to being 2-3 weeks slower than people who decided to make a huge deal about the fact that it didn't support TLS 1.2 (even though it practically didn't matter). I think it would be good to learn from that experience and be more preemptive here.

[1] https://www.blackhat.com/asia-15/briefings.html#bar-mitzva-attack-breaking-ssl-with-13-year-old-rc4-weakness
This seems like a change that we should list in our release notes. Can you help with the wording? Is "Disabled insecure TLS version fallback" clear enough? Is this specific to a version of TLS or is there any other way that we should further qualify this change?
tracking-firefox37: --- → +
relnote-firefox: --- → ?
Flags: needinfo?(VYV03354)

Comment 91

3 years ago
RC4 has nothing to do with TLS version fallback though.
(In reply to Yuhong Bao from comment #91)
> RC4 has nothing to do with TLS version fallback though.

Yes it does, because many websites will use RC4 when the browser uses a TLS 1.1 or TLS 1.0 ClientHello, but will use AES-GCM when the browser uses a TLS 1.2 ClientHello.
(Assignee)

Comment 93

3 years ago
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #90)
> This seems like a change that we should list in our release notes. Can you
> help with the wording? Is "Disabled insecure TLS version fallback" clear
> enough? Is this specific to a version of TLS or is there any other way that
> we should further qualify this change?

Brian could give much better wording than me.
Flags: needinfo?(VYV03354)
I think that :lmandel's suggestion is fine.
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #89)
> I'd suggest that you do try to enable this in Firefox 36 beta, and make a
> decision whether to ship it in the release in the last beta cycle.
Thanks for the long and impressive rational :)
I would like to see the results of Matt's tests before making a call on this.
Created attachment 8555562 [details]
fx36_tls_fallback_3_errors.txt

I've run site compatibilty for Fx36.0b2, changing the pref "security.tls.version.fallback-limit" from default of 1 to 3. I ran the Pulse top sites list of ~211k sites and diffed the results.

With the fallback limit set to 3, we see 561 uniquely broken sites, or approximately 0.27% rounded up. I'm attaching the list of broken sites to this bug. Of the broken sites:

- About half (284) have "NetworkInterruptError" with error string "pr_end_of_file_error". I've examined half a dozen of these sites and all have TLS version intolerance, so I'm guessing that might be the primary reason.

- The rest - including errors labeled "undefined" - appear to be a grab-bag of issues that are caused when the server is forced to downgrade its TLS version, such as attempting to use SSLv3.  

That's the quantitative part. The qualitative part is that - while many sites on this list would be kindly labeled "inactive" - there are also some high-visibility sites such as "products.geotrust.com" and three properties for "bankofthewest.com".

If you need me to run this test against Google's CT list - 10x the domains - it will be at least another week of work, as I'm out of the office part of this week and couldn't begin until 2015-02-04. In the meantime, I hope the above gives you some insight into feasibility for inclusion into Fx36.

Lastly, please note that previous test results in comment 79 were erroneously thought to be testing this change, but were in fact due to bug 1088915. I've factored those failures out of this specific testing.

Comment 97

3 years ago
Adding Rick Andrews as products.geotrust.com was mentioned in comment 96.

Updated

3 years ago
Depends on: 1126620

Comment 98

3 years ago
(In reply to Matt Wobensmith from comment #96)
> The qualitative part is that - while many
> sites on this list would be kindly labeled "inactive" - there are also some
> high-visibility sites such as "products.geotrust.com" and three properties
> for "bankofthewest.com".

If there is a small set of high-visibility sites like this, then filing a dozen or so TE bugs for them now might be a good idea.

I just created bug 1126620 to start to organize this. Users will undoubtedly file more.
Depends on: 1126652
Depends on: 1126654
(Assignee)

Comment 99

3 years ago
Comment #96 summary:
0x804b0014 / NetworkError / pr_connect_reset_error: 1
0x804b0047 / NetworkInterruptError / pr_end_of_file_error: 282
0x804b0014 / NetworkError / pr_end_of_file_error: 2
0x805a2ff0 / SecurityError / ssl_error_bad_mac_alert: 5
0x805a2ff1 / SecurityError / ssl_error_bad_mac_read: 47
0x805a2fc3 / SecurityError / ssl_error_handshake_failure_alert: 1
0x805a2fc5 / SecurityError / ssl_error_handshake_unexpected_alert: 3
0x805a2ffe / SecurityError / ssl_error_no_cypher_overlap: 137
0x804b0014 / NetworkError / ssl_error_no_cypher_overlap: 1
0x805a2f9e / SecurityError / ssl_error_protocol_version_alert: 26
0x8000ffff / undefined / undefined: 56
I also manually checked "undefined" sites:
	pr_end_of_file_error: 37
	ssl_error_bad_mac_read: 3
	ssl_error_no_cypher_overlap: 15
	(Loading forever): 1
(In reply to Matt Wobensmith from comment #96)
> With the fallback limit set to 3, we see 561 uniquely broken sites, or
> approximately 0.27% rounded up. I'm attaching the list of broken sites to
> this bug.
561 websites is a big number, too big compared to the release date (less than 3 weeks).
I think we should do in 37 (with the agreement of the 37 driver, Lawrence).
status-firefox36: affected → wontfix

Comment 101

3 years ago
I think it is more like 5 weeks than 3 weeks.
(In reply to Yuhong Bao from comment #101)
> I think it is more like 5 weeks than 3 weeks.

If you're referring to the Firefox 36 release date, the release is scheduled for Feb 24, which is 26 days from now.
(Assignee)

Comment 103

3 years ago
Removing evang bugs because bug 1126620 will track them.
Depends on: 1128227
No longer depends on: 1095507, 1111354, 1112110, 1115883, 1126652, 1126654

Comment 104

3 years ago
About products.geotrust.com, here's the server configuration: Weblogic 10.3, using the Certicom SSL stack, with only TLS1 enabled and no support for TLS1.1/1.2. 

Is there something unique about that configuration that causes your logic to fail?

Comment 105

3 years ago
It is more likely a bug in the server software than in the configuration, but thanks for naming the server software affected.

Comment 106

3 years ago
Are you asking us to take any particular action?

Comment 107

3 years ago
See if you can obtain a fix from the vendor.

Comment 108

3 years ago
It's not clear to me from reading this thread exactly what "fix" I would ask for. Can you explain the bug as you see it?

Comment 109

3 years ago
The server should respond with a ServerHello showing the TLS 1.0 (SSL 3.1) version in response to a ClientHello showing any later version (such as TLS 1.2 and TLS 1.3), basically.

Comment 110

3 years ago
(In reply to Rick Andrews from comment #108)

The total fix, however, is to upgrade your SSL/TLS stack to a version which supports TLS 1.2. (preferably also with FS & AES-GCM) TLS 1.0 will eventually be phased out as well and TLS 1.2 provides far better security. It would be best for everyone to start upgrading long before vendors have to do another panic disable for TLS 1.0 like was needed for SSL 3. (of course, nobody knows what the timetables are here, but being complacent is not good) TLS clients which support 1.2 would generally be expected to handle fallbacks correctly.

Qualys SSL Labs server test is the best way to check for TLS version intolerance at the moment:
https://www.ssllabs.com/ssltest/analyze.html?d=products.geotrust.com

The versions it breaks with due to intolerance are listed in the "Protocol Details" section. To not break with current standards, it can't be 1.1 or 1.2 intolerant (or extension intolerant, which is not a problem here). Your server is also intolerant to 1.3, so when the new version comes out you'll have issues there as well.

Further discussion on this topic should be done elsewhere, as this bug is resolved. New TLS intolerance TE bugs can be opened as blocking bug 1126620.

Comment 111

3 years ago
We're already working on upgrading the site, so I'm not going to file any bugs with any vendor.

Updated

3 years ago
Depends on: 1129925
Created attachment 8563649 [details]
fx36_fallback3_googlect_results.txt

I have completed a pass on Fx36 using the Google Certificate Transparency list of ~2.6 million sites, with the same parameters as what I used in comment 96. I am attaching a list of broken sites to this bug.

In summary:

* 7647 uniquely broken sites out of 2678142 sites total, or roughly 0.286% breakage.
* Most errors look like what we see with the previous run against the Pulse list - issues when downgrading to an unsupported TLS protocol or TLS intolerance.
* IMPORTANT: I have not individually verified all ~7k broken sites as legitimately broken. This is too time-consuming to do right now. This testing is imperfect and has noise - especially with a data set this large - so please allow a small degree of error. However, this still helps us understand the scope of the changes we wish to make.

The results of running this list seems pretty close to the previous run - whose results were meticulously confirmed - so I'm reasonably confident that we know how invasive this change will be, should we choose to do it.
relnoted as "Disabled insecure TLS version fallback for site security"
relnote-firefox: ? → 37+
(Assignee)

Comment 114

2 years ago
Looks like api.geotrust.com and products.geotrust.com were fixed, but the following subdomains are still broken:
canadaca.geotrust.com
services.geotrust.com
smarticon.geotrust.com
Duplicate of this bug: 1120942
(Assignee)

Updated

2 years ago
Keywords: site-compat
Blocks: 901695
Blocks: 936818

Comment 116

2 years ago
Folks - while I applaud the technical decision about this - there needs to be better communication to end users about this change.  Forums are absolutely littered with discussion on this topic.  End users don't know or care why their web sites are no longer working, they just want them to work.  

We need to have a detailed article discussing this change and PROPER mitigation strategy should a user encounter a website that no longer works.  Stating "contact the website owner" is not acceptable.  Users will dump Firefox and switch to another web browser at that point which "just works" with the given website.
You need to log in before you can comment on or make changes to this bug.