Closed Bug 1084025 Opened 10 years ago Closed 9 years ago

Disable insecure TLS version fallback

Categories

(Core :: Security: PSM, defect)

33 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla37
Tracking Status
firefox35 + wontfix
firefox36 + wontfix
firefox37 + fixed
relnote-firefox --- 37+

People

(Reporter: s.franke, Assigned: emk)

References

Details

(Keywords: site-compat)

Attachments

(7 files, 5 obsolete files)

1.14 KB, patch
keeler
: review+
Details | Diff | Splinter Review
5.65 KB, patch
keeler
: review+
Details | Diff | Splinter Review
1.31 KB, patch
emk
: review+
keeler
: review+
Details | Diff | Splinter Review
6.00 KB, patch
Details | Diff | Splinter Review
3.43 KB, patch
Details | Diff | Splinter Review
38.69 KB, text/plain
Details
528.11 KB, text/plain
Details
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
Build ID: 20141011015303

Steps to reproduce:

I connect to a server providing TLS 1.0, TLS 1.1 and TLS 1.2.


Actual results:

An attacker blockes the handshake if TLS 1.1 or TLS 1.2 is used.

The Browser downgrades the TLS version to TLS 1.0.


Expected results:

The Browser MUST not downgrade the security level. Since the TLS protocol negotiates the TLS version itself.

The offending code is in nsNSSIOLayer.cpp:

void
nsSSLIOLayerHelpers::adjustForTLSIntolerance(const nsACString& hostName,
                                             int16_t port,
                                             /*in/out*/ SSLVersionRange& range)
{
  IntoleranceEntry entry;

  {
    nsCString key;
    getSiteKey(hostName, port, key);

    MutexAutoLock lock(mutex);
    if (!mTLSIntoleranceInfo.Get(key, &entry)) {
      return;
    }
  }

  entry.AssertInvariant();

  if (entry.intolerant != 0) {
    // We've tried connecting at a higher range but failed, so try at the
    // version we haven't tried yet, unless we have reached the minimum.
    if (range.min < entry.intolerant) {
      range.max = entry.intolerant - 1;
    }
  }
}
OS: Windows 7 → All
Hardware: x86_64 → All
Component: Untriaged → Security
Severity: normal → critical
Mozilla is already aware, see https://blog.mozilla.org/security/2014/10/14/the-poodle-attack-and-the-end-of-ssl-3-0/

"SSLv3 will be disabled by default in Firefox 34, which will be released on Nov 25. The code to disable it is landing today in Nightly, and will be promoted to Aurora and Beta in the next few weeks."
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
This ticket is not related to SSL 3.0

It's about bogus TLS 1.2 downgrade to TLS 1.0
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
Then how is this bug related to POODLE?
Summary: (POODLE) TLS is downgraded by Browser → TLS is downgraded by Browser
Part of POODLE is "On Downgraded Legacy Encryption".

An attacker can still provoke using TLS 1.0 instead of TLS 1.2.

And TLS 1.0 is weaker than TLS 1.2.
It applies to downgrading attacks generally. How is it specially related to POODLE?
Now you can effectively disable the insecure fallback by setting "security.tls.version.fallback-limit" to 3.

I found a discussion about removing the insecure fallback in dev.tech.crypto:
https://groups.google.com/d/msg/mozilla.dev.tech.crypto/Qis_B8b2b7U/CCOouHbcLN4J

FYI, Brian Smith introduced the insecure fallback to TLS 1.1 or later (Bug 839310). According to his comment, he didn't care much about making Firefox more secure than only-TLS1.0-capable browsers.
https://bugzilla.mozilla.org/show_bug.cgi?id=839310#c6
Blocks: 839310
Status: UNCONFIRMED → NEW
Component: Security → Security: PSM
Depends on: 1083058
Ever confirmed: true
Product: Firefox → Core
Please don't close bugs that you didn't open, unless you are a module owner or module peer in this module.

Back when I wrote my comments in bug 839310, our goal was to turn TLS 1.2 and TLS 1.1 on in Firefox ASAP. Because we wanted to do so ASAP, and because other browsers were already doing the non-secure fallback, we didn't have much choice except to do the fallback the same way the others do. I always planned on trying to find some way to do so with minimal compatibility impact, but I left Mozilla before I could do so.

Now, you can see on the IETF TLS WG mailing list that I am arguing that all browsers should remove the non-secure fallback. And, I've provided a pretty good argument as to why. See [1]. I believe now is the best time, politically, to disable the non-secure fallback in all browsers, and there are practical security concerns motivating the decision to do so. Mozilla should take the lead in doing so, so that other browsers feel comfortable in joining them. However, that said, I have not done the research to verify how plausible that is, as far as compatibility is concerned, and I haven't talked to Firefox management about whether they're willing to accept the compatibility risk. David Keeler is the PSM module owner now. So, it will be up to other people, at Mozilla, to decide this.

[1] http://www.ietf.org/mail-archive/web/tls/current/msg14072.html
As a former developer of NSS from 2001-2009, I wholeheartedly agree with Brian.
I have always thought that this PSM insecure fallback was crazy.
IMO, it's high time for it to go.
There is mitigation for this: bug 1083058 allows you to control this behaviour now.
(In reply to Martin Thomson [:mt] from comment #9)
> There is mitigation for this: bug 1083058 allows you to control this
> behaviour now.

That's not a real mitigation, because we can't expect end-users to know about that preference. What matters is the default behavior.
According to [1], TLS 1.2 intolerance servers were under 1% even a year ago.
Let's experiment the fallback-free world.

[1] http://www.ietf.org/mail-archive/web/tls/current/msg10657.html
Attachment #8511585 - Flags: review?(dkeeler)
Attachment #8511585 - Attachment description: bug1084025_disable_insecure_fallback → Disable insecure version fallback (Nightly only)
What information do you plan to collect from this experiment?  The screams of frustrated users?  We haven't had many as a result of disabling SSL 3.0, even in Aurora, so I'm not sure that we'll be able to definitely say that the experiment succeeded.  Failure is easier to detect, but I don't want to have this experiment fail.

Here's an experiment that might work, and the only cost is latency (which can be tuned differently).

Leave fallback in place for now.  But - with some probability - ignore any previously learned intolerance information.  Then, record telemetry if a previously determined intolerant version becomes marked as tolerant; and the converse if the value is re-confirmed.  We can use this to learn about the false fallback rates.  From that, we can determine what reasons for fallback can be removed.

With SSL 3.0 out of the picture, this is definitely worth pursuing in one form or other, but I think that the above experiment might be a better first step.
I extrapolated from Mozilla's telemetry data that disabling SSL 3.0 will cause about 7 times as many failures as disabling non-secure fallback from TLS 1.2 to lower versions. I think it is likely that the failure can be fixed in a different way than re-enabling the TLS intolerance fallback logic. Also, many of the things that need to be done to make disabling the TLS intolerance fallback logic tolerable should also be done prior to disabling SSL 3.0, to minimize the compatibility impact of SSL 3.0.

I think it is a good idea to land Masatoshi's patch from bug 1088915 before landing a patch for this bug, because Masatoshi's patch from bug 1088915 is likely to be a foundation for some compatibility workarounds that don't involve version fallback.
Comment on attachment 8511585 [details] [diff] [review]
Disable insecure version fallback (Nightly only)

Review of attachment 8511585 [details] [diff] [review]:
-----------------------------------------------------------------

Please make sure you get David's review before checking this in.

::: netwerk/base/public/security-prefs.js
@@ +3,5 @@
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  pref("security.tls.version.min", 1);
>  pref("security.tls.version.max", 3);
> +#ifdef NIGHTLY_BUILD

I don't think it is a good idea to have these conditions in these patches. Instead, you could just do this unconditionally, and then when Nightly gets uplifted to Aurora, the change can be backed out of Aurora, if it is causing problems. That is the normal way of doing things and it works well.
Attachment #8511585 - Flags: review+
OK, it would be simple enough in case we have to flip it back.
Attachment #8511941 - Flags: review?(dkeeler)
Attachment #8511585 - Flags: review?(dkeeler)
Attachment #8511585 - Attachment is obsolete: true
Summary: TLS is downgraded by Browser → Disable insecure TLS version fallback
Comment on attachment 8511941 [details] [diff] [review]
Disable insecure version fallback, v2

Review of attachment 8511941 [details] [diff] [review]:
-----------------------------------------------------------------

I'm all for trying this, but we need some way of measuring the outcome (e.g. a telemetry count of failures due to not falling back). r=me with that.
Attachment #8511941 - Flags: review?(dkeeler) → review+
(In reply to David Keeler (:keeler) [use needinfo?] from comment #16)
> I'm all for trying this, but we need some way of measuring the outcome (e.g.
> a telemetry count of failures due to not falling back). r=me with that.

SSL_TLS12_INTOLERANCE_REASON_PRE will serve the purpose.
> I'm all for trying this, but we need some way of measuring the outcome (e.g.
> a telemetry count of failures due to not falling back). r=me with that.

Does this telemetry contain the information how many failures are caused by attackers?
(In reply to Stefan Franke from comment #18)
> > I'm all for trying this, but we need some way of measuring the outcome (e.g.
> > a telemetry count of failures due to not falling back). r=me with that.
> 
> Does this telemetry contain the information how many failures are caused by
> attackers?

It's impossible by definition. If it's possible, we could stop the fallback only when attackers shut down the connection.
I had to change the logic to reduce false positives, so requesting an additional review.
Attachment #8513857 - Flags: review?(dkeeler)
Comment on attachment 8513857 [details] [diff] [review]
Add telemetry to measure failures due to not falling back

Review of attachment 8513857 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for doing this. In general this is good, but it will need rebasing (see comments). Also, I made a suggestion for a slightly different way of collecting the data that I think will give us more insight. r- for now.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +879,5 @@
>  
>    MutexAutoLock lock(mutex);
>  
> +  bool fallbackLimitReached =
> +    intolerant <= minVersion || intolerant <= mVersionFallbackLimit;

I was thinking we would want to distinguish between hitting the limit from mVersionFallbackLimit vs. hitting minVersion (and maybe even depending on what minVersion is?)

How about we have a histogram with 8 buckets. The first 4 will be for hitting minVersion, for minVersion=0, 1, 2, 3. The last 4 will be the same, but for mVersionFallbackLimit. Does that sound like useful data to collect?

@@ +892,4 @@
>        // We already know that the server is intolerant at a lower version.
>        return true;
> +    } else if (fallbackLimitReached) {
> +      // We can't fall back any further. Assume that intolerance isn't the issue.

Looks like this will have to be rebased on top of the recent changes from bug 1088950.

@@ +906,5 @@
> +    Telemetry::Accumulate(Telemetry::SSL_FALLBACK_LIMIT_REACHED, true);
> +    return false;
> +  }
> +
> +

nit: unnecessary blank line

::: toolkit/components/telemetry/Histograms.json
@@ +6382,5 @@
>      "n_values": 64,
>      "description": "TLS/SSL version intolerance was falsely detected, server rejected handshake"
>    },
> +  "SSL_FALLBACK_LIMIT_REACHED": {
> +    "expires_in_version": "never",

We should probably have an expiration on this, since this is more or less a temporary experiment, right? Version 40 or maybe 42 is probably good.
Attachment #8513857 - Flags: review?(dkeeler) → review-
Resolved review comments.
Attachment #8513857 - Attachment is obsolete: true
Attachment #8515427 - Flags: review?(dkeeler)
Moved the telemetry call to the caller.
Attachment #8515897 - Flags: review?(dkeeler)
Attachment #8515427 - Attachment is obsolete: true
Attachment #8515427 - Flags: review?(dkeeler)
Comment on attachment 8515897 [details] [diff] [review]
Add telemetry to measure failures due to not falling back

Review of attachment 8515897 [details] [diff] [review]:
-----------------------------------------------------------------

I think we can gather this telemetry with fewer changes to the code. Also, be careful about how you convert version values to histogram buckets. For example, TLS 1.0 is represented by the value 0x0301, not 0x01.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +879,5 @@
>    MutexAutoLock lock(mutex);
> +  forgetIntolerance(key, lock);
> +}
> +
> +void nsSSLIOLayerHelpers::forgetIntolerance(const nsCString& key,

I'm not sure refactoring this function is useful.

@@ -903,5 @@
>  {
> -  if (intolerant <= minVersion || intolerant <= mVersionFallbackLimit) {
> -    // We can't fall back any further. Assume that intolerance isn't the issue.
> -    forgetIntolerance(hostName, port);
> -    return false;

These changes to rememberIntolerantAtVersion change the control flow in a way that I'm not convinced is equivalent to the original code. Why can't we add the bucket determination/telemetry accumulation here?

@@ +940,5 @@
> +    if (intolerant <= minVersion) {
> +      fallbackLimitBucket += minVersion;
> +    }
> +    if (intolerant <= mVersionFallbackLimit) {
> +      fallbackLimitBucket += (mVersionFallbackLimit << 2);

We should document how this is encoded (i.e. what each bucket in the histogram corresponds to).
And, actually, this won't work as-is, because mVersionFallbackLimit is something like 0x0301, which puts the bucket out of range for just 16 values.

@@ +1296,5 @@
>    if (!socketInfo->SharedState().IOLayerHelpers()
>                   .rememberIntolerantAtVersion(socketInfo->GetHostName(),
>                                                socketInfo->GetPort(),
> +                                              range.min, range.max, err,
> +                                              fallbackLimitBucket)) {

I think it would be best if rememberIntolerantAtVersion accumulated the appropriate telemetry on its own, rather than having an extra output parameter.
Attachment #8515897 - Flags: review?(dkeeler) → review-
(In reply to David Keeler (:keeler) [use needinfo?] from comment #24)
> ::: security/manager/ssl/src/nsNSSIOLayer.cpp
> @@ +879,5 @@
> >    MutexAutoLock lock(mutex);
> > +  forgetIntolerance(key, lock);
> > +}
> > +
> > +void nsSSLIOLayerHelpers::forgetIntolerance(const nsCString& key,
> 
> I'm not sure refactoring this function is useful.

MutexAutoLock is an RAII helper for mozilla::Mutext [1] which is a bare wrapper around PRLock [2] which is not recursive at least on Linux [3][4]. I moved a forgetIntolerance call into the MutexAutoLock scope, so I have to add a non-lock version of the function.
[1] https://developer.mozilla.org/ja/docs/Namespace/Mozilla/MutexAutoLock
[2] https://developer.mozilla.org/en-US/docs/Namespace/Mozilla/Mutex
[3] https://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/pthreads/ptsynch.c?rev=3a49b88c5bb0#52
[4] https://stackoverflow.com/questions/19863734/what-is-pthread-mutex-adaptive-np#answer-19863902

> @@ -903,5 @@
> >  {
> > -  if (intolerant <= minVersion || intolerant <= mVersionFallbackLimit) {
> > -    // We can't fall back any further. Assume that intolerance isn't the issue.
> > -    forgetIntolerance(hostName, port);
> > -    return false;
> 
> These changes to rememberIntolerantAtVersion change the control flow in a
> way that I'm not convinced is equivalent to the original code. Why can't we
> add the bucket determination/telemetry accumulation here?

I'm attempting to avoid accumulating the telemetry if we already know whether the server is tolerant at the version. I called it "false positives" in comment #20. If we don't have to exclude this case, the second patch is unneccessary in the first place because SSL_TLS12_INTOLERANCE_REASON_PRE will be sufficient as I said in comment #17. Do you think this patch is not needed?
Flags: needinfo?(dkeeler)
What scenarios cause these false positives?
Assignee: nobody → VYV03354
Flags: needinfo?(dkeeler) → needinfo?(VYV03354)
1. Access example.com with TLS version 1.2. Firefox will remember the server is tolerant at TLS 1.2.
2. Press the reload button. Access failed due to network glitches.
3. rememberIntolerantAtVersion will find that the version reaches the fallback limit. It will accumulate the new telemetry and returns false while we already know the server is tolerant at version 1.2. I called it the "false positive".

Moreover, retryDueToTLSIntolerance will accumulate SSL_TLS12_INTOLERANCE_REASON_PRE and will not accumulate SSL_TLS12_INTOLERANCE_REASON_POST whenever rememberIntolerantAtVersion returns true. Therefore the new telemetry is redundant.
Flags: needinfo?(VYV03354)
> whenever rememberIntolerantAtVersion returns true.

Correction: whenever rememberIntolerantAtVersion returns false.
I'm not still convinced of the usefulness of this telemetry, but I can live with that and I would like to land this early in the cycle.

> We should document how this is encoded (i.e. what each bucket in the
> histogram corresponds to).

I think the new code is self-documenting enough.
Attachment #8515897 - Attachment is obsolete: true
Attachment #8530477 - Flags: review?(dkeeler)
OK, I found a much cleaner way to reduce false positives.
Attachment #8530477 - Attachment is obsolete: true
Attachment #8530477 - Flags: review?(dkeeler)
Attachment #8530480 - Flags: review?(dkeeler)
My feeling is that the best thing would be to set a clear deadline to disable TLS 1.2 intolerance fallback (I suggest 6 months to a year), and to announce it widely. TLS 1.3 intolerance will probably take longer to solve though, though I recently tried to notify a few sites.
(In reply to Yuhong Bao from comment #32)
> My feeling is that the best thing would be to set a clear deadline to
> disable TLS 1.2 intolerance fallback (I suggest 6 months to a year), and to
> announce it widely.

That cannot be done until more is learned about TLS 1.2/1.1 intolerance. EMK's patch in this bug will help us learn more about it. With EMK's patch, people at Mozilla can use Matt Wobensmith's compatibility testing system to identify some of the most popular servers that would be affected by this change. EMK's patch can always be augmented with improved compatibility logic and/or backed out if it is too disruptive. 

There are many compatibility issues that get treated as "version intolerance" that aren't really version intolerance, so once there is a (partial) list of affected sites, people can triage that list to find out if it is really version intolerance, or intolerance to a specific cipher suite, or intolerance to a specific extension, or intolerance to the specific ordering of extensions in NSS's ClientHello, or something else. Then safer workarounds can be implemented for those other issues (e.g. reordering cipher suites, and/or reordering extensions, and/or asking CAs to give their customers better certificates).
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #33)
> (In reply to Yuhong Bao from comment #32)
> > My feeling is that the best thing would be to set a clear deadline to
> > disable TLS 1.2 intolerance fallback (I suggest 6 months to a year), and to
> > announce it widely.
> 
> That cannot be done until more is learned about TLS 1.2/1.1 intolerance.
> EMK's patch in this bug will help us learn more about it. With EMK's patch,
> people at Mozilla can use Matt Wobensmith's compatibility testing system to
> identify some of the most popular servers that would be affected by this
> change. EMK's patch can always be augmented with improved compatibility
> logic and/or backed out if it is too disruptive. 

I agree actually, and that the deadline set should depend on how many servers are affected.
(In reply to Yuhong Bao from comment #34)
> (In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment
> #33)
> > (In reply to Yuhong Bao from comment #32)
> > > My feeling is that the best thing would be to set a clear deadline to
> > > disable TLS 1.2 intolerance fallback (I suggest 6 months to a year), and to
> > > announce it widely.
> > 
> > That cannot be done until more is learned about TLS 1.2/1.1 intolerance.
> > EMK's patch in this bug will help us learn more about it. With EMK's patch,
> > people at Mozilla can use Matt Wobensmith's compatibility testing system to
> > identify some of the most popular servers that would be affected by this
> > change. EMK's patch can always be augmented with improved compatibility
> > logic and/or backed out if it is too disruptive. 
> 
> I agree actually, and that the deadline set should depend on how many
> servers are affected.

Summary:
Since it is impossible to distinguish between a bad server or an attacked server, attacks are permanently supported.

The intention of this tickt was to prevent attacks.
Comment on attachment 8530480 [details] [diff] [review]
Add telemetry to measure failures due to not falling back, v3

Review of attachment 8530480 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM with nits addressed.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +874,1 @@
>                                              int16_t port)

nit: indentation (also, put the return type on its own line)

@@ +917,5 @@
> +    uint32_t fallbackLimitBucket = 0;
> +    // added if the version has reached the min version.
> +    if (intolerant <= minVersion) {
> +      switch (minVersion) {
> +      case SSL_LIBRARY_VERSION_TLS_1_0:

nit: indent the body of the switch statement one more level

@@ +931,5 @@
> +    }
> +    // added if the version has reached the fallback limit.
> +    if (intolerant <= mVersionFallbackLimit) {
> +      switch (mVersionFallbackLimit) {
> +      case SSL_LIBRARY_VERSION_TLS_1_0:

nit: indentation

::: security/manager/ssl/src/nsNSSIOLayer.h
@@ +224,5 @@
>                                     uint16_t intolerant, uint16_t minVersion,
>                                     PRErrorCode intoleranceReason);
>    bool rememberStrongCiphersFailed(const nsACString& hostName, int16_t port,
>                                     PRErrorCode intoleranceReason);
> +  // returns the known tolerant version

nit: add "or 0 if there is no known tolerant version" or equivalent

::: toolkit/components/telemetry/Histograms.json
@@ +6456,5 @@
> +  "SSL_FALLBACK_LIMIT_REACHED": {
> +    "expires_in_version": "default",
> +    "kind": "enumerated",
> +    "n_values": 16,
> +    "description": "TLS/SSL version fallback reached the limit, stopped the fallback"

Which limit? From what I can tell, this telemetry will be collected both when the version fallback limit is reached and when the minimum enabled version is reached. We should document that here.
Attachment #8530480 - Flags: review?(dkeeler) → review+
And followup to fix the wrong description:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fa0ba58df12f
https://hg.mozilla.org/mozilla-central/rev/35f5ec149ad5
https://hg.mozilla.org/mozilla-central/rev/38616c7e2da9
https://hg.mozilla.org/mozilla-central/rev/fa0ba58df12f
Status: ASSIGNED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Depends on: 1109613
Depends on: 1095507
I don't think bug 1095507 or any bugs for SSL3-only sites should be linked to *this* bug; instead, they should get linked (only) to the POODLEBITES bug. It's better to keep the dependency list of this bug reserved for TLS 1.2 and TLS 1.1 intolerance bugs. In particular, reverting the patch for this bug won't fix any of those SSL3-only sites.
No longer depends on: 1095507
www.kredodirect.com.ua is not only SSLv3-only but also TLS 1.1+ intolerant. If the site administrator only enables TLS 1.0 without fixing the intolerance, Nightly still cannot access the site.
Should I file a separate bug even if the site URL happens to be the same?
Flags: needinfo?(brian)
(In reply to Masatoshi Kimura [:emk] from comment #41)
> www.kredodirect.com.ua is not only SSLv3-only but also TLS 1.1+ intolerant.
> If the site administrator only enables TLS 1.0 without fixing the
> intolerance, Nightly still cannot access the site.
> Should I file a separate bug even if the site URL happens to be the same?

I would only file bugs for the case where the site already supports TLS 1.0+ but no longer works due to this patch. i.e. only for sites where backing out this patch would get the site working again.

It is possible that the site admin might fix the bug in a way that makes this bug relevant, but it is unlikely that will happen.
Flags: needinfo?(brian)
> i.e. only for sites where backing out this patch would get the site working again.

The site will work again if *both* bug 1076983 and this bug are backed out. So I set the dependency on both.
(In reply to Masatoshi Kimura [:emk] from comment #43)
> > i.e. only for sites where backing out this patch would get the site working again.
> 
> The site will work again if *both* bug 1076983 and this bug are backed out.
> So I set the dependency on both.

Yes, but at this point I think it is safe to assume that bug 1076983 won't get backed out, so it doesn't matter.
When we change a preference in security-prefs.js or firefox.js or elsewhere, we also need to make a change like the one in this patch.
Attachment #8535930 - Flags: review?(dkeeler)
Attachment #8535930 - Flags: review?(VYV03354)
Attachment #8535930 - Attachment description: disable-non-secure-fallback-p3.patch → Part 3: Clean up some bits
David and Masatoshi-san, I think we should uplift these patches to mozilla-aurora and mozilla-beta. I just saw the latest SSL Pulse report at https://www.trustworthyinternet.org/ssl-pulse/ that indicates that 10% of the (approx.) top 200,000 sites have the POODLE vulnerability. That's quite a large percentage.

Obviously, this change has some higher-than-normal compatibility risks. But, the setting is controlled by a pref, so we can revert it if necessary. At the very least, it would be good to get the code (including the new telemetry changes) into Firefox 35 so that there is the *option* of controlling this with minimal risk, should things really get out of hand. WDYT?
Flags: needinfo?(dkeeler)
Flags: needinfo?(VYV03354)
Attachment #8535930 - Flags: review?(dkeeler) → review+
I wouldn't be opposed to it, but I think we need more bake time for the pref change. We could start by uplifting the telemetry changes, see how the numbers look, and go from there.
Flags: needinfo?(dkeeler)
Flags: needinfo?(VYV03354)
Attachment #8535930 - Flags: review?(VYV03354) → review+
Oops, needinfo request was accidentally cleared. I don't disagree with uplifting the change.
(In reply to David Keeler (:keeler) [use needinfo?] from comment #47)
> I wouldn't be opposed to it, but I think we need more bake time for the pref
> change. We could start by uplifting the telemetry changes, see how the
> numbers look, and go from there.

Do you think that a few days on Nightly would suffice?  Or maybe we can get as much time as possible before the next release?  Is this worth a chemspill release?  It seems like it might, almost.

I think that we should consider updating the "SSL version control" add-on to bump this pref, or shipping another similar add-on, so that we can provide some sort of coverage in the meantime.  Richard, see comment 46.
Flags: needinfo?(rlb)
Comment on attachment 8530480 [details] [diff] [review]
Add telemetry to measure failures due to not falling back, v3

Approval Request Comment
[Feature/regressing bug #]: bug 1076983, the POODLE attack
[User impact if declined]: Increased exposure to the POODLE attack.
[Describe test coverage new/current, TBPL]: This is just telemetry. This has been on mozilla-central for 2014-12-09.

[Risks and why]: This is just telemetry. Seems like low risk. (:emk, :keeler: correct me if I'm wrong). AFAICT, no issues have been reported that have to deal with this *telemetry*.

[String/UUID change made/needed]: none

It is important that Mozilla be prepared to land all the patches in this bug in case the POODLE stuff gets real bad. Some would argue it is already really bad. In the discussion above, we agreed to uplift the lowest-risk part (the telemetry) ahead of a possible decision to uplift the riskier parts (the actual disabling of the non-secure fallback logic). Landing this patch in -aurora and -beta ASAP will reduce risk later.
Attachment #8530480 - Flags: approval-mozilla-beta?
Attachment #8530480 - Flags: approval-mozilla-aurora?
Comment on attachment 8511941 [details] [diff] [review]
Disable insecure version fallback, v2

Approval Request Comment
[Feature/regressing bug #]: See above uplift request for the telemetry.

[User impact if declined]: Less testing of the impact of disabling non-secure version fallback. Without this testing, it will be riskier to decide whether to uplift this to -beta later. Note that :mt has already hinted at uplifting it to -release, even. If so, we need as much testing as possible.

[Describe test coverage new/current, TBPL]: The test coverage from pre-existing tests is good. This landed on -central on 2014-12-09.

[Risks and why]: Note that this uplift request is for -aurora only, not -beta. My hope is that testing on -central and -aurora over the next few days will go well, and then we'll uplift to -beta. The risk for -aurora is low, because it is so easy to undo (just revert the patch, which just flips a pref). Risk for -release is much higher, because there are potentially serious and hard-to-predict compatibility issues that can arise from this. (This bug is disabling a feature that trades security for compatibility, because it turns out the security tradeoff is much larger than we expected.) Firefox would be the first browser to turn off this compatibility feature. Although Mozilla can and should do additionally compatibility testing before uplifting to -beta and -release, lots of end-user testing on -aurora and soon -beta will be needed because of compatibility issues that arise due to anti-virus software and proxies, which cannot easily be tested.

[String/UUID change made/needed]: None
Attachment #8511941 - Flags: approval-mozilla-aurora?
Comment on attachment 8535930 [details] [diff] [review]
Part 3: Clean up some bits

Approval Request Comment: See the most recent, aurora-only, comment.
Attachment #8535930 - Flags: approval-mozilla-aurora?
As you can see from the above uplift requests, my suggestion for handling this is to uplift to -aurora immediately. Concurrently with that, somebody should run the pkix-compatibility-test thing to find known-incompatible servers. The results of the pkix-compatibility-test should be triaged so that compatibility issues that Gecko can work around can be fixed, and so that Mozilla can contact server vendors about compatibility fixes. Then, a decision should be made about uplifting to -beta or, like :mt hinted at, to -release. Because the compatibility issues are due to things that pkix-compatibility-test cannot measure, as much end-user testing as possible is needed. It would be good for Mozilla to make a special public call-to-action to ask volunteers to test this.
The nightly/aurora patch did not apply beta cleanly.
Before we get carried away with uplifts to release channels, we have to ask if this is going to fix a real problem (this in particular: http://arstechnica.com/security/2014/12/meaner-poodle-bug-that-bypasses-tls-crypto-bites-10-percent-of-websites/).  That flaw hits a large number of sites, but doesn't get fixed by having a newer TLS version.  For instance, bankofamerica.com negotiates a CBC mode with TLS 1.2.

It might be addressed if an AEAD cipher suite is selected, but I haven't seen any evidence that the affected sites are selecting GCM, even though a number of them support TLS 1.2.  Someone asked if we were willing to mark CBC as "weak" in the same way that we mark RC4, but I suspect that the net effect will be that we slow down 90% of the web to no measurable improvement in security.

(Sorry if I got too enthuasiastic about this.  I thought that it might help.  After looking at the numbers, I'm fairly sure it will have a negligible impact.)
Flags: needinfo?(rlb)
(In reply to Martin Thomson [:mt] from comment #57)
> That flaw hits a large number of
> sites, but doesn't get fixed by having a newer TLS version.  For instance,
> bankofamerica.com negotiates a CBC mode with TLS 1.2.

Good point.

> It might be addressed if an AEAD cipher suite is selected, but I haven't
> seen any evidence that the affected sites are selecting GCM, even though a
> number of them support TLS 1.2.

>  Someone asked if we were willing to mark
> CBC as "weak" in the same way that we mark RC4, but I suspect that the net
> effect will be that we slow down 90% of the web to no measurable improvement
> in security.

I agree.

> (Sorry if I got too enthuasiastic about this.  I thought that it might help.
> After looking at the numbers, I'm fairly sure it will have a negligible
> impact.)

It would be good to see the actual numbers and methodology used to collect them.

Also, peek in mind that Ivan's current number of 0.7% of websites (7/1,000) seems to greatly overestimate the compatibility impact of this change, because when he collected those numbers he put 0x0303 in the ClientHello record layer, but browsers (including Firefox) put 0x0301. I think he's planning on collecting new numbers, but in the meantime the Firefox telemetry numbers are a better indicator of compatibility impact. As I said above, the compatibility impact is about 1/7th the compatibliity impact of disabling SSL 3.0, based on the Firefox telemetry.
Also, I think that uplifting the additional telemetry to -aurora and -beta makes sense, regardless of the final decision on whether to uplift the actual change. And, I think it makes sense to uplift the preference change to -aurora for now, to get (slightly) more testing before a decision is made.
Comment on attachment 8530480 [details] [diff] [review]
Add telemetry to measure failures due to not falling back, v3

Approving the telemetry portion for now.
Attachment #8530480 - Flags: approval-mozilla-beta?
Attachment #8530480 - Flags: approval-mozilla-beta+
Attachment #8530480 - Flags: approval-mozilla-aurora?
Attachment #8530480 - Flags: approval-mozilla-aurora+
Keywords: checkin-needed
Attachment #8535985 - Flags: checkin?
Attachment #8535986 - Flags: checkin?
Comment on attachment 8535985 [details] [diff] [review]
Telemetry patch for aurora

We have bug queries specfically for uplifts. Adding flags all over just clutters up other queries.
Attachment #8535985 - Flags: checkin?
Attachment #8535986 - Flags: checkin?
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #58)
> > (Sorry if I got too enthuasiastic about this.  I thought that it might help.
> > After looking at the numbers, I'm fairly sure it will have a negligible
> > impact.)
> 
> It would be good to see the actual numbers and methodology used to collect
> them.

I'm using these numbers, which were derived from telemetry (see slide 12): https://tools.ietf.org/agenda/91/slides/slides-91-saag-3.pdf

The key information I'm looking for is the number of sites that have GCM, but don't otherwise select it.  This indicates that it's better than you'd think.  25% moved off RC4; 1.2% moved from RC4 to GCM.  That's probably the 1.2% that were concerned about BEAST and weren't able to order for <=1.0 differently.  I don't expect that we can do anything at all about the 1.2% who moved to 3DES, and I don't typically see someone picking a CBC mode over GCM if GCM is present.  I don't see any significant moves.

Let's look at what the telemetry tells us about fallback.  I'm still concerned that we aren't buying ourselves much value here.  BEAST mitigation is perhaps the best we can hope for.  Maybe that's enough.
 
> Also, peek in mind that Ivan's current number of 0.7% of websites (7/1,000)
> seems to greatly overestimate the compatibility impact of this change,
> because when he collected those numbers he put 0x0303 in the ClientHello
> record layer, but browsers (including Firefox) put 0x0301. I think he's
> planning on collecting new numbers, but in the meantime the Firefox
> telemetry numbers are a better indicator of compatibility impact. As I said
> above, the compatibility impact is about 1/7th the compatibliity impact of
> disabling SSL 3.0, based on the Firefox telemetry.

Based on nightly, we hit our fallback limit pretty rarely.  I have 122k hitting the fallback (mostly due to TCP RST) against 2008k that encounter a fallback condition.  But we're only falling back now for RC4, so it's hard to tell how many of those might have succeeded if we let them progress.

Most fallbacks fall all the way through, so it's hard to get a good sense of what is going on.
Depends on: 1111354
Depends on: 1112110
Depends on: 1112399
Depends on: 1095507
(In reply to Masatoshi Kimura [:emk] from comment #41)
> www.kredodirect.com.ua is not only SSLv3-only but also TLS 1.1+ intolerant.
> If the site administrator only enables TLS 1.0 without fixing the
> intolerance, Nightly still cannot access the site.

Sadly I got it right :(
Comment on attachment 8511941 [details] [diff] [review]
Disable insecure version fallback, v2

Brian, Can an user override the error page?
Flags: needinfo?(brian)
Attachment #8511941 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8535930 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
The user can change the "security.tls.version.fallback-limit" pref value from about:config to bring back the old behavior.
(In reply to Sylvestre Ledru [:sylvestre] from comment #66)
> Comment on attachment 8511941 [details] [diff] [review]
> Disable insecure version fallback, v2
> 
> Brian, Can an user override the error page?

No.

(In reply to Masatoshi Kimura [:emk] from comment #67)
> The user can change the "security.tls.version.fallback-limit" pref value
> from about:config to bring back the old behavior.

Yes. Another (better) option would be to create a pref that is a comma-separated list of domain names for which non-secure fallback to TLS 1.0 is still done. We could even pre-populate this pref with a list of domains we know are TLS 1.2 intolerant. This would help ensure that the feature stays enable for release.
Flags: needinfo?(brian)
Depends on: 1114816
Filed bug 1114816.
We're too late in 35 to take this, wontfixing.
Depends on: 1116891
No longer depends on: 1116891
I think that I would like to see this in aurora for another release before taking this to release.  We have seen a few sites that break already, and we know that release and beta users encounter more legacy sites than on nightly and aurora.  :keeler, :emk, :briansmith, am I being too cautious here? Or could this actually risk losing us users?

:sylvestre, Providing that others agree with me, how do I ensure that this doesn't accidentally ride the trains?
Flags: needinfo?(sledru)
Flags: needinfo?(dkeeler)
Matt is working on an analysis of the impact of this change. With clear numbers, it should be easier to make a call on this for 36.
Flags: needinfo?(sledru) → needinfo?(mwobensmith)
36 is about to go to beta, right? I think it would be worthwhile to see the impact of this on the beta population (given that aurora and nightly are much smaller and different populations). If there's too much breakage, we can back it out of 36 once it's in beta.
Flags: needinfo?(dkeeler)
Depends on: 1120942
I added two more sites to bug 1116891 that appear to be TLS intolerant in some form.

In short, only about a dozen sites out of over 200k run seem to be affected by this change, if I'm not mistaken.

I did notice a small amount of sites broken by our recent removal of 1024-bit signed root certs (bug 881553) but that's to be expected.

In summary, I'm not seeing a huge impact.
Flags: needinfo?(mwobensmith)
If you are willing to stand by that, I think that I'm willing to proceed with this.  If we get a catastrophic number of complaints in Beta, that might be a cue for backout, but I think that we're fine.  I should ping the other browser vendors about this.
(In reply to Martin Thomson [:mt] from comment #75)
> I should ping the other browser vendors about this.

Hi. I work on WebKitGTK+, which is used by many desktop applications and smaller browsers. I want to drop insecure TLS version fallback, but hesitate to do so because we already have enough problems with HTTPS sites that Firefox handles and we don't. If you go through with this change, then we will drop insecure fallback from libsoup, affecting both WebKitGTK+ and EFL WebKit.

This change will also be good for the web, since TLS intolerance is a chicken-and-egg problem: servers will likely be fixed much faster if a major browser like Firefox stops connecting, and other major browsers will hesitate to make the change until another one does first.
Martin, what would you consider to be an indicator of whether we ship this or not? Percent of sites affected? Specific sites affected? And - to be sure - the "connection reset" issues (bug 1116891) that only happen in Fx36 are solely due to this change, right?
(In reply to Michael Catanzaro from comment #76)
> If you go through with this change, then we
> will drop insecure fallback from libsoup, 

Thanks for the feedback. I have received positive signals from Microsoft and a "good luck with that" from Google.  Note that we will retain the preference that will re-enable this (that helps us back out the change too); and we will still likely retain the fallback code when we start deploying TLS 1.3.

(In reply to Matt Wobensmith from comment #77)
> Martin, what would you consider to be an indicator of whether we ship this
> or not? Percent of sites affected? Specific sites affected? 

Probably a combination of all of the above.  If we see a noticeable increase in connection failures in telemetry, that's a bad sign.  When we dropped SSL3, the change was hard to detect, though we might have lost somewhere in the order of 0.3%.  If this has a similar effect, then I think we're good.

> And - to be sure
> - the "connection reset" issues (bug 1116891) that only happen in Fx36 are
> solely due to this change, right?

Yes, those sites are intolerant to our handshake in various interesting ways.  Those should be addressed by making our fallback to RC4 more sensitive to different failures though (i.e., :emk's patch on that bug).

Interesting observation:  Removing fallback entirely has an interesting effect on exposing initial page loads to network glitches.  With fallback enabled, an initial page load has a higher probability of success because we get two or more attempts at connecting (assuming the server doesn't support the downgrade SCSV).  The glitch rate seems to be quite high (based on the number of spurious inappropriate_fallback alerts we're seeing).  Therefore, I suspect that it will be very hard to properly distinguish between mere network glitches and real intolerance with the telemetry we have.  That's why your testing is so valuable.
(In reply to Martin Thomson [:mt] from comment #78)

> Probably a combination of all of the above.  If we see a noticeable increase
> in connection failures in telemetry, that's a bad sign.  When we dropped
> SSL3, the change was hard to detect, though we might have lost somewhere in
> the order of 0.3%.  If this has a similar effect, then I think we're good.

I use two sets of data for testing SSL compatibility - the Pulse top sites list (220k) and the Google CT list (1.8 million). The former is more interesting as far as active, popular sites, while the latter contains a lot of outliers.

For Fx36, it appears that when using the Pulse list, we have a failure rate of less than .005% for this specific change. My findings were recorded in bugs 1116891, 1116892 and 1116893. To me, this seems very small. Affected sites are listed in those bugs.

I haven't run the entire Google CT list due to time constraints, but I did a pass yesterday with a smaller random selection of 100k sites. From that sample, I am seeing a failure rate of around .024% (24 out of 100k). I haven't dug into all errors individually, but the majority appear to be the same as bug 1116891. Another data point - many broken sites from that list appear abandoned or often unused (default IIS home page). I have the error site list but it is not conclusive since it is only a subset. If you need a full run with the Google CT list, it will take more time.

Let me know if you need more from me.
Yeah, those approvals got missed due to the other approval noise from this bug (doing things the way this bug did them makes uplift tracking a big PITA, fwiw). Anyway, you'll need to request beta approval on those patches at this point.
Flags: needinfo?(VYV03354)
Also, Part 3 depends on bug 1093724, so presumably you're going to want to request beta approval on that as well.
Indeed, next time, it would be nice to have a bug for the experiment and an other for the change itself.

Anyway, is it a big deal if it is not released with 36 but with 37?
:sylvestre, I don't think the delay harms us much, if at all.  This does have the potential to cause problems.
Martin, any idea then what caused the TLS intolerance regressions in Fx36, if it wasn't this change? Was it bug 861310?
Flags: needinfo?(martin.thomson)
Matt, I don't know what you tested with.  I think that the patch is on Aurora, and it only missed the uplift to Beta.  If you tested on Aurora, then this would still be the cause.  None of the regressions that I've seen are likely to be as a result of false start changes.
Flags: needinfo?(martin.thomson)
IMO if we land this on beta, we should land some mitigations either (e.g. bug 1112399 and bug 1114816).
Flags: needinfo?(VYV03354)
(In reply to Martin Thomson [:mt] from comment #86)
> Matt, I don't know what you tested with.  I think that the patch is on
> Aurora, and it only missed the uplift to Beta.  If you tested on Aurora,
> then this would still be the cause.  None of the regressions that I've seen
> are likely to be as a result of false start changes.

Matt refers bug 1116891 to 1116893. And indeed bug 1116891 and 1116892 happen on Firefox 36 (currently Beta, was Aurora when Matt tested that).
I'd suggest that you do try to enable this in Firefox 36 beta, and make a decision whether to ship it in the release in the last beta cycle.

Reasoning:

* The non-secure fallback logic makes it easy for an active attacker to downgrade connections from the secure AEAD cipher suites to RC4 (and CBC-mode) cipher suites, because the AEAD cipher suites only work for TLS 1.2.

* There is a talk [1] scheduled for around March 24th, which is AFTER the Firefox 36 release and BEFORE the Firefox 37 release. Deferring this change until Firefox 37 means that it won't be helpful during the period of time between March 24 and the Firefox 37 release.

* I've heard rumors that another paper on attacking RC4 in TLS that will be released soon--maybe even sooner than the Blackhat paper.

* :emk has already written a patch for adding a whitelist of known-TLS-intolerant sites to Firefox.

* If Matt re-runs his compatibility tests against the entire Google CT dataset, he will be able to report a fairly comprehensive list of sites that can be added to the default value of the whitelist.

* Uplifting the patch for bug 1112399 (and bug 1096197) will allow Beta channel users to report additional websites that are broken through the "tell Mozilla about this" feature on the SSL error page. By monitoring this feedback, Mozilla can expand the whitelist further as necssary.

* Using the Hotfix update, Mozilla can dynamically expand the set of sites in the known-TLS-intolerant whitelist without having to do a chemspill release.

* There is telemetry for measuring this feature, which can be monitored during the next three weeks in order to make a decision whether to ship this in the final release.

* The patches that would need to be uplifted are all low risk and all are easy to undo.

* Even if it is decided to be too risky to ship in Firefox 36, and the feature gets delayed until Firefox 37, Firefox 37 would still benefit from the additional 3 weeks of learning about how the feature works on Firefox 36 beta.

We had a very similar situation with TLS 1.2 around this time last year. Firefox ended up getting some notably bad press due to being 2-3 weeks slower than people who decided to make a huge deal about the fact that it didn't support TLS 1.2 (even though it practically didn't matter). I think it would be good to learn from that experience and be more preemptive here.

[1] https://www.blackhat.com/asia-15/briefings.html#bar-mitzva-attack-breaking-ssl-with-13-year-old-rc4-weakness
This seems like a change that we should list in our release notes. Can you help with the wording? Is "Disabled insecure TLS version fallback" clear enough? Is this specific to a version of TLS or is there any other way that we should further qualify this change?
relnote-firefox: --- → ?
Flags: needinfo?(VYV03354)
RC4 has nothing to do with TLS version fallback though.
(In reply to Yuhong Bao from comment #91)
> RC4 has nothing to do with TLS version fallback though.

Yes it does, because many websites will use RC4 when the browser uses a TLS 1.1 or TLS 1.0 ClientHello, but will use AES-GCM when the browser uses a TLS 1.2 ClientHello.
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #90)
> This seems like a change that we should list in our release notes. Can you
> help with the wording? Is "Disabled insecure TLS version fallback" clear
> enough? Is this specific to a version of TLS or is there any other way that
> we should further qualify this change?

Brian could give much better wording than me.
Flags: needinfo?(VYV03354)
I think that :lmandel's suggestion is fine.
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #89)
> I'd suggest that you do try to enable this in Firefox 36 beta, and make a
> decision whether to ship it in the release in the last beta cycle.
Thanks for the long and impressive rational :)
I would like to see the results of Matt's tests before making a call on this.
I've run site compatibilty for Fx36.0b2, changing the pref "security.tls.version.fallback-limit" from default of 1 to 3. I ran the Pulse top sites list of ~211k sites and diffed the results.

With the fallback limit set to 3, we see 561 uniquely broken sites, or approximately 0.27% rounded up. I'm attaching the list of broken sites to this bug. Of the broken sites:

- About half (284) have "NetworkInterruptError" with error string "pr_end_of_file_error". I've examined half a dozen of these sites and all have TLS version intolerance, so I'm guessing that might be the primary reason.

- The rest - including errors labeled "undefined" - appear to be a grab-bag of issues that are caused when the server is forced to downgrade its TLS version, such as attempting to use SSLv3.  

That's the quantitative part. The qualitative part is that - while many sites on this list would be kindly labeled "inactive" - there are also some high-visibility sites such as "products.geotrust.com" and three properties for "bankofthewest.com".

If you need me to run this test against Google's CT list - 10x the domains - it will be at least another week of work, as I'm out of the office part of this week and couldn't begin until 2015-02-04. In the meantime, I hope the above gives you some insight into feasibility for inclusion into Fx36.

Lastly, please note that previous test results in comment 79 were erroneously thought to be testing this change, but were in fact due to bug 1088915. I've factored those failures out of this specific testing.
Adding Rick Andrews as products.geotrust.com was mentioned in comment 96.
Depends on: TLS-Intolerance
(In reply to Matt Wobensmith from comment #96)
> The qualitative part is that - while many
> sites on this list would be kindly labeled "inactive" - there are also some
> high-visibility sites such as "products.geotrust.com" and three properties
> for "bankofthewest.com".

If there is a small set of high-visibility sites like this, then filing a dozen or so TE bugs for them now might be a good idea.

I just created bug 1126620 to start to organize this. Users will undoubtedly file more.
Comment #96 summary:
0x804b0014 / NetworkError / pr_connect_reset_error: 1
0x804b0047 / NetworkInterruptError / pr_end_of_file_error: 282
0x804b0014 / NetworkError / pr_end_of_file_error: 2
0x805a2ff0 / SecurityError / ssl_error_bad_mac_alert: 5
0x805a2ff1 / SecurityError / ssl_error_bad_mac_read: 47
0x805a2fc3 / SecurityError / ssl_error_handshake_failure_alert: 1
0x805a2fc5 / SecurityError / ssl_error_handshake_unexpected_alert: 3
0x805a2ffe / SecurityError / ssl_error_no_cypher_overlap: 137
0x804b0014 / NetworkError / ssl_error_no_cypher_overlap: 1
0x805a2f9e / SecurityError / ssl_error_protocol_version_alert: 26
0x8000ffff / undefined / undefined: 56
I also manually checked "undefined" sites:
	pr_end_of_file_error: 37
	ssl_error_bad_mac_read: 3
	ssl_error_no_cypher_overlap: 15
	(Loading forever): 1
(In reply to Matt Wobensmith from comment #96)
> With the fallback limit set to 3, we see 561 uniquely broken sites, or
> approximately 0.27% rounded up. I'm attaching the list of broken sites to
> this bug.
561 websites is a big number, too big compared to the release date (less than 3 weeks).
I think we should do in 37 (with the agreement of the 37 driver, Lawrence).
I think it is more like 5 weeks than 3 weeks.
(In reply to Yuhong Bao from comment #101)
> I think it is more like 5 weeks than 3 weeks.

If you're referring to the Firefox 36 release date, the release is scheduled for Feb 24, which is 26 days from now.
Removing evang bugs because bug 1126620 will track them.
Depends on: 1128227
No longer depends on: 1095507, 1111354, 1112110, 1115883, 1126652, 1126654
About products.geotrust.com, here's the server configuration: Weblogic 10.3, using the Certicom SSL stack, with only TLS1 enabled and no support for TLS1.1/1.2. 

Is there something unique about that configuration that causes your logic to fail?
It is more likely a bug in the server software than in the configuration, but thanks for naming the server software affected.
Are you asking us to take any particular action?
See if you can obtain a fix from the vendor.
It's not clear to me from reading this thread exactly what "fix" I would ask for. Can you explain the bug as you see it?
The server should respond with a ServerHello showing the TLS 1.0 (SSL 3.1) version in response to a ClientHello showing any later version (such as TLS 1.2 and TLS 1.3), basically.
(In reply to Rick Andrews from comment #108)

The total fix, however, is to upgrade your SSL/TLS stack to a version which supports TLS 1.2. (preferably also with FS & AES-GCM) TLS 1.0 will eventually be phased out as well and TLS 1.2 provides far better security. It would be best for everyone to start upgrading long before vendors have to do another panic disable for TLS 1.0 like was needed for SSL 3. (of course, nobody knows what the timetables are here, but being complacent is not good) TLS clients which support 1.2 would generally be expected to handle fallbacks correctly.

Qualys SSL Labs server test is the best way to check for TLS version intolerance at the moment:
https://www.ssllabs.com/ssltest/analyze.html?d=products.geotrust.com

The versions it breaks with due to intolerance are listed in the "Protocol Details" section. To not break with current standards, it can't be 1.1 or 1.2 intolerant (or extension intolerant, which is not a problem here). Your server is also intolerant to 1.3, so when the new version comes out you'll have issues there as well.

Further discussion on this topic should be done elsewhere, as this bug is resolved. New TLS intolerance TE bugs can be opened as blocking bug 1126620.
We're already working on upgrading the site, so I'm not going to file any bugs with any vendor.
Depends on: 1129925
I have completed a pass on Fx36 using the Google Certificate Transparency list of ~2.6 million sites, with the same parameters as what I used in comment 96. I am attaching a list of broken sites to this bug.

In summary:

* 7647 uniquely broken sites out of 2678142 sites total, or roughly 0.286% breakage.
* Most errors look like what we see with the previous run against the Pulse list - issues when downgrading to an unsupported TLS protocol or TLS intolerance.
* IMPORTANT: I have not individually verified all ~7k broken sites as legitimately broken. This is too time-consuming to do right now. This testing is imperfect and has noise - especially with a data set this large - so please allow a small degree of error. However, this still helps us understand the scope of the changes we wish to make.

The results of running this list seems pretty close to the previous run - whose results were meticulously confirmed - so I'm reasonably confident that we know how invasive this change will be, should we choose to do it.
relnoted as "Disabled insecure TLS version fallback for site security"
Looks like api.geotrust.com and products.geotrust.com were fixed, but the following subdomains are still broken:
canadaca.geotrust.com
services.geotrust.com
smarticon.geotrust.com
Keywords: site-compat
Folks - while I applaud the technical decision about this - there needs to be better communication to end users about this change.  Forums are absolutely littered with discussion on this topic.  End users don't know or care why their web sites are no longer working, they just want them to work.  

We need to have a detailed article discussing this change and PROPER mitigation strategy should a user encounter a website that no longer works.  Stating "contact the website owner" is not acceptable.  Users will dump Firefox and switch to another web browser at that point which "just works" with the given website.
You need to log in before you can comment on or make changes to this bug.