Closed Bug 1153212 Opened 9 years ago Closed 9 years ago

Repair Root Cause of 1148328

Categories

(Core :: Networking: HTTP, defect)

32 Branch
x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox37 --- disabled
firefox38 --- disabled
firefox39 --- fixed
firefox40 --- fixed
firefox-esr31 --- unaffected
firefox-esr38 --- disabled
b2g-v2.0 --- unaffected
b2g-v2.0M --- unaffected
b2g-v2.1 --- unaffected
b2g-v2.1S --- unaffected
b2g-v2.2 --- disabled
b2g-master --- fixed

People

(Reporter: mcmanus, Assigned: mcmanus)

References

Details

(Keywords: sec-other, Whiteboard: must fix sec-critical bug before re-enabling feature)

Attachments

(2 files, 3 obsolete files)

bug 1148328 reported symptoms and Alt-Svc was disabled to solve the immediate problem. This bug is about fixing the root cause to allow re-enabling of the feature.
Depends on: 1154012
Keywords: sec-other
Whiteboard: must fix sec-critical bug before re-enabling feature
dan - what do you think about this use of try for patches that resolve this at this point?
Flags: needinfo?(dveditz)
The STR boils down to this
1] origin https://foo identifies bar:443 as an alternate
2] a TLS connection to bar:443 is made and that handshake is enforced to be valid for foo. It happens that the certificate used is valid for foo but not for bar.
3] a request is made for https://bar. PSM is correctly asked to authenticate it for origin bar, but before it can do so it does SSL session resumption of the connection from step #2. So the validation for bar is effectively bypassed.

This happens because NSS and PSM actually have separate copies of the hostname (NSS actually has 2 of them, and PSM has one) and the way Alt-Svc was setup PSM is bootstrapped with the routed-host name and then the authentication name (i.e. the origin) is set as a attribute on the socket. This made the NSS and PSM copies of the name get out of sync as they were forked at bootstrap time.

I didn't actually realize that NSS did name validation independent of PSM - as the normal handshake callback comes back into PSM and uses its (correct) view of the authentication name. Unfortunately for session resumption that validation is done internal to NSS and used the stale name.

A better structure to Alt-Svc would be to never allow PSM to see anything other than the origin. The challenge to this, and the reason it was originally done otherwise, is that PSM establishes the networking socket for SSL connections and having the destination host around for creating the socket makes intuitive sense. Additionally PSM already copes with having a proxy information available to it, which is conceptually pretty similar in that it is a split between connection and authentication information. 

however, in hindsight, PSM can continue to create the socket without routing information because it does not actually connect the socket - that is done by necko. Necko can insert the relevant routing information when it goes to to the namelookup before the connect. This makes a lot of sense - the alt-svc is basically a SRV record anyhow. That way PSM can be changed to only ever have access to the origin.

Likewise, nsHttpConnectionInfo (the necko structure that will manage the route and the origin) can be changed to more explicitly segment the route and the origin. Right now it just manages them under host() (which could mean either one depending on context) and authenticationHost() (which is always the origin. Much better to sanitize that into strictly origin() and routedHost(). A related problem with coalescing was found based on that confusion.

I will post two patch sets. One is fairly long, but not very complicated - and it deals with the strategy laid out in this comment by removing the routed host information from PSM.. much of its size is involved in renaming things from "host" to "origin" or "routedHost" as apropos.

The second is a more concise patch more suitable for uplift. It just directly addresses the issues of PSM/NSS names getting out of sync and fixes the coalescing issue.
This is the more backportable patch. I would start by looking at this one to internalize the problem that is being resolved by the other patch (which restructures a lot of things).
Assignee: nobody → mcmanus
Attachment #8592898 - Flags: review?(hurley)
Attachment #8592898 - Flags: review?(dkeeler)
Comment on attachment 8592898 [details] [diff] [review]
0001-fix-sni-and-resumption-and-join.patch

Review of attachment 8592898 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +312,5 @@
> +  // we need to call SetCanceled to avoid non-determinstic results
> +
> +  const char *hostName = GetHostNameRaw();
> +  if (SECSuccess != SSL_SetURL(mFd, hostName)) {
> +    // todo log

noted!

@@ +319,5 @@
> +  }
> +
> +  int32_t port = GetPort();
> +  if (NS_FAILED(nsSSLIOLayerSetPeerName(mFd, this, hostName, port))) {
> +    // todo log

noted!
This is the larger patch suitable for (at least?) m-c. Among other things it removes nsHttpConnectionInfo::Host() and replaces that with ::Origin() and ::RoutedHost() so there are lots of callsites to Host() that needed to be updated to be Origin().

From the PSM side I think this is kind of a no brainer - it removes interfaces and information. There's a lot more to look at on the necko side.
Attachment #8592901 - Flags: review?(hurley)
Attachment #8592901 - Flags: review?(dkeeler)
I should clarify the STR.. the security problem happens in step 2 when the NSS copy of the name (which is out of sync - so it is bar instead of foo) is used to make the session resumption cache key. the right information is used in step 3, it just suffers from the cache having poisoned itself in step 2.

SNI is the other user of that information by NSS - so that was incorrect in step 2 as well as is repaired by both patches.
Attached patch altsvc-short (obsolete) — Splinter Review
Attachment #8592997 - Flags: review?(dkeeler)
Attachment #8592997 - Flags: review?(hurley)
Attachment #8592898 - Attachment is obsolete: true
Attachment #8592898 - Flags: review?(hurley)
Attachment #8592898 - Flags: review?(dkeeler)
Attached patch altsvc-longSplinter Review
even better :) (this is the long version intended for (at least) m-c.)
Attachment #8592901 - Attachment is obsolete: true
Attachment #8592901 - Flags: review?(hurley)
Attachment #8592901 - Flags: review?(dkeeler)
Attachment #8592999 - Flags: review?(hurley)
Attachment #8592999 - Flags: review?(dkeeler)
(In reply to Patrick McManus [:mcmanus] from comment #1)
> dan - what do you think about this use of try for patches that resolve this
> at this point?

Please go ahead, additional people have reproduced based on the advisory (after re-enabling the pref of course). You won't be revealing anything new in your patches should someone find them on try.
Flags: needinfo?(dveditz)
Comment on attachment 8592997 [details] [diff] [review]
altsvc-short

Review of attachment 8592997 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM with comments addressed.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +195,5 @@
>  nsNSSSocketInfo::SetBypassAuthentication(bool arg)
>  {
> +  nsNSSShutDownPreventionLock locker;
> +  if (isAlreadyShutDown())
> +    return NS_ERROR_NOT_AVAILABLE;

nit: we've been moving to using braces around all conditionals in new code in PSM

@@ +233,5 @@
> +          mFd, PromiseFlatCString(GetHostName()).get(),
> +          authenticationName.get()));
> +
> +  nsresult rv = SetHostName(authenticationName.get());
> +  if (NS_SUCCEEDED(rv)) {

nit: let's flip this:

if (NS_FAILED(rv)) {
  return rv;
}

return SyncNSSNames();

@@ +255,5 @@
> +  if (!mFd)
> +    return NS_ERROR_FAILURE;
> +
> +  nsresult rv = SetPort(aAuthenticationPort);
> +  if (NS_SUCCEEDED(rv)) {

nit: same here - error-check first, then call SyncNSSNames()

@@ +305,5 @@
> +
> +nsresult
> +nsNSSSocketInfo::SyncNSSNames()
> +{
> +  // call locked!

We can guarantee that this gets called with an nsNSSShutDownPreventionLock held by passing in an unnamed reference to one as an argument.

@@ +310,5 @@
> +
> +  // I don't know why any of these calls would fail, but if they do
> +  // we need to call SetCanceled to avoid non-determinstic results
> +
> +  const char *hostName = GetHostNameRaw();

nit: 'const char* hostName'

@@ +311,5 @@
> +  // I don't know why any of these calls would fail, but if they do
> +  // we need to call SetCanceled to avoid non-determinstic results
> +
> +  const char *hostName = GetHostNameRaw();
> +  if (SECSuccess != SSL_SetURL(mFd, hostName)) {

What's the purpose of SSL_SetURL?

@@ +2588,5 @@
>    }
>    return nullptr;
>  }
>  
> +// call this function locked!

Same idea with passing in an unnamed reference to an nsNSSShutDownPreventionLock.

@@ +2622,5 @@
>  nsSSLIOLayerSetOptions(PRFileDesc* fd, bool forSTARTTLS,
>                         const char* proxyHost, const char* host, int32_t port,
>                         nsNSSSocketInfo* infoObject)
>  {
>    nsNSSShutDownPreventionLock locker;

Does this function actually check if NSS has already shut down? (If not, we might be able to ask the nsNSSSocketInfo object, since this isn't an nsNSSShutDownObject.)
Attachment #8592997 - Flags: review?(dkeeler) → review+
(In reply to David Keeler [:keeler] (use needinfo?) from comment #10)
> @@ +311,5 @@
> > +  // I don't know why any of these calls would fail, but if they do
> > +  // we need to call SetCanceled to avoid non-determinstic results
> > +
> > +  const char *hostName = GetHostNameRaw();
> > +  if (SECSuccess != SSL_SetURL(mFd, hostName)) {
> 
> What's the purpose of SSL_SetURL?

Oh, wait, I think I figured it out - SNI, right?
Comment on attachment 8592999 [details] [diff] [review]
altsvc-long

Review of attachment 8592999 [details] [diff] [review]:
-----------------------------------------------------------------

I had a look at the necko parts to see how this works, but I didn't review that in depth. The PSM parts look good to me.
Attachment #8592999 - Flags: review?(dkeeler) → review+
(In reply to David Keeler [:keeler] (use needinfo?) from comment #11)
> (In reply to David Keeler [:keeler] (use needinfo?) from comment #10)
> > @@ +311,5 @@
> > > +  // I don't know why any of these calls would fail, but if they do
> > > +  // we need to call SetCanceled to avoid non-determinstic results
> > > +
> > > +  const char *hostName = GetHostNameRaw();
> > > +  if (SECSuccess != SSL_SetURL(mFd, hostName)) {
> > 
> > What's the purpose of SSL_SetURL?
> 
> Oh, wait, I think I figured it out - SNI, right?

yes. thanks
Comment on attachment 8592997 [details] [diff] [review]
altsvc-short

Review of attachment 8592997 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpConnectionInfo.h
@@ +63,5 @@
>      const nsCString &GetAuthenticationHost() const { return mAuthenticationHost; }
>      int32_t GetAuthenticationPort() const { return mAuthenticationPort; }
>  
> +    const nsCString &GetOrigin() const { return mAuthenticationHost.IsEmpty() ? mHost : mAuthenticationHost; }
> +    int32_t OriginPort() const { return mAuthenticationHost.IsEmpty() ? mPort : mAuthenticationPort; }

Let's call the first one GetOriginHost (for parity with GetHost and GetAuthenticationHost). I'm tempted to say call the second GetOriginPort (for parity with GetAuthenticationPort), but then Port exists... perhaps we should think about re-aligning all the port-related names? That's probably a larger change than what we want for this version of the patch, though. Just make sure OriginPort aligns with what we would like to settle on for all the names. (Unless this all changes really wildly in the larger patch - I haven't looked at that one yet.)
Attachment #8592997 - Flags: review?(hurley) → review+
Comment on attachment 8592997 [details] [diff] [review]
altsvc-short

Review of attachment 8592997 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpConnectionInfo.h
@@ +63,5 @@
>      const nsCString &GetAuthenticationHost() const { return mAuthenticationHost; }
>      int32_t GetAuthenticationPort() const { return mAuthenticationPort; }
>  
> +    const nsCString &GetOrigin() const { return mAuthenticationHost.IsEmpty() ? mHost : mAuthenticationHost; }
> +    int32_t OriginPort() const { return mAuthenticationHost.IsEmpty() ? mPort : mAuthenticationPort; }

here's the thinking - an origin is not necessarily a host (though the host is implied) - its more than that - its the identity.

While a routedHost (a term I use in the longer patch for the alt-svc host, which seemed even more obtuse than routedHost) is actually a hostname. The longer patch mercifully gets rid of the ambiguous Host() and Port() alltogether (which is what makes it long :))
Fair enough... I'm working my way through the longer patch now, so let's just Ship It so we can turn altsvc back on :)
Dan, Al, given that this is a disabled feature, can I just land approved patches and re-enable without doing a coordinated sec-landing? Maybe just on nightly to start?
Flags: needinfo?(dveditz)
Yeah. Since it's disabled I've marked this "sec-other" so please just land.
Flags: needinfo?(dveditz)
Comment on attachment 8592999 [details] [diff] [review]
altsvc-long

Review of attachment 8592999 [details] [diff] [review]:
-----------------------------------------------------------------

Looks pretty good - a couple nits, and one thing that needs to change to make the code clearer/easier to understand. r=me with that changed.

::: netwerk/base/nsISocketTransport.idl
@@ +199,5 @@
> +     * This flag is an explicit opt-in that allows a normally secure socket
> +     * provider to use, at its discretion, an insecure algorithm. e.g.
> +     * a TLS socket without authentication.
> +     */
> +    const unsigned long MITM_OK = (1 << 6);

Nice name :)

::: netwerk/protocol/http/nsHttpConnection.cpp
@@ +1768,5 @@
>      }
>      MOZ_ASSERT(ci);
>  
>      mTLSFilter = new TLSFilterTransaction(mTransaction,
> +                                          ci->Origin(), ci->OriginPort(), this, this);

nit: re-wrap these lines as appropriate

@@ +1799,5 @@
>      }
>  
>      nsHttpHandler::GenerateHostPort(
> +        nsDependentCString(trans->ConnectionInfo()->Origin()),
> +                           trans->ConnectionInfo()->OriginPort(), result);

nit (which should've been caught sooner, obviously): de-indent the line with OriginPort & result, 'cause right now it looks like a continuation of the nsDependentCString ctor.

::: netwerk/protocol/http/nsHttpConnectionInfo.h
@@ +47,4 @@
>                           const nsACString &npnToken,
>                           const nsACString &username,
>                           nsProxyInfo *proxyInfo,
> +                         const nsACString &routedHost,

I'm not a fan of calling this "routedHost" (and routedPort, and in other places) - calling it that sounds like this is the hostname that is being routed somewhere else, as opposed to the hostname that you are routing the origin to (I've had to continually remind myself of that while reading the rest of this patch - very confusing!)

My current recommendations are either "hostRoute", "alternateHost", or (like it was previously) "physicalHost", though I'm open to bikeshedding. I *really* don't want "routedHost", though.

@@ +59,5 @@
>      void BuildHashKey();
>  
>  public:
>      const nsAFlatCString &HashKey() const { return mHashKey; }
> + 

nit: trailing whitespace
Attachment #8592999 - Flags: review?(hurley) → review+
Keywords: leave-open
(In reply to Patrick McManus [:mcmanus] from comment #20)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/90d6a38931fa

This is the short fix on nightly without yet re-enabling anything.
A

Approval Request Comment
[Feature/regressing bug #]: Alt-Svc and Opportunistic Encryption
[User impact if declined]: Feature remains disabled for fixed security bug
[Describe test coverage new/current, TreeHerder]: new test coverage on mozilla-central.
[Risks and why]: this is the short version of the patch and was designed for backporting with minimal change.
[String/UUID change made/needed]: none
Attachment #8600320 - Flags: review+
Attachment #8600320 - Flags: approval-mozilla-beta?
Attachment #8600320 - Flags: approval-mozilla-aurora?
re-enable on nightly is tracked in bug 1159971 (pending on m-i opening) and new test coverage was added as 1159944

re-enable for beta/aurora is tracked here.

the long version of the patch should be landed on nightly shortly after things settle down.
Comment on attachment 8600320 [details] [diff] [review]
Alt Svc Fix and Re-Enable for Aurora 39

> [Risks and why]: this is the short version of the patch and was designed for backporting with minimal change.

You are not really answering to the question.
Anyway, looking at the size of the patch, it is too late for 38 as we are building the RC in a couple of days and the feature is disabled by default.
Attachment #8600320 - Flags: approval-mozilla-beta?
Attachment #8600320 - Flags: approval-mozilla-beta-
Attachment #8600320 - Flags: approval-mozilla-aurora?
Attachment #8600320 - Flags: approval-mozilla-aurora+
Keywords: leave-open
Attachment #8592997 - Attachment is obsolete: true
Comment 27 lands the reviewed "long" patch on m-i. We can close this now when it gets to m-c
Target Milestone: --- → mozilla40
Group: network-core-security
Group: core-security → core-security-release
Group: core-security-release
Depends on: 1309809
You need to log in before you can comment on or make changes to this bug.