Closed Bug 644640 Opened 13 years ago Closed 8 years ago

Implement extension point for extensions to influence trust decisions in PSM

Categories

(Core :: Security: PSM, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: bsterne, Assigned: pde-lists)

References

Details

(Keywords: sec-want, Whiteboard: [sg:want])

Attachments

(2 files, 14 obsolete files)

3.59 KB, text/plain
geekboy
: feedback+
Details
96.10 KB, patch
briansmith
: review-
Details | Diff | Splinter Review
I'm filing this feature request on behalf of Dan Kaminsky, who is interested in writing a Firefox add-on that enhances TLS certificate verification.  The request is for a new event (observer topic) that fires after a TLS certificate is fetched, but before the connection is made.  This could be considered similar to the "http-on-examine-response" topic that observers can use to inspect the HTTP server response early (right when we receive it).

Per Dan's email, event observers would be passed: the certificate (ideally fully parsed, though at least parsed to the point that CN and SAN are exposed), the domain being verified, and any other pertinent details.

The observer could then take one of several actions:
1. Allow
2. Reject
3. Prompt (with provided document, rendered in non-chrome context)
4. Pass-through (allow legacy X.509 validation)

This seems sort of Firefox-y, so I didn't file it in NSS, though that's where it may belong.  Hopefully bsmith can help route the bug properly.
The goal of this ask is to cause the minimum amount of disruption to Firefox, while we figure out the things *actually required* for DNSSEC in the browser.  I can implement this through the Secure Proxy functionality, but it's a huge mess.

This is a better ask than "heh, give me DNSSEC support" because there's a number of unsolved issues in our domain:

1) How do we handle downgrade attacks?
2) How do we achieve sufficient performance such that DNSSEC doesn't create huge connection delays?
3) How do we interact with X.509 certificates?
4) What do the records actually look like (the one place that everyone is playing)

Until there's consensus here, it's better to just let the extensions work themselves out.
I don't see any objections to providing an interface for augmenting the current X.509 validation. But, how can we reduce the risk added by APIs that allow extensions to instruct us to disable security checks and change security indicators?

For example, do we really need to allow the extensions to say "Allow" or do we merely need the extensions to say "This cert chain has a trusted non-X.509 trust anchor?"
Component: Networking → Security: PSM
QA Contact: networking → psm
The idea is that the extension is allowed control over the DV set of security indicators. (I am OK with an independent evaluation for the EV indicators.).  DV is reflecting trust in DNS anyway, so the canonical case is certainly no less safe. 

Remember that the extension API's have full code execution privileges so it's a little silly to try to restrict them. They can do anything.

I would not object to there being a way to get more details on the nature of the accepted credentials. But the basic context returned should be an Allow, as a valid chain validated cert returns an Allow. It's just an alternate validation.
Oh, obviously I'd need to know the domain being validated.  So, for example, if we were browsing to www.foo.com and were presented a cert, the event should tell us not only "here's the cert" but "here's a cert presented while browsing to www.foo.com".

This actually doesn't happen in most APIs, so I'm pointing it out here.
Attached file interface proposal (obsolete) —
Attachment #524526 - Flags: review?(bsmith)
Attachment #524526 - Flags: feedback?(dan)
I would prefer to provide a callback, offered by PSM, rather than an observer.

Please have a look at the attached interface proposal.

An addon would implement nsISSLAuthOverrideCallback.

An addon would query for the PSM service to get interface nsISSLAuthOverrideHook and register itself with addAuthOverrideCallback().

Each time PSM wants to verify a certificate related to a SSL connection,
PSM could call:
- nsISSLAuthOverrideCallback.advanceNotification()
- nsISSLAuthOverrideCallback.verify()
  (or cancel)
Attachment #524526 - Flags: feedback?(sstamm)
We'll obviously need a nsISSLAuthOverrideHook.removeAuthOverrideCallback, too, but that's trivial, will add it in next revision.
I'm not sure I understand the use case for both .advanceNotification() and .verify() called one line after the other; couldn't they be effectively wrapped into one function, or is there a slew of slow-ish code between the two?
(In reply to comment #8)
> I'm not sure I understand the use case for both .advanceNotification() and
> .verify() called one line after the other; couldn't they be effectively wrapped
> into one function, or is there a slew of slow-ish code between the two?

This allows an addon to perform its own verification code in parallel to PSM's verification code.

The idea is:

- PSM calls advanceNotification()
- PSM does whatever it needs to do to verify on its own,
  which may include OCSP, CRL, AIA
- when PSM is done, it calls callback.verify()
  and will combine the results to make a final decision
... and during the time PSM performs its verification, the addon could run its own code on a separate thread
advanceNotification should be called way before we have a cert to pass to it, so it can't have a cert parameter. We should be calling it as soon as we know we are going to open a TLS connection, and preferably at the time that we would do the DNS lookup, and before we've even opened the TCP socket and/or done the TLS handshake.
(In reply to comment #9)
> - PSM calls advanceNotification()
> - PSM does whatever it needs to do to verify on its own,
>   which may include OCSP, CRL, AIA
> - when PSM is done, it calls callback.verify()
>   and will combine the results to make a final decision

For some uses, verify() should be called before we do OCSP/CRL/AIA so that it can tell PSM to fail without even trying any of those requests. (If an addon needs the results of OCSP/CRL/AIA fetching and/or validation, then we need a separate function for that.)
Comment on attachment 524526 [details]
interface proposal

If we want the interface to be able to pre-authorize (before NSS) *or* post-authorize (after NSS), then the way Kai describes it in comment 9 makes sense to me.  Perhaps the comments in the interface for the two functions could explicitly say "this happens before we start checking the chain and revocation" and "this happens after a decision has been made by NSS".  

If we go this way, I recommend renaming the functions too.  Perhaps "preSSLAuthNotify" and "postSSLAuthVerify"?
(In reply to comment #11)
> advanceNotification should be called way before we have a cert to pass to it,
> so it can't have a cert parameter. We should be calling it as soon as we know
> we are going to open a TLS connection,
> and preferably at the time that we would
> do the DNS lookup, and before we've even opened the TCP socket and/or done the
> TLS handshake.

That's fine. We can do that in nsSSLIOLayerAddToSocket.

However, in this scenario, I'd prefer to drop my offer of passing callback-provided-userData around, to keep things simple.
(In reply to comment #14)
> However, in this scenario, I'd prefer to drop my offer of passing
> callback-provided-userData around, to keep things simple.

That sounds fine (superior) to me, as long as there's a way for the extension is able to somehow lookup data it has cached during advanceNotification() based on the parameters it is given in verify().

Also, I noticed you didn't include the TCP port in verify()'s signature.
(In reply to comment #12)
> For some uses, verify() should be called before we do OCSP/CRL/AIA so that it
> can tell PSM to fail without even trying any of those requests.

I conclude we actually need three stages:

- early-advance-notification
  => cert not yet known, returns void

- pre-psm-auth-verify
  => cert is known, psm has not yet worked on verification
  => the addon decides between
        STOP = "I have results already and PSM should stop"
     and
        PENDING "ok, now that I have the cert,
                 let me start my expensive work in parallel,
                 I'll tell PSM later"

If addon said STOP, PSM should stop processing,
and whatever PSM actually decides to do,
it will not call the addon again

- final-auth-verify
  => if addon had returned PENDING,
     then PSM will call this one after PSM is done with it's work


> If an addon
> needs the results of OCSP/CRL/AIA fetching and/or validation, then we need a
> separate function for that.

If I understand correctly, we don't need this functionality now, so I propose to skip this for the initial implementation.

However, we can include that in the design now.

final-auth-verify can pass parameter
    nsISupports psmVerificationResults
that we might implement later


(In reply to comment #14)
> Also, I noticed you didn't include the TCP port in verify()'s signature.

Thanks for noticing, fixed.
The very earliest stage mentioned here ("before even commencing a TCP connection") might be appropriately merged with the "http-on-connection" observer notification proposed in bug 354993 comment 89.
Attached file interface proposal v2
Attachment #524526 - Attachment is obsolete: true
Attachment #524526 - Flags: review?(bsmith)
Attachment #524526 - Flags: feedback?(sstamm)
Attachment #524526 - Flags: feedback?(dan)
Attachment #524552 - Flags: review?(bsmith)
Attachment #524552 - Flags: feedback?(sstamm)
While bugzilla had an outage, we continued in email, and Dan said:

"As an architectural thing, I'd love it if browsers could retrieve content from a TLS socket w/o rendering it.  This could have major performance gains, without security implications.

...

This ain't bad.  For my needs, I don't care about the results of OCSP etc.

Out of curiosity, does Moz have the ability to send arbitrary UDP packets, and learn the DNS server of the platform?  My initial implementation will be native code but I could see porting to JS at some point."
The retrieving content from a TLS socket thing is a bad idea -- it means you have to leak the HTTP request, with whatever tokens/cookies are inside.  Can't do that.  So I rescind that suggestion.

I assume advanceNotification would be called on DNS prefetch?

There's some interesting interactions with HSTS to consider, if we want external providers to be able to block HTTP.
(In reply to comment #21)
> I assume advanceNotification would be called on DNS prefetch?

Our code is heavily layered and modularized.

We could easily call it at the time our core networking engine talks to the application module that manages SSL (we call that module PSM). 

PSM doesn't do DNS.

We could call it either at socket construction time (prior to DNS and connect),
or afterwards.
DNS Prefetch is the best time to tell me to go out and collect the TLS state, especially if you want to implement HSTS over DNS.

I don't need PSM to do any DNS work; that's the plugin's job.  In the long run it'd be useful (not required) if the plugin could get UDP and local DNS server identification resources from Moz though.
I would like to get you started with an implementation that is completely inside the PSM, this greatly simplifies the review processes and increase likelyhood that this can be added immediately.

I propose, for the initial patch, let's do the early-notification at the time we know we want to connect, prior to connecting.

We can work to move this notification to the prefetch code later.
That works.
Comment on attachment 524552 [details]
interface proposal v2

Looks good, Kai.  The comment blocks for preInternalVerify() and finalVerify() could use a bit more detail about *when* they will be called.  With that minor tweak, I think this looks great.
Attachment #524552 - Flags: feedback?(sstamm) → feedback+
Attached patch Patch v1 (obsolete) — Splinter Review
This patch is actually on top of the patch from 479393, which I happen to work on in parallel, so the real patch will have different contexts. Will merge later.
Attached patch Patch v2 (obsolete) — Splinter Review
Assignee: nobody → kaie
Attachment #524573 - Attachment is obsolete: true
Attachment #524590 - Flags: review?(bsmith)
Attachment #524552 - Flags: review?(bsmith)
Oh, please make sure these validators can stack, if at all possible.
(In reply to comment #30)
> Oh, please make sure these validators can stack, if at all possible.

Looks like the patch v2 in attachment 524590 [details] [diff] [review] happily allows multiple validators.
Attachment #524590 - Flags: review?(rrelyea)
I've been working with some folks (Nathan Wilcox, Zooko Wilcox-O'Hearn, David-Sarah Hopwood) on some alternative-cert-validity techniques we call PMAGH, some of which include putting a hash of the expected leaf cert (or some other cert on the chain) in the URL: either in the hostname, a query argument, or the fragment.

We'd love to use this hook to develop an add-on to test these techniques. To pull that off, we'd need to look at the full URL in the validation function (ideally pre-parsed so the function couldn't screw it up). How hard would it be to add the path string to this hook interface?

Some other wishlist items that would help PMAGH (many of which are out-of-scope for this ticket, of course, but I'll mention them for context):

 * ideally the 'cert' argument provides access to the whole cert chain, not just the leaf
 * hook could get some extra information about the TLS negotiation, like any extensions in the ClientHello/ServerHello, and the cipher suite being negotiated
 * the validator's response should be able to override the Larry warning, or replace it with a different one (our needs are probably the same as Dan's here)
 * the response should be able to blank out the URL/location-bar string (e.g. to permit self-signed certs but not make user-facing claims about site identity), and change the resulting page's Same-Origin-Policy origin (perhaps to a nonce, not shared with any other page, again to explore self-signed certs)
oh, and it'd be nice if the hook function could learn something about the specific request for which the connection is being established, so that the add-on could correlate decisions made in the hook with decisions (or informational displays) made later as the resulting content is rendered. I don't know how that connection is represented, however, so I'm not sure what to specifically ask for.
Couple comments:

1) Access to the full chain is somewhat important.  I wouldn't block the patch
without it, but there's a lot of people who want to be able to put a signer in.
2) If it's easy, I can see uses for having the full URL path.
3) I don't really know what I'd want out of the TLS handshake, or anything about the specific request.  Brian -- details?
4) I'm a little nervous about SOP/Address bar changes, but it's possible for
them to be done right.

Attempts at self-certifying URLs go back at least as far as 1999:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.6482

They were in my Phreebird talk too, and Dan Bernstein has a mode with them. 
They *generally* don't scale, but the goal of this ask is neutrality.  Let
people actually work with this layer of the trust model.
(In reply to comment #34)
> 
> 3) I don't really know what I'd want out of the TLS handshake, or anything
>    about the specific request.  Brian -- details?

We don't have anything too specific, just trying to extend the space in which
we can experiment without coming back and asking for a new API. The vague
idea would be that the servers might indicate their willingness to
participate in a new validation scheme via a TLS extension (and the HTTP
response may be too late), and the addon could then see that willingness.

> 4) I'm a little nervous about SOP/Address bar changes, ...
> 
> ... but the goal of this ask is neutrality.

Exactly: the hook makes it possible to run some experiments. There are lots
of bad decisions/UIs that could be made with this power, but that's no reason
to not explore the solution space.
(In reply to comment #32)
> 
> We'd love to use this hook to develop an add-on to test these techniques. To
> pull that off, we'd need to look at the full URL in the validation function
> (ideally pre-parsed so the function couldn't screw it up). How hard would it be
> to add the path string to this hook interface?


The existing interfaces between modules don't allow us to obtain the full URL. We only have host and port from inside the raw socket handling.

Note that the path idea does not apply to SSL in general, but only to http and some other protocols wrapped inside SSL.

So first at all, you would have to find ways to extend or replace the existing interfaces and stick in an additional URL, and change the core to use your updated interfaces.

Furthermore, SSL is designed to open a channel, and because creating the channel (handshake, etc.) is quite expensive, that's why the SSL will usually be reused for many transaction.

Which means, your hook would get the initial URL, only, but it wouldn't know anything about all the other transactions that will be executed on the channel afterwards.

So, the short answer is, at this point, I don't see a straightforward way to give you the full URL in the hook.


>  * ideally the 'cert' argument provides access to the whole cert chain, not
> just the leaf


If you want that, the hook must wait with its work, until PSM has fully completed it's own verification checking.

If you want, we could add the chain to the parameter list of the finalVerify call.


>  * hook could get some extra information about the TLS negotiation, like any
> extensions in the ClientHello/ServerHello, and the cipher suite being
> negotiated

I'm not able to get this done immediately (like today or tomorrow for Firefox 5).

Here's my proposal:
- let's skip this for now
- let's add a generic "nsISupports moreInformation"
  to the preInternalVerify() and finalVerify()
- if anyone wants to make more information in the future,
  then addons can attempt to QueryInterface moreInformation to 
  a specific interface and see if it's available.


>  * the validator's response should be able to override the Larry warning, or
> replace it with a different one (our needs are probably the same as Dan's here)
>  * the response should be able to blank out the URL/location-bar string (e.g.
> to permit self-signed certs but not make user-facing claims about site
> identity), and change the resulting page's Same-Origin-Policy origin (perhaps
> to a nonce, not shared with any other page, again to explore self-signed certs)


This sounds like it would need a lot more careful design, including localization etc. Well, I can't do that now, but I understand there is a need to pass additional information back from the addon to the application.

Again, I propose, let's add another output parameter to the finalVerify function, a generic "nsISupports moreVerificationResults", that the application might use to QueryInterface to potential future interfaces.
(In reply to comment #33)
> oh, and it'd be nice if the hook function could learn something about the
> specific request for which the connection is being established, so that the
> add-on could correlate decisions made in the hook with decisions (or
> informational displays) made later as the resulting content is rendered. I
> don't know how that connection is represented, however, so I'm not sure what to
> specifically ask for.

Request? Do you mean nsIRequest? Again, you will have enhance the core architecture to enable us to pass those things around, and again, you will be called only first the request of a transport channel, and the transport channel will be reused for many additional requests.
(In reply to comment #34)
> 
> 1) Access to the full chain is somewhat important.  I wouldn't block the patch
> without it, but there's a lot of people who want to be able to put a signer in.


We can make the chain available in finalVerify, but what do you mean by "put in a signer"? Who puts in what, for which purpose? From application to addon, or vice versa, or in both directions? What behaviour do you expect?
Attached patch Patch v3 (obsolete) — Splinter Review
Attachment #524590 - Attachment is obsolete: true
Attachment #524590 - Flags: review?(rrelyea)
Attachment #524590 - Flags: review?(bsmith)
Attachment #524759 - Flags: superreview?(rrelyea)
Attachment #524759 - Flags: review?(bsmith)
> > 1) Access to the full chain is somewhat important. I wouldn't block the
> >    patch without it, but there's a lot of people who want to be able to
> >    put a signer in.
> 
> We can make the chain available in finalVerify, but what do you mean by
> "put in a signer"? Who puts in what, for which purpose? From application to
> addon, or vice versa, or in both directions? What behaviour do you expect?

I don't think he was talking about addons and applications sending signed
messages to each other. If Dan's thinking the same way I am, then by "signer"
he probably meant the web server using an intermediate signing certificate in
its cert chain, so e.g. sites can rotate keys or use short-lived
certificates, such that the "out-of-band" validation information (cert hash
distributed through DNS, or added to the URL) can point at the intermediate
cert instead of the leaf cert.
Mostly what Brian said, except there's a few places who have Enterprise CA's who would like them to work globally.  I don't see that as an unreasonable request.
You guys *sure* you can't give me a nudge whenever a DNS prefetch is triggered?  It would do very nice things for perf.
Summary: Create a "tls-certificate-received" topic for cert-validating observers → Implement extension point for extensions to influence trust decisions in PSM
It occurred to me that I had been thinking of this the wrong way. Instead of always asking "Can we establish *this* new SSL/TLS connection for *that* HTTP request?", maybe we have to ask "Given *this* HTTP request, is *that* SSL/TLS connection a valid transport for it?". If I understand correctly Firefox currently has to ask the question that way in order to decide if an extant SSL/TLS connection is the right one to use for a given request (based on the domain name in that request).

If my understanding is correct then there has to be a function which takes a set of extant SSL/TLS connections and an HTTP request (or some information extracted from the full HTTP request--possibly just the domain name) and return which extant SSL/TLS connection should be used, or else indicate that none of them is suitable.

Is this true that there is such a "selection from the extant SSL/TLS connections" function? If so, is this in *addition* to, or integrated with, the preInternalVerify() and finalVerify(), which I assume mean "given a certain SSL/TLS connection and a certain request, is it okay to proceed and send this request on this connection?"?

Okay, before I post any more speculations about this, I'll go hunt up preInternalVerify() and finalVerify() and read them. If you can give me pointers to any code which might implement the complementary "which connection to use for this request?" logic I would appreciate it.
So this missed the Firefox 5 train, then :-(

Gerv
The entire bug?
Yes, we missed it, because nobody reviewed it in time, and there was a deadline on Tuesday. I hacked the patch on Tuesday as fast as I could, but having a patch is not sufficient, unfortunately.

Dan, we do have the ability to produce a test build for all platforms. If this is simply something you want to play with as early as possible, I could produce such a build and make it available to you. You could test locally.
The functionality provided wasn't sufficient for my needs so I resigned myself to waiting for the Firefox 6 train.
Kai, yesterday I said that this feature should be made independent of libpkix, but now I don't think it is possible to do so:

In order to pass a chain to the extension, we would need to build the chain. But, we need to know what trust anchors to use when constructing the chain. A DANE(-like) extension would be able to determine which trust anchor(s), if any, to use by querying DNS, looking up the matching certs in the cert db and/or the handshake, and then restricting the set of trust anchors to the ones found in DNS (if any). AFAICT, we could do exactly this using the cert_pi_trustAnchors parameter to CERT_PKIXVerifyCert when we are using libpkix.

Accordingly, I think we should change the interface in three ways: (1) pass all the non-EE certs in the handshake, in addition to the EE cert, to preInternalVerify; (2) provide (for preInternalVerify) convenient access for extensions to easily query for (root) certs that aren't in the handshake; (3) provide a way for the preInternalVerify to optionally return a set of trust anchors to be used as the cert_pi_trustAnchors input to libpkix; and (4) pass the full cert chain to finalVerify.

Dan, I agree that it seems like we should provide a hook into DNS pre-fetching but that is a separate bug.

BTW, I don't like the names preInternalVerify and finalVerify. preVerifyCertChain and postVerifyCertChain would be better names.
(In reply to comment #48)
> In order to pass a chain to the extension, we would need to build the chain.
> But, we need to know what trust anchors to use when constructing the chain. A
> DANE(-like) extension would be able to determine which trust anchor(s), if 
> any, to use by querying DNS, looking up the matching certs in the cert db 
> and/or the handshake, and then restricting the set of trust anchors to the 
> ones found in DNS (if any). AFAICT, we could do exactly this using the 
> cert_pi_trustAnchors parameter to CERT_PKIXVerifyCert when we are using 
> libpkix.

...

> (3) provide a way for the preInternalVerify to optionally return
> a set of trust anchors to be used as the cert_pi_trustAnchors
> input to libpkix

This won't work, because the exclusion part of DANE isn't just about trust anchors. I believe the eventual semantics of DANE will be to build the cert chain like normal, then ensure that the chain contains a cert that matches one of the certs identified in the DANE records.

We need to verify what libpkix does when it encounters an EE cert that is self-signed, and when it encounters a chain that terminates in an untrusted self-signed root, to see how well this interface would be able to handle the inclusion part of DANE-like protocols.
The extension point for enabling extensions to do work at DNS resolution time is is bug 652295.
See Also: → 652295
(In reply to comment #48)
> 
> Accordingly, I think we should change the interface in three ways: (1) pass
> all the non-EE certs in the handshake, in addition to the EE cert, to
> preInternalVerify; 


First, I disagree that the interface needs to be changed.

The proposed API already allows us to provide additional information (like additional certs received during the handshake) in a future version of the implementation - by allowing extensions to use QueryInterface on "nsISupports additionalInformation".


Second, the list of certs obtained during the handshake is internal to NSS. I don't know of a way to retrieve this list at the PSM level.

Do you think your proposal should block (and delay) the current work, by waiting for a future version of NSS that makes this data available via a new API in a future NSS release? I think it should not.

I thought the plan of this bug was: let's get something done - now - without introducing additional dependencies. Your proposal opposes that plan.


> (2) provide (for preInternalVerify) convenient access for
> extensions to easily query for (root) certs that aren't in the handshake;

Convenient access can be enabled in the future by using QueryInterface on  "nsISupports additionalInformation".

This depends on someone to contribute a patch that carries this information from NSS to PSM, and adds the information to the data object that we pass to the extension. Future work, unless people think that this extension is worthless in the current initial state.


> (3) provide a way for the preInternalVerify to optionally return a set of
> trust anchors to be used as the cert_pi_trustAnchors input to libpkix;


Again, this is already possible. Look at patch v3. The extension is able to return a data object, and PSM can use QueryInterface on "nsISupports additionalFeedback" to query for data being returned.

Yes, the initial implementation doesn't do this, but the interface allows for it. Unless you think the initial API and the initial implementation is worthless without this, I propose to start with the first step, and add the interface. People can contribute enhancement patches later.
 

> (4) pass the full cert chain to finalVerify.

Patch v3 already allows that, nsIArray certChain.


> BTW, I don't like the names preInternalVerify and finalVerify.
> preVerifyCertChain and postVerifyCertChain would be better names.

I propose let's not start a long discussion about the names.
The names are not that much important, because we have a detailed API description in the interface file.
I don't like your proposed names, because your names imply that our internal verification would be limited to chain checking - but it's not. We do other checks as well, like cert names, validity.

I think the current names are good. The names are not relating to our internal code. The names are relating to the name of the interface nsISSLAuthOverrideCallback.
If this is urgent, then I would like to see additional contributions to get this done. Either in the form of supporting me to get this done, or by code that implements additional mandatory requirements.
I am willing to help to get this done before the migration to firefox 6 / next aurora, but because of the Mozilla rules, I can't get it done by myself.
So everyone knows: the Firefox 6 deadline is May 24th, next Tuesday. It would be a great shame if we were to miss another boat with this. Let's not let the best be enemy of the good. Going to Aurora is not a "no more changes" deadline, but it is a "no more features" deadline, so we have to get _something_ in by then.

Brian: given that we want something by Tuesday, which changes do you think are absolute requirements and which could be discussed after we have committed an initial version? 

Kai: are you referring in comment 53 to the need for review, or something else?

Gerv
Yes, I would like to ask for a constructive review, that focuses on correctness of the attached patch v3, and should be one the following three options:

(a) A statement that this patch and the API is sufficient for our attempt to
    get an initial extension point implemented.
    A statement that the code that hooks this API into the existing code
    looks sufficient for our initial integration, as proposed in 
    the comments inside the patch.
    If mistakes in the code are identified, please point them out,
    so we can fix them.

or

(b) A statement that the interface looks incomplete and needs rework.
    Even then, the review should not stop at the API,
    because we already have an r+ of the API.
    The review should also make statements about the reminder of the patch.
    If there is need for more then the current patch does,
    then I would appreciate if the reviewer assists our work,
    as much as possible. If there are lots of changes requested,
    then instead of trying to write text, I would prefer if the reviewer
    actively helped this task, by implementing the ideas into code,
    and attaching an modified patch.
    I could then review the differences on top of my patch.

or

(c) A statement that the patch is incomplete and not acceptable 
    and that we should give up our idea of getting this done within the 
    next 4 days.
Nothing I raised should be considered a blocker: any progress on this would be welcome to me, with or without any of the wishlist features. I think having the full chain available to the hook will be important for both Dan's use case and mine, but I'd bet some interested experiments can be run with just the leaf cert.
I'm new here, so please bear with me if I don't understand how things are done.

I believe the current interface is insufficient to enable my project "PMAGH", but I'll try to help by reviewing this patch.

Why is it insufficient for my project? Mainly, I need for the plugin to be given the full URL of the query--not just the hostname/IPaddr--and this implies that I need to plugin to be invoked before every HTTP request, not just during connection establishment. Please see comment 32 and comment 43 for more details.

So, I looked at https://bugzilla.mozilla.org/attachment.cgi?id=524759&action=diff and here are my notes:


"treatDecisionAsBinding" is underspecified and the current implementation ignores it. It should be removed from the interface. If in the future we come up with a semantics for that and an implementation, then that can be encoded into the additionalInformation and additionalFeedback arguments.

Alternately, the implementation should inspect treatDecisionAsBinding, and if it is true then do not perform our own checks.


The comment on line 1097:

	  * NB: This is our only defense against Man-In-The-Middle (MITM) attacks!

Should be changed to:

	  * NB: Unless the auth override callback has already defended against MITM attacks, then this is our only defense!


I'm not sure of the implications of this comment:

3437    // TODO: Well, now that we're doing lots of additional checks in
3438    // PSM_SSL_PKIX_AuthCertificate, this no longer covers all failure
3439    // scenarios.

Does "lots of additional checks" refer to auth override callbacks? In the current implementation line 3437 in nsNSSBadCertHandler() will never be reached if the auth override callback returned anything other than nsIX509Cert::VERIFIED_OK, because if it did then PSM_SSL_PKIX_AuthCertificate would have set error to SEC_ERROR_REVOKED_CERTIFICATE and then nsNSSBadCertHandler() would exit at its beginning: http://hg.mozilla.org/mozilla-central/file/1b0de5824042/security/manager/ssl/src/nsNSSIOLayer.cpp#l3349 . Now if someone in the future wants to fix up the "1143 // I'm too lazy right now." part of this patch then nsNSSBadCertHandler() would presumably stop exiting right away, and then the assumption in http://hg.mozilla.org/mozilla-central/file/1b0de5824042/security/manager/ssl/src/nsNSSIOLayer.cpp#l3445 , "We ignore the result code of the cert verification", would probably need to be revisited. So I suggest removing this comment on line 3437 in the patch and instead adding a comment at line 1143 "I'm too lazy right now." saying that the future not-so-lazy person needs to revisit http://hg.mozilla.org/mozilla-central/file/1b0de5824042/security/manager/ssl/src/nsNSSIOLayer.cpp#l3445 as well.


Okay those are all my comments.
(In reply to comment #56)
> Nothing I raised should be considered a blocker: any progress on this would
> be welcome to me, with or without any of the wishlist features.

I'm less sanguine about this. It's not exactly that I consider the good (this patch) to be the enemy of the best (my wishlist)--it's more that I consider the "useful for other people's use cases but not mine" (this patch) to be potentially the enemy of the "useful for my use case" (a patch that calls the hook before each HTTP request and include the full URL in the information made available to the hook). :-)

So, I tried to help by reviewing this patch just in case it helps someone else, in order to look for security problems, or in case a future patch can be added to do what I want, but this patch as it stands is actually not useful for my purposes. :-)
I forgot to include a very important part of code review: positive words and thanks. :-) Thank you Kai Engert for this patch! It is clearly written and I found no bugs, and I especially appreciated the comments which helped guide me through reading it.
(In reply to comment #57)
> Why is it insufficient for my project? Mainly, I need for the plugin to be
> given the full URL of the query--not just the hostname/IPaddr--and this
> implies that I need to plugin to be invoked before every HTTP request, not
> just during connection establishment. Please see comment 32 and comment 43
> for more details.

I'm not all that familiar with this part of the code myself, so possibly there is an obvious reason why this would not work, but would it suffice to have your extension also define a http-on-modify-request observer (see https://developer.mozilla.org/en/Setting_HTTP_request_headers )? That hook is already invoked before every HTTP request.  You would need to communicate between your code for that hook and your code for the new hook manually.
Zooko, thanks a lot for your feedback. I think this hook actually isn't appropriate for you, because of SSL session reuse. You want something else. Brian is going to explain it to you, he said. Maybe you can use nsIContentPolicy.
(In reply to comment #57)
> 
> "treatDecisionAsBinding" is underspecified and the current implementation
> ignores it.

The only reason why it's currently ignored, is because we don't have the full implementation yet. This output parameter is one of the primary response values of the hook. Even if we don't use it perfectly yet, I expect that it will be used very soon, probably in the next iteration of the code that reads the results from the hook (after we were able to gain some experience with using the hook).


> It should be removed from the interface. 

I don't see a strong argument to remove it. Having this output parameter available immediately simplifies the future programming. It doesn't hurt to have it, if we agree that this value is a primary output parameter of the hook.


> The comment on line 1097:
> 
> 	  * NB: This is our only defense against Man-In-The-Middle (MITM) attacks!

Actually, this is a very old comment, that I had copied. I think it's obsolete and incorrect. Instead of improving it, I suggest I simply remove it. The world is much more complicated, and there's no need to summarize it in a single comment line.


> I'm not sure of the implications of this comment:
> 
> 3437    // TODO: Well, now that we're doing lots of additional checks in
> 3438    // PSM_SSL_PKIX_AuthCertificate, this no longer covers all failure
> 3439    // scenarios.

This code is inside the "bad cert handler", the PSM code that is only called when our cert verification returned a failure, and NSS calls us saying "go deal with it the error, report it, or tell NSS that we want to proceed anyway".

Currently, the patch's handling should be sufficient, because all such failures are converted in a single error code - revoked - and we already handle that error code explicitly.

So, in other words, this comment is a reminder: Whatever additional failure results we allow (or produce) in PSM_SSL_PKIX_AuthCertificate, here is the place were we must deal with it, obtain it, report it, etc.


> Does "lots of additional checks" refer to auth override callbacks?

I'm referring to this new extension point, which allows for an unknown number of additional checks.


> In the
> current implementation line 3437 in nsNSSBadCertHandler() will never be
> reached if the auth override callback returned anything other than
> nsIX509Cert::VERIFIED_OK, because if it did then
> PSM_SSL_PKIX_AuthCertificate would have set error to
> SEC_ERROR_REVOKED_CERTIFICATE and then nsNSSBadCertHandler() would exit at
> its beginning:

Yeah correct, in the initial implementation, that's it way it is. But I expect we want to change that. So, in order to make sure that future coders don't have to guess, I wrote my comment as a "code bookmark", so that future coders can find this more easily.


> Now if someone in the future wants to fix up
> the "1143 // I'm too lazy right now." part of this patch then
> nsNSSBadCertHandler() would presumably stop exiting right away, and then the
> assumption in
> http://hg.mozilla.org/mozilla-central/file/1b0de5824042/security/manager/ssl/
> src/nsNSSIOLayer.cpp#l3445 , "We ignore the result code of the cert
> verification", would probably need to be revisited.

If you were able to conclude this from the current code, then future coders should be able to draw that same conclusion.

Both places point to each other by mentioning the respective other function name.

Your refering to line numbers in the non-mxr patched lines is difficult to follow. If you still think that my comments are not sufficient, you could attempt to write better comments. Please quote my full comments that you would like to replace, and write your new proposed version of the comments, to avoid having me to guess.
Attached patch Patch v4 (obsolete) — Splinter Review
There have been a lot of context changes.

This is v3 merged to current mozilla-central.
Attachment #524759 - Attachment is obsolete: true
Attachment #524759 - Flags: superreview?(rrelyea)
Attachment #524759 - Flags: review?(bsmith)
Attachment #534112 - Flags: superreview?(rrelyea)
Attachment #534112 - Flags: review?(bsmith)
(In reply to comment #62)
> 
> The only reason why it's currently ignored, is because we don't have the
> full implementation yet. This output parameter is one of the primary
> response values of the hook. Even if we don't use it perfectly yet, I expect
> that it will be used very soon, probably in the next iteration of the code
> that reads the results from the hook (after we were able to gain some
> experience with using the hook).

I'm not convinced that this means it should be included in this release, but I don't know how Firefox handles the evolution of function arguments over releases. If you're not going to remove it, and you're not going to implement it, then I still suggest that you define it. Currently it seems to be "The hook returns a bool, which the caller may or may not respect, but if it does respect it then it means the caller will use the hook's decision about validity and not use any other checks. In this version the caller never respects the bool.". Is that right? I still don't like these semantics, but at least writing it down might help.

But, this is not in my opinion a blocker for this patch.


> So, in other words, this comment is a reminder: Whatever additional failure
> results we allow (or produce) in PSM_SSL_PKIX_AuthCertificate, here is the
> place were we must deal with it, obtain it, report it, etc.

Yes, but I think that reminder should be located where the code is whose change would make this necessary, not where the new code will need to spring up once this becomes necessary. :-)

> If you were able to conclude this from the current code, then future coders
> should be able to draw that same conclusion.

I would probably not have learned this from reading the code -- I learned it from reading the patch.

> Your refering to line numbers in the non-mxr patched lines is difficult to
> follow.

Sorry--I've learned how to refer precisely to lines of code in the code base (using hyperlinks to hg.mozilla.org) but I do not know how to refer precisely to parts of a patch.

I have no further comments on this patch and I see nothing that should prevent it from being applied to trunk.
I implemented a minimal test addon that attempts to use this callback, and I tested it locally.

Unfortunately I run into an error:

Attempt to use JS function on a different thread calling nsISSLAuthOverrideCallback.preInternalVerify. JS objects may not be shared across threads.
(In reply to comment #65)
> 
> Attempt to use JS function on a different thread calling
> nsISSLAuthOverrideCallback.preInternalVerify. JS objects may not be shared
> across threads.

This means that it would be probably be necessary to implement extensions for this hook using C++, because the SSL transport and the SSL verification are not happening on the primary thread.
(In reply to comment #51)

> > (4) pass the full cert chain to finalVerify.
> 
> Patch v3 already allows that, nsIArray certChain.

No, it always passes null. The documentation for this new interface says that the extension can call cert->getChain() to get the chain; however, with libpkix enabled, cert->getChain() doesn't (necessarily) return the correct cert chain because it uses the old cert chain building logic. Instead, we need to get the cert chain that CERT_PKIXVerifyCert outputs and always pass it to the extension. With this, an extension should be able to implement the same kind of exclusion mechanism that Google Chrome is doing.

We must have have automated tests that demonstrate that what we are doing actually works--that is, that an extension can actually exclude a cert from being trusted, that an extension can safely act as a no-op and say "keep going", that multiple extensions have an opportunity to exclude a cert from working.

I am fine with the initial implementation only providing for exclusion. I am also fine with it only working for native C++ extensions. Please remove the documentation about the unimplemented features from the comments in the interface and remove the "scriptable" annotation. Please add a quick note that the extension will always be called off the main thread and thus must be thread-safe. 

I am glad you asked rrelyea to review this, but we need to also have a SR from bz or another official super-reviewer, since it is a new interface, and since this is the first time I've had to review an interface addition.

If you make the above changes and add the tests today or early tomorrow, I will do the full code review. Until then, I am going to work on enabling libpkix by default. Note that this extension point won't work without libpkix turned on.
Comment on attachment 534112 [details] [diff] [review]
Patch v4

Review of attachment 534112 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/manager/ssl/public/nsISSLAuthOverride.idl
@@ +11,5 @@
> +
> +/*
> + * Implementations of this interface MUST be multi-thread-safe
> + */
> +[scriptable, uuid(cc473506-e7f4-402c-a505-dd1b8471d663)]

Remove the scriptable annotation, since it doesn't work from JS due to threading issues.

@@ +25,5 @@
> +   * parameters:
> +   * hostname: optional (might be empty or null)
> +   * IP: optional (might be empty or null)
> +   * port: the port number of the service we're connecting to
> +   * however, at least one of hostname and IP must be non-null.

Replace this with: "For all methods, if a hostname was used to initiate the connection, then hostname will be non-null and IP will be null; otherwise, IP will be the non-null string representation of the IPv4 or IPv6 IP address for the connection and hostname will be null. port is the port number used for the connection." Move this comment to the top of the interface.

@@ +39,5 @@
> +  /*
> +   * bit flags that may be returned by the verify function
> +   */
> +  const unsigned long NO_FLAGS                                    =      0;
> +  const unsigned long FLAG_DENY_MORE_THAN_DOMAIN_VALIDATION       = 1 << 0;

Remove FLAG_DENY_MORE_THAN_DOMAIN_VALIDATION for now, since it isn't implemented, and because EV-ness is something that is inherent in the cert. This is something we can add back in the next version with clearer semantics.

@@ +42,5 @@
> +  const unsigned long NO_FLAGS                                    =      0;
> +  const unsigned long FLAG_DENY_MORE_THAN_DOMAIN_VALIDATION       = 1 << 0;
> +
> +  /*
> +   * hostname, IP, port: exactly as in advanceNotification   

Remove this line.

@@ +43,5 @@
> +  const unsigned long FLAG_DENY_MORE_THAN_DOMAIN_VALIDATION       = 1 << 0;
> +
> +  /*
> +   * hostname, IP, port: exactly as in advanceNotification   
> +   * cert: mandatory

Document that the extension might not be able to construct a chain for this cert, and that if it does construct a chain, it might not match the chain used for the internal verification or the one used by other extensions.

@@ +51,5 @@
> +   *                         decision should be treated as binding,
> +   *                         or if the application should rather treat it as a recommendation, only.
> +   *                         The app will usually follow this recommendation.
> +   *                         However, obviously, the application reserves the right
> +   *                         to make the final decision.

Document that treatDecisionAsBinding is ignored.

@@ +75,5 @@
> +   * the usual application verification, 
> +   * then the callback may block until done and return DONE.
> +   *
> +   * Returns: true = DONE = Callback is already finished with its processing.
> +   *                        Internal code will NOT call finalVerify.

This is wrong. finalverify still gets called unless the extension tells us to stop.

@@ +99,5 @@
> +   *                              initial implementation passes null
> +   * certChain: If the application has produced the cert chain as part of its verification task,
> +   *            it may pass that chain, as a matter of optimization.
> +   *            This parameter may be null.
> +   *            If the parameter is null, the callback may attempt to call cert->getChain()

We need to change the implementation so that certChain will never be null. The certChain will never be null and it will be the same one constructed and verified by our internal (libpkix) certificate chain validation. The first element of the array will be the end-entity certificate, and the last element of the array will be the trust anchor (root) certificate.

@@ +103,5 @@
> +   *            If the parameter is null, the callback may attempt to call cert->getChain()
> +   * This function is expected to block until the callback has made its final decision.
> +   */
> +  void finalVerify(in nsIX509Cert cert,
> +		   in nsIArray certChain,

Fold cert into certChain, according to the above comment.

@@ +112,5 @@
> +		   in long port,
> +		   in nsISupports additionalInformation,
> +		   in nsISupports internalVerificationResults,
> +		   out unsigned long verificationResult,
> +		   out boolean treatDecisionAsBinding,

Document that treatDecisionAsBinding is ignored.

::: security/manager/ssl/src/nsNSSCallbacks.cpp
@@ +1092,5 @@
> +            PRBool treatDecisionAsBinding;
> +            PRUint32 flags;
> +
> +            cb->FinalVerify(nsnsscert.get(),
> +                            nsnull,

Do not pass null here; see other comments for the IDL.
Attachment #534112 - Flags: review?(bsmith) → review-
(In reply to comment #67)
> 
> No, it always passes null.

Yes, that's a limitation of the current implementation, and it is documented in the API.

In my understanding this bug is about making an API available, not about blocking the API until a perfect implementation is ready.

Patches for a better implementation are welcome.


> The documentation for this new interface says
> that the extension can call cert->getChain() to get the chain; however, with
> libpkix enabled, cert->getChain() doesn't (necessarily) return the correct
> cert chain because it uses the old cert chain building logic.

Yes. Does this need to block making this API available?


> Instead, we
> need to get the cert chain that CERT_PKIXVerifyCert outputs and always pass
> it to the extension.

Yes, does this need to block this API?


> We must have have automated tests that demonstrate that what we are doing
> actually works--that is, that an extension can actually exclude a cert from
> being trusted, that an extension can safely act as a no-op and say "keep
> going", that multiple extensions have an opportunity to exclude a cert from
> working.

This hook depends on libPKIX enabled - but FF 6 will ship with libPKIX disabled by default - this code will be inactive unless the user explicitly adds prefs security.use_libpkix_verification and sets it to true.

I think in order to make progress on this we must take advantage of the fact that people are interested to use it and experiment.

Unfortunately, we have not yet seen contributions that make use of the experimental hook, and provide example addons. I had offered to produce a sample build a while ago, but nobody showed interest.

This means, we don't have tests yet. I tried to produce a minimal addon in JS (comment 65 + 66). This addon showed me that "call extension" works, and that the "keep going" works correctly in case the addon is malfunctioning, and I learned that we probably need a C++ XPCOM component to implement the hook, something I don't see doing that as my priority.

My hope is, once people start to use this addon, they will publish their addon, and this could be minimized and reused for our automated testing, prior to making this hook enabled by default.


> I am fine with the initial implementation only providing for exclusion. I am
> also fine with it only working for native C++ extensions.

ok


> Please remove the
> documentation about the unimplemented features from the comments in the
> interface and remove the "scriptable" annotation.

I'm not sure which pieces you're referring to, and if you still want this, after my new arguments.
Can you please clarify what you'd like removed? And if so, why it can't be kept, because we probably intend to complete the implementation in the future?


I agree that we should declare nsISSLAuthOverrideCallback
as non-scriptable.

I assume that nsISSLAuthOverrideHook can still be scriptable,
used from JS to register an addon's C++ component.


> Please add a quick note
> that the extension will always be called off the main thread and thus must
> be thread-safe.

Already there:
 * Implementations of this interface MUST be multi-thread-safe



> I am glad you asked rrelyea to review this, but we need to also have a SR
> from bz or another official super-reviewer, since it is a new interface, and
> since this is the first time I've had to review an interface addition.

Review from rrelyea is optional, not sure he will have time.

Ok, after we have "r+" we can do "sr?".


> If you make the above changes and add the tests today or early tomorrow, 

I will not work on making a C++ XPCOM addon implementation available, from scratch.

If anyone has started doing so already, please make your work available, so we can see if we can reuse it.


> I
> will do the full code review. Until then, I am going to work on enabling
> libpkix by default.

I assume this work targets FF >= 7.


> Note that this extension point won't work without
> libpkix turned on.

Yes, it's targeted to developers who want an early start with their experiments, that's my understanding. Those who do will currently have to accept the issues currently being tracked by 651246.
I will work on comment 68 and produce an updated patch in the next 2-3 hours.
(In reply to comment #68)
> > +[scriptable, uuid(cc473506-e7f4-402c-a505-dd1b8471d663)]
> 
> Remove the scriptable annotation, since it doesn't work from JS due to
> threading issues.

done



> > +   * parameters:
> > +   * hostname: optional (might be empty or null)
> > +   * IP: optional (might be empty or null)
> > +   * port: the port number of the service we're connecting to
> > +   * however, at least one of hostname and IP must be non-null.
> 
> Replace this with: "For all methods, if a hostname was used to initiate the
> connection, then hostname will be non-null and IP will be null; otherwise,
> IP will be the non-null string representation of the IPv4 or IPv6 IP address
> for the connection and hostname will be null. port is the port number used
> for the connection." Move this comment to the top of the interface.

I mostly agree.
My thinking was, if an hostname was initially being used,
then we _might_ have the IP address available after having resolved it in gecko,
and we _might_ be able to pass an IP address in addition.

So, I propose, instead of saying
  "if hostname non-null, then IP will be null"
we could say
  "if hostname non-null, then IP will be either null or non-null".


> > +  /*
> > +   * bit flags that may be returned by the verify function
> > +   */
> > +  const unsigned long NO_FLAGS                                    =      0;
> > +  const unsigned long FLAG_DENY_MORE_THAN_DOMAIN_VALIDATION       = 1 << 0;
> 
> Remove FLAG_DENY_MORE_THAN_DOMAIN_VALIDATION for now, since it isn't
> implemented, and because EV-ness is something that is inherent in the cert.
> This is something we can add back in the next version with clearer semantics.

ok


> > +   * cert: mandatory
> 
> Document that the extension might not be able to construct a chain for this
> cert, and that if it does construct a chain, it might not match the chain
> used for the internal verification or the one used by other extensions.

ok


> > +   *                         decision should be treated as binding,
> > +   *                         or if the application should rather treat it as a recommendation, only.
> > +   *                         The app will usually follow this recommendation.
> > +   *                         However, obviously, the application reserves the right
> > +   *                         to make the final decision.
> 
> Document that treatDecisionAsBinding is ignored.

In my understanding, my comment already explains that treatDecisionAsBinding may be ignored by the application.

This is about an interface. 
The current implementation follows the choices allowed by the current interface documentation.

If you want to document an implementation behaviour inside an interface description, then you must update the interface description in the future. I think we want to avoid that.

However I have updated the final sentence of the existing comment and make it stronger.
Instead of saying
  The app will usually follow this recommendation.
  However, obviously, the application reserves the right
  to make the final decision.

it's now saying:
  The app will usually follow this recommendation.
  However, obviously, the application reserves the right
  to make the final decision, or to ignore this value.
  For example, even if the extension decides that a cert is good,
  the application reserves the right to check for other reasons to
  not trust the cert.

(and that's what the current implementation does).


> > +   * Returns: true = DONE = Callback is already finished with its processing.
> > +   *                        Internal code will NOT call finalVerify.
> 
> This is wrong. finalverify still gets called unless the extension tells us
> to stop.

Thanks for noticing. I've added a "boolean skipFinalCallback" if retval is "done".


> > +   * certChain: If the application has produced the cert chain as part of its verification task,
> > +   *            it may pass that chain, as a matter of optimization.
> > +   *            This parameter may be null.
> > +   *            If the parameter is null, the callback may attempt to call cert->getChain()
> 
> We need to change the implementation so that certChain will never be null.
> The certChain will never be null and it will be the same one constructed and
> verified by our internal (libpkix) certificate chain validation. The first
> element of the array will be the end-entity certificate, and the last
> element of the array will be the trust anchor (root) certificate.

For the sake of getting something done now, I propose to allow null, and work on the implementation to pass a full chain later.

I thought the intention of this hook is to enable extensions to verify a cert, using their own strategy. I don't see why giving our internally obtained chain is mandatory, and even if we later make it mandatory, and will later promise it will always be passed, I think, this enhancement of the implementation can still can be done at a later time.


> > +   *            If the parameter is null, the callback may attempt to call cert->getChain()
> > +   * This function is expected to block until the callback has made its final decision.
> > +   */
> > +  void finalVerify(in nsIX509Cert cert,
> > +		   in nsIArray certChain,
> 
> Fold cert into certChain, according to the above comment.

I'm not sure about the meaning of "fold".
Is your proposal to remove the parameter "cert" from the API?

If yes, this proposal surprises me.

First, "cert" is a primary key, that an extension could use to match an earlier call from preInternalVerify to finalVerify. It would be difficult for an extension to find that primary key in an unsorted list of certs. (At the very least it could require additional logic in an extension, and why not simply avoid that, by keeping cert separate?)

Second, you don't say "why" you are proposing this, please do. Is your desire to introduce an additional semantic? Are you requesting that the extension should do its work for each of the certs in the chain?

The API is designed to verify a given cert in the context of a hostname.

Maybe an extension is able to verify the EE cert in that context, but the extension might not know how to verify an intermediate in the context of a hostname. The extension would have to return different verification results for EE and intermediate. Given that a security validation function is supposed to return the worst decision as the overall decision, it would have to return "fail".

I think we must clearly define what this API is supposed to do.

Is it supposed to say
(a) perform verification of a server's cert using alternative strategies
or is it
(b) perform verification of a list of certificates, which NSS/PSM
    has assembled as the preferred chain of true,
    and tell us whether all of these certs are valid 
    in the opinion of an extension.

In my understanding, we are trying to do (a).

In my opinion, if we ever want to do (b),
we can do so. PSM can simply call the extension multiple times,
i.e. once for each of the (chain) certs we want to have verified.

In my opinion calling the API multiple times is cleaner, because it allows
us to look at the results of each of the certs separately,
and it allows us to treat extension verification results differently,
e.g. we might decide that PSM always respects the decision regarding the EE
cert, but we might decide that PSM may ignore extension decisions about
our prefered intermediates?


> > +		   in long port,
> > +		   in nsISupports additionalInformation,
> > +		   in nsISupports internalVerificationResults,
> > +		   out unsigned long verificationResult,
> > +		   out boolean treatDecisionAsBinding,
> 
> Document that treatDecisionAsBinding is ignored.

Done, see my changed comment above.


> > +            cb->FinalVerify(nsnsscert.get(),
> > +                            nsnull,
> 
> Do not pass null here; see other comments for the IDL.

I would like to keep the null, as I don't have a quick solution for passing the correct chain, also see my other comments.
Attachment #534112 - Flags: superreview?(rrelyea)
Attached patch Patch v5 (obsolete) — Splinter Review
Addressed some of Brian's requests.
Attachment #534298 - Flags: superreview?
Attachment #534298 - Flags: review?
(In reply to comment #69)
> (In reply to comment #67)
> > 
> > No, it always passes null.
> 
> Yes, that's a limitation of the current implementation, and it is documented
> in the API.
> Yes. Does this need to block making this API available?

Yes, because it affects the interface of the API. It is easy to fix, AFAICT: you pass in another output parameter of type cert_po_certList to CERT_PKIXVerifyCert, and it will spit out the cert chain the parameter's value.pointer.certList. Then, extract those certificates in order and stuff then into the array. You can do this unconditionally (even if no callbacks are registered), because I need the output cert chain for other libpkix-related work. 

> > We must have have automated tests that demonstrate that what we are doing
> > actually works--that is, that an extension can actually exclude a cert from
> 
> This hook depends on libPKIX enabled - but FF 6 will ship with libPKIX
> disabled by default - this code will be inactive unless the user explicitly
> adds prefs security.use_libpkix_verification and sets it to true.

Enabling libPKIX by default should not be blocked by (the lack of) these tests, but that is what your proposal would do.

> My hope is, once people start to use this addon, they will publish their
> addon, and this could be minimized and reused for our automated testing,
> prior to making this hook enabled by default.

They will be able to do this in FF7 Nightly and we will have even more flexibility to modify it based on their feedback that way.

> Can you please clarify what you'd like removed? And if so, why it can't be
> kept, because we probably intend to complete the implementation in the
> future?

I indicated this in the review I did yesterday.

> Ok, after we have "r+" we can do "sr?".

They can be done in either order. I recommend requesting the SR as the SRs are busy. But, I don't know who the best person to request it from is.

> I will not work on making a C++ XPCOM addon implementation available, from
> scratch.

We need simple automated tests, like we have for every new feature, not an add-on.

> I assume this work targets FF >= 7.

Yes. I am doing the work now that will allow it to be turned on early in the FF7 cycle.
By the way, there's a reason it's important to give our chain to the callback, instead of letting the callback build its own: Most people working on DANE- and DANE-like stuff are assuming that certificate chains are built according to the way the TLS specification recommends. However, we (and other browsers) do not follow the TLS specification regarding certificate chain building, but instead do PKIX(-sh) cert chain building. There can significant differences in certificate chains built using each of those two methods. A path built using PKIX path building may side-step an exclusion policy/implementation that assumes the TLS path building semantics. By giving our built chain to the extension, the extension can ensure that it has the chain that the rest of Firefox will be operating on.
>   cb->FinalVerify(nsnsscert.get(),

Thanks to Honza for suggesting this should be:
  nsrv = cb->FinalVerify(nsnsscert.get(),
(In reply to comment #73)
> 
> Yes, because it affects the interface of the API. It is easy to fix, AFAICT:
> you pass in another output parameter of type cert_po_certList to
> CERT_PKIXVerifyCert, and it will spit out the cert chain the parameter's
> value.pointer.certList. Then, extract those certificates in order and stuff
> then into the array. You can do this unconditionally (even if no callbacks
> are registered), because I need the output cert chain for other
> libpkix-related work. 

Ok, I will do it.


> > > We must have have automated tests that demonstrate that what we are doing
> > > actually works

I have little to no experience writing new tests, so I wouldn't be able to get this done in time. Honza is trying to get a minimal test done. Thanks Honza!


> They will be able to do this in FF7 Nightly and we will have even more
> flexibility to modify it based on their feedback that way.

I understand you're trying to move this into ff7, but I understand we'd like to get it done today.


> > Can you please clarify what you'd like removed? And if so, why it can't be
> > kept, because we probably intend to complete the implementation in the
> > future?
> 
> I indicated this in the review I did yesterday.

Yes, but I'm not sure, I would appreciate a clarification, which pieces you mean. Maybe you could simply copy/paste what you still want removed.

I suspect that this is about "treatDecisionAsBinding", but I'm not sure this is what you're referring to, and I had hoped my arguments could convince you.
Attached patch Patch v7 (obsolete) — Splinter Review
Attachment #534298 - Attachment is obsolete: true
Attachment #534298 - Flags: superreview?
Attachment #534298 - Flags: review?
Attachment #534435 - Flags: superreview?
Attachment #534435 - Flags: review?
Attached patch C++ test component (v c1) (obsolete) — Splinter Review
I no longer think that nsISSLAuthOverrideCallback must be non-scriptable.

The interface must be implemented by C++ code, but why not allow that component to be accessed by JS?

It's easiest to implement tests using JS.

I think it should be OK to allow nsISSLAuthOverrideCallback to be scriptable, even if we require it to be implemented in C++.

That way we can use

 var override = Components.classes["@mozilla.org/security/sslauthoverridehook;1"].getService(Components.interfaces.nsISSLAuthOverrideHook);
var impl = Components.classes["@mozilla.org/test-sslauthhook-impl;1"].getService(Components.interfaces.nsISSLAuthOverrideCallback);
override.addAuthOverrideCallback(impl);

from a test.

Honza thinks getService in JS will fail if the interface is not scriptable.
FYI, Tyler Close told me that, for his YURL/web-key work (where the DNS name
contains a hash of one of the certificates in the chain), that it'd be
sufficient for the hook to get just the leaf certificate. Even if the hook
doesn't get the full chain, his system can do additional work on the
certificate-generation side to make the necessary non-leaf certs available.

In his words: "The YURL implementation can get by with just the end entity
certificate. I was talking with Ben Laurie and it's easy enough to stuff the
root public key into the end entity certificate as a certificate extension."


It'd be nice if the hook were usable by pure JS: I'm hoping to build a Jetpack API for this, and we don't currently have a way to use compiled C++ code in jetpack modules. I'm sure we can find a way, but it'd be a drag. Also, am I safe in assuming that this is a synchronous/blocking hook? Async would be easier to use, of course, but I imagine it could be a lot harder to implement.
Brian, you said, adding a test is mandatory.

However, the usual tests will never execute the code in question.

In my understanding, our tests are run with the default prefs.
This means, our automated tests will always fail, if they try to run these tests?

Should we make test execution conditional on pref value security.use_libpkix_verification having non-default value true?
I'm pretty sure there's a way for individual tests to toggle prefs (and then put them back again).  Ted would know for sure.
(In reply to comment #82)
> I'm pretty sure there's a way for individual tests to toggle prefs (and then
> put them back again).  Ted would know for sure.

Problem is, this particular pref is evaluated at startup, only.

It's a pref to toggle old code vs. new code.
First, thanks a lot to Honza, who has jumped in today and helped out. 

We have a test that works, with restrictions.
The restrictions are:

- because this code is active only with non-default prefs,
  we cannot test it by default

- the automated test only runs if prefs set
    security.use_libpkix_verification = true
  prior to startup

Is that sufficient?

The fact is, we do have a test, and it can be run locally, and it will also be automatically as soon as this test is enabled.


The second issue is:
- the callback must be implemented in C++
- this means we need a binary C++ component
- we are using chrome mochitest, because xpcshell cannot be used,
  because SSL testing is not implemented (according to Honza)
- this means the component ends up in dist/bin/components
- this is bad, because we obviously don't want to ship 
  a binary test component in release builds :(
Comment on attachment 534455 [details] [diff] [review]
C++ test component (v c1)

Review of attachment 534455 [details] [diff] [review]:
-----------------------------------------------------------------

In this test, we should also include a sanity check that the correct cert chain is provided to the callback--e.g. by comparing the distinguished names and/or keys of the certs in the passed-in chain to expected values. Also, that chain should include at least three certs (EE, intermediate, and CA).
(In reply to comment #84)
> 
> - because this code is active only with non-default prefs,
>   we cannot test it by default
> 
> - the automated test only runs if prefs set
>     security.use_libpkix_verification = true
>   prior to startup
> 
> Is that sufficient?

I concluded it's not sufficient.
I changed the code to allow this pref to be changed at runtime.
Attached patch Patch v14 (obsolete) — Splinter Review
This replaces v7.
This is not testing code, but extended code for PSM.
It allows to switch security.use_libpkix_verification at runtime.
Attachment #534435 - Attachment is obsolete: true
Attachment #534435 - Flags: superreview?
Attachment #534435 - Flags: review?
Attached patch C++ test component (c14) (obsolete) — Splinter Review
Unfortunately, this still has the binary module in bin/ and bin/components and mentions it in bin/components/components.manifest

We haven't (yet) been able to make it work without living there.
This contains updated certificate databases, that allow our test certificate for 4 additional hostnames, as being used in testing.

Our test scripts use the hostname as an agreed value, the component fails/succeeds based on the agreed hostnames.
Attachment #534576 - Attachment description: certificate and hostname infrastructure db14 → certificate and hostname infrastructure (db14) [made by Honza]
I would like to thank Honza for jumping in and being of wonderful help today.
If I understand correctly, all of the remaining work is to get tests produced, that are acceptable.

I would like to ask for contributions from Mozilla, to get this rather complicated setup done (a binary test component that is available from mochitest chrome, but is not being shipped). I have no experience to get this done.

Because of this I'm removing myself as the bug assignee.

I will gladly help out if you need additional code changes in PSM.
Assignee: kaie → nobody
Attached patch FINAL - v14 (obsolete) — Splinter Review
This is a final patch that works for me locally 
- builds after clobber
- the testing C++ component is not distributed (is registered online)
- preference for PKIX auth is set (and re-set) at runtime just for the tests
Assignee: nobody → honzab.moz
Attachment #534603 - Flags: review?(bsmith)
You can set NO_DIST_INSTALL=1 in a Makefile to ensure that your component doesn't wind up in dist/bin/components. You can then just copy it to the Mochitest folder in your Makefile where you copy the test files, and you ought to be able to register it using the nsIComponentRegistrar APIs.
(In reply to comment #94)
> You can set..

Exactly all what we do, thanks.
Attached patch v14 - v15 IDIFF (obsolete) — Splinter Review
Based on bsmiths IRC review.
Attachment #534112 - Attachment is obsolete: true
Attachment #534455 - Attachment is obsolete: true
Attachment #534574 - Attachment is obsolete: true
Attachment #534575 - Attachment is obsolete: true
Attachment #534576 - Attachment is obsolete: true
Attachment #534577 - Attachment is obsolete: true
Attachment #534641 - Flags: review?(bsmith)
Honza and I agreed to remove advancedNotification since it didn't get called, wasn't tested, and wouldn't be called from the DNS (pre-)resolution code like it is eventually intended to be. Also, Honza removed the redundant cert parameer from the finalVerify function as I requested. These changes are in v15.

The most serious problem was that in v14, when any callback's preInternalVerify returned DONE, *all* callbacks' finalVerify calls were skipped. Instead, only the finalVerify call for the callbacks that returned DONE from preInternalVerify should get called. Honza also made this change in v15.

Otherwise, the diff beween v14 and v15 is bug fixes found during the v14 review.

v15 introduces some problems: if we connect to a host by email address, then the "hostname" parameter of preInternalVerify callback will be the same as the "ip" parameter, instead of NULL. I propose that we leave it this way, rename "hostname" to "hostnameOrIP" in the IDL, change the documentation to indicate that "hostnameOrIP" is either the hostname or IP address, "ip" will always be the (IPv4 or IPV6) IP address, and if we don't have a hostname, then the "hostname" parameter equals the "ip" parameter. Otherwise, callbacks will not be able to use the IP address as a factor during validation (e.g. to verify the hostname & IP address match according to DNSSEC-signed A/AAAA records) when a hostname is used to connect.

Also, the documentation for the certChain parameter of finalVerify didn't get updated. It should be "certChain: The first element is the end-entity certificate, the last element is the root certificate."

Also, there is this code in nsNSSCallback, which does a null check after using the pointer:
    ...
    registeredAuthCallbacks->Count(&count);
    if (registeredAuthCallbacks)" {
    ...

I will fix this before check-in.

Honza pushed v15 to try and it broke (as of now) Linux and Mac builds. I am looking into it: http://tbpl.mozilla.org/?tree=Try&rev=376de52ae6ae
Here is Honza's patch. Follow-up coming.
Attachment #534603 - Attachment is obsolete: true
Attachment #534641 - Attachment is obsolete: true
Attachment #534603 - Flags: review?(bsmith)
Attachment #534641 - Flags: review?(bsmith)
Attachment #534660 - Flags: review+
Few things:

1) I'm not too interested in patches that evaluate the URL.  The problem is that SOP is linked to the domain name specifically; there's way too much risk of a plugin letting https://foo/HASH1 script against https://foo/HASH2.  Sorry Zooko.
2) I have no personal interest in IP validation, but if it's convenient to send to the plugin I'm OK with that.  Won't this eventually cause messes with DNS pinning?
3) I was about to say I can't build HASTLS/HSTS-over-DNSSEC without a PENDING state, but if there's one thing I should be able to do with an extension it's hijack and block all outbound HTTP :)
4) Yeah, we need the entire chain.
Comment on attachment 534660 [details] [diff] [review]
v15 - needs follow-up patch

Review of attachment 534660 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/manager/ssl/src/nsNSSCallbacks.cpp
@@ +1011,5 @@
> +        auth_override->GetRegisteredCallbacks(getter_AddRefs(registeredAuthCallbacks));
> +      }
> +      
> +      PRUint32 count;
> +      registeredAuthCallbacks->Count(&count);

These two lines must go inside the following if (registeredAuthCallbacks) statement.

@@ +1020,5 @@
> +            ip.SetCapacity(64);
> +            PR_NetAddrToString(&peeraddr, ip.BeginWriting(), 64);
> +            ip.SetLength(strlen(ip.BeginReading()));
> +          }
> +        }

The check for hostnameUTF16.IsEmpty() is wrong. hostnameUTF16 will never be empty; when the connection is done with an IP address instead of a hostname, hostnameUTF16 will hold the string representation of the hostname. We need to remove the "if (hostnameUTF16.IsEmpty())" condition so that we always initialize "ip" and we should update the documentation for the callback; see my previous comment in the bug.

@@ +1033,5 @@
> +          if (cb) {
> +            PRUint32 verificationResult;
> +            PRBool treatDecisionAsBinding;
> +            PRUint32 flags;
> +            PRBool retval_done;

These need to be initialized to sane values, in case the PreInternalVerify function doesn't set them.

@@ +1049,5 @@
> +                                         &flags,
> +                                         nsnull,
> +                                         &retval_done);
> +            
> +            if (NS_SUCCEEDED(nsrv) && retval_done) {

It doesn't make sense to silently ignore failures if the user is counting in the callback for his safety. We must stop and fail validation if the callback failed.

@@ +1065,5 @@
> +                // Store the exact error code, or even multiple errors
> +                // Change the nsNSSBadCertHandler to check for the non-nss-verification-erro,
> +                // if present, don't attempt retrieve error status from NSS
> +                // But for now, let's simply say REVOKED in all scenarios
> +                PR_SetError(SEC_ERROR_REVOKED_CERTIFICATE, 0);

We shouldn't enable this mechanism on Aurora until we have some sane way of reporting errors. "Revoked" has a very specific meaning. During testing even I was confused by the "revoked" certificates that weren't really revoked. We need to design some sane error reporting.

@@ +1066,5 @@
> +                // Change the nsNSSBadCertHandler to check for the non-nss-verification-erro,
> +                // if present, don't attempt retrieve error status from NSS
> +                // But for now, let's simply say REVOKED in all scenarios
> +                PR_SetError(SEC_ERROR_REVOKED_CERTIFICATE, 0);
> +                rv = SECFailure;

There needs to be a break here.

@@ +1139,5 @@
> +                                   &treatDecisionAsBinding,
> +                                   &flags,
> +                                   getter_AddRefs(additionalFeedback));
> +            
> +            if (NS_SUCCEEDED(nsrv)) {

Again, we shouldn't be silently ignoring failures.

@@ +1155,5 @@
> +                // But for now, let's simply say REVOKED in all scenarios.
> +                // (We should find a way to store the full results of our verification
> +                // and avoid to repeat it inside nsNSSBadCertHandler)
> +                PR_SetError(SEC_ERROR_REVOKED_CERTIFICATE, 0);
> +                rv = SECFailure;

There needs to be a break here.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +3438,5 @@
>        cvout[1].type = cert_po_end;
>  
> +      // TODO: Well, now that we're doing lots of additional checks in
> +      // PSM_SSL_PKIX_AuthCertificate, this no longer covers all failure
> +      // scenarios.

File a follow-up bug to address this TODO comment and include the bug number in the comment.

::: security/manager/ssl/tests/authhook/Makefile.in
@@ +49,5 @@
> +MODULE = sslauthhook
> +XPIDL_MODULE = sslauthhook
> +MODULE_NAME = sslauthhook
> +CPPSRCS	= AuthHookComp.cpp nsTestAuthHook.cpp
> +NO_DIST_INSTALL = 1

In order to get this to build, I had to add:

IS_COMPONENT    = 1
FORCE_SHARED_LIB = 1

@@ +59,5 @@
> +EXTRA_DSO_LDOPTS = \
> +  $(DIST)/lib/$(LIB_PREFIX)xpcomglue_s.$(LIB_SUFFIX) \
> +  $(XPCOM_LIBS) \
> +  $(NSPR_LIBS) \
> +  $(NULL)

I had to change this to:

EXTRA_DSO_LDOPTS = \
  $(XPCOM_STANDALONE_GLUE_LDOPTS) \
  $(MOZALLOC_LIB) \
  $(NSPR_LIBS) \
  $(NULL)

::: security/manager/ssl/tests/authhook/nsTestAuthHook.cpp
@@ +27,5 @@
> +                                                nsISupports **additionalFeedback NS_OUTPARAM, 
> +                                                PRBool *_retval NS_OUTPARAM)
> +{
> +    *_retval = PR_FALSE; // not done, all other out params will be ignored
> +    return NS_OK;

This must cause a verification failure for pre-fail.authhooktest.com.
Attachment #534660 - Flags: review+ → review-
When we started implemented this, it seemed quite straightforward. But, I tried to write a useful extension that did something reasonable using this interface, and I found it to be very confusing. Also, given the number of bugs we've found, it seems like it is quite difficult to correctly implement even the subset of the interface we've tried to implement so far, even before we consider how the extension will report errors. I think we need to create a simpler and more complete design, make sure secteam will not shoot it down in security review, and make sure we have somebody allocated to implement it before we (I) continue working on this further. Especially, I want to ask secteam's opinion about what we should do when the extension fails with an error.
brian: fair enough :-)

Thanks very much to kai, honza, brian, dveditz and everyone else who was involved in the push yesterday. Your efforts, and willingness to lose sleep, are very much appreciated :-) 

I hope we'll be able to design and build the better version in time for Firefox 7.

Gerv
Something is better than nothing here.  If the complexity of this patch is out of control from feature creep, then it's OK to cut stuff.  Here's prioritization:

P0:
   1. Plugin is called on HTTPS verify
   2. Plugin receives hostname
   3. Plugin can return:
      a) Force accept
      b) Force reject
      c) Cede to legacy verification
      d) Error
   4. Error renders at least a generic error page, possibly with option to use to bypass.

P1:
   1. Plugin receives the leaf cert
   2. Legacy verification still executed after plugin, to determine whether to mark with EV
   3. Plugin is called before an actual navigation event occurs
   4. Plugin can return an explicit error page
   5. Plugin is called in advance of actual retrieve, triggered on DNS prefetch

P2:
   1. Plugin receives additional information -- IP, Port
   2. Plugin receives the full chain, as the browser would see it

P3:
   1. Plugin can authenticate an email address?  (Huh?)
   
I specifically do not want the plugin seeing the full URL path, as that drags in the entire same origin policy problem.  We can't solve that yet.

Is there any chance we can cut the existing code to implement just the P0s, and maybe some of the P1s?
I stick to my opinion that your decision to fold the cert into the cert chain is unnecessarily complicating this bug.

I looked at the declaration of the functions that return chains from NSS. I could not find any promises about the order of certs inside the array.

I conclude that your interface shouldn't promise the order, unless you do one of the following:

(a)
- verify that NSS indeed returns the EE cert as the first element in returned
  cert lists.
- add comments to the NSS code about this promise,
  reminding future developers to never break that promise
- ensure that you do this before you deply the code from this bug

or

(b)
- enhance this patch to assert that the EE is indeed the first one in
  the list. If it's not, you must sort the array, prior to passing it over.


Also, the comment for parameter certChain does no longer match your new desired semantic, please fix that.
(In reply to comment #99)
> 4) Yeah, we need the entire chain.

Dan, for clarification, which chain are you interested in?

Are you interested in

(a) the chain or full list of certs that was sent by the
    server during the handshake

or

(b) the chain that PSM assumes to be correct,
    after PSM passed its own verification code?

The API currently will pass (b), not (a).
(In reply to comment #104)
> I stick to my opinion that your decision to fold the cert into the cert
> chain is unnecessarily complicating this bug.

Perhaps it would help to make a clear distinction between what would be _useful_ and what is _implementable_. Or, to put it another way, between requirements and then code-based reality.

We should listen to the people who want to use the interface, and then design something which is suitable for their needs - an end goal. Then, we look at whether it's possible, how difficult it is and, if we can't get there all in one go, what subset can we implement first which gives as much "bang for the buck" as possible, while still allowing us to move forward to the full design later.

The fact that NSS doesn't promise the order of certificates isn't relevant to the question about whether it would be _useful_ to have the certificates in order. If it would be useful, but it's hard to do, then perhaps the design would say "certificates in order", but the first version would have a note which said "certificates are currently not returned in order; you have to order them yourself; we hope to remove this limitation in a future version".

I think this bug isn't a great place to be hashing out an interface design. Perhaps someone would like to write up a proposal on the wiki, and then post about it in mozilla.dev.security for people to review?

Gerv
I can't speak for Dan or anyone else, but for the uses EFF is planning to make of this API, we will definitely want (b) and not (a).  But in the best of all worlds, of course, extensions would be able to get either (a) or (b) :).
Yikes, I actually got that exactly backwards: we will want (a), the full chain of certificates as presented by the server.
Mozilla Corporation has decided to focus our immediate efforts on other things (e.g. bug 589537). Right now there isn't anybody available (AFAICT) to implement or review patches for this, until at least after USENIX 2011. I am planning to have something good to report around USENIX time, though.
OK, so in the mean time let me explain two reasons why we want the full chain of certificates as presented by the webserver:

1. For the Decentralized SSL Observatory

https://mail1.eff.org/pipermail/https-everywhere/2011-July/000975.html

We want to be able to detect man in the middle attacks like this one from within extensions:

https://www.eff.org/deeplinks/2011/05/syrian-man-middle-against-facebook

without the user having to click through and accept the malicious certificate before we get a callback.

2. For implementing cross-signature mechanisms inside X.509 chains.  

We believe that both the CA ecosystem and DANE/DNSSEC will remain vulnerable to various attacks forever, and the best way to protect high-security sites may be some variant of "certificate pinning" or cross-signatures on the X.509 chain from a key that is known to belong to the operators of this particular site.

(One way to get such a key is discussed here https://www.ietf.org/mail-archive/web/websec/current/msg00281.html but there are others)

The best way to cross-sign an X.509 chain is probably to have a special magic cert in the chain that contains the cross-signature.  But in order for extensions to check this cross-signature, they will need to be able to see the whole chain, regardless of what parts of it NSS thought were valid/relevant.
I'd really like an API like this for simplifying the Convergence (http://convergence.io) implementation.  As it is, I had to jump through some pretty prolific hoops in order to replace the browser's default certificate validation with my own.

It seems to me that the browser is already calling SSL_AuthCertificateHook, and that it'd be fairly straightforward to replace the SSL_AuthCertificate callback prototype with a facade that just distributes that callback across some number of registered listeners.  I can try my hand at a patch, unless Mozilla is fully committed to not supporting an API like this.

There is a lot of distrust about DANE, and on a panel about Convergence and trust agility at Defcon this weekend, Whit Diffie agreed that leveraging DNSSEC is not a good idea for SSL authenticity moving forward.  It seems like it would make sense for Mozilla to hedge by supporting an interface that could allow for multiple solutions.
Whiteboard: [sg:want]
bsterne made a good point regarding nsIContentPolicy: we explicitly decided to limit the types of requests that we allow nsIContentPolicy to block. AFAICT, then, it should be possible to use nsIContentPolicy to implement checks that can be done on top of our normal certificate validation. If it isn't, then let's file new bugs for those additional capabilities.

Regarding the ability to disable or replace our normal certificate validation, see bug 686095.
The argument here is similar to the "restrictive or additive" argument which has gone on in the DANE WG for some time.  I think we want to allow extensions to make changes in both directions - that is, both "don't connect to this site that you otherwise would have" and "do connect to this site that you otherwise wouldn't have".  The former offers immediate security benefits to our users (e.g. protecting them from future DigiNotar-like fiascos) and can't realistically do any harm.  But the latter is what will facilitate experimenting with *new* site-authentication models that might *replace* the CA mess.  We need both.
Linking to a thread on mozilla.dev.security.policy where there's discussion of a desire for enterprises to be able to control browser trust decisions:

<https://groups.google.com/group/mozilla.dev.security.policy/browse_frm/thread/aa097c6d99fcf4eb/070b722788eb1fb2?lnk=gst&q=enterprise+advice+replace+the+root+list#070b722788eb1fb2>
Status: NEW → ASSIGNED
See Also: → 692248
Assignee: honzab.moz → nobody
Status: ASSIGNED → NEW
Assignee: nobody → pde
I'm going to get started on dev docs for a full trust control API in the next week or two.
(In reply to Peter Eckersley from comment #115)
> I'm going to get started on dev docs for a full trust control API in the
> next week or two.

any updates?
Yes.  A few of us sat down in January to try to figure out how this API should work, and I now have the task of writing up what we concluded.  I started that task on Thursday, but it's not done yet.
(In reply to Peter Eckersley from comment #117)
> Yes.  A few of us sat down in January to try to figure out how this API
> should work, and I now have the task of writing up what we concluded.  I
> started that task on Thursday, but it's not done yet.

I'm curious to hear if there has been any progress on this.
Way overdue, but here are draft dev docs for this extension API:

https://hackpad.com/PTL7OH5myvk#Draft-TLS-Trust-extension-API-for-Mozilla

they should be world-editable.
(In reply to Peter Eckersley from comment #119)
> Way overdue, but here are draft dev docs for this extension API:
> 
> https://hackpad.com/PTL7OH5myvk#Draft-TLS-Trust-extension-API-for-Mozilla
> 
> they should be world-editable.

Why are you considering that some are going to say "bad" and some are going to say "good", and when one says bad and one says good you default to bad?

This destroys ALL useful capacity from this proposal.  The entire point behind multiple validators is to permit on any good, rather than prevent on any bad.  This is to permit one single non-modifiable piece of evidence and its proof to have multiple potential formats, not to mandate that one single piece of evidence CANNOT pass ANY validators because they cannot pass ALL validators.

For example, I've got a format that will give a "self-signed certificate in chain" error under Mozilla's validator, but which pulls out multiple chains and evaluates them independently.  It is specified so that if any one is valid, the entire set is considered acceptable even if any one or more of the specified chains is signed by a CA which is considered untrustworthy.

The practical upshot: I can't use the instant proposal to implement this format with this mandate.  Thus, it is useless, and I vote NAY.
(Example why the proposed limitation is bad)

Alice has a certificate chain, current format.  Bob goes to Alice's website, and checks the cert chain.  It validates.

Next, Bob installs CertValidator3000 extension, which adds a validator that will not pass current standard cert chains.

Bob goes back to Alice's website.  MozValidator says it's valid.  CertValidator3000 says it's not.

The proposal is that (VALID & INVALID) shall be considered invalid.  This means that the instant a second validator is added which doesn't pass current chains, *no chains can pass*.  This shall do nothing but cause more damage to the idea of TLS as a useful or even remotely effective consumer-level protection, when it's the consumer market which most needs protection.

If this can't be safely implemented without CSP, then make CSP block this bug.
(In reply to Kyle Hamilton from comment #120)

> This destroys ALL useful capacity from this proposal.  The entire point
> behind multiple validators is to permit on any good, rather than prevent on
> any bad.  

Actually, the API should support both kinds of extensions.  For instance, the Decentralized SSL Observatory is an extension that can return INVALID for otherwise-valid certs because we have detected cryptographic problems with the keys contained in those certs.  The API should support that as well as something like Convergence which can report that some self-signed certs are VALID.

But I think you are right that there is a problem with the way the first-draft API suggested replacing the built-in validator, that will need to be resolved.  It goes like this:

1. Suppose we wish to write an extension that knows that some set X of self-signed certs are actually trustworthy.  

The way the first draft suggests implementing this is by un-registering the builtin nsICertificateValidator, and implementing an alternative nsICertificateValidator whose algorith is: if cert is in X return VALID else call the default builtin nsICertificateValidator manually.

2. The problem is that two such extensions cannot coexist.  They are clever enough to disable the built-in nsICertificateValidator, but not clever enough to disable each others' internal calling though to that built-in validator, which always returns INVALID for all self-signed non-roots. 

I am going to try to revise the draft to fix this issue, while still meeting bsmith's objective easy flagging of additional AMO review for extensions which return VALID for anything the builtin validator wouldn't have trusted.
I added additional clarification about the way the first draft was intended to work:

> In addition to the VALID and INVALID responses, Validators can return STAND_ASIDE.  Validators should only return INVALID if they have an active reason to warn the user about some certificate (a clear example would be that the extension knows this key is cryptographically insecure; a more contentious example is that the extension knows this is an abnormal CA to be signing for the domain in question).  If they have no strong reason to distrust a certificate chain, they must return STAND_ASIDE.

Not that the problem I flagged in my previous comment still exists, and will need an actual change to the APIs semantics.
Okay, I've updated the draft to version 2.  It now has an explicit flag which allows an extension to say that its VALID responses should override the builtin validator's INVALID responses.  This will allow additional AMO scrutiny.

It remains the case that if the user installs multiple extensions, some say VALID and others say INVALID, they will experience a warning.  This might be the wrong outcome in some cases, but the opposite is also going to be wrong in some cases.
There's a separate interesting question, which is perhaps worth thinking about but hard to address in this API, about the different reasons and severities of an INVALID response.

In one case, an extension that is warning about something that is likely to be a man-in-the-middle should return INVALIDs that are louder than everything else.  The user definitely needs to get a warning or some other UI response in those cases.

But there are other cases in which an INVALID just means that the connection is less secure than HTTPS normally is, but still more secure than HTTP.  A 512 bit RSA key, or a remotely factorizable key, etc etc, could be a reason for an extension to issue such INVALID responses.

Perhaps the API could allow extensions to return VALID, INVALID, STAND_ASIDE or INSECURE, with INSECURE being used for things like weak keys that are still preferable to HTTP, but which the site operator needs to know about and fix.  The semantics of the two could initially be the same, but one day Firefox might want to build different UI on top of them (perhaps copy Chrome's crossed-out https, with an explanatory rollover, or restore the broken lock icon, which should probably never have gone away).
Having mulled over this a bit, I think a fourth return value (INSECURE, which causes mixed-content style changes to the HTTPS UI decorations) makes sense.  We might use this for the Decentralized SSL Observatory warnings about insecure keys, or give users a setting about which kind of warning method they prefer.

It could also be used to implement extensions that downgrade certain classes of cert warnings (recent cert expiry? www.example.com cert for example.com?) to UI decoration warnings.

I've added this and bumped the draft version to 3.
I had some productive discussions with Ryan Sleevi about these kinds of APIs over the weekend, raising at least one point that's relevant here:

One use that HTTPS Everywhere will make of this API if/as soon as it is available, is to implement workarounds for known important servers that speak TLS, but have the wrong domain names in their certificates.  The HTTPS Everywhere source contains numerous off-by-default rulesets for servers that have these kinds of errors because they are hosted on Akamai and other CDNs.  For instance:

https://gitweb.torproject.org/https-everywhere.git/blob/HEAD:/src/chrome/content/rules/ACM.org-mismatches.xml
https://gitweb.torproject.org/https-everywhere.git/blob/HEAD:/src/chrome/content/rules/AOL-mismatches.xml

There are hundreds of rulesets like this; see all the files with -mismatches in their names here:

https://gitweb.torproject.org/https-everywhere.git/tree/HEAD:/src/chrome/content/rules

Implementation of this API should allow HTTPS Everywhere rulesets to contain something like <target host="deliveryimages.acm.org" accept_cert="*.akamaihd.net">.  However we should make sure that the API offers a sane way for HTTPS Everywhere to re-invoke the builtin validator with a different hostname, so that we aren't going to accept an /invalid/ akamai cert for deliveryimages.acm.org.

My current reading is that the v3 API won't need to be changed to make this work, but it will be necessary for extensions to be able to obtain a reference to the builtin validator in order to be able to call it themselves with altered parameters (in this instance, a different domain name).
Another interesting and more hypothetical use of the API would be an extension that replaces cert warnings due to expired certificates with a downgraded UI experience for those domains (Chrome's crossed-out HTTPS would be ideal, any chance that Firefox could implement that? ;).

Such an extension would need to replace the builtin validator's INVALID response with a less-severe INSECURE response.  I therefore revised the proposed API to v4, making it clear that the extension's response trumps the builtin one in that case (if overrideBuiltin=true at registration of course).
Peter, see bug 628312
I think  this service should NOT allow overrides to addons.mozilla and aus*.mozilla. The addon could mistalenly prevent connections to update Firefox or even itself by mistake. 

Worse than that if the addon has a security failure (or the data from where it calculates trust) it would mean it could leverage this to  install arbitrary software on the users machine.
(In reply to Camilo Viecco (:cviecco) from comment #130)
> I think  this service should NOT allow overrides to addons.mozilla and
> aus*.mozilla. The addon could mistalenly prevent connections to update
> Firefox or even itself by mistake. 

Good point.  bsmith and I had previously agreed that that was the right thing to do; I've now revised the draft to v5 to make that clear:

https://hackpad.com/PTL7OH5myvk#TLS-Trust-extension-API-for-Mozilla-%28Draft-Version-5%29
(In reply to Peter Eckersley from comment #131)
> (In reply to Camilo Viecco (:cviecco) from comment #130)
> > I think  this service should NOT allow overrides to addons.mozilla and
> > aus*.mozilla. The addon could mistalenly prevent connections to update
> > Firefox or even itself by mistake. 
> 
> Good point.  bsmith and I had previously agreed that that was the right
> thing to do; I've now revised the draft to v5 to make that clear:
> 
> https://hackpad.com/PTL7OH5myvk#TLS-Trust-extension-API-for-Mozilla-%28Draft-
> Version-5%29

As an organization that as been repeatedly targeted by CA compromise and/or state-supported CA-endorsed MITM, the Tor Project disagrees here.

So long as the certs used for addons and browser updates are not pinned to the *specific* leaf cert or to the hash of a public key created by Mozilla itself, we believe that addons striving to improve the CA model should also be allowed to approve or deny update version pings and downloads of either addons or the browser.

If bug 744204 changes the pinning situation for browser updates and addons before this API is implemented, our opinion might change on this specific point.

However: either way, the browser should *always* loudly alert the user if update downloads are failing while other network activity is still succeeding (because any form of update failure could represent a hold-back/"freeze" attack by an MITM intent on exploiting vulnerabilities in the non-updated browser or addon).
> So long as the certs used for addons and browser updates are not pinned to the
> *specific* leaf cert or to the hash of a public key created by Mozilla itself

Totally agreed.
This is exactly what I argued back in bug 471672, and eventually filed bug 471779 about this.
Bug 643461 is about addon updates and bug 644403 for addon installs.
(In reply to Mike Perry from comment #132)
> So long as the certs used for addons and browser updates are not pinned to
> the *specific* leaf cert or to the hash of a public key created by Mozilla
> itself, we believe that addons striving to improve the CA model should also
> be allowed to approve or deny update version pings and downloads of either
> addons or the browser.

Like I commented in all of those bugs, we should be using digital signatures that are restricted to certificates issued by the Mozilla-owned-and-operated root that we're already using for packaged web apps distributed through the Firefox Marketplace.

> However: either way, the browser should *always* loudly alert the user if
> update downloads are failing while other network activity is still
> succeeding (because any form of update failure could represent a
> hold-back/"freeze" attack by an MITM intent on exploiting vulnerabilities in
> the non-updated browser or addon).

I agree that this is important and that it is missing in Firefox today. Do you know if there is an existing bug on doing that? If so, could you CC me? If not, could you please CC me on a new bug about that? Thanks!
On a side note, someone once pasted me a link to some extension code that would control certificate warnings using existing XPCOM interfaces.  This was a while ago, and unfortunately, I can't find that URL anymore.  So if anyone on this bug knows of such a thing, please send it again!
(It's not pretty code. But it shows you the XPCOM IDL, and you can look for more examples.)
Peter, is this still something you're interested in?
Flags: needinfo?(pde-lists)
David, I think the chances we would use it are lower now. In terms of working around known invalid certs (say in HTTPS Everywhere), the successful launch of Let's Encrypt and the wider adoption of SNI probably reduces demand somewhat. And in terms of warning about weird ones, we're going to put our efforts into getting a clean Web Extensions API to get access to the certs and other connection metadata (it won't be a blocking API though, so in some cases we'd be warning after an attack had succeeded).

Roland Shoemaker has been looking into that API.
Flags: needinfo?(pde-lists)
I'm currently working on implementing the webRequests API extension in Chromium to provide per request SSL metadata. Once it has been finalized and the API is stable enough I'll start work on a changeset that implements the same in Firefox if someone doesn't beat me to it.
Ok - thanks for the feedback. Sounds like this particular route is a wontfix.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: