mozilla::pkix spends too much time attempting to build a valid path when there are many possible paths

NEW
Unassigned

Status

()

Core
Security: PSM
3 years ago
2 months ago

People

(Reporter: Michael Newton, Unassigned)

Tracking

32 Branch
All
Other
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [psm-backlog], URL)

Attachments

(4 attachments)

(Reporter)

Description

3 years ago
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.78.2 (KHTML, like Gecko) Version/7.0.6 Safari/537.78.2

Steps to reproduce:

Attempt to load web interface of a pfSense router using default (self-signed) HTTPS certificate


Actual results:

Page hangs on loading, using 100% CPU, process has to be terminated


Expected results:

Certificate warning
(Reporter)

Comment 1

3 years ago
Created attachment 8476258 [details]
Sample taken during hang with Activity Monitor
(Reporter)

Comment 2

3 years ago
I don't have an installation open to the internet for testing against, but if this isn't a known issue and testing needs to be done, I can set one up. Setting security.use_mozillapkix_verification to false allows successful page load (with security warning.)

See also https://forum.pfsense.org/index.php?topic=79656
Can you please attach the pfsense certifcates being used? (email is OK if you want to keep them private)
Flags: needinfo?(miken32)
Having a public test installation would also be useful, if possible.
(Reporter)

Comment 5

3 years ago
Created attachment 8476332 [details]
Default pfSense certificate and key

Here is the default certificate; I will work on getting a test box with public internet access tomorrow.
Flags: needinfo?(miken32)
The cert contains CA:True, which means we should talk to pfsense to modify the way the generate their cert.
(In reply to Camilo Viecco (:cviecco) from comment #6)
> The cert contains CA:True, which means we should talk to pfsense to modify
> the way the generate their cert.

why is the cert verification spinning rather than just failing?
(Reporter)

Comment 8

3 years ago
Test box is now online at https://184.66.147.162/
Hi Michael - thanks for setting that up. However, I can't connect to that IP (using firefox or otherwise). Is there a firewall that needs to open a port or something?
Flags: needinfo?(miken32)
(Reporter)

Comment 10

3 years ago
Allow all rule has been added to the WAN port, sorry about that.
Flags: needinfo?(miken32)
Michael, I can connect now (using Firefox 32), but all I see is the untrusted connection page. After adding an override, I can visit the site. Is this still an issue?
Flags: needinfo?(miken32)
(Reporter)

Comment 12

3 years ago
Yes, still experiencing the problem on Firefox 33 beta 1, starting with all addons disabled. I will try with a fresh profile to confirm it's still happening in that case as well.

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:33.0) Gecko/20100101 Firefox/33.0
Flags: needinfo?(miken32)
(Reporter)

Comment 13

3 years ago
Ok I have an update on this (and apologies for not testing this before filing!) The problem seems to be happening because I already had accepted the certificate previously. Once I deleted that certificate from my database I was able to access the page, and accept the security certificate. The problem doesn't recur, though the certificate is back in the database.

I have tried creating a new profile in FF 23 and 18, accepting the certificate, and then loading it up in FF 33, but cannot reproduce the problem. However by copying over my existing cert8.db file to a new profile I can reproduce the problem.

I'd be inclined to close it as WORKSFORME and write it off as a corrupt file, except for the fact that it seems to be affecting lots of other pfSense users. I don't want to attach my cert8.db to the report, but please confirm I can email it to dkeeler for testing and I will do so.
Flags: needinfo?(dkeeler)
You can email your cert8.db to me if you feel comfortable with that. My understanding is it will have a list of every intermediate certificate you've encountered while browsing as well as root CAs you've added manually. It will also have the public half of any client certificates you have (the private keys should all be in key3.db, so don't send that, of course).
Flags: needinfo?(dkeeler)
(Reporter)

Comment 15

3 years ago
Email has been sent, thanks.
I think I figured out why that profile takes forever to verify the certificate sent by that server. The server sends a certificate with issuer "E=Email Address,CN="Common Name (eg, YOUR name)",OU="Organizational Unit Name (eg, section)",O=CompanyName,L=Somecity,ST=Somewhere,C=US". There are 16 different certificates in that profile with that same issuer/subject. Since each of these certificates could have in theory issued the others, and since (it looks like) signature verification happens starting from the root backwards, mozilla::pkix basically explores something like every possible ordering of every 7-element subset of the 16 certificates, which is 7! * 16 choose 7 = 57657600. In other words, it's trying to do around 57 million signature verifications, which understandably takes some time. If we checked signatures the other way, I think we would prevent this, but I have an inkling it's not safe to do that for some reason (although I'm not sure why, since intuitively if every signature is checked, it ends up being the same either way).
Brian, unless I'm misunderstanding pkixbuild.cpp, it seems like building a chain without first verifying signatures like we're doing is at least inefficient, if not just wrong. In particular, things like comment 16 happen. Also, pkixbuild.cpp will call trustDomain.IsChainValid on a chain that where the signatures haven't actually been validated (although it will validate them before returning an ultimate success). Can you shed some light on this? Thanks.
Flags: needinfo?(brian)
(In reply to David Keeler (:keeler) [use needinfo?] from comment #17)
> Brian, unless I'm misunderstanding pkixbuild.cpp, it seems like building a
> chain without first verifying signatures like we're doing is at least
> inefficient, if not just wrong.

The signature verification is the most expensive check, so we try to do it after we've verified everything else is OK. Consider:

     - B <- C <- D
    /
A <--- E <- F <- G
    \
     - H <- I <- J


Let's say that only J is a trusted CA. If you do the signature verification on the way up the tree, then you will do 9 signature verifications. If you do signature verification on the way down the tree, then you will only do 3. 3 < 9.

> In particular, things like comment 16
> happen.

> Also, pkixbuild.cpp will call trustDomain.IsChainValid on a chain
> that where the signatures haven't actually been validated (although it will
> validate them before returning an ultimate success).

Again, this is a good thing, right? Because if you reject a key as "not pinned" then you're trading an expensive signature verification for a cheap hash comparison.

Consider this:

A <- A <- A <- A <- A <- A <- A <- A

You have 8 certificates, all with the same subject name. Further, only one of them (the last one) is a trust anchor. If you try really hard, it is possible to put the certificates in the right order so that you get a valid chain. But, trying really hard takes a lot of time, as you saw.

Obviously, it isn't good to have an O(n!) algorithm. But, in fact, there really are O(n!) possible chains sometimes, and sometimes one (and only one) is a valid chain. So, if we want to guarantee that you spend less than O(n!) time in the worst case, we'd need some kind of heuristic.

Anyway, I agree this is a bug. We shouldn't make the normal, good case of a certificate that chains to a trusted CA significantly slower in favor of more quickly handling self-signed certificates, but we should find a way to better handle this case. Not sure it should be a high priority, though.
Flags: needinfo?(brian)
See the discussion here:
https://forum.pfsense.org/index.php?topic=82295.0
OS: Mac OS X → Other
Hardware: x86 → All

Comment 20

3 years ago
We at pfSense have committed changes for the next pending version that will make new certificates in a more proper and unique way so they will work better for new deployments or for those who manually regenerate the self-signed certificate (no CA flag, server set, a unique identifier in the CN field, generated using the configured hostname and so on.)

However, that won't help the very large number of installations already out there with the less-than-ideal certificates that overlap and tickle this bug (well over 200,000), not to mention other products and sites that may have similar dodgy self-signed certificates. Now that the ability to disable PKIX has been stripped out of Firefox, we have to tell people hitting this bug to use other browsers that don't have issues processing the certificates aside from the usual warnings, such as Chrome. At least in the current version of Firefox the CPU usage dies down once the offending tab is closed. 

I feel this should be given some priority if at all possible.
Duplicate of this bug: 1147544

Updated

2 years ago
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 22

2 years ago
There are test services in bug 1147544, which David has marked as a duplicate of this one.

Comment 23

2 years ago
(I'm making a comment regarding the case in duplicate bug 1147544, I haven't checked if it's exactly the same scenario.)

After a discussion amonst several NSS developers, the recommendation is to stop searching for other issuer certificates if the normalized subject and issuer names of the certificate match.

In all non-pathological scenarios, this should sufficiently indicate that it's a self signed certificate, and searching for potential other issuers is unnecessary.

If you would like to be extra careful, then in addition for checking for identical subject and issuer names, you could also check that the signature contained in certificate A was indeed created using the key contained in the certificate A, and if it was was, assume it's fine to stop searching for other issues.

It was mentioned that there might be other scenarios, but they are seen as impractical for the purposes of Internet PKI, and Firefox shouldn't need to support them.
There are multiple bugs here:

1. When Gecko stores certificate error overrides for a hostname, it stores the certificate for which it added the certificate error override, but it never cleans up any old certificates that were stored for previous certificate error overrides.

2. In general, PKIX path building is a combinatorial problem. Some heuristics is needed to reduce the combinatorial problem into one with a lower upper bound in performance. mozilla::pkix's heuristic is to limit the cert chain length, but this is not sufficient in some situations.

Because of #1, when cert error overrides are used for self-signed/issued certificates, #2 is triggered more commonly. However, self-issued certificates are not the only thing that can cause a large number of possible paths to be traversed. The proposed solutions mentioned in comment 23 would wallpaper over #2 for this most-common case, without actually solving the general problem. That doesn't seem like a good approach. Instead, we should find a solution that works for all cases of #2, that doesn't cause false positive failures for real-world use cases. Additionally, another bug for #1 should be filed, and that bug should be fixed.

(In reply to Kai Engert (:kaie) from comment #23)
> After a discussion amonst several NSS developers, the recommendation is to
> stop searching for other issuer certificates if the normalized subject and
> issuer names of the certificate match.

This would effectively remove the ability for any self-issued certificates to work in Gecko. That seems likely to cause real-world compatibility problems.

> If you would like to be extra careful, then in addition for checking for
> identical subject and issuer names, you could also check that the signature
> contained in certificate A was indeed created using the key contained in the
> certificate A, and if it was was, assume it's fine to stop searching for
> other issues.

This is better, but it will not be compatible with (hopefully) upcoming changes to how signature verification is done in mozilla::pkix, so I'd like to avoid it.

Keep in mind that an attacker that is interested in forcing this DoS situation could do so by using self-issued certificates that aren't self-signed, or by sending us certificates where none are self-issued. We need a solution that prevents this DoS attack.

A better solution would be to limit the total number of calls to IssuerChecker::Check() per invocation of BuildCertChain to a certain value. That would be analogous to libpkix's "if (numIterations++ > 250) PKIX_ERROR(PKIX_TIMECONSUMEDEXCEEDSRESOURCELIMITS);" check. See 597618 comment 2 and 597618 comment 7. Considering that the maximum cert chain length in mozilla::pkix is 8 (1 + MAX_SUBCA_COUNT + 1), 250 should be more than enough, and our experience with libpkix shows that 250 seems to be web compatible.
Summary: pfSense interface hangs with mozilla::pkix enabled → mozilla::pkix spends too much time attempting to build a valid path when there are many possible paths
CC'ing Wan-Teh since he fixed the same (or very similar, at least) bug in libpkix in bug 597618. 

The comment references above should be "bug 597618 comment 2" and "bug 597618 comment 7."
Duplicate of this bug: 1226203
Duplicate of this bug: 1201465
David, in the bug you just dup'd to this one, you said that this is an architectural issue that is unlikely to change. However, above, I mentioned a very simple workaround that is pretty easy to implement and which probably won't cause much harm.
Oh - for some reason I missed that the first time around. That sounds like a reasonable solution.

Comment 30

2 years ago
OK, so I can confirm that that problem arise with other self-signed certificates, see: https://bugzilla.mozilla.org/show_bug.cgi?id=1240548

So bad the title of the bug didn't let me find it when I was searching for existing bug before opening that one. And actually, I don't think anybody without a deep understanding of mozilla internals would search for "mozilla::pkix" and "path", but more tags like "self-signed certificate", "hang / stall", etc.

Updated

2 years ago
Duplicate of this bug: 1240548
Whiteboard: [psm-backlog]

Comment 32

10 months ago
Me too!

This bug happens very often here because I test an application that has such certificates.
Every time I do it, I known I have to open a chromium to access the application. With firefox it is just impossible.

Comment 33

10 months ago
Rather than just "me too!", can you attach a certificate chain that demonstrates this problem? Can you provide details about the product or CA that generates it? The most useful path to finding a good resolution is to include additional debug and diagnostic data about the problem, describing with as much detail as you can the environment that tickles it.

To date, this seems to be exceptional and only when some PKI products have taken long-known suboptimal paths, but if there is room for improvement, the NSS team is happy to explore.

Comment 34

10 months ago
Ryan,

I could attach, but it's very simple to reproduce:

- Generate a self signed single hostname certificate (like: webserver1, no full FQDN) and used it with a webserver. Point the browser to the webserver.
- Close the browser
- Generate another certificate, with the same data, same hostname: webserver1, put in on the webserver, point the browser to the webserver.
- Do this until the browser starts to hang when you try to load that website (I think 5-10 times would be enough).

Comment 35

10 months ago
That's not sufficient, nor does it reproduce. More likely, you're doing something like generating a self-signed cert with the same distinguished name and extensions, but a new key, but that's precisely why I pointed out the need for more info.

Good bugs get fixed.

Comment 36

10 months ago
(In reply to Claudiu N. CISMARU from comment #34)
> I could attach, but it's very simple to reproduce:
> 
> - Generate a self signed single hostname certificate (like: webserver1, no
> full FQDN) and used it with a webserver. Point the browser to the webserver.
> - Close the browser
> - Generate another certificate, with the same data, same hostname:
> webserver1, put in on the webserver, point the browser to the webserver.
> - Do this until the browser starts to hang when you try to load that website
> (I think 5-10 times would be enough).


Can you clarify how you generate the certificate?  I see you specify "self-signed", but there are several different ways to generate a certificate.  Alternatively, can you attach two certificates generated using the method you describe?  Please do not include the private keys, just the certificates.

Comment 37

10 months ago
Created attachment 8803436 [details]
dcert1.crt - cert from a device

Comment 38

10 months ago
Created attachment 8803437 [details]
cert from the second device

Comment 39

10 months ago
Uploaded 2 certs from 2 different devices. They are generated during the installing process.

Comment 40

10 months ago
These examples are very helpful.  Each if these is a self-signed CA certificate and each uses the exact same Distinguished Name.  They have different SPKI and SKID.

Are you importing all of these (one per device) as a trusted CA into Firefox?

Comment 41

10 months ago
Yes. Usually I'm adding exceptions as they are testing devices which I use daily basis.

However, I though you understood where the problem is, that's why I never bothered to add the certs in here.
(Reporter)

Comment 42

10 months ago
I think there's no need for further confirmation that the bug exists; this was confirmed more than 2 years ago. Given that nothing has happened since then, I'm not holding my breath for a fix. Clear out your old certificates and ensure that your certs  have unique distinguished names and you should be fine.

Comment 43

5 months ago
This is still broken in Firefox 52.0 ESR.

Is there a reason why previous certificates for the same server are kept in cert8.db when a new certificate exception is stored? Shouldn't the new certificate replace the old one? In Firefox's Certificate Manager there is only one entry for each server, even when I've stored multiple exceptions for the same server, as I would have expected. In cert8.db, on the other hand, all previous certificates are still in place.

I'm testing a device with self-signed certificate that is regenerated every time the configuration of the device is wiped. As a result, I have dozens of certificate with the same name in cert8.db. Even when deleting the exception from Certificate Manager, certificates still remain stored in cert8.db.

This workaround from Red Hat Bugzilla fixes the problem until the certificates pile up again:

https://bugzilla.redhat.com/show_bug.cgi?id=1204670#c34
> certutil -d dbm:. -D -n localhost.localdomain
> multiple times, it will delete on cert with each execution.
> You can run the -L command again to check how many you have left.
> When you have deleted all of them, try firefox again, and it should be quick again.

Comment 44

2 months ago
Still broken in 55.0b2. I just was bitten badly by this (came here via bug 1240548.)
I use a VPN connection into the office. It died while I was in the process of logging in to a system at work. I restarted the VPN connection but Firefox didn't get anywhere. Restarting Firefox didn't help - it hung hard in the TLS handshake (displaying that in the status bar is nice!)
I deleted all certificates that were related to that system, restarted Firefox and I could log in normally after going through the usual security dialog.
You need to log in before you can comment on or make changes to this bug.