Open Bug 1474963 Opened 6 years ago Updated 2 months ago

Firefox stalls after TLS handshake on self signed certificate - bug 1056341 not corrected

Categories

(Core :: Security: PSM, defect, P3)

61 Branch
defect

Tracking

()

UNCONFIRMED

People

(Reporter: fanf42, Unassigned)

Details

(Whiteboard: [psm-backlog])

Attachments

(1 obsolete file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0
Build ID: 20180705213349

Steps to reproduce:

Bug described in ticket 1056341 (https://bugzilla.mozilla.org/show_bug.cgi?id=1056341) is not solved in Firefox 61. When the problem described in that ticket happen (several certificate for the same dns name), the corresponding site is totally unusable with several (tens of) second waiting for each requests on TLS. 

The workaround described in https://bugzilla.mozilla.org/show_bug.cgi?id=1056341#c59 still works. 
The same site under chromium works without performance penalty.

This ticket is open because it was asked so in https://bugzilla.mozilla.org/show_bug.cgi?id=1056341#c60
Needinfo :keeler as per bug 1056341, comment 60.
Component: Untriaged → Security: PSM
Flags: needinfo?(dkeeler)
Product: Firefox → Core
What kind of error output are you seeing?
Flags: needinfo?(dkeeler) → needinfo?(fanf42)
It is all described in the other ticket, but I'm going to give it again here for clarity and to pin-point what was done in the other ticket correction and what remains. 

We are in a situation where we have server certificates for an internal app, say "server.xxx.local ", that are self signed. These certificated are regenerated regularly (it's part of the CI deployment test protocol). 

Each time the server.xxx.local app is deployed by CI with a newly generated certificate, we have the "be careful, self signed cert" exception message and accept it. 

But firefox doesn't overwrite existing certificate for the corresponding domain server name, it adds it in its certificate database. 

After three or four rounds of added certificates, firefox TLS validation is starting to be extremelly slow (you know, when you have the little comment box in bottom left saying "validating TLS handshake" or something as part of the request). "Slow" like it takes several seconds and grow exponentially with each added exception for the same server name. That validation is done for each request (of course), and so basically, the corresponding server.xxx.local is unusable. 

When that point is reached, there is no other way than to go to your user profile directory and remove certificate by hand (trying to delete certificate in about:preferences does not work).

For example: 

8<------------------------------------------------------------
% cd ~/.mozilla/firefox/xxxxxx.default
% certutil -d sql:. -L | grep server.xxx.local     
server.xxx.local                                          ,,
server.xxx.local                                          ,,
server.xxx.local                                          ,,
server.xxx.local                                          ,,
server.xxx.local                                          ,,
server.xxx.local                                          ,,
server.xxx.local

Here, access to corresponding site is almost impossible, extremelly slow and unusable. Doing 7 times: 

% certutil -d sql:. -D -n server.xxx.local

And now, everything is fast again.
8<------------------------------------------------------------

Ticket https://bugzilla.mozilla.org/show_bug.cgi?id=1056341 was supposed to introduced a budget in certificate exploration to avoid the exponential latency explosion. I'm sure it did, because now in place of a total froze of firefox, I just have to wait 10s for each request. But this is not a solution. The site is still totally unusable, and it's even pernicious as users think that the performance degradation is due to the last site deployment. 

The solution seems to be to overwrite existing cert for the exact same server name in profile cert base (or remove older one and add the new one). If the user accepts to add an exception, he can accept that an older certificate also added with an exception for the same server name can be deleted (the worst case is that the user will flip-flop between two certificates exception to add - and that's actually a feature, because it would let the user know about a state of certificates that is really broken, not just evolving with application life cycle). 

Hope it helps,
Flags: needinfo?(fanf42)
Thanks for the details. I don't have time to dig into this immediately, but it's on my radar.
In the meantime, there are some things you can do to avoid this situation:
1. set up a certificate hierarchy with a CA that your users import (with the added bonus that they don't have to click through certificate error warnings every time your service gets updated)
2. give the server certificates unique subject distinguished names
Priority: -- → P3
Whiteboard: [psm-backlog]
Same problem for me on the latest release version (61.0.1). Same behaviour on Developer version (62.0b8) after importing the cert9.db from my "normal" Firefox. It displays "Performing a TLS handshake to ...." for a very long time (if I wait some minutes it eventually displays the login page of my router, but meanwhile CPU usage is at 100% on one core). Tried on Windows 7 x64.
Issue also encountered in Firefox 62.0 on Fedora 28. Workaround with certutil helps.

I actually have the same issue right now on Firefox Developer Edition version 66.0b10
Working on Mac OS Mojave v10.14.3

I am loading a page from webpack-dev-server using port 8080 with a self signed certificate.
Browsing to https://localhost:8080/ will result in a timeout after waiting for a while on "Performing a TLS handshake to localhost"

As I see the most messages here are from a couple of months back I am curious if the got resolved or people still have this issue?

@bar-jan: I still have the bug, and the workaround I described abobe in comment #3 still works for me.

Forgot to mention:

I did not update my computer,
I do not use anti virus programs
I did reboot my machine
I do not have any issues with the 'normal' Firefox version 65.0.1 (64-bit)
I did remove Firefox Developer, reinstalled it (also after a reboot)

none of which worked to solve the issue.

(In reply to @fanf42 from comment #8)

@bar-jan: I still have the bug, and the workaround I described abobe in comment #3 still works for me.

Ah yes!
I actually saw that webpack is going to support a self signed cert file, which will prevent this issue.
As you would keep on using the same file each time.

I'm seeing this as well. Pretty annoying.

Would you consider making the buildForwardCallBudget introduced in https://hg.mozilla.org/mozilla-central/rev/9dcd9f186bbf a configurable falg, in about:config?

I believe we are seeing the same thing here on Firefox Quantum 68.4.2esr (64-bit)
Our Dell servers' out of band management (idrac) all have self-signed certificates.
I can connect to none of them using Firefox ("Performing TLS handshake...")
So I am using Chrome instead, but it's not my preferred solution.

I can also confirm the behavior with Dell servers (a HPC cluster of PowerEdge C6525 systems).

Unfortunately, in my experience, it only happens if you access several of them. I.e. you can't reproduce the issue easily with a single server. Login to the iDRAC web UI via https/TLS becomes slower and slower until it times out when you access more and more of the nodes. Both deleting the certificate entries with certutil and starting Firefox with a new profile are (temporary) workarounds.

(I have reported the issue to my Dell contacts as well.)

@jm: the workaround in comment #3 still works well for me.

(In reply to Karsten Weiss from comment #14)

I can also confirm the behavior with Dell servers (a HPC cluster of PowerEdge C6525 systems).
(I have reported the issue to my Dell contacts as well.)

Yep, the CN of the certificate always is idrac-SVCTAG [sic]. Probably they intended to replace the string "SVCTAG" by the actual service tag (in which case the problem would not occur) but failed. What did your contacts report back? I guess I'll open a service request with them. But I fear it will be a long one. Still, they officially support Firefox.

@jm I have not received a reply yet. I hope we'll get a comment here.

However, I flashed a new iDRAC FW today but still need to test if it makes difference. (However, even if it does, this remains a Firefox issue as well.)

@jm Your guess is probably right because with the iDRAC FW 4.10.10.10 the certificates no longer use "idrac-SVCTAG" but the actual service tag in the name. I've now deleted all accumulated instances of "idrac-SVCTAG" with certutil and hope the issue won't bother me again.

@Karsten Weiss
I chose to engage Dell Enterprise Support and after some discussion they have forwarded it to their developers. I told them to either correctly replace "SVCTAG" by the actual service tag (such that the certificates are unique, using SVCTAG suggests that that's what they intended to do but failed in the process), or that they remove Firefox from their list of compatible browsers (documentation bug) because it's not compatible (no matter if it's a bug in FF or whatever). Probably that got their attention.
In any case it's interesting that you say it's fixed in idrac 4.10.10.10 because we upgraded to that recently (obviously the new firmware did not actively replace existing certificates).
Maybe it's already fixed in the production chain, such that when you get a new server, the SSL cert CN is set correctly on delivery.
A workaround is the following:
racadm>> sslresetcfg
racadm>> racreset
This creates a new self-signed cert and restarts idrac (all settings are kept).
The new cert contains the hostname in my case (not sure what happens if none is configured, maybe it then defaults to svctag).
Maybe we should continue this discussion in Dell Community ^^ if there is anything further to be discussed.

I can confirm this bug for Firefox Quantum 68.8.0esr (64-Bit) on Debian GNU/Linux 10 (buster) against our embedded Linux devices which create their own self-signed certs like this:

 60 gen_cert_one_year_from_now() {                                          
 61     openssl req -x509 -new \                                            
 62         -keyout "${TTSSL_KEY_FILE}" -out "${TTSSL_CERT_FILE}" \         
 63         -nodes -subj "/CN=$(hostname)" -utf8 -batch -days 365           
 64 }                                                                       

The hostname on all of these devices is the same. I had only 11 devices contacted, when connections became unusable slow and CPU usage of Firefox very high:

% certutil -d sql:. -L | grep -c [redacted]
11

Only connections to those local devices are affected. TLS connections to servers on the WWW work fine and within expectable time.

Severity: normal → S3
Attachment #9383443 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: