SSL_ERROR_MISSING_ESNI_EXTENSION occurs occasionally when visiting websites
Categories
(Core :: Networking: HTTP, defect, P3)
Tracking
()
People
(Reporter: raphael.mauro, Unassigned)
References
Details
(Keywords: regression, Whiteboard: [necko-triaged])
Attachments
(2 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0
Steps to reproduce:
I activated DOH and ESNI in about:config from Mozilla Firefox Version 68.0 on Windows 10 1903.
I set those values in about:config :
network.trr.mode;3
network.trr.bootstrapAddress;1.1.1.1
network.trr.uri;https://mozilla.cloudflare-dns.com/dns-query
network.security.esni.enabled;true
Actual results:
When visiting websites (sometimes not always) I have those type of errors:
SSL_ERROR_MISSING_ESNI_EXTENSION
For example this website does this error: https://www.frandroid.com/
Some websites are working fine with ESNI set to "true" and some doesn't work at all, I have to set the value of network.security.esni.enabled to "false" in order to access some websites.
Expected results:
The website should be working fine, like the others where I don't have those type of error messages.
Please, see the attached file where I screen the error that occured while trying to access the community.cloudflare.com site.
Also have the same error on nightly (70). The error is NOT permanent. The sites are working just fine and then suddenly throw this error for a few seconds or minutes (and when one site does, then the other sites will too, so it's not site specific) and then start working again.
Reporter | ||
Comment 2•5 years ago
|
||
(In reply to khagaroth from comment #1)
Also have the same error on nightly (70). The error is NOT permanent. The sites are working just fine and then suddenly throw this error for a few seconds or minutes (and when one site does, then the other sites will too, so it's not site specific) and then start working again.
Hello,
What do I got to do then to get those sites working? Because for me I can't get an access to them at all. I always have this error.
Thanks.
I'm also encountering this. It's inconsistent, and sites may start sites may start working 5 minutes after I've received the error: SSL_ERROR_MISSING_ESNI_EXTENSION when attempting to connect. It happens with connections over HTTP, as well as websockets. My network.trr.mode is set to 2, otherwise all my settings are the same as Raphael's.
Comment 4•5 years ago
|
||
Hi,
I wasn't able to reproduce this issue on Nightly 70.0a1 (2019-07-29) and on 68.0
Also, Could you please try to see if it's reproducible on Nightly? here is the link for download https://www.mozilla.org/en-US/firefox/nightly/all/
Thanks!
Reporter | ||
Comment 5•5 years ago
|
||
(In reply to Luciana queirolo from comment #4)
Hi,
I wasn't able to reproduce this issue on Nightly 70.0a1 (2019-07-29) and on 68.0
Also, Could you please try to see if it's reproducible on Nightly? here is the link for download https://www.mozilla.org/en-US/firefox/nightly/all/Thanks!
Hello,
Sorry for the delay. I don't have any ESNI issue with Nightly. Still I would like to point out, that this error is so random... like sometimes a website throw this error and then a few sec later it works fine... For the moment, on Mozilla 68.0.1, the error sometimes occur and sometimes doesn't.
Is there anything I can do to try to diagnose this error when it happens? I've had it happen multiple times over the past few days, I just don't know how to diagnose the problem.
Comment 7•5 years ago
|
||
I got the issue on feedly.com on Nightly, but it fixed itself after a few minutes.
The site uses Cloudflare, so perhaps there is an issue with server key rotation.
https://blog.cloudflare.com/encrypted-sni/
Cloudflare’s own SNI encryption implementation rotates the server’s keys every hour to improve forward secrecy, but keeps track of the keys for the previous few hours to allow for DNS caching and replication delays, so that clients with slightly outdated keys can still use ESNI without problems (but eventually all keys are discarded and forgotten).
Does Firefox keep DNS cache entries (or more specifically ESNI records) in a disk cache ?
I restarted it less than 30 min ago, so if the cache is only in RAM, then the key must be fresh, unless there is a server-side issue (or the issue isn't related to key rotation).
This just randomly started happening to me as well, on feedly.com and canary.discordapp.com. The site is broken for up to 60 seconds, I just continue to refresh and it goes away eventually. I'm on 70.0b10.
I have found something interesting about this bug: It only happens on full hours.
For example, it happens at 9:00 and keeps happening. Then it starts working again at 9:01.
This is in line with the person above stating it happens for "up to 60 seconds".
For me, this bug happens every day, and at times like 8:00, 11:00, 6:00, you get the idea.
It also only happens on some websites, but it's always the same websites.
There's probably some kind of bug in how you compare the time at full hours during ESNI validation.
I'm no programmer, so I hope someone else will take a close look at the time part of the code and fix the bug.
Comment 10•5 years ago
|
||
Sorry for the doublepost, but I could not find an "edit comment" function.
I want to add the following to the above post:
- Changing my computer time (Win10's) to a full hour does not reproduce the bug. It appears that it actually has to be a full hour, not just on your computer.
- The website I experience this regularly on uses cloudflare. This could be a coincidence, but I thought I should mention it, because the other posters also experienced issues on cloudflare websites.
Comment 11•5 years ago
|
||
Mark to NEW based on several reports.
Comment 12•5 years ago
|
||
(In reply to NoName from comment #10)
Sorry for the doublepost, but I could not find an "edit comment" function.
I want to add the following to the above post:
- Changing my computer time (Win10's) to a full hour does not reproduce the bug. It appears that it actually has to be a full hour, not just on your computer.
- The website I experience this regularly on uses cloudflare. This could be a coincidence, but I thought I should mention it, because the other posters also experienced issues on cloudflare websites.
This is really interesting.
Could you try to get the http log when this happens?
Thanks.
Comment 14•5 years ago
|
||
This looks like a server configuration error... or a DNS server over-caching an old ESNI record ... or a bad local clock at the time that keys roll over. I don't see any problem with these sites, unless I have a very old ESNI record. I'll forward this to our friends at Cloudflare to get a better idea of what is going on, but we might need to build some better diagnostics for this error.
We probably need to log the ESNI record and the system time when this error happens. Ideally, we should also ask the DoH server about what time it thinks that it is. (If we already do these things, that's super.)
Comment 15•5 years ago
|
||
(In reply to Kershaw Chang [:kershaw] from comment #12)
(In reply to NoName from comment #10)
Sorry for the doublepost, but I could not find an "edit comment" function.
I want to add the following to the above post:
- Changing my computer time (Win10's) to a full hour does not reproduce the bug. It appears that it actually has to be a full hour, not just on your computer.
- The website I experience this regularly on uses cloudflare. This could be a coincidence, but I thought I should mention it, because the other posters also experienced issues on cloudflare websites.
This is really interesting.
Could you try to get the http log when this happens?
Thanks.
I'm working on it, though it's rather difficult, because the bug happens somewhat randomly and I need to start logging before it fixes itself. I'll make another post then.
Meanwhile I got some more info:
A few weeks ago I stated that the bug happens from XX:00 to XX:01 every hour for me.
Shortly afterwards, this actually changed to being XX:04 to XX:05. For example, I noticed it happening about 4 more times around times like 6:04, 11:04, and so on. So while probably not connected to a certain number, it still happens somewhat periodically.
And next, I tried logging the http traffic by refreshing around that time. I kept refreshing pages every few seconds between XX:55 and XX:10, but the bug never happened (I did it especially often around XX:04).
So it seems to me that it doesn't happen EVERY hour, but when it does happen, it's always around the same time.
Comment 16•5 years ago
|
||
(In reply to raphael.mauro from comment #0)
Created attachment 9078235 [details]
ESNI-Error.pngUser Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0
Steps to reproduce:
I activated DOH and ESNI in about:config from Mozilla Firefox Version 68.0 on Windows 10 1903.
I set those values in about:config :network.trr.mode;3
network.trr.bootstrapAddress;1.1.1.1
network.trr.uri;https://mozilla.cloudflare-dns.com/dns-query
network.security.esni.enabled;true
I am having this same issue with ESNI support enabled in Firefox 70. My TRR mode is set to 2, if that matters.
Comment 17•5 years ago
|
||
(In reply to Kershaw Chang [:kershaw] from comment #12)
This is really interesting.
Could you try to get the http log when this happens?
Thanks.
I managed to get a good HTTP log of this bug now. It's uploaded here:
https://x0.at/Zuw.txt
I did the following:
- started browser
- started logging
- entered https://boards.4channel.org/g/ into the URL bar of a new tab and hit enter, 1 minute past a full hour (This is a cloudflare-backed website, if I'm not mistaken)
- ESNI error appears almost instantly
- stopped logging
- removed cookies with personal information from the log
Note that this bug, once again, occured 1 minute past a full hour (in this case 1:01AM) and fixed itself just a few seconds later, around 1:02AM. Just like the previous times it occured to me.
Hope this helps.
Comment 18•5 years ago
|
||
The formatting in the previous post got messed up, but it contains a link to a clean log of the bug.
I forgot to add some possibly relevant info:
My system clock was on 2:01AM when I looked at it. The log says 1:01AM UTC, so it's probably just a hour timezone difference and the milliseconds should be the same as in the log?
Windows 10, Firefox 70.0, I use the HTTPS Everywhere and uBlock Origin extension.
Comment 19•5 years ago
|
||
I am getting this error in Firefox 71 beta 4 when visiting https://feedly.com/ with ESNI encryption and DoH enabled.
Comment 20•5 years ago
|
||
Based on what I'm seeing in the log (https://x0.at/Zuw.txt), we have a valid ESNI record that was generated at 2019-10-26T23:00Z on . I'm guessing, but this is exactly the middle of the notBefore and notAfter fields. The DNS TTL is 3600 (one hour). I need to confirm, but I expect that these records are generated once per hour on the hour. If that is the case, then this record might have entered a cache just before 2019-10-27T00:00Z. That means that the record would be considered usable right up to 2019-10-27T01:00Z, a minute before the failed attempt. It is likely that the ESNI keys at the server were rotated at 2019-10-27T01:00Z or soon afterwards. As a result, the server wasn't able to produce the correct ESNI record and we failed the connection.
This is clearly a server configuration issue. Because Cloudflare tell us that this is valid for 3 days, we can't do anything to safeguard against failures here. That's their prerogative, and they make that call because they don't trust client clocks, which is probably wise. Our experience shows that clocks on clients are sometimes very bad.
In the end, the goal is to have the server ensure that the client won't attempt the ESNI request past when they drop their keys. They do that either by reducing the time that the record is valid (by reducing the TTL on DNS records, for instance) or by extending the time that the server retains the corresponding keys.
If, as I'm guessing, the record was issued at 23:00, replaced at 00:00, and the keys were retired at 01:00, then a TTL of 3600 is too tight. Any delay in provisioning replacement keys at 00:00 would leave a period of exposure. If keys are replaced too soon relative to that, this error occurs. If a DNS server holds on to keys too long, or there are network delays in delivering DNS responses, then the TTL will extend past the 3600 and bad things happen.
Now, I might be wrong and this might be the result of an insane over-extension of the DNS TTL, but I'll check with Cloudflare to confirm this.
Comment 21•5 years ago
|
||
A note:
necko increase validity (valid + grace time) to be at least a 1min (the grace period is increase, but anyway we will use the record for 1min at least).
This is not the problem here because the record is not from necko cache.
(We should not increase TTL and also do not add grace as we so for A and AAAA records, I will file a bug.)
Comment 22•5 years ago
|
||
This does not look like our bug. i will wait to get a confirmation from Cloudflare.
Comment 23•5 years ago
|
||
I just got this error at https://clark.com/latest (Win7/FF 71.0).
Settings:
network.trr.mode - 2
network.trr.bootstrapAddress - 9.9.9.11
network.trr.uri - https://dns11.quad9.net/dns-query
network.trr.custom_uri - https://dns11.quad9.net/dns-query
network.trr.early-AAAA - true
network.trr.request-timeout - 3000
network.trr.request_timeout_ms - 3000
network.trr.wait-for-portal - true
network.security.esni.enabled - true
Changing only network.security.esni.enabled to false let me in. I then set it back to true and still got in. Will see what happens tomorrow.
Comment 24•5 years ago
|
||
(In reply to Dragana Damjanovic [:dragana] from comment #22)
This does not look like our bug. i will wait to get a confirmation from Cloudflare.
Indeed; for me, it's caused by having the AVG Antivirus setting for 'Enable HTTPS scanning' enabled.
Comment 25•5 years ago
|
||
Well, I had a AVG update waiting to be installed on reboot so I did it - and the problem has disappeared. Also, my laptop doesn't have the problem and it's still waiting for a reboot to install the AVG update. Don't know what to make of that, as both PCs have the same FF DNS setup as mentioned above :-/
Comment 26•5 years ago
|
||
Regarding the comment above, I don't use an antivirus at all, and I still get this problem multiple times every day at regular time intervals.
So possibly the AVG was just a coincidence.
Reporter | ||
Comment 27•5 years ago
|
||
Hello everyone,
As the person who reported this issue, I can confirm that the problem is not coming the AVG antivirus, as I'm using Eset Internet Security (version 13.0.24.0) and still have the issue I reported.
Regards,
Comment 28•5 years ago
|
||
Have to confirm - also don't use any antivirus. Just to add a new thing - for about a week now the same sites that give me the ESNI error now also rarely (ie even more rarely than the ESNI error) give me the SSL_ERROR_NO_CYPHER_OVERLAP error instead which also clears itself after few seconds/minutes.
Comment 29•5 years ago
|
||
I'm experiencing this issue as well, though it is not nearly as intermittent as others have reported. For me, the error doesn't clear itself until I restart Firefox.
I don't use any antivirus. Firefox 72.0.1 on Linux
Comment 30•5 years ago
|
||
Can also confirm I encountered this error on Firefox 73.0b5 (Windows 10). My trr is set to 3 and have the standard settings same to the users above.
I disabled the protection feature "Web Anti-Virus" in Kaspersky after seeing the comment regarding AVG, since then it's worked.
Some people seem still affected by without having an AV but if you have one it's worth disabling to try.
Comment 31•5 years ago
|
||
This happens for me regularly at hourly intervals, usually at 59 minutes past the hour, and usually abates at 1 minute or so past the hour. I'm running Linux, using Firefox Developer Edition 73.0b5 (64-bit).
Comment 32•5 years ago
|
||
This still happens to me rarely, can't seem to reproduce it. Last time it happened was on a different older version of Firefox Nightly, current using version 76.0a1 (2020-03-17) (64-bit).
When the problem occurs it happens on a certain site (Different each time it occurs.) however other sites work.
No anti-virus, ESNI + DOH + DNSSEC are all on, using Windows 10 stable latest version. This also happens right as the time hits XX:00 (A new hour.) which aligns with the previous comment above me. Problem dissipates after about a minute to three minutes.
Comment 33•5 years ago
|
||
Bugbug thinks this bug is a regression, but please revert this change in case of error.
Comment 34•5 years ago
|
||
I am getting this still on FF 75. I have no antivirus other than windows defender and it is not happening at regular intervals, it happens randomly. Pressing refresh a few times usually gets the page to load. I am using DoH with Nextdns, TRR mode set to 3. I havehad to stop using DoH until this is resolved as it is driving me mad. How this is still going on after this much time I don't know.
Comment 35•5 years ago
|
||
Experiencing the same randomly today but only on CloudFlare hosted websites, I think comment 14 lays out the real issue with CF.
Comment 36•5 years ago
|
||
I've got SSL_ERROR_MISSING_ESNI_EXTENSION today, although I haven't got that yesterday.
And I took a network log I attached.
The site is https://pastebin.com/
<My Environment>
Firefox 75.0 (64-bit)
Windows 10 (64-bit)
network.trr.bootstrapAddress = 1.1.1.1
network.trr.mode = 3
network.trr.custom_uri = https://mozilla.cloudflare-dns.com/dns-query
network.trr.uri = https://mozilla.cloudflare-dns.com/dns-query
network.security.esni.enabled = true
Comment 37•5 years ago
|
||
This seems to remain broken for longer period of times, since at least 10 minutes (when I updated to today's nightly) :
- https://www.cloudflare.com, https://feedly.com => SSL_ERROR_MISSING_ESNI_EXTENSION
- https://blog.cloudflare.com, https://www.phoronix.com => SSL_ERROR_NO_CYPHER_OVERLAP
Disabling DoH and restarting Firefox avoids the issues.
Comment 38•5 years ago
|
||
FWIW. I'm having the 'SSL_ERROR_MISSING_ESNI_EXTENSION' issue with some sites on Firefox Nighly version 77.0a1 (2020-04-15) (64-bit) - Windows 10.
A temporary work-around on my end seems to be setting network.trr.mode to 1 (network.trr.mode = 1).
So have the settings as follows:
network.trr.bootstrapAddress = 1.1.1.1
network.trr.mode = 1
network.trr.custom_uri = https://mozilla.cloudflare-dns.com/dns-query
network.trr.uri = https://mozilla.cloudflare-dns.com/dns-query
network.security.esni.enabled = true
Comment 39•5 years ago
|
||
However this leads the browser to fail the Cloudflare ENSI check:
Comment 40•5 years ago
|
||
I have commented on this months ago and stated that it happens for about a minute an hour.
But recently it has happened for 10 minutes an hour and then 15 minutes an hour.
At the time of this writing, it has been happening for about 90 minutes nonstop, and it's still happening.
Comment 41•5 years ago
|
||
Hi,
Also getting this issue, only on the Discord website, every other site is loading fine. Running https://www.cloudflare.com/ssl/encrypted-sni/ I pass all tests
https://bin.privacytools.io/?3834a6d573796256#V932WglnnlG29qLklYXN7lQWiodAdYl0crklAqrYEEU=
My network.trr.mode is set to 2, network.trr.uri is set to https://mozilla.cloudflare-dns.com/dns-query to https://dns.quad9.net/dns-query and it's now working.
Comment 42•5 years ago
|
||
Possible issue with cloudflair it self https://www.cloudflarestatus.com/
Comment 43•5 years ago
|
||
Hi,
A fix has been rolled out at Cloudflare earlier today. The problem was exposed by two separate bugs:
- Key rotation happens every hour, but Cloudflare's authorative DNS servers could serve stale records for up to 15 minutes. Since the TTL was originally 1 hour, this could mean that a client could continue to use the previous-previous key for 1h15m after the intended key rotation.
- The TLS server was supposed to keep the previous two ESNI keys to counter DNS caches that do not respect TTL. However due to a bug, only the previous and current key was kept. That could result in a window of up to 15 minutes where client connections using older ESNI keys would fail connections.
A fix for the first issue is to reduce the TTL from 1h to 30 minutes, this change seems to have an immediate effect on the failure rates. There are close to no hourly failure spikes anymore.
A fix for the second issue has not been applied yet, but interestingly it does not seem necessary to reduce failure rates for the current deployment. While previous research revealed that resolvers can significantly increase TTL, the current deployment of ESNI being coupled with DoH seems to prevent these TTL modifications. As deployment of ESNI grows, it remains to be seen whether this still holds.
As for the reports on 2020-04-15, there was a temporary issue that caused increased failure rates for up to 2 hours.
@mt Do you have any metrics on ESNI failure rates from your side to verify this?
I think that this issue can be closed as it was a problem at our (CF) side that has since been resolved.
Reporter | ||
Comment 44•5 years ago
|
||
(In reply to Peter Wu from comment #43)
Hi,
A fix has been rolled out at Cloudflare earlier today. The problem was exposed by two separate bugs:
- Key rotation happens every hour, but Cloudflare's authorative DNS servers could serve stale records for up to 15 minutes. Since the TTL was originally 1 hour, this could mean that a client could continue to use the previous-previous key for 1h15m after the intended key rotation.
- The TLS server was supposed to keep the previous two ESNI keys to counter DNS caches that do not respect TTL. However due to a bug, only the previous and current key was kept. That could result in a window of up to 15 minutes where client connections using older ESNI keys would fail connections.
A fix for the first issue is to reduce the TTL from 1h to 30 minutes, this change seems to have an immediate effect on the failure rates. There are close to no hourly failure spikes anymore.
A fix for the second issue has not been applied yet, but interestingly it does not seem necessary to reduce failure rates for the current deployment. While previous research revealed that resolvers can significantly increase TTL, the current deployment of ESNI being coupled with DoH seems to prevent these TTL modifications. As deployment of ESNI grows, it remains to be seen whether this still holds.
As for the reports on 2020-04-15, there was a temporary issue that caused increased failure rates for up to 2 hours.
@mt Do you have any metrics on ESNI failure rates from your side to verify this?
I think that this issue can be closed as it was a problem at our (CF) side that has since been resolved.
Hello,
Thanks for your feedback though I still have the same error when accessing this URL: https://developers.cloudflare.com/
My settings are the same as my original post.
I don't think we should close this case until most people can confirm that everything's working fine.
Regards,
Comment 45•5 years ago
|
||
Raphael,
Can you reproduce the problem after restarting your browser?
If so, please share:
- The info from https://1.1.1.1/help
- On about:networking#dns, do any entries with "TRR" set to "true" exist?
- On about:networking#dns, can you see developers.cloudflare.com after opening it? If so, what does it say?
I am unable to reproduce it locally with the settings you provided. I tried to connect to https://developers.cloudflare.com/, and it works. Wireshark shows the presence of ESNI too in the Encrypted Extensions handshake message.
Comment 46•5 years ago
|
||
Firstly, thanks to Peter for following up here.
The bugs regarding key rotation and DNS caching mean that we're finding the hidden operational traps in this. On the one hand, I'm glad that it was just server-side bugs; on the other, as long as these are inherent in the design we'll have these problems. I don't know how much we can feed this into improvements to specifications or documentation, but it would be good to capture this somewhere. Especially the bit about future research.
As far as knowing that problems remain or whether this fixes things, that's difficult. I believe we do track TLS error codes, but that tracking is not public and getting the data is a little difficult (as in, I have made some casual attempts to get data and only ever failed). Also, we are limited in the granularity of the data. We won't be able to tell what part of an hour an observation is from; we can barely tell which day a submission was from. I might be able to look into whether this trends down as a result of these fixes, but I expect that this will end up in the noise: our telemetry has a bunch of noise and as we haven't enabled this by default, the number of people who might encounter this is probably quite small (no, we don't have telemetry on that, and I don't believe we have plans to build that capability: about:config is a little sensitive).
This was always an experimental feature (and it's going to change a LOT soon), so this is valuable experience more than it is a problem that needs to be fixed. I'm going to close this, but feel free to use this bug to coordinate the last little details if that suits.
Updated•5 years ago
|
Comment 47•5 years ago
|
||
The issue seems to be anything but resolved. Just started getting SSL_ERROR_MISSING_ESNI_EXTENSION
on sites like Patreon. And they don't go away, they're permament. Firefox 77.0b7 (64-bit)
Comment 48•5 years ago
|
||
Can't edit the comment, so forgive double-posting. Turns out, that it was ESET Nod32's doing. Unchecking
Settings > Advanced settings > Web and email > access protection > protocols > HTTPS checking
fixed the issue
Comment 49•5 years ago
|
||
Are you using ESNI?
ESNI is/was an experimental feature that had some deployment difficulties, e.g. unexpectedly getting this error. It will be replace with a new version soon. There is already a draft specification for the new version.
We are not advising on using ESNI at the moment.
Comment 50•5 years ago
|
||
I probably am using it, yes. That being said, changing this antivirus setting did help.
Comment 51•3 years ago
|
||
This same issue happened in IE edge also. https://www.mycashbackreviews.com/post/fluz-app-referral-code/
I have set the network.trr.uri to https://rfc8484.com/dns-query
this error happens randomly.
Description
•