Open Bug 1566175 Opened 9 months ago Updated 14 days ago

SSL_ERROR_MISSING_ESNI_EXTENSION occurs occasionally when visiting websites

Categories

(Core :: Networking: HTTP, defect, P3)

68 Branch
defect

Tracking

()

Tracking Status
firefox-esr68 --- ?
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- affected
firefox71 --- affected

People

(Reporter: raphael.mauro, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: regression, Whiteboard: [necko-triaged])

Attachments

(1 file)

Attached image ESNI-Error.png

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0

Steps to reproduce:

I activated DOH and ESNI in about:config from Mozilla Firefox Version 68.0 on Windows 10 1903.
I set those values in about:config :

network.trr.mode;3
network.trr.bootstrapAddress;1.1.1.1
network.trr.uri;https://mozilla.cloudflare-dns.com/dns-query
network.security.esni.enabled;true

Actual results:

When visiting websites (sometimes not always) I have those type of errors:
SSL_ERROR_MISSING_ESNI_EXTENSION

For example this website does this error: https://www.frandroid.com/
Some websites are working fine with ESNI set to "true" and some doesn't work at all, I have to set the value of network.security.esni.enabled to "false" in order to access some websites.

Expected results:

The website should be working fine, like the others where I don't have those type of error messages.
Please, see the attached file where I screen the error that occured while trying to access the community.cloudflare.com site.

Also have the same error on nightly (70). The error is NOT permanent. The sites are working just fine and then suddenly throw this error for a few seconds or minutes (and when one site does, then the other sites will too, so it's not site specific) and then start working again.

(In reply to khagaroth from comment #1)

Also have the same error on nightly (70). The error is NOT permanent. The sites are working just fine and then suddenly throw this error for a few seconds or minutes (and when one site does, then the other sites will too, so it's not site specific) and then start working again.

Hello,

What do I got to do then to get those sites working? Because for me I can't get an access to them at all. I always have this error.
Thanks.

I'm also encountering this. It's inconsistent, and sites may start sites may start working 5 minutes after I've received the error: SSL_ERROR_MISSING_ESNI_EXTENSION when attempting to connect. It happens with connections over HTTP, as well as websockets. My network.trr.mode is set to 2, otherwise all my settings are the same as Raphael's.

Hi,
I wasn't able to reproduce this issue on Nightly 70.0a1 (2019-07-29) and on 68.0
Also, Could you please try to see if it's reproducible on Nightly? here is the link for download https://www.mozilla.org/en-US/firefox/nightly/all/

Thanks!

Flags: needinfo?(raphael.mauro)

(In reply to Luciana queirolo from comment #4)

Hi,
I wasn't able to reproduce this issue on Nightly 70.0a1 (2019-07-29) and on 68.0
Also, Could you please try to see if it's reproducible on Nightly? here is the link for download https://www.mozilla.org/en-US/firefox/nightly/all/

Thanks!

Hello,

Sorry for the delay. I don't have any ESNI issue with Nightly. Still I would like to point out, that this error is so random... like sometimes a website throw this error and then a few sec later it works fine... For the moment, on Mozilla 68.0.1, the error sometimes occur and sometimes doesn't.

Flags: needinfo?(raphael.mauro)

Is there anything I can do to try to diagnose this error when it happens? I've had it happen multiple times over the past few days, I just don't know how to diagnose the problem.

I got the issue on feedly.com on Nightly, but it fixed itself after a few minutes.

The site uses Cloudflare, so perhaps there is an issue with server key rotation.
https://blog.cloudflare.com/encrypted-sni/

Cloudflare’s own SNI encryption implementation rotates the server’s keys every hour to improve forward secrecy, but keeps track of the keys for the previous few hours to allow for DNS caching and replication delays, so that clients with slightly outdated keys can still use ESNI without problems (but eventually all keys are discarded and forgotten).

Does Firefox keep DNS cache entries (or more specifically ESNI records) in a disk cache ?
I restarted it less than 30 min ago, so if the cache is only in RAM, then the key must be fresh, unless there is a server-side issue (or the issue isn't related to key rotation).

This just randomly started happening to me as well, on feedly.com and canary.discordapp.com. The site is broken for up to 60 seconds, I just continue to refresh and it goes away eventually. I'm on 70.0b10.

I have found something interesting about this bug: It only happens on full hours.
For example, it happens at 9:00 and keeps happening. Then it starts working again at 9:01.
This is in line with the person above stating it happens for "up to 60 seconds".
For me, this bug happens every day, and at times like 8:00, 11:00, 6:00, you get the idea.
It also only happens on some websites, but it's always the same websites.
There's probably some kind of bug in how you compare the time at full hours during ESNI validation.
I'm no programmer, so I hope someone else will take a close look at the time part of the code and fix the bug.

Sorry for the doublepost, but I could not find an "edit comment" function.
I want to add the following to the above post:

  • Changing my computer time (Win10's) to a full hour does not reproduce the bug. It appears that it actually has to be a full hour, not just on your computer.
  • The website I experience this regularly on uses cloudflare. This could be a coincidence, but I thought I should mention it, because the other posters also experienced issues on cloudflare websites.

Mark to NEW based on several reports.

Status: UNCONFIRMED → NEW
Component: Untriaged → Networking: HTTP
Ever confirmed: true
Product: Firefox → Core
Summary: ESNI error when visiting websites SSL_ERROR_MISSING_ESNI_EXTENSION → SSL_ERROR_MISSING_ESNI_EXTENSION occurs occasionally when visiting websites

(In reply to NoName from comment #10)

Sorry for the doublepost, but I could not find an "edit comment" function.
I want to add the following to the above post:

  • Changing my computer time (Win10's) to a full hour does not reproduce the bug. It appears that it actually has to be a full hour, not just on your computer.
  • The website I experience this regularly on uses cloudflare. This could be a coincidence, but I thought I should mention it, because the other posters also experienced issues on cloudflare websites.

This is really interesting.
Could you try to get the http log when this happens?
Thanks.

Flags: needinfo?(defer.com)

mt, can you look at this or pass it on?

Flags: needinfo?(mt)

This looks like a server configuration error... or a DNS server over-caching an old ESNI record ... or a bad local clock at the time that keys roll over. I don't see any problem with these sites, unless I have a very old ESNI record. I'll forward this to our friends at Cloudflare to get a better idea of what is going on, but we might need to build some better diagnostics for this error.

We probably need to log the ESNI record and the system time when this error happens. Ideally, we should also ask the DoH server about what time it thinks that it is. (If we already do these things, that's super.)

Flags: needinfo?(mt)

(In reply to Kershaw Chang [:kershaw] from comment #12)

(In reply to NoName from comment #10)

Sorry for the doublepost, but I could not find an "edit comment" function.
I want to add the following to the above post:

  • Changing my computer time (Win10's) to a full hour does not reproduce the bug. It appears that it actually has to be a full hour, not just on your computer.
  • The website I experience this regularly on uses cloudflare. This could be a coincidence, but I thought I should mention it, because the other posters also experienced issues on cloudflare websites.

This is really interesting.
Could you try to get the http log when this happens?
Thanks.
I'm working on it, though it's rather difficult, because the bug happens somewhat randomly and I need to start logging before it fixes itself. I'll make another post then.
Meanwhile I got some more info:
A few weeks ago I stated that the bug happens from XX:00 to XX:01 every hour for me.
Shortly afterwards, this actually changed to being XX:04 to XX:05. For example, I noticed it happening about 4 more times around times like 6:04, 11:04, and so on. So while probably not connected to a certain number, it still happens somewhat periodically.
And next, I tried logging the http traffic by refreshing around that time. I kept refreshing pages every few seconds between XX:55 and XX:10, but the bug never happened (I did it especially often around XX:04).
So it seems to me that it doesn't happen EVERY hour, but when it does happen, it's always around the same time.

(In reply to raphael.mauro from comment #0)

Created attachment 9078235 [details]
ESNI-Error.png

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0

Steps to reproduce:

I activated DOH and ESNI in about:config from Mozilla Firefox Version 68.0 on Windows 10 1903.
I set those values in about:config :

network.trr.mode;3
network.trr.bootstrapAddress;1.1.1.1
network.trr.uri;https://mozilla.cloudflare-dns.com/dns-query
network.security.esni.enabled;true

I am having this same issue with ESNI support enabled in Firefox 70. My TRR mode is set to 2, if that matters.

(In reply to Kershaw Chang [:kershaw] from comment #12)

This is really interesting.
Could you try to get the http log when this happens?
Thanks.
I managed to get a good HTTP log of this bug now. It's uploaded here:
https://x0.at/Zuw.txt
I did the following:

  • started browser
  • started logging
  • entered https://boards.4channel.org/g/ into the URL bar of a new tab and hit enter, 1 minute past a full hour (This is a cloudflare-backed website, if I'm not mistaken)
  • ESNI error appears almost instantly
  • stopped logging
  • removed cookies with personal information from the log
    Note that this bug, once again, occured 1 minute past a full hour (in this case 1:01AM) and fixed itself just a few seconds later, around 1:02AM. Just like the previous times it occured to me.
    Hope this helps.
Flags: needinfo?(defer.com)

The formatting in the previous post got messed up, but it contains a link to a clean log of the bug.
I forgot to add some possibly relevant info:
My system clock was on 2:01AM when I looked at it. The log says 1:01AM UTC, so it's probably just a hour timezone difference and the milliseconds should be the same as in the log?
Windows 10, Firefox 70.0, I use the HTTPS Everywhere and uBlock Origin extension.

I am getting this error in Firefox 71 beta 4 when visiting https://feedly.com/ with ESNI encryption and DoH enabled.

Based on what I'm seeing in the log (https://x0.at/Zuw.txt), we have a valid ESNI record that was generated at 2019-10-26T23:00Z on . I'm guessing, but this is exactly the middle of the notBefore and notAfter fields. The DNS TTL is 3600 (one hour). I need to confirm, but I expect that these records are generated once per hour on the hour. If that is the case, then this record might have entered a cache just before 2019-10-27T00:00Z. That means that the record would be considered usable right up to 2019-10-27T01:00Z, a minute before the failed attempt. It is likely that the ESNI keys at the server were rotated at 2019-10-27T01:00Z or soon afterwards. As a result, the server wasn't able to produce the correct ESNI record and we failed the connection.

This is clearly a server configuration issue. Because Cloudflare tell us that this is valid for 3 days, we can't do anything to safeguard against failures here. That's their prerogative, and they make that call because they don't trust client clocks, which is probably wise. Our experience shows that clocks on clients are sometimes very bad.

In the end, the goal is to have the server ensure that the client won't attempt the ESNI request past when they drop their keys. They do that either by reducing the time that the record is valid (by reducing the TTL on DNS records, for instance) or by extending the time that the server retains the corresponding keys.

If, as I'm guessing, the record was issued at 23:00, replaced at 00:00, and the keys were retired at 01:00, then a TTL of 3600 is too tight. Any delay in provisioning replacement keys at 00:00 would leave a period of exposure. If keys are replaced too soon relative to that, this error occurs. If a DNS server holds on to keys too long, or there are network delays in delivering DNS responses, then the TTL will extend past the 3600 and bad things happen.

Now, I might be wrong and this might be the result of an insane over-extension of the DNS TTL, but I'll check with Cloudflare to confirm this.

A note:
necko increase validity (valid + grace time) to be at least a 1min (the grace period is increase, but anyway we will use the record for 1min at least).
This is not the problem here because the record is not from necko cache.
(We should not increase TTL and also do not add grace as we so for A and AAAA records, I will file a bug.)

This does not look like our bug. i will wait to get a confirmation from Cloudflare.

Priority: -- → P3
Whiteboard: [necko-triaged]

I just got this error at https://clark.com/latest (Win7/FF 71.0).

Settings:
network.trr.mode - 2
network.trr.bootstrapAddress - 9.9.9.11
network.trr.uri - https://dns11.quad9.net/dns-query
network.trr.custom_uri - https://dns11.quad9.net/dns-query
network.trr.early-AAAA - true
network.trr.request-timeout - 3000
network.trr.request_timeout_ms - 3000
network.trr.wait-for-portal - true
network.security.esni.enabled - true

Changing only network.security.esni.enabled to false let me in. I then set it back to true and still got in. Will see what happens tomorrow.

(In reply to Dragana Damjanovic [:dragana] from comment #22)

This does not look like our bug. i will wait to get a confirmation from Cloudflare.

Indeed; for me, it's caused by having the AVG Antivirus setting for 'Enable HTTPS scanning' enabled.

Well, I had a AVG update waiting to be installed on reboot so I did it - and the problem has disappeared. Also, my laptop doesn't have the problem and it's still waiting for a reboot to install the AVG update. Don't know what to make of that, as both PCs have the same FF DNS setup as mentioned above :-/

Regarding the comment above, I don't use an antivirus at all, and I still get this problem multiple times every day at regular time intervals.
So possibly the AVG was just a coincidence.

Hello everyone,

As the person who reported this issue, I can confirm that the problem is not coming the AVG antivirus, as I'm using Eset Internet Security (version 13.0.24.0) and still have the issue I reported.

Regards,

Have to confirm - also don't use any antivirus. Just to add a new thing - for about a week now the same sites that give me the ESNI error now also rarely (ie even more rarely than the ESNI error) give me the SSL_ERROR_NO_CYPHER_OVERLAP error instead which also clears itself after few seconds/minutes.

I'm experiencing this issue as well, though it is not nearly as intermittent as others have reported. For me, the error doesn't clear itself until I restart Firefox.

I don't use any antivirus. Firefox 72.0.1 on Linux

Can also confirm I encountered this error on Firefox 73.0b5 (Windows 10). My trr is set to 3 and have the standard settings same to the users above.
I disabled the protection feature "Web Anti-Virus" in Kaspersky after seeing the comment regarding AVG, since then it's worked.

Some people seem still affected by without having an AV but if you have one it's worth disabling to try.

This happens for me regularly at hourly intervals, usually at 59 minutes past the hour, and usually abates at 1 minute or so past the hour. I'm running Linux, using Firefox Developer Edition 73.0b5 (64-bit).

This still happens to me rarely, can't seem to reproduce it. Last time it happened was on a different older version of Firefox Nightly, current using version 76.0a1 (2020-03-17) (64-bit).
When the problem occurs it happens on a certain site (Different each time it occurs.) however other sites work.
No anti-virus, ESNI + DOH + DNSSEC are all on, using Windows 10 stable latest version. This also happens right as the time hits XX:00 (A new hour.) which aligns with the previous comment above me. Problem dissipates after about a minute to three minutes.

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression
You need to log in before you can comment on or make changes to this bug.