1716586 - HTTPS-Only Mode: https timeout with multiple simultaneous connections

Reporter

Description

•

4 years ago

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 Firefox/89.0

Steps to reproduce:

Fresh browser session. Settings | Privacy and Security | HTTPS-only Mode set to "enable in all windows." Simultaneously opened multiple tabs, six or more, by right-clicking a subfolder in Bookmarks.

Actual results:

Some — unpredictably and intermittently — of the tabs fail HTTPS validation, resulting in a "security warning" and user request whether to go to HTTP site. Manually clicking the "go to HTTP" button, more often than not, then goes to HTTPS (lock icon in address bar).

Expected results:

Should have maintained HTTPS connection despite implied timeout/connection inconsistency.

Suspicion: There is either a timeout problem, or an inability to handle too many (however many that is) handshakes if they all come back at once.

If the former, the timeout should be either relaxed or examined to see if it's necessary at all. As I read the easily available documentation on HTTPS handshaking, the protocol itself does not demand a timeout; I may well be missing something, or the documentation might be incorrect or incomplete.

If the latter — which seems unlikely, as I've had this happen with four new tabs and fail to happen with fifteen — the message the user receives needs to be changed to suggest "Retry HTTPS?" rather than blithely assume that "fail once = persistent failure of HTTPS capability" and presume that any further connection will necessarily be "unsecured."

And if it's something else entirely... that's well above my pay grade.

Timea Cernea [:tbabos][inactive]

Comment 1

•

4 years ago

Thank you for the report, moving this over to Core:Networking for further triage.

Component: Untriaged → Networking

Product: Firefox → Core

CEP

Reporter

Comment 2

•

4 years ago

Some further unintentional/inadvertent testing this morning is pointing at timeouts rather than capacity. I got this error in both 89.0 and 89.0.1 with a single tab... to the NYT on a morning on which significant Supreme Court decisions were being announced and the NYT site was being updated with breaking-news-type material. In one instance, it was after a complete shutdown and restart of Firefox. I think that rules out a "too many handshakes" cause for what I'm seeing. It MIGHT point at a server-side problem, but then I'd expect to see it in other browsers and on other machines, and I haven't.

Kershaw Chang [:kershaw]

Comment 3

•

4 years ago

Hi Reporter,

Does this also happen when HTTPS-only Mode is disabled?
Could you also try to capture a http log with the steps in comment #2?

Thanks.

Flags: needinfo?(ceplaw)

CEP

Reporter

Comment 4

•

4 years ago

(In reply to Kershaw Chang [:kershaw] from comment #3)

Hi Reporter,

Does this also happen when HTTPS-only Mode is disabled?

No. This seems unique to HTTPS-only Mode... and I'm not really able to do "normal work" without it, so I'm reluctant to turn it off.

Could you also try to capture a http log with the steps in comment #2?

Thanks.

I will try, but because this is a bit unpredictable it may be several days. I'm assuming that you only want a log when I have this same issue show up.

Flags: needinfo?(ceplaw)

Kershaw Chang [:kershaw]

Comment 5

•

4 years ago

Change the component to DOM:Security, since this only happens with HTTPS-only Mode enabled.

Component: Networking → DOM: Security

Daniel Veditz [:dveditz]

Comment 6

•

4 years ago

There's a balance between wanting to timeout quickly so we can "fix" users quickly if HTTPS-only is going to fail for that site, and waiting long enough for slow sites. Waiting for the full http timeout value was a bad experience a lot of the time, Maybe the current value is too short enough of the time to consider lengthening it?

Do we have telemetry for how often people hit the timeout? Could we do an experiment choosing different values and see what happens?

Flags: needinfo?(ckerschb)

CEP

Reporter

Comment 7

•

4 years ago

Additional side note that may bear consideration (or quick dismissal, I don't know the code):

Is there any potential relationship to EnablePerformanceEventTiming (https://bugzilla.mozilla.org/show_bug.cgi?id=1701029)? What I've been able to read of this indicates that there MIGHT be a waterfall into other timing/time-out issues.

Christoph Kerschbaumer [:ckerschb, back Sept 8th]

Comment 8

•

4 years ago

(In reply to Daniel Veditz [:dveditz] from comment #6)

There's a balance between wanting to timeout quickly so we can "fix" users quickly if HTTPS-only is going to fail for that site, and waiting long enough for slow sites. Waiting for the full http timeout value was a bad experience a lot of the time, Maybe the current value is too short enough of the time to consider lengthening it?

Here is how it works under the hood:

HTTPS-Only Mode tries to upgrade the top-level connection from http to https
If the browser has not received any kind of signal after 3000ms, then we send an http background request
If the background request receives a signal faster then the upgraded https request, then it's a strong indicator that the upgraded top-level https request will result in a timeout.

We empirically found that using 3000ms provides the best tradeoff for the end user experience. If interested, there are more details in our Research Paper.

The pref dom.security.https_only_mode_send_http_background_request allows one to completely disable the sending of the background request. On the one hand disabling the background request has the advantage that slow responding servers do not cause the connection timeout, but the disadvatage that a not responding server might result in a very long timeout.

Within Bug 1717797, which just landed a few days ago, we allow to adjust the waiting period. The default is 3000ms, but can be modified by updating the pref dom.security.https_only_fire_http_request_background_timer_ms.

Flags: needinfo?(ckerschb)

Updated

•

4 years ago

Blocks: https-only-mode

Severity: -- → S3

Priority: -- → P3

Summary: https timeout with multiple simultaneous connections → HTTPS-Only Mode: https timeout with multiple simultaneous connections

Whiteboard: [domsecurity-backlog1]

CEP

Reporter

Comment 9

•

4 years ago

(In reply to Christoph Kerschbaumer [:ckerschb] from comment #8)

(In reply to Daniel Veditz [:dveditz] from comment #6)

There's a balance between wanting to timeout quickly so we can "fix" users quickly if HTTPS-only is going to fail for that site, and waiting long enough for slow sites. Waiting for the full http timeout value was a bad experience a lot of the time, Maybe the current value is too short enough of the time to consider lengthening it?

Here is how it works under the hood:

HTTPS-Only Mode tries to upgrade the top-level connection from http to https

If the browser has not received any kind of signal after 3000ms, then we send an http background request

If the background request receives a signal faster then the upgraded https request, then it's a strong indicator that the upgraded top-level https request will result in a timeout.

We empirically found that using 3000ms provides the best tradeoff for the end user experience. If interested, there are more details in our Research Paper.

The pref dom.security.https_only_mode_send_http_background_request allows one to completely disable the sending of the background request. On the one hand disabling the background request has the advantage that slow responding servers do not cause the connection timeout, but the disadvatage that a not responding server might result in a very long timeout.

Within Bug 1717797, which just landed a few days ago, we allow to adjust the waiting period. The default is 3000ms, but can be modified by updating the pref dom.security.https_only_fire_http_request_background_timer_ms.

Please explain what you mean by "updating the pref" — that is a Boolean preference. Does it need to be changed to numeric, and is that in milliseconds?

Christoph Kerschbaumer [:ckerschb, back Sept 8th]

Comment 10

•

4 years ago

(In reply to CEP from comment #9)

Please explain what you mean by "updating the pref" — that is a Boolean preference. Does it need to be changed to numeric, and is that in milliseconds?

There are two prefs of importance here:

dom.security.https_only_mode_send_http_background_request which is a bool and you can set it to false so no background request happens at all.
dom.security.https_only_fire_http_request_background_timer_ms which is a uint and you can set it to e.g. 5000 instead of the default 3000 which would give the web server two additional seconds to establish an HTTPS connection.

CEP

Reporter

Comment 11

•

4 years ago

Some updates:

First, the good. Changing the dom.security.https_only_fire_http_request_background_timer_ms parameter to 5000 (from 3000) reduced incidence of this problem by around 75%. This seems to undermine the "empirical data," indicating that perhaps testing was done on connections that are too good and/or too close to major nodes and/or without other system loads (e.g., multiple devices on the same router).

Next, the bad. Changing dom.security.https_only_mode_send_http_background_request to false had no effect, whether the time was set to 3000 or 5000. I therefore question Christoph's note in comment 10...

... and after updating to Firefox 91, with no other system changes, behavior has substantially regressed, to between (timer=3000 under 90.x) and (timer=5000 under 90.x). This is most noticeable when opening more than half a dozen tabs at once.

Two questions for Christoph:

(1) Can you explain why changing that boolean to false had no effect?

(2) Is there any other timing mechanism (or process) that would make changing the timer all the way up to 7500 milliseconds inadvisable? Remember, I'm seeing this behavior ordinarily on below-the-top-tab opening of multiple tabs (that is, it seems less likely to occur in the active window).

CEP

Reporter

Comment 12

•

4 years ago

Further update (version 91.0.1):

A trip to the coffee shop this morning may have exposed a third-party-interface issue. Virtually all of my problems have arisen using Comcast/Xfinity connections in one particular region. I went to a coffee shop for a meeting this morning, and while waiting for someone who had been delayed in traffic I did a little surfing... and there were no problems at all. I got back here, and the problems recurred (including warning screen --> click "try http://" --> connection made to https:// twice). I called the coffee shop, and they're using an entirely different provider -- that has welcomed, not discouraged, use of alternate DNS servers. (N.B. This is occurring on multiple, relatively high-speed accounts, both wireless and Cat6 connections, in different cities in the same region -- so I'm pretty sure it isn't local hardware. The commonality is Xfinity/Comcast.)

So, that exposes two other potential interaction areas:

(1) Any filtering (or "load-balancing" or any other excuses offered) being done by Xfinity that is causing timeouts

(2) Any delays being caused by DNS requests routed to Cloudflare because Xfinity is specifically trying to discourage customers from using more-privacy-protective third-party DNSs. I will test THIS one over the next week by changing DNS providers a couple of times and NOT clearing the DNS cache as often.

From the coding perspective, this loops back to my item 2 in comment 11.

CEP

Reporter

Comment 13

•

3 years ago

[tap tap tap] Is this thing on?

One further request/suggestion: The timeout counter should not <b>start</b> until the site is reached; at minimum, this means that the 3000 milliseconds (or whatever value the user has set per Christoph's note) should exclude time waiting for a DNS response. This may not be directly possible, so perhaps building two stages into the timer's default value would make sense. (I'm approximately 5km from a Cloudflare node, so it's Cloudflare's response time and not actual communication delay at issue here; but that won't be true for all users.)

My understanding of the current flow is:
Request sent | Timer starts | DNS responds | Site responds | https compliance determined

And I'm suggesting either
Request sent | DNS responds | Timer starts | Site responds | https compliance determined
or
Request sent | If {localDNSentrynull} then {addXtotimer} | Timer starts | Site responds | https compliance determined

CEP

Reporter

Comment 14

•

3 years ago

(minor refresh and note concerning "unconfirmed" status)

This problem has actually gotten worse in version 105. It appears to (potentially) be related to delays imposed by choosing a DNS provider different from the service carrier (e.g., Cloudflare DNS in Settings | General | Network Settings | Enable DNS Over HTTPS), but I do not have a large-enough data set to draw any conclusions. It remains unpredictable in either proportion or extent, but DOES appear related to network-level caching (e.g., a network-level cache of the front page of newyorktimes.com).

So I'm suggesting, perhaps a bit more focused, that the timer should not start until there's a valid DNS for the target page.

Bugzilla

HTTPS-Only Mode: https timeout with multiple simultaneous connections

Categories

(Core :: DOM: Security, defect, P3)

Tracking

()

People

(Reporter: ceplaw, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [domsecurity-backlog1])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14