Open Bug 2040268 Opened 25 days ago Updated 1 day ago

Tabs take a long time to load after restarting

Categories

(Core :: Networking, defect, P2)

Firefox 152
Desktop
Windows
defect

Tracking

()

Tracking Status
firefox-esr115 --- unaffected
firefox-esr140 --- unaffected
firefox151 --- unaffected
firefox152 --- disabled
firefox153 + affected

People

(Reporter: alice0775, Assigned: leggert, NeedInfo)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: nightly-community, perf, regression, Whiteboard: [necko-triaged])

Attachments

(8 files)

Steps to reproduce:

  1. Enable Open previous windows and tabs from about:preferences#general
  2. Load https://kakuyomu.jp/works/16817330661959774004/episodes/16817330664784275099 for example.
  3. Restart browser
  4. Repeat step 3 if needed

Actual results:
Tabs take a long time to load after restarting.
The status display indicates Performing a TLS handshake to Kakuyomu.jp....

Expected results:
It should take no more than a second or two.

Attached file about:support β€”

:leggert, since you are the author of the regressor, bug 2035453, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Flags: needinfo?(leggert)
Assignee: nobody → leggert
Flags: needinfo?(leggert)

network.ssl_tokens_cache_persistence=false does NOT help.

However, the following workaround helps 100%.
A truly IDIOTIC workaround (to prevent create any ssl_tokens_cache file at shutdown of browser):

  1. Delete ssl_tokens_cache.bin and ssl_tokens_cache.tmp (if any) from profile folder
  2. And create folder named ssl_tokens_cache.bin and ssl_tokens_cache.tmp in the profile folder
Blocks: 2024146
Component: Session Restore → Networking
Product: Firefox → Core

This seems to be a server-side issue with AWS Cloudfront, where a server at IP address A only rejects a PSK issued by a server at IP address B after 10-30s. (Reusing PSK token with server at same address works, but due to DNS round-robin is not deterministic.) Needless to say, the RFC says "rejection should be fast..."

The fix would probably be a fallback timer of sorts.

Attached file test_psk_stall.sh β€”

:alice, you can also disable network.ssl_tokens_cache_persistence, but then obviously no caching benefits. (After removing the token file).

(In reply to Lars Eggert [:lars] from comment #8)

:alice, you can also disable network.ssl_tokens_cache_persistence, but then obviously no caching benefits. (After removing the token file).

Even if setting network.ssl_tokens_cache_persistence to false and deleting the ssl_tokens_cache.bin file before launching the browser, the problem will occur again when the browser is restarted.

The ssl_tokens_cache.bin file is automatically created even if setting network.ssl_tokens_cache_persistence to false.

Some CDN edges take 10–30s to reject a TLS 1.3 session ticket they
cannot decrypt (observed on AWS CloudFront after a browser restart
connects to a different edge than the one that issued the ticket).

Add network.http.psk_fallback_timeout (default 3s): if a TLS handshake
using a PSK without early data does not complete in time, close with
NS_ERROR_NET_RESET so the existing resumption-error path evicts the
stale token and retries with a fresh handshake.

Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged]
Attached file test_psk_stall_fast.sh β€”

Best I can tell this is an issue with AWS Cloudfront, I am in contact with them. If I run test_psk_stall_fast.sh a bunch of times, I see timeouts pretty regularly.

# /tmp/test_psk_stall_fast.sh kakuyomu.jp
IPs: 13.249.8.95 13.249.8.36 13.249.8.112 13.249.8.111  sleep=2
Getting ticket from 13.249.8.95...
Connecting to 13.249.8.95
depth=2 C=US, O=Amazon, CN=Amazon Root CA 1
verify return:1
depth=1 C=US, O=Amazon, CN=Amazon RSA 2048 M04
verify return:1
depth=0 CN=kakuyomu.jp
verify return:1
DONE
ticket:     2461 bytes

 2  13.249.8.111    118ms  resumed
 3  13.249.8.111    119ms  resumed
 4  13.249.8.112    121ms  resumed
 1  13.249.8.36  10068ms  TIMEOUT

That script uses openssl, no Gecko involved.

Attached file kaku.pcap β€”
Attached file kaku.keys.log β€”

Here's a pcap and key log file. There are two repeated connections to kakuyomu.jp taken with a fresh browser profile. The first loads fine. The second presents the token obtained during the first handshake, stalls, and is then closed after some 20s after some TLS Alerts.

The bug is marked as tracked for firefox152 (beta) and tracked for firefox153 (nightly). However, the bug still has low severity.

:ghess, could you please increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(ghess)

New pcap and key log file. Packet 545 is our ClientHello at ~120sec, packet 550 is the AWS TLS Alert at ~136sec.

Attached file kaku.pcap β€”
Attached file kaku.keys.log β€”
Blocks: 2046812
Regressed by: 2046859
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: