Closed Bug 2014416 Opened 3 months ago Closed 2 months ago

H3 connection to google-analytics.com is hanging (h3 early data?)

Categories

(Core :: Networking: HTTP, defect, P1)

defect

Tracking

()

RESOLVED FIXED
149 Branch
Tracking Status
firefox149 --- fixed

People

(Reporter: valentin, Assigned: kershaw)

References

Details

(Whiteboard: [necko-triaged][necko-priority-queue])

Attachments

(1 file)

2026-02-04 00:49:00.051 ⁃ nsHttpTransaction ⁃ 37ab67500 ⁃ created ⁃ url=https://ssl.google-analytics.com/__utm.gif?utmwv=5.7.2&utms=20&utmn=1446592607&utmhn=mozilla.greenhouse.io&utmcs=UTF-8&utmsr=3440x1440&utmvp=1805x1250&utmsc=30-bit&utmul=en-us&utmje=0&utmfl=-&utmdt=Interview%20Kit%20%7C%20Greenhouse%20Recruiting&utmhid=1173164778&utmr=0&utmp=%2Fscorecards%2F95587105&utmht=1770166139838&utmac=UA-31511427-2&utmcc=__utma%3D44269810.2116914909.1769018471.1769745579.1770163876.9%3B%2B__utmz%3D44269810.1770163876.9.4.utmcsr%3Dauth.mozilla.auth0.com%7Cutmccn%3D(referral)%7Cutmcmd%3Dreferral%7Cutmcct%3D%2F%3B&utmjid=&utmu=qBAAAAAAAAAAAAAAAAAAAAAE~
-82026-02-04 00:49:00.051986328 UTC - [Parent Process 25235: GeckoMain]: V/nsHttp Creating nsHttpTransaction @37ab67500
-82026-02-04 00:49:00.051989257 UTC - [Parent Process 25235: GeckoMain]: E/nsHttp nsHttpChannel 37e939000 created nsHttpTransaction 37ab67510
nsHttpChannel @37e939000 --> nsHttpTransaction @37ab67500
-82026-02-04 00:49:00.051991455 UTC - [Parent Process 25235: GeckoMain]: E/nsHttp nsHttpTransaction::Init [this=37ab67500 caps=400001]
-82026-02-04 00:49:00.051992919 UTC - [Parent Process 25235: GeckoMain]: E/nsHttp nsHttpTransaction 37ab67500 SetRequestContext 174dcea40
?:754 @174dcea40 --> nsHttpTransaction @37ab67500
-82026-02-04 00:49:00.051997802 UTC - [Parent Process 25235: GeckoMain]: E/nsHttp http request [
-7GET /__utm.gif?utmwv=5.7.2&utms=20&utmn=1446592607&utmhn=mozilla.greenhouse.io&utmcs=UTF-8&utmsr=3440x1440&utmvp=1805x1250&utmsc=30-bit&utmul=en-us&utmje=0&utmfl=-&utmdt=Interview%20Kit%20%7C%20Greenhouse%20Recruiting&utmhid=1173164778&utmr=0&utmp=%2Fscorecards%2F95587105&utmht=1770166139838&utmac=UA-31511427-2&utmcc=__utma%3D44269810.2116914909.1769018471.1769745579.1770163876.9%3B%2B__utmz%3D44269810.1770163876.9.4.utmcsr%3Dauth.mozilla.auth0.com%7Cutmccn%3D(referral)%7Cutmcmd%3Dreferral%7Cutmcct%3D%2F%3B&utmjid=&utmu=qBAAAAAAAAAAAAAAAAAAAAAE~ HTTP/1.1
-7Host: ssl.google-analytics.com
-7User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:149.0) Gecko/20100101 Firefox/149.0
-7Accept: image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5
-7Accept-Language: en-US,en;q=0.9
-7Accept-Encoding: gzip, deflate, br, zstd
-7Referer: https://mozilla.greenhouse.io/
-7Sec-Fetch-Storage-Access: none
-7Alt-Used: ssl.google-analytics.com
-7Connection: keep-alive
-7Sec-Fetch-Dest: image
-7Sec-Fetch-Mode: no-cors
-7Sec-Fetch-Site: cross-site
-7Priority: u=6, i
-72026-02-04 00:49:00.052048095 UTC - [Parent Process 25235: GeckoMain]: E/nsHttp 
-7]
-72026-02-04 00:49:00.052196533 UTC - [Parent Process 25235: GeckoMain]: D/nsHttp nsHttpChannel::TriggerNetwork [this=37e939000]
-72026-02-04 00:49:00.052199462 UTC - [Parent Process 25235: GeckoMain]: D/nsHttp network already triggered. Returning.
-62026-02-04 00:49:00.053091064 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp nsHttpTransaction::OnHTTPSRRAvailable [this=37ab67500] mActivated=0
-62026-02-04 00:49:00.053099853 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp nsHttpConnectionMgr::OnMsgNewTransaction [trans=37ab67500]
-62026-02-04 00:49:00.053104492 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp nsHttpConnectionMgr::TryDispatchTransaction without conn [trans=37ab67500 ci=37e255400 ci=.S........[tlsflags0x00000000]ssl.google-analytics.com:443 <ROUTE-via ssl.google-analytics.com:443> {NPN-TOKEN h3}^partitionKey=%28https%2Cgreenhouse.io%29 caps=1 onlyreused=0 active=1 idle=0]
-62026-02-04 00:49:00.053109375 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp nsHttpTransaction::RemoveDispatchedAsBlocking this=37ab67500 not blocking
-62026-02-04 00:49:00.053110839 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp nsHttpConnectionMgr::DispatchTransaction [ent-ci=.S........[tlsflags0x00000000]ssl.google-analytics.com:443 <ROUTE-via ssl.google-analytics.com:443> {NPN-TOKEN h3}^partitionKey=%28https%2Cgreenhouse.io%29 17261a200 trans=37ab67500 caps=1 conn=362546ed0 priority=10 isHttp2=0 isHttp3=1]
-62026-02-04 00:49:00.053113769 UTC - [Parent Process 25235: Socket Thread]: E/nsHttp HttpConnectionUDP::Activate [this=362546ed0 trans=37ab67500 caps=1]
-62026-02-04 00:49:00.053119873 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp nsHttpConnectionMgr::AddActiveTransaction    t=37ab67500 tabid=137(1) thr=0
-62026-02-04 00:49:00.053127197 UTC - [Parent Process 25235: Socket Thread]: I/nsHttp Http3Session::AddStream 35ad9c400 atrans=37ab67500.
-62026-02-04 00:49:00.053141845 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp ProcessNewTransaction Dispatch Immediately trans=37ab67500
02026-02-04 00:49:00.059660156 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp nsHttpConnectionMgr::ProcessPendingQForEntry [ci=.S........[tlsflags0x00000000]bugzilla.mozilla.org:443^partitionKey=%28https%2Catlassian.net%29 ent=34a2f58a0 active=0 idle=0 urgent-start-queue=0 queued=0]
02026-02-04 00:49:00.059661132 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp urgent queue [
02026-02-04 00:49:00.059662109 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp ]
02026-02-04 00:49:00.059662841 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp active conns [
02026-02-04 00:49:00.059663818 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp ] idle conns [
02026-02-04 00:49:00.059664550 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp ]
02026-02-04 00:49:00.059666015 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp nsHttpConnectionMgr::ProcessPendingQForEntry [ci=.S........[tlsflags0x00000000]profiler.firefox.com:443^partitionKey=%28https%2Cfirefox.com%29 ent=351a3ac00 active=0 idle=0 urgent-start-queue=0 queued=0]
02026-02-04 00:49:00.059666748 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp urgent queue [
02026-02-04 00:49:00.059667724 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp ]
02026-02-04 00:49:00.059668457 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp active conns [
02026-02-04 00:49:00.059669433 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp ] idle conns [
02026-02-04 00:49:00.059670166 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp ]
02026-02-04 00:49:00.059671386 UTC - [Parent Process 25235: Socket Thread]: D/nsHttp Destroying nsHttpConnectionInfo @36235b400
02026-02-04 00:49:00.059673583 UTC - [Parent Process 25235: Socket Thread]: D/nsHttp Http3Session::SendData [this=35ad9c400]
02026-02-04 00:49:00.059675048 UTC - [Parent Process 25235: Socket Thread]: D/nsHttp Http3Session::SendData call ReadSegments from stream=13d0a5fd0 [this=35ad9c400]
02026-02-04 00:49:00.059676513 UTC - [Parent Process 25235: Socket Thread]: D/nsHttp Http3Stream::ReadSegments state=0 [this=13d0a5fc0]
02026-02-04 00:49:00.059677490 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp nsHttpTransaction::ReadSegments 37ab67500
02026-02-04 00:49:00.059679199 UTC - [Parent Process 25235: Socket Thread]: D/nsHttp Http3Stream::OnReadSegment count=1075 state=0 [this=13d0a5fc0]
02026-02-04 00:49:00.059680419 UTC - [Parent Process 25235: Socket Thread]: I/nsHttp Http3Stream::GetHeadersString 13d0a5fc0 avail=1075.
02026-02-04 00:49:00.059682128 UTC - [Parent Process 25235: Socket Thread]: D/nsHttp Http3Stream::TryActivating [this=13d0a5fc0]
02026-02-04 00:49:00.059684326 UTC - [Parent Process 25235: Socket Thread]: D/nsHttp Http3Session::TryActivating [stream=13d0a5fd0, this=35ad9c400 state=1]
02026-02-04 00:49:00.059685791 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp nsHttpTransaction::Do0RTT
02026-02-04 00:49:00.059712646 UTC - [Parent Process 25235: Socket Thread]: D/nsHttp Http3Session::TryActivating streamId=0x44 for stream=13d0a5fd0 [this=35ad9c400].
02026-02-04 00:49:00.059714355 UTC - [Parent Process 25235: Socket Thread]: E/nsHttp nsHttpTransaction::OnTransportStatus [this=37ab67500 status=4b0005 progress=1075]
02026-02-04 00:49:00.059716064 UTC - [Parent Process 25235: Socket Thread]: E/nsHttp nsHttpTransaction::OnTransportStatus 37ab67500 SENDING_TO without request body
02026-02-04 00:49:00.059717041 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp nsHttpTransaction::ReadRequestSegment 37ab67500 read=1075
02026-02-04 00:49:00.059718261 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp mEarlyDataDisposition = EARLY_SENT
02026-02-04 00:49:00.059719482 UTC - [Parent Process 25235: Socket Thread]: D/nsHttp Http3Stream::ReadSegments rv=0x0 read=1075 sock-cond=0 again=1 [this=13d0a5fc0]
02026-02-04 00:49:00.059720458 UTC - [Parent Process 25235: Socket Thread]: D/nsHttp Http3Stream::ReadSegments state=2 [this=13d0a5fc0]
02026-02-04 00:49:00.059721679 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp nsHttpTransaction::ReadSegments 37ab67500
02026-02-04 00:49:00.059722656 UTC - [Parent Process 25235: Socket Thread]: D/nsHttp Http3Stream::ReadSegments rv=0x0 read=0 sock-cond=0 again=1 [this=13d0a5fc0]
02026-02-04 00:49:00.059723876 UTC - [Parent Process 25235: Socket Thread]: E/nsHttp nsHttpTransaction::OnTransportStatus [this=37ab67500 status=4b000a progress=0]
02026-02-04 00:49:00.059731689 UTC - [Parent Process 25235: Socket Thread]: D/nsHttp Http3Session::ProcessOutput reader=362546ed0, [this=35ad9c400]
02026-02-04 00:49:00.059734375 UTC - [Parent Process 25235: GeckoMain]: D/nsHttp sending progress and status notification [this=37e939000 status=4b000a progress=0/0]

My reading of the logs is that we dispatch the connection over H3, send the early data, then the connection just hangs.
I suspect this might be related to/fixed by bug 2012485

Hello,

I was wondering if there are any workarounds or resolutions for this or if it's something that is going to prioritized soon. I like using Nightly but at this point it is getting so bad it's making it impossible to use the browser. After 4 years of only using Nightly, I might have to switch to Beta to Release :(

Flags: needinfo?(valentin.gosu)

Try setting network.http.http3.enable to false in about:config.

Flags: needinfo?(valentin.gosu)

Unfortunately, the log file is incomplete. It does not include the beginning of the stalled h3 connection, so it’s difficult to determine why it became stuck.
Looking at the stream ID in the log below, many transactions were already dispatched to the problematic h3 connection:

02026-02-04 00:49:00.059685791 UTC - [Parent Process 25235: Socket Thread]: V/nsHttp nsHttpTransaction::Do0RTT
02026-02-04 00:49:00.059712646 UTC - [Parent Process 25235: Socket Thread]: D/nsHttp Http3Session::TryActivating streamId=0x44 for stream=13d0a5fd0 [this=35ad9c400].

Our current fallback behavior does not work in this case because we assume that an h3 connection in the 0-RTT state is usable. As a result, we continue dispatching transactions to it.
I believe this will be addressed by the Happy Eyeballs v3 project, since we will only consider a connection is established after the TLS handshake completes.
However, before Happy Eyeballs v3 is implemented, it may be reasonable to add a mitigation. For example, if an h3 connection remains in the 0-RTT state for too long, we could trigger a fallback to HTTP/2.

Assignee: nobody → kershaw
Pushed by kjang@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/db9b775c203a https://hg.mozilla.org/integration/autoland/rev/f3aefa8acfaa Add timeout mitigation for stuck HTTP/3 0-RTT sessions, r=necko-reviewers,valentin

While the above patch should resolve the issue, I believe it is still worth getting to the root of why the 0RTT connection could not be established in the first place.

@Gela do I understand correctly that you can consistently reproduce the issue? If so, would you mind providing another Firefox Profile and a Wireshark pcap using a Firefox version without the patch above.

For the Firefox profile:

  1. Go to about:logging.
  2. Select the HTTP/3 logging preset.
  3. Use Logging to the Firefox Profiler
  4. Start logging
  5. Reproduce the issue at hand.
  6. Stop the profile.

For the Wireshark pcap:

  1. Install Wireshark.
  2. Follow these instructions to expose the TLS keys.
  3. Start the Wireshark capture.
  4. Reproduce the issue at hand.
  5. Stop the capture.

If either is too much work, one alone is already helpful. Happy to help you help us here or on Slack. Thank you.

Flags: needinfo?(gmalekpour)

Hi Max,

The issue comes in waves, for example on Friday I was experiencing the problem consistently, and today I haven't seen it at all. Should I wait until the issue happens again to collect the logs you ask for? I'm happy to share them now but not sure how helpful it would be if I'm not seeing the issue.

Flags: needinfo?(gmalekpour) → needinfo?(mail)

Thanks for the update. We only need the logs when the issue is actually happening. If you’re not seeing the problem right now, there’s no need to capture logs yet.
Also note that we've just landed the mitigation in nightly, so you might want to use Firefox release to reproduce this. Thanks.

Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 149 Branch

(In reply to :Gela from comment #11)

Which version was the fix landed in? I am still seeing this in Nightly 149.0a1

Firefox profile: https://share.firefox.dev/46NEO8D

This log shows a different symptom from before. I don’t see any 0-RTT failures in this profile.

It looks like the problematic request is this one:

2026-02-18 00:23:43.547648681 UTC - [Parent Process 29637: GeckoMain]: D/nsHttp nsHttpChannel::PrepareToConnect [this=35c036f00]
2026-02-18 00:23:43.547649658 UTC - [Parent Process 29637: GeckoMain]: D/nsHttp Adding Dictionary headers
2026-02-18 00:23:43.547860595 UTC - [Parent Process 29637: GeckoMain]: D/nsHttp nsHttpChannel::Connect 35c036f00 AwaitingCacheCallbacks forces async

This request appears to be waiting for a cache callback indefinitely.

Gela, could you check whether network.http.dictionaries.enable is true? If so, could you try disabling it and see whether the issue can still be reproduced? Thanks.

Flags: needinfo?(gmalekpour)

Looking at about:config, it looks like network.http.dictionaries.enable is already set to false.

Flags: needinfo?(gmalekpour)
QA Whiteboard: [qa-triage-done-c150/b149]
Flags: needinfo?(mail)

FYI I still experience this intermittently on Nightly 151.0a1 (2026-04-14) (aarch64)

Profile Summary

Firefox 149 on macOS 15.7.2, 50s capture. The user had Google Drive open along with many other tabs (ChatGPT, Atlassian/Jira,
Bugzilla, etc.). The profile captures a navigation to drive.google.com/drive/u/0/my-drive.

Key Findings

  1. Massive logging overhead — 2.2 million log markers

The profile contains 2,217,766 log entries from verbose nsHttp/nsSocketTransport logging (likely MOZ_LOG was enabled at a high level).
This alone generates enormous profiler marker data and may introduce overhead during capture. If this logging was active during the
user's normal browsing, it would add main-thread overhead.

  1. Connection table proliferation for Google services

The connection manager has 170+ unique connection entries. Google's Dynamite messaging infrastructure creates numbered shard
subdomains (0-prod-dynamite-prod-02-us-signaler-pa.clients6.google.com through
9-prod-dynamite-prod-02-us-signaler-pa.clients6.google.com), each generating multiple connection entries (H2/H3, with and without
Allow0Rtt). Combined with the many *.clients6.google.com API endpoints Drive uses (drivefrontend-pa, appsgenaiserver-pa,
calendarsuggest, peoplestack-pa, ogads-pa, waa-pa, espresso-pa, etc.), this creates significant connection management overhead.

  1. Connection coalescing failures

FindCoalescableConnectionByHashKey ... join failed appears 12,000+ times. Firefox repeatedly tries to coalesce connections for Google
subdomains that share IPs but fails — likely due to H3/QUIC connections where coalescing has stricter requirements, or cert SAN
mismatches. Signaler-pa connections to H2 sessions DO coalesce successfully (join ok), but many H3 paths fail. This forces more
separate connections than necessary.

  1. Signaler channel churn

Three simultaneous signaler-pa.clients6.google.com long-polling sessions are active (each with different gsessionid), being cancelled
and re-established during navigations. The cancellations are all NS_BINDING_ABORTED with reason nsDocLoader::Stop, meaning page
navigations cancel in-flight signaler requests which are then immediately re-created.

  1. Post-load jank from Drive JS + IndexedDB

After the page load completes, Drive continues to cause 21 jank periods (up to 263ms) and 4 BHR-detected hangs (up to 246ms). The
worst post-load jank at 22.9s (188ms) is caused by Drive's JavaScript processing an IndexedDB response — specifically JSON.stringify
on large objects in functions like Hda, B_g, _.D.prototype.Dc. This is synchronous main-thread work from Drive's own code.

  1. Heavy GC pressure

7 GCMajor events in the content process, the worst being 530ms. With 22+ DOM Workers active and Drive's heavy JS, GC pressure is
significant.

  1. Page load itself is reasonable

The Drive navigation timings are actually within normal range: TTFB=771ms, FCP=1.2s, LCP=2.8s, TTFI=7.9s. The 8 jank periods during
load are from JS parsing (Drive loads ~2MB+ of JS). One anomaly: a 1.6KB cached PNG (drive_2022q3_32dp.png) took 3 seconds — it was
likely deprioritized behind other network activity.

Assessment

The "problems loading multiple google websites" are most likely caused by:

  1. The verbose nsHttp logging — if MOZ_LOG was active during normal use, this is a major self-imposed overhead. The user should
    disable it when not actively debugging.
  2. Google's own app complexity — Drive spawns 22+ workers, maintains multiple signaler channels, uses dozens of API endpoints, and
    does heavy synchronous JS work (JSON serialization of IndexedDB data). This is inherent to Google's web apps.
  3. Connection proliferation — the sharded signaler subdomains and many Google API endpoints create a very large connection table, with
    failed coalescing attempts adding processing overhead on the Socket Thread during connection lookups.

The Firefox networking stack itself appears healthy — DNS resolves fine (TRR via Cloudflare works), H2/H3 connections establish
properly, and the coalescing that CAN succeed does. The main actionable issue from a Firefox perspective would be investigating why H3
connection coalescing fails for Google domains that share the same IP and whether the cost of repeatedly attempting and failing
coalescing is significant.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: