Closed Bug 1868070 Opened 5 months ago Closed 5 months ago

Intermittently the socket thread uses 100% of a single logical core and the browser drops all connections

Tracking

()

Status:

VERIFIED FIXED

Milestone:

122 Branch

Tracking Flags:

Tracking

Status

firefox-esr115

---

unaffected

firefox120

---

unaffected

firefox121

---

unaffected

firefox122

fixed

People

(Reporter: tgnff242, Assigned: kershaw)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: nightly-community, regression, Whiteboard: [necko-triaged][necko-priority-queue])

Attachments

(1 file)

Bug 1868070 - Make sure we exit the loop when neqo returns a zero duration, r=#necko 5 months ago Kershaw Chang [:kershaw] 48 bytes, text/x-phabricator-request		Details \| Review

tgn-ff

Reporter

Description

•

5 months ago

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:122.0) Gecko/20100101 Firefox/122.0

Steps to reproduce:

This is an intermittent issue, I don't have STR. This might be triggered by youtube. (I'm never logged in and I use uBO).

Actual results:

Suddenly, the browser starts using a lot of CPU. about:processes show most of the CPU to be used by the socket thread. When that happens all connections are dropped. Exiting Firefox doesn't terminate its processes.

Expected results:

The first time I experienced this issue was in BuildID 20231201095335.

Attempting to capture an HTTP log results in an empty log.

Attempting to capture a performance profile with the Firefox profiler is impossible without an internet connection, since saving the profile locally requires to open it in the profile site first... Should I report this somewhere?

This is a profile I captured with perf: https://share.firefox.dev/46Gd42T

This is the resulting crash report when killing the main process which hangs on exit: https://crash-stats.mozilla.org/report/index/752f9a78-caed-46e0-b719-7cd890231202 https://crash-stats.mozilla.org/report/index/0525b25a-2682-41f5-a3e9-c4e460231204

tgn-ff

Reporter

Updated

•

5 months ago

Has STR: --- → no

Keywords: nightly-community

tgn-ff

Reporter

Updated

•

5 months ago

Component: Networking → Networking: HTTP

tgn-ff

Reporter

Updated

•

5 months ago

Updated

•

5 months ago

Blocks: necko-perf

Severity: -- → S3

Priority: -- → P2

Whiteboard: [necko-triaged]

tgn-ff

Reporter

Comment 1

•

5 months ago

It happened again. This time when doing a Google search through the urlbar (suggestions disabled).

Just to be clear, this is not a performance issue, Firefox is unusable whenever that happens and requires a restart, otherwise it cannot load any site.

https://crash-stats.mozilla.org/report/index/fc029fce-439f-41ad-ac4f-dfda00231206

tgn-ff

Reporter

Updated

•

5 months ago

Crash Signature: shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::net::nsHttpConnectionMgr::Shutdown

tgn-ff

Reporter

Updated

•

5 months ago

Crash Signature: shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::net::nsHttpConnectionMgr::Shutdown → [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::net::nsHttpConnectionMgr::Shutdown ]

Kershaw Chang [:kershaw]

Assignee

Comment 2

•

5 months ago

Hi Reporter,

Could you set this pref network.http.http3.max_accumlated_time_ms to 0 and see if this happens again?

Thanks.

Flags: needinfo?(tgnff242)

BugBot [:suhaib / :marco/ :calixte]

Comment 3

•

5 months ago

The bug has a crash signature, thus the bug will be considered confirmed.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Kershaw Chang [:kershaw]

Assignee

Comment 4

•

5 months ago

The crash signature is not related to this bug.

Assignee: nobody → kershaw

Severity: S3 → S2

Crash Signature: [@ shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::net::nsHttpConnectionMgr::Shutdown ]

Priority: P2 → P1

Whiteboard: [necko-triaged] → [necko-triaged][necko-priority-queue]

Kershaw Chang [:kershaw]

Assignee

Comment 5

•

5 months ago

Attached file Bug 1868070 - Make sure we exit the loop when neqo returns a zero duration, r=#necko — Details

Kershaw Chang [:kershaw]

Assignee

Comment 6

•

5 months ago

I think the problem here is that neqo returns a zero duration from this line. When this happens, the accumulated_time is never accumulated, so we stuck in this loop forever.

Donal Meehan [:dmeehan]

Comment 7

•

5 months ago

Tracking this for Fx122 as it was triaged as an S2.
:kershaw is this a regression introduced by Bug 1852924?

status-firefox122: --- → affected

tracking-firefox122: --- → +

Flags: needinfo?(kershaw)

Kershaw Chang [:kershaw]

Assignee

Comment 8

•

5 months ago

(In reply to Donal Meehan [:dmeehan] from comment #7)

Tracking this for Fx122 as it was triaged as an S2.
:kershaw is this a regression introduced by Bug 1852924?

Yes, this is introduced by Bug 1852924.

Flags: needinfo?(kershaw)

Keywords: regression

Regressed by: 1852924

Donal Meehan [:dmeehan]

Updated

•

5 months ago

status-firefox120: --- → unaffected

status-firefox121: --- → unaffected

status-firefox-esr115: --- → unaffected

tgn-ff

Reporter

Comment 9

•

5 months ago

(In reply to Kershaw Chang [:kershaw] from comment #2)

Could you set this pref network.http.http3.max_accumlated_time_ms to 0 and see if this happens again?

I've set the pref yesterday and so far I haven't encountered it yet. However, sometimes this occurred three or more times in a day and other days it didn't occur at all, so I'm not sure it did anything yet. I'll report in a week unless it happens again sooner than that.

Flags: needinfo?(tgnff242)

Pulsebot

Comment 10

•

5 months ago

Pushed by kjang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7fc200daf13a
Make sure we exit the loop when neqo returns a zero duration, r=necko-reviewers,valentin

Cosmin Sabou [:CosminS]

Comment 11

•

5 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/7fc200daf13a

Status: NEW → RESOLVED

Closed: 5 months ago

status-firefox122: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → 122 Branch

tgn-ff

Reporter

Comment 12

•

5 months ago

I haven't encountered the bug since. I reset the pref as soon as the patch landed. It seems it's fixed.

Status: RESOLVED → VERIFIED

Kershaw Chang [:kershaw]

Assignee

Comment 13

•

5 months ago

(In reply to tgn-ff from comment #12)

I haven't encountered the bug since. I reset the pref as soon as the patch landed. It seems it's fixed.

Thanks for verifying.

You need to log in before you can comment on or make changes to this bug.