Closed Bug 1729466 Opened 3 years ago Closed 3 years ago

Async logger messages are enqueued even when the async logger isn't running, resulting in a massive memory leak

Tracking

()

Status:

RESOLVED FIXED

Milestone:

94 Branch

Tracking Flags:

Tracking

Status

firefox94

---

fixed

People

(Reporter: jib, Assigned: padenot)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

ACrawl.png 3 years ago Jan-Ivar Bruaroey [:jib] (needinfo? me) 379.99 KB, image/png		Details
Bug 1729466 - Don't enqueue async logging messages when the async logger is not running. r?pehrsons 3 years ago Paul Adenot (:padenot) 48 bytes, text/x-phabricator-request		Details \| Review

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Description

•

3 years ago

Attached image ACrawl.png — Details

STR:

Open https://jsfiddle.net/jib1/6tbvwgu3/20/ (creates 2000 RTCPeerConnection objects + SRD)

Expected results:

Log spew

Actual results:

Spew plus Firefox slowly grinds to a halt.
Doesn't recover even after closing the tab, likely due to about:webrtc
Opening about:webrtc after this seems to stall forever.

Profile doesn't reveal much, but Firefox is near beach ball: https://share.firefox.dev/3zTr6yr

I think we know what's going on here.

Allowing 2000+ peer connection objects to be created is costly
about:webrtc collects snapshots from all open AND CLOSED peer connections in Firefox, until the user opens about:webrtc waits for it to load and clicks the "Clear History" button.

Florian Quèze [:florian]

Comment 1

•

3 years ago

(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #0)

Profile doesn't reveal much, but Firefox is near beach ball: https://share.firefox.dev/3zTr6yr

The relevant content process is not included in the profile (it likely was too busy to respond to the profiler's IPC), so indeed the profile isn't useful.

about:processes can show a bit of what's going on. We create a large number of threads (not sure if it's infinite or if my content process became unresponsive before it was done creating threads), more than 2000. In about:processes, I saw that the threads we repeatedly create are: PacerThread, ModuleProcessThread, VoiceProcessThread. All of them are active, using 0.3% to 0.5% of a CPU core each.

Michael Froman [:mjf]

Comment 2

•

3 years ago

Changing severity to S3 until we have info that this is super common. It is something we definitely need to look at, but probably after the libwebrtc uplift is complete.

Severity: -- → S3

Priority: -- → P2

Byron Campen [:bwc]

Comment 3

•

3 years ago

(In reply to Michael Froman [:mjf] from comment #2)

Changing severity to S3 until we have info that this is super common. It is something we definitely need to look at, but probably after the libwebrtc uplift is complete.

(In reply to Florian Quèze [:florian] from comment #1)

(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #0)

Profile doesn't reveal much, but Firefox is near beach ball: https://share.firefox.dev/3zTr6yr

The relevant content process is not included in the profile (it likely was too busy to respond to the profiler's IPC), so indeed the profile isn't useful.

about:processes can show a bit of what's going on. We create a large number of threads (not sure if it's infinite or if my content process became unresponsive before it was done creating threads), more than 2000. In about:processes, I saw that the threads we repeatedly create are: PacerThread, ModuleProcessThread, VoiceProcessThread. All of them are active, using 0.3% to 0.5% of a CPU core each.

Given that part of the libwebrtc update (bug 1654112) includes using a shared thread for a lot of this libwebrtc stuff, it may improve things considerably.

Depends on: 1654112

Andreas Pehrson [:pehrsons]

Comment 4

•

3 years ago

If you want to test on a WIP build with the new threading model you can use this.

Florian Quèze [:florian]

Comment 5

•

3 years ago

(In reply to Andreas Pehrson [:pehrsons] from comment #4)

If you want to test on a WIP build with the new threading model you can use this.

I tested and it's a dramatic improvement:

no more thread explosion (the relevant content process had a maximum of 45 threads, only 10 of them active)
total CPU use was between 1 and 2 cores.
I managed to capture a profile: https://share.firefox.dev/2VwvBQO
GraphRunner is using about 50% of a core (about 2/3 of it being in mozilla::AsyncLogger::Log, did you enable some debug logging in your try run?)
the MainThread, Socket Thread and IPC I/O Child thread together use about 50% of a core, doing activity triggered by the EverySecondTelemetryCallback_m runnable (ie. bug 1724417).
3 WebRTCPD threads use about 2% of a core each. Most of this time seems to be memory allocation, firing, and creating timers (NS_NewTimerWithCallback).

Florian Quèze [:florian]

Comment 6

•

3 years ago

Also, there is a massive memory leak when using this build. Just during the time of my profile from comment 5, the memory track shows the memory use increased by 1.5GB. The system monitor shows the process as using 29.6GB of memory. About:memory shows:
29,787.47 MB (100.0%) -- explicit
├──28,952.53 MB (97.20%) ── heap-unclassified

After several attempts, I managed to capture a profile with the nativeallocations feature enabled: https://share.firefox.dev/2X1UXpZ
It shows that 99% of the retained memory is from the mozilla::AsyncLogger::Log calls (based on the stacks, my guess is they are from the GraphRunner thread).

Andreas Pehrson [:pehrsons]

Comment 7

•

3 years ago

Just looking at code I guess this happens when you're profiling and have audio capturing/playing (audio callbacks) -- but you're not profiling audio callbacks. So we're generating log entries but not consuming them (we didn't start the audio callback logging).

This should happen on nightly too.

Paul, could you take a look?

Flags: needinfo?(padenot)

Paul Adenot (:padenot)

Assignee

Updated

•

3 years ago

Assignee: nobody → padenot

Flags: needinfo?(padenot)

Paul Adenot (:padenot)

Assignee

Comment 8

•

3 years ago

Reading comment 5, I'm turning this bug into the issue described in comment 7, because the uplift fixes the original issue.

Paul Adenot (:padenot)

Assignee

Updated

•

3 years ago

Component: WebRTC: Signaling → Audio/Video: cubeb

Summary: Opening 1000s of RTCPeerConnections brings Firefox to a crawl → Async logger messages are enqueued even when the async logger isn't running, resulting in a massive memory leak

Paul Adenot (:padenot)

Assignee

Comment 9

•

3 years ago

Attached file Bug 1729466 - Don't enqueue async logging messages when the async logger is not running. r?pehrsons — Details

Pulsebot

Comment 10

•

3 years ago

Pushed by padenot@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/58b23100d739
Don't enqueue async logging messages when the async logger is not running. r=pehrsons

Andreea Pavel [:apavel]

Comment 11

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/58b23100d739

Status: NEW → RESOLVED

Closed: 3 years ago

status-firefox94: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 94 Branch

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Async logger messages are enqueued even when the async logger isn't running, resulting in a massive memory leak

Categories

(Core :: Audio/Video: cubeb, defect, P2)

Tracking

()

People

(Reporter: jib, Assigned: padenot)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Updated

Comment 9

Comment 10

Comment 11

Attachment

General

Description

File Name

Content Type