Async logger messages are enqueued even when the async logger isn't running, resulting in a massive memory leak
Categories
(Core :: Audio/Video: cubeb, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox94 | --- | fixed |
People
(Reporter: jib, Assigned: padenot)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
STR:
- Open https://jsfiddle.net/jib1/6tbvwgu3/20/ (creates 2000 RTCPeerConnection objects + SRD)
Expected results:
- Log spew
Actual results:
- Spew plus Firefox slowly grinds to a halt.
- Doesn't recover even after closing the tab, likely due to about:webrtc
- Opening about:webrtc after this seems to stall forever.
Profile doesn't reveal much, but Firefox is near beach ball: https://share.firefox.dev/3zTr6yr
I think we know what's going on here.
- Allowing 2000+ peer connection objects to be created is costly
- about:webrtc collects snapshots from all open AND CLOSED peer connections in Firefox, until the user opens about:webrtc waits for it to load and clicks the "Clear History" button.
Comment 1•3 years ago
|
||
(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #0)
Profile doesn't reveal much, but Firefox is near beach ball: https://share.firefox.dev/3zTr6yr
The relevant content process is not included in the profile (it likely was too busy to respond to the profiler's IPC), so indeed the profile isn't useful.
about:processes can show a bit of what's going on. We create a large number of threads (not sure if it's infinite or if my content process became unresponsive before it was done creating threads), more than 2000. In about:processes, I saw that the threads we repeatedly create are: PacerThread, ModuleProcessThread, VoiceProcessThread. All of them are active, using 0.3% to 0.5% of a CPU core each.
Comment 2•3 years ago
|
||
Changing severity to S3 until we have info that this is super common. It is something we definitely need to look at, but probably after the libwebrtc uplift is complete.
Comment 3•3 years ago
|
||
(In reply to Michael Froman [:mjf] from comment #2)
Changing severity to S3 until we have info that this is super common. It is something we definitely need to look at, but probably after the libwebrtc uplift is complete.
(In reply to Florian Quèze [:florian] from comment #1)
(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #0)
Profile doesn't reveal much, but Firefox is near beach ball: https://share.firefox.dev/3zTr6yr
The relevant content process is not included in the profile (it likely was too busy to respond to the profiler's IPC), so indeed the profile isn't useful.
about:processes can show a bit of what's going on. We create a large number of threads (not sure if it's infinite or if my content process became unresponsive before it was done creating threads), more than 2000. In about:processes, I saw that the threads we repeatedly create are: PacerThread, ModuleProcessThread, VoiceProcessThread. All of them are active, using 0.3% to 0.5% of a CPU core each.
Given that part of the libwebrtc update (bug 1654112) includes using a shared thread for a lot of this libwebrtc stuff, it may improve things considerably.
Comment 4•3 years ago
|
||
If you want to test on a WIP build with the new threading model you can use this.
Comment 5•3 years ago
|
||
(In reply to Andreas Pehrson [:pehrsons] from comment #4)
If you want to test on a WIP build with the new threading model you can use this.
I tested and it's a dramatic improvement:
- no more thread explosion (the relevant content process had a maximum of 45 threads, only 10 of them active)
- total CPU use was between 1 and 2 cores.
- I managed to capture a profile: https://share.firefox.dev/2VwvBQO
- GraphRunner is using about 50% of a core (about 2/3 of it being in mozilla::AsyncLogger::Log, did you enable some debug logging in your try run?)
- the MainThread, Socket Thread and IPC I/O Child thread together use about 50% of a core, doing activity triggered by the
EverySecondTelemetryCallback_m
runnable (ie. bug 1724417). - 3 WebRTCPD threads use about 2% of a core each. Most of this time seems to be memory allocation, firing, and creating timers (
NS_NewTimerWithCallback
).
Comment 6•3 years ago
|
||
Also, there is a massive memory leak when using this build. Just during the time of my profile from comment 5, the memory track shows the memory use increased by 1.5GB. The system monitor shows the process as using 29.6GB of memory. About:memory shows:
29,787.47 MB (100.0%) -- explicit
├──28,952.53 MB (97.20%) ── heap-unclassified
After several attempts, I managed to capture a profile with the nativeallocations feature enabled: https://share.firefox.dev/2X1UXpZ
It shows that 99% of the retained memory is from the mozilla::AsyncLogger::Log
calls (based on the stacks, my guess is they are from the GraphRunner thread).
Comment 7•3 years ago
|
||
Just looking at code I guess this happens when you're profiling and have audio capturing/playing (audio callbacks) -- but you're not profiling audio callbacks. So we're generating log entries but not consuming them (we didn't start the audio callback logging).
This should happen on nightly too.
Paul, could you take a look?
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 8•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 9•3 years ago
|
||
Comment 10•3 years ago
|
||
Pushed by padenot@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/58b23100d739 Don't enqueue async logging messages when the async logger is not running. r=pehrsons
Comment 11•3 years ago
|
||
bugherder |
Description
•