Closed Bug 1329104 Opened 8 years ago Closed 8 years ago

Thread leak

Categories

(Core :: Audio/Video: Playback, defect)

x86_64
Windows 10
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox50 --- unaffected
firefox51 --- unaffected
firefox52 --- unaffected
firefox53 + fixed

People

(Reporter: bugzilla.mozilla.org, Assigned: kkoorts)

References

Details

(Keywords: regression, Whiteboard: [MemShrink])

Attachments

(2 files)

Attached image allocations.png β€”
Build ID 	20170105030229

Nightly seems to be leaking threads in content processes under some circumstances. After 3 hours of browsing with 3 content processes it has accumulated about 3000-4000 threads in each (growing over time).

This leads to large amounts of wasted memory in the form of thread stacks (see attached VMMap screenshot):

4,970.67 MB (100.0%) -- explicit
β”œβ”€β”€4,291.42 MB (86.33%) ── heap-unclassified
β”œβ”€β”€β”€β”€265.90 MB (05.35%) -- window-objects

The number of JS compartments is far lower, so those threads probably do not represent workers:

227 (100.0%) -- js-main-runtime-compartments
β”œβ”€β”€169 (74.45%) -- system
β”‚  β”œβ”€β”€156 (68.72%) ++ (156 tiny)
β”‚  β”œβ”€β”€β”€β”€9 (03.96%) ── [System Principal], outOfProcessTabChildGlobal [9]
β”‚  └────4 (01.76%) ── [System Principal], Addon-SDK (from: resource://gre/modules/commonjs/toolkit/loader.js:414) [4]
└───58 (25.55%) ++ user

Another curious number in the parent process is

β”œβ”€β”€β”€β”€218.93 MB (06.87%) -- dom
β”‚    β”œβ”€β”€218.11 MB (06.84%) -- memory-file-data
β”‚    β”‚  β”œβ”€β”€217.08 MB (06.81%) ── stream [6543]

The number of streams does not add up to the total amount of leaked threads, but it's still suspiciously large. Could the threads belong to some background IO?
Component: General → DOM
Product: Firefox → Core
Andrea, can you comment on comment 0?
Flags: needinfo?(amarchesini)
I left a restarted browser open for several hours with just 1 tab loaded and threads did not accumulate.

After some regular browsing and addon feature use threads started to accumulate in 2 of 3 content processes again. So maybe a small leak in web pages or an addon is entraining a larger one in the form of threads.
This leaks remains even after closing all tabs but a blank one.

6,309.29 MB (100.0%) -- explicit
β”œβ”€β”€6,180.52 MB (97.96%) ── heap-unclassified
β”œβ”€β”€β”€β”€β”€87.45 MB (01.39%) ++ heap-overhead
└─────41.32 MB (00.65%) ++ (18 tiny)

The unclassified amount seems to be consistent with the leaked thread stacks. Would that show up in a DMD build? If so I'd need a windows 64 dmd build.
Does this reproduce in safe mode? If not can you try selectively re-enabling add-ons?
Flags: needinfo?(bugzilla.mozilla.org)
Whiteboard: [MemShrink]
I copied my tabs over to a profile without addon and it still happens, is that good enough?
Flags: needinfo?(bugzilla.mozilla.org)
(In reply to The 8472 from comment #5)
> I copied my tabs over to a profile without addon and it still happens, is
> that good enough?

Safe mode will also disable some prefs (graphics mostly I believe), so it's possible something is going on there. Also, if you feel comfortable, it would be *really* helpful to get a list of sites that reproduce this issue.
Flags: needinfo?(bugzilla.mozilla.org)
The issue persists in safe-mode. And no, I don't want to share my tabs.
Flags: needinfo?(bugzilla.mozilla.org)
(In reply to The 8472 from comment #7)
> The issue persists in safe-mode. And no, I don't want to share my tabs.

Can you elaborate on technologies used in the tabs? ie HTML5 video, webworkers, etc
- no audio
- webm, mp4 and gif are used
- about:debugging shows no workers
- about:memory shows no wasm guard pages, thus no wasm
- none of the opt-in features (camera, screen capture, webrtc) are used
Any chance you have the visual studio debugger (or other debugger) available and can run Firefox with the debugger attached so you can view the list of threads with names?  Unfortunately, the way thread names are reported on windows is to generate a specially constructed exception that the debugger listens for and uses to annotate the thread names.  Attaching to Firefox after the threads have already spawned is too late for the existing threads, but for an actively growing leak, the new threads should have names.  (See http://searchfox.org/mozilla-central/source/nsprpub/pr/src/md/windows/ntthread.c#292 for the implementation likely in use.)
I have windbg installed
Okay, so I don't want to pretend I'm a windows super-debugging expert, but I was able to do this and it seems promising:

- Firefox can already be running!
- Run windbg, I did x86 because I've got a 32-bit nightly, but I assume/hope x64 works fine.
- Use about:memory to determine the PID of the content process I'm interested in.
- Use "File... Attach to process (F6)" to locate the PID and attach.
- This suspends the process.
- Type "g" and hit return to cause the process to resume execution.
- Wait a bit for stuff to happen/leak.
- Use "Debug... Break" or press Ctrl-break in the debugger window to suspend the debugee.
- Type "~" and hit return.  I see names for the new threads.  In my case, "EncodingRunnable #1" and "DOM Worker".  I've also seen stream transport threads.
- Type "g" to resume afterwards until you want to Ctrl-break again.

From https://developer.mozilla.org/en-US/docs/Mozilla/How_to_get_a_stacktrace_with_WinDbg it looks like there's commands to log what gets printed to a disk, but I would hope/presume it should be fairly obvious from scrolling through the list of threads what went wrong.
It does not show thread names for me, but after fetching symbols I was able to get thread stacks:


 2560  Id: 1b290.1feb4 Suspend: 1 Teb: 000000e0`18148000 Unfrozen
Child-SP          RetAddr           Call Site
000000e0`29dff6b8 00007ff9`146a75ff ntdll!NtWaitForSingleObject+0x14
000000e0`29dff6c0 00007ff8`c8bf8d60 KERNELBASE!WaitForSingleObjectEx+0x8f
000000e0`29dff760 00007ff9`14b2cab0 xul!thread_decoding_proc(void * p_data = <Value unavailable error>)+0x44 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\media\libvpx\vp8\decoder\threading.c @ 637]
000000e0`29dff7b0 00007ff9`17498364 ucrtbase!o__realloc_base+0x60
000000e0`29dff7e0 00007ff9`177070d1 KERNEL32!BaseThreadInitThunk+0x14
000000e0`29dff810 00000000`00000000 ntdll!RtlUserThreadStart+0x21

 2566  Id: 1b290.1a7c0 Suspend: 1 Teb: 000000e0`180d0000 Unfrozen
Child-SP          RetAddr           Call Site
000000e0`225ff928 00007ff9`146a75ff ntdll!NtWaitForSingleObject+0x14
000000e0`225ff930 00007ff8`c8bf8d60 KERNELBASE!WaitForSingleObjectEx+0x8f
000000e0`225ff9d0 00007ff9`14b2cab0 xul!thread_decoding_proc(void * p_data = <Value unavailable error>)+0x44 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\media\libvpx\vp8\decoder\threading.c @ 637]
000000e0`225ffa20 00007ff9`17498364 ucrtbase!o__realloc_base+0x60
000000e0`225ffa50 00007ff9`177070d1 KERNEL32!BaseThreadInitThunk+0x14
000000e0`225ffa80 00000000`00000000 ntdll!RtlUserThreadStart+0x21

 2567  Id: 1b290.1dfe4 Suspend: 1 Teb: 000000e0`180d2000 Unfrozen
Child-SP          RetAddr           Call Site
000000e0`227ffc08 00007ff9`146a75ff ntdll!NtWaitForSingleObject+0x14
000000e0`227ffc10 00007ff8`c8bf8d60 KERNELBASE!WaitForSingleObjectEx+0x8f
000000e0`227ffcb0 00007ff9`14b2cab0 xul!thread_decoding_proc(void * p_data = <Value unavailable error>)+0x44 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\media\libvpx\vp8\decoder\threading.c @ 637]
000000e0`227ffd00 00007ff9`17498364 ucrtbase!o__realloc_base+0x60
000000e0`227ffd30 00007ff9`177070d1 KERNEL32!BaseThreadInitThunk+0x14
000000e0`227ffd60 00000000`00000000 ntdll!RtlUserThreadStart+0x21

 2581  Id: 1b290.1f870 Suspend: 1 Teb: 000000e0`18134000 Unfrozen
Child-SP          RetAddr           Call Site
000000e0`289ffc68 00007ff9`146a75ff ntdll!NtWaitForSingleObject+0x14
000000e0`289ffc70 00007ff8`c8bf8d60 KERNELBASE!WaitForSingleObjectEx+0x8f
000000e0`289ffd10 00007ff9`14b2cab0 xul!thread_decoding_proc(void * p_data = <Value unavailable error>)+0x44 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\media\libvpx\vp8\decoder\threading.c @ 637]
000000e0`289ffd60 00007ff9`17498364 ucrtbase!o__realloc_base+0x60
000000e0`289ffd90 00007ff9`177070d1 KERNEL32!BaseThreadInitThunk+0x14
000000e0`289ffdc0 00000000`00000000 ntdll!RtlUserThreadStart+0x21
How long does it take for this to reproduce for you?  Any chance you could use mozregression to narrow down when the problem started?  I believe it has options to use an existing profile, etc.

http://mozilla.github.io/mozregression/
2017-01-14T02:35:44: INFO : Narrowed inbound regression window from [7ce2094b, ce672399] (4 revisions) to [073d993c, ce672399] (2 revisions) (~1 steps left)
2017-01-14T02:35:44: DEBUG : Starting merge handling...
2017-01-14T02:35:44: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=ce67239948a0319df63deef8fccaf023731f6a29&full=1
2017-01-14T02:35:45: DEBUG : Found commit message:
Bug 1321076 - In the case of alpha, VPXDecoder uses overloaded CreateAndCopy that takes alpha plane. r=jya

MozReview-Commit-ID: AIJxPRjGvrg

2017-01-14T02:35:45: INFO : The bisection is done.
2017-01-14T02:35:45: INFO : Stopped
It would appear bug 1321076 added a decoder, |mVPXAlpha|, but doesn't clean it up [1].

[1] http://searchfox.org/mozilla-central/rev/0aed9484bd3e97206fd1949ee4a4992ef300a81f/dom/media/platforms/agnostic/VPXDecoder.cpp#87-91
Blocks: 1321076
Component: DOM → Audio/Video: Playback
Flags: needinfo?(amarchesini) → needinfo?(kkoorts)
[Tracking Requested - why for this release]:
Memory leak regression impacting UX.
Keywords: regression
By the way, thank you for your help and persistence tracking this down!
I think another issue is that thread stacks only show up as heap-unclassified. If about:memory had provided more information this would have been easier to track down.
Comment on attachment 8827282 [details]
Bug 1329104 - Shutdown context used for WebM alpha decoding.

https://reviewboard.mozilla.org/r/104996/#review105786

::: dom/media/platforms/agnostic/VPXDecoder.cpp:91
(Diff revision 1)
>  
>  void
>  VPXDecoder::Shutdown()
>  {
>    vpx_codec_destroy(&mVPX);
> +  if (mInfo.HasAlpha()) {

seeing that we don't test the return values, the test appears unecessary
Attachment #8827282 - Flags: review?(jyavenard) → review+
Tracking 53+ for this memory leak.
Comment on attachment 8827282 [details]
Bug 1329104 - Shutdown context used for WebM alpha decoding.

https://reviewboard.mozilla.org/r/104996/#review106018
Flags: needinfo?(kkoorts)
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/5d4f34a2196c
Shutdown context used for WebM alpha decoding. r=jya
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/5d4f34a2196c
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: