Closed Bug 857538 Opened 11 years ago Closed 10 years ago

Intermittent browser-chrome leakcheck | 156 bytes leaked (Mutex, ReentrantMonitor, nsRunnable, nsTArray_base, nsThread)

Categories

(Core :: Gecko Profiler, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: emorley, Unassigned)

Details

(Keywords: intermittent-failure, memory-leak, Whiteboard: [leave open])

Attachments

(2 files)

Broken out from bug 833769...

jesup: the nsThread leak should be a new bug; that's browsertest-chrome, which has no webrtc tests in it.  Perhaps someone else tripped on the the same XPCOM bug, or maybe it's a totally unrelated  was to leak an nsThread
jesup: If someone checked in new use of DISPATCH_SYNC recently that might be the cause.  Or if they didn't clean up their event queue before shutting down

--

Rev3 WINNT 5.1 mozilla-inbound debug test mochitest-browser-chrome on 2013-04-02 19:18:49 PDT for push 4d1787e7e3cf

slave: talos-r3-xp-086

https://tbpl.mozilla.org/php/getParsedLog.php?id=21380426&tree=Mozilla-Inbound

{
03:53:10     INFO -  == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, default process 3584
03:53:10     INFO -       |<----------------Class--------------->|<-----Bytes------>|<----------------Objects---------------->|<--------------References-------------->|
03:53:10     INFO -                                                Per-Inst   Leaked    Total      Rem      Mean       StdDev     Total      Rem      Mean       StdDev
03:53:10     INFO -     0 TOTAL                                          16      156 687149874        5 (39306.27 +/- 69385.20) 718888335        2 (30538.33 +/- 61721.33)
03:53:10     INFO -   291 Mutex                                          12       12    87443        1 (  541.75 +/-    86.07)        0        0 (    0.00 +/-     0.00)
03:53:10     INFO -   354 ReentrantMonitor                               16       16    26332        1 (  134.56 +/-    28.40)        0        0 (    0.00 +/-     0.00)
03:53:10     INFO -  1170 nsRunnable                                     12       12  2136113        1 (  274.74 +/-  2055.06)  5153838        1 (  234.83 +/-  1873.23)
03:53:10     INFO -  1286 nsTArray_base                                   4        4 148765346        1 (156894.97 +/- 61328.56)        0        0 (    0.00 +/-     0.00)
03:53:10     INFO -  1298 nsThread                                      112      112      424        1 (   27.87 +/-     5.21)  1008502        1 (  297.33 +/-   124.99)
03:53:10     INFO -  nsTraceRefcntImpl::DumpStatistics: 1456 entries
03:53:10     INFO -  TEST-INFO | leakcheck | leaked 1 Mutex (12 bytes)
03:53:10     INFO -  TEST-INFO | leakcheck | leaked 1 ReentrantMonitor (16 bytes)
03:53:10     INFO -  TEST-INFO | leakcheck | leaked 1 nsRunnable (12 bytes)
03:53:10     INFO -  TEST-INFO | leakcheck | leaked 1 nsTArray_base (4 bytes)
03:53:10     INFO -  TEST-INFO | leakcheck | leaked 1 nsThread (112 bytes)
03:53:10  WARNING -  TEST-UNEXPECTED-FAIL | leakcheck | 156 bytes leaked (Mutex, ReentrantMonitor, nsRunnable, nsTArray_base, nsThread)
}
The fix to avoid the XPCOM bug was in bug 853998 (avoid DISPATCH_SYNC from a thread that doesn't normally spin the event loop).
I can try to figure out what test is to blame for this leak.
Assignee: nobody → continuation
(In reply to Andrew McCreight [:mccr8] from comment #8)
> I can try to figure out what test is to blame for this leak.

Thank you :-)
Try run with logging: https://tbpl.mozilla.org/?tree=Try&rev=b7f5850e39f2

So far this has only happened 13 times, all on XP, according to Orange Factor.  We'll see if I can manage to capture it.
Are there any debugs you could commit that would dump useful info to the logs until we can track it down (Without overloading the logs).  We did this for working on some of the timeout problems in WebRTC.
Yeah, that was going to be my next step if I didn't find anything with the try run.
I got lucky and managed to catch one of the leaks.

In a run without leaks, I got this:
Perspicacity:bcnoleak amccreight$ python ../ThreadLeak.py noleak.txt 
Leaked thread 0204BBF8 with a nameless mThread 01FF6BD8
Leaked thread 02EE2C20 with name Timer
Leaked thread 020C4358 with name Socket Thread
Leaked thread 020C0FF8 with name Link Monitor

In a run with leaks, I got this:
Perspicacity:bcnoleak amccreight$ python ../ThreadLeak.py ../bc/leaks.txt
Leaked thread 0204BBF8 with a nameless mThread 01FF6BD8
Leaked thread 02EE2C20 with name Timer
Leaked thread 020C0FF8 with name Link Monitor
Leaked thread 14B374A8 with name Event Tracer
Leaked thread 020C4358 with name Socket Thread

These are all created very early in the log.  I'm not sure why the destructors don't show up, but anyways, this suggests that the leaking thread is named "Event Tracer".

The only patches that have touched toolkit/xre/EventTracer.cpp recently are bug 853896 and bug 851611, which landed on 3/27.

The first reappearance of this leak in bug 833769 was from 4/02, which is 5 days later, and seems a little late to have caused it, but I don't know how long it takes for new leaks to get picked up.

Event Tracer may be innocent, and may just not have its destructor logged for the same reason those other threads don't.  I'll try to look at some other of my logs to see if they find anything useful.
(My logging just prints stuff out in the ctor and dtor, plus some stuff to log the mThread to try to get a name.)
I looked at another log that leaks, and another log that doesn't, and as before the leaking one has a leaking nsThread named Event Tracer, and the non-leaking one doesn't.

I'm not sure what I was looking at in comment 22, but the leaking nsThread is actually created during the test devtools/profiler/test/browser_profiler_bug_830664_multiple_profiles.js, not at the very beginning, like I said.  Bug 830664 landed at the beginning of February, so it doesn't seem like it could be the cause.  It does involve the profiler, so the test at least seems related.

In the log that doesn't leak, the Event Tracer thread is killed late during shutdown.
Here's excerpts from a leaking and non-leaking log, for browser_profiler_bug_830664_multiple_profiles.
Component: XPCOM → Gecko Profiler
I don't think I'll be able to make any further progress here, given that I don't really know anything about nsThread.  Maybe BenWa can investigate.
Assignee: continuation → nobody
(In reply to Andrew McCreight [:mccr8] from comment #25)
> I looked at another log that leaks, and another log that doesn't, and as
> before the leaking one has a leaking nsThread named Event Tracer, and the
> non-leaking one doesn't.
> 
> I'm not sure what I was looking at in comment 22, but the leaking nsThread
> is actually created during the test
> devtools/profiler/test/browser_profiler_bug_830664_multiple_profiles.js, not
> at the very beginning, like I said.  Bug 830664 landed at the beginning of
> February, so it doesn't seem like it could be the cause.  It does involve
> the profiler, so the test at least seems related.
> 
> In the log that doesn't leak, the Event Tracer thread is killed late during
> shutdown.

(In reply to Andrew McCreight [:mccr8] from comment #26)
> Created attachment 734684 [details]
> excerpts from leaking and nonleaking logs
> 
> Here's excerpts from a leaking and non-leaking log, for
> browser_profiler_bug_830664_multiple_profiles.

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #27)
> The tracer thread code lives here, FWIW:
> http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/EventTracer.cpp

(In reply to Andrew McCreight [:mccr8] from comment #30)
> I don't think I'll be able to make any further progress here, given that I
> don't really know anything about nsThread.  Maybe BenWa can investigate.

This is a top-orange - Benoit, would you be able to take a look? :-)
Flags: needinfo?(bgirard)
Ted, the problem only happens on xp slaves so perhaps the problem is specific to the windows backend of event tracing:

Here it looks like we dispatch an event to the main event loop but we don't clean it up if it's pending:
http://mxr.mozilla.org/mozilla-central/source/widget/windows/WidgetTraceEvent.cpp#78
Flags: needinfo?(bgirard) → needinfo?(ted)
Attached patch diagnosticSplinter Review
Attachment #750049 - Flags: review?(ted)
Attachment #750049 - Flags: review?(ted) → review+
Flags: needinfo?(ted)
Keywords: checkin-needed
Whiteboard: [leave open]
This seems pretty plausible. Sorry, this event tracer code wound up being a lot more fragile than I ever thought. :-/
(In reply to TinderboxPushlog Robot from comment #157)
> RyanVM
> https://tbpl.mozilla.org/php/getParsedLog.php?id=23038196&tree=Mozilla-
> Inbound
> Rev3 WINNT 5.1 mozilla-inbound debug test mochitest-browser-chrome on
> 2013-05-16 09:23:05
> slave: talos-r3-xp-109
> 
> 10:50:14  WARNING -  TEST-UNEXPECTED-FAIL | leakcheck | 156 bytes leaked
> (Mutex, ReentrantMonitor, nsRunnable, nsTArray_base, nsThread)

This is post-comment 155.
Looks like it wasn't it. I'll take another look.
So I guess this patch showed us that we're not actually leaking this thing?
No sign of this for a year.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: