Closed
Bug 401433
Opened 17 years ago
Closed 17 years ago
nsHTMLFormElement's nsFormControlList causes nsTArray to leak its mHdr
Categories
(Core :: XPCOM, defect, P2)
Tracking
()
RESOLVED
FIXED
People
(Reporter: Dolske, Assigned: Dolske)
References
()
Details
(Keywords: memory-leak, testcase, top100)
Attachments
(1 file)
859 bytes,
patch
|
sicking
:
review+
sicking
:
superreview+
|
Details | Diff | Splinter Review |
1. Start browser on Solaris with libumem logging enabled. 2. Visit http://digg.com 3. Attach mdb with a _exit sysbreakpoint 4. Quit browser When the _exit breakpoint triggers, ::findleaks reports a pile of libX11 and libfreetype leaks, and some leaks through libnspr4.so's' PR_[CM]alloc... Most of those are NSS, but two have a signature like this: > 0d3be540::bufctl_audit ADDR BUFADDR TIMESTAMP THREAD CACHE LASTLOG CONTENTS d3be540 d3bd500 3d0646b807860 1 a07dc90 856ef50 0 libumem.so.1`umem_cache_alloc_debug+0x14f libumem.so.1`umem_cache_alloc+0x180 libumem.so.1`umem_alloc+0xc5 libumem.so.1`malloc+0x27 libnspr4.so`PR_Malloc+0x37 libxul.so`NS_Alloc_P+0x2f libxul.so`int nsTArray_base::EnsureCapacity+0xff libxul.so`nsFormControlList::nsFormControlList #Nvariant 1+0x5a libxul.so`unsigned nsHTMLFormElement::Init+0x3a libxul.so`nsGenericHTMLElement*NS_NewHTMLFormElement+0x4b libxul.so`already_AddRefed<nsGenericHTMLElement>CreateHTMLElement+0x28 libxul.so`unsigned NS_NewHTMLElement+0x47 libxul.so`unsigned NS_NewElement+0xd7 libxul.so`unsigned nsDocument::CreateElem+0x81 libxul.so`unsigned nsHTMLDocument::CreateElement+0x213 libxul.so`NS_InvokeByIndex_P+0x51 libxul.so`int XPCWrappedNative::CallMethod+0x100b libxul.so`int XPC_WN_CallMethod+0xf0 libmozjs.so`js_Invoke+0x882 libmozjs.so`js_Interpret+0x59a1 libmozjs.so`js_Execute+0x2ab libmozjs.so`JS_EvaluateUCScriptForPrincipals+0x61 libxul.so`unsigned nsJSContext::EvaluateString+0x358 libxul.so`unsigned nsScriptLoader::EvaluateScript+0x311 libxul.so`unsigned nsScriptLoader::ProcessRequest+0xb9 libxul.so`void nsScriptLoader::ProcessPendingRequests+0xc4 libxul.so`unsigned nsScriptLoader::OnStreamComplete+0x68 libxul.so`unsigned nsStreamLoader::OnStopRequest+0x5f libxul.so`unsigned nsHTTPCompressConv::OnStopRequest+0x1d libxul.so`unsigned nsHttpChannel::OnStopRequest+0x343 libxul.so`unsigned nsInputStreamPump::OnStateStop+0x91 libxul.so`unsigned nsInputStreamPump::OnInputStreamReady+0x56 libxul.so`unsigned nsInputStreamReadyEvent::Run+0x33 libxul.so`unsigned nsThread::ProcessNextEvent+0x160 libxul.so`int NS_ProcessNextEvent_P+0x3a libxul.so`unsigned nsBaseAppShell::Run+0x3a libxul.so`unsigned nsAppStartup::Run+0x2a libxul.so`XRE_main+0x37fd main+0x281 _start+0x7d The code in question seems to be: content/src/nsHTMLFormElement.cpp#1943 1939 nsFormControlList::nsFormControlList(nsHTMLFormElement* aForm) : 1940 mForm(aForm), 1941 // Initialize the elements list to have an initial capacity 1942 // of 8 to reduce allocations on small forms. 1943 mElements(8) 1944 { 1945 } AIUI, mElements' destructor should be called when nsFormControlList is destroyed. So either nsTArray isn't deallocating what what it got in this stack, or this is a secondary leak if the form element itself is leaking... But I'm not seeing the element leak (I think?). [I thought this might be a symptom of bug 384230, but this profile doesn't have any Digg logins stored to the leak-causing code in pwmgr shouldn't be getting called.]
Updated•17 years ago
|
URL: http://digg.com
Keywords: top100
Assignee | ||
Comment 1•17 years ago
|
||
[Actually, now that I think about it, the "secondary leak" thing is mostly nonsense, ::findleaks reports it as a leak because nothing in the process is pointing to that address.]
Updated•17 years ago
|
Flags: blocking1.9?
Updated•17 years ago
|
Flags: blocking1.9? → blocking1.9+
Priority: -- → P3
Comment 2•17 years ago
|
||
I don't see any leaks with trace-refcnt on Mac when I load digg. Dolske, are you still seeing this (on Solaris with libumem)?
Assignee | ||
Comment 3•17 years ago
|
||
Hmm, no, I still see this. CCing dwitte, since he knows nsTArray and was just looking into SetCapacity()... [solaris] $ mdb -p `pgrep firefox-bin` Loading modules: [ ld.so.1 libumem.so.1 libc.so.1 libuutil.so.1 ] > $G C++ symbol demangling enabled > ::sysbp _exit > :c mdb: stop on entry to _exit mdb: target stopped at: 0xfee78648: addb %al,(%eax) mdb: You've got symbols! > ::findleaks ... CACHE LEAKED BUFCTL CALLER ... 0a07ec90 1 0dfe52b0 libnspr4.so`PR_Malloc+0x63 ... ------------------------------------------------------------------------ Total 490 buffers, 53176 bytes > 0dfe52b0::bufctl_audit ADDR BUFADDR TIMESTAMP THREAD CACHE LASTLOG CONTENTS dfe52b0 dfe4440 1e1b098374fcb 1 a07ec90 8f01248 0 libumem.so.1`umem_cache_alloc_debug+0x14f libumem.so.1`umem_cache_alloc+0x180 libumem.so.1`umem_alloc+0xc5 libumem.so.1`malloc+0x27 libnspr4.so`PR_Malloc+0x63 libxpcom_core.so`NS_Alloc_P+0x47 libxpcom_core.so`int nsTArray_base::EnsureCapacity+0xae libgklayout.so`int nsTArray<nsIFormControl*>::SetCapacity+0x30 libgklayout.so`nsTArray<nsIFormControl*>::nsTArray #Nvariant 1+0x45 libgklayout.so`nsFormControlList::nsFormControlList #Nvariant 1+0xb8 libgklayout.so`unsigned nsHTMLFormElement::Init+0x46 libgklayout.so`nsGenericHTMLElement*NS_NewHTMLFormElement+0x73 libgklayout.so`already_AddRefed<nsGenericHTMLElement>CreateHTMLElement+0xa6 libgklayout.so`unsigned NS_NewHTMLElement+0x1bd libgklayout.so`unsigned NS_NewElement+0x36 libgklayout.so`unsigned nsDocument::CreateElem+0x2aa libgklayout.so`unsigned nsHTMLDocument::CreateElem+0x17e libgklayout.so`unsigned nsHTMLDocument::CreateElement+0x2fd libxpcom_core.so`NS_InvokeByIndex_P+0x51 libxpconnect.so`int XPCWrappedNative::CallMethod+0x1b7a libxpconnect.so`int XPC_WN_CallMethod+0x20d libmozjs.so`js_Invoke+0xcb1 libmozjs.so`js_Interpret+0xd9ff libmozjs.so`js_Execute+0x505 libmozjs.so`JS_EvaluateUCScriptForPrincipals+0xbd libgklayout.so`unsigned nsJSContext::EvaluateString+0x485 libgklayout.so`unsigned nsScriptLoader::EvaluateScript+0x48f libgklayout.so`unsigned nsScriptLoader::ProcessRequest+0x144 libgklayout.so`void nsScriptLoader::ProcessPendingRequests+0xe4 libgklayout.so`unsigned nsScriptLoader::OnStreamComplete+0xea libnecko.so`unsigned nsStreamLoader::OnStopRequest+0xbf libnecko.so`unsigned nsHTTPCompressConv::OnStopRequest+0x46 libnecko.so`unsigned nsHttpChannel::OnStopRequest+0x534 libnecko.so`unsigned nsInputStreamPump::OnStateStop+0x137 libnecko.so`unsigned nsInputStreamPump::OnInputStreamReady+0xa3 libxpcom_core.so`unsigned nsInputStreamReadyEvent::Run+0x85 libxpcom_core.so`unsigned nsThread::ProcessNextEvent+0x332 libxpcom_core.so`int NS_ProcessNextEvent_P+0x7b libwidget_gtk2.so`unsigned nsBaseAppShell::Run+0x82 libtoolkitcomps.so`unsigned nsAppStartup::Run+0x9a libxul.so`XRE_main+0x521e main+0x3e5 _start+0x7d
Comment 4•17 years ago
|
||
Do you have to be logged into Digg, or have passwords saved for it, or anything like that? Can you make a reduced testcase? Even if we can figure out the bug from the stack trace, it would be nice to have a good testcase to stick into the crashtest suite.
Assignee | ||
Comment 5•17 years ago
|
||
I'm testing on a profile without a Digg login stored, and have not logged in. I just open the browser, enter "digg.com" in the URL bar, wait a bit to ensure it loaded, and then quit.
Comment 6•17 years ago
|
||
digg.com never finishes loading for me. Firefox has "waiting for ad.doubleclick.net" in the status bar forever.
Justin, do you see nsTArray showing up as leaked if you test trace-refcnt? I really don't see how we could leak nsTArrays without leaking nsFormControlLists.
Assignee | ||
Comment 9•17 years ago
|
||
Reduced test case... I added a printf to nsFormControlList's constructor and destructor. Loading Digg fires the constructor 3 times and the destructor 1 time. After closing the tab and waiting a moment, the destructor fires 2 times. 3 == 1+2, so no refcount leak of this. Digg has 2 <form> elements in their HTML (login and search). The other form element comes from http://digg.com//js/60/prototype.js... It's comparing two element prototypes for inequality to set |SpecificElementExtensions|. I dunno why it's doing that, but the test there is sufficient to cause the leak. If I start the browser, enter the following JS, and close the tab, I see a leak with basically the same stack: javascript:document.createElement('div').__proto__!==document.createElement('form').__proto__; The mElements related code has a comments about weak references due to fixing bug 36639, but I don't see an obvious (to me :-) reason why this would leak. I suppose the next step is to look at who's getting references to mElements during its brief life.
Comment 10•17 years ago
|
||
I still don't see the leak using trace-refcnt on Mac, even using the reduced testcase. Did you ever get trace-refcnt working on Solaris, or are you still only using libumum?
Keywords: testcase
Assignee | ||
Comment 11•17 years ago
|
||
Hmm, so, I think I see what's happening here. Skip to the end of this comment if you just want the conclusion. :-)
I added some extra logging, and grabbed the leak with libumem again:
> 0ccc9178::bufctl_audit
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
ccc9178 b275d80 5d7500a8bcdd0 1
a07ec90 87707e8 0
The BUFADDR (b275d80) is the address of the leaked buffer. libumem stores some metadata in the first 8 bytes, so the malloc caller actually gets b275d88. For this leak, that's nsTArray_base's mHdr. The array in question is nsFormControlList's mElements [mElements.DebugGetHeader() == mHdr]. So, that's enough to firmly associate the nsTArray with the nsFormControlList, so sanity's sake.
Here's the extra logging I generated, filtered to the relevant bits. The "TTT" are in nsTArray.cpp/h (with |"/%p",this| appended), the XXX lines are in the nsFormControlList.
TTT_base/cc4332c ctor
TTT_base/cc4332c EnsureCapacity(8) init mHdr to b275d88 (was sEmptyHdr/fec60fb4)
XXX +++ nsFormControlList/cc43318 created (mHdr == b275d88)
XXX ... nsFormControlList/cc43318 ::Clear clearing mElements...
TTT/cc4332c Clear()
TTT/cc4332c RemoveElementsAt start=0, count=0
TTT_base/cc4332c ShiftData called, start=0, oldLen=0, newLen=0
XXX --- nsFormControlList/cc43318 destroyed (mHdr == b275d88)
XXX ... nsFormControlList/cc43318 ::Clear clearing mElements...
TTT/cc4332c Clear()
TTT/cc4332c RemoveElementsAt start=0, count=0
TTT_base/cc4332c ShiftData called, start=0, oldLen=0, newLen=0
TTT/cc4332c nsTArray dtor...
TTT/cc4332c Clear()
TTT/cc4332c RemoveElementsAt start=0, count=0
TTT_base/cc4332c ShiftData called, start=0, oldLen=0, newLen=0
TTT_base/cc4332c dtor
Yadda, yadda, yadda, the code seems to assume that memory is deallocated from the nsTArray's dtor when it calls Clear() --> RemoveElementsAt() --> DestructRange()/ShiftData().
Unfortunately, the beginning of ShiftData() decides it doesn't have to do anything when oldLen == newLen, and immediately returns. The array never had anything stored in it (even though it's capacity was set to 8 elements when we made it), so 0 == oldLen == newLen. Thus, it never calls ShrinkCapacity() to deallocate mHdr. Ta-da, leak.
Assignee | ||
Comment 12•17 years ago
|
||
Oh, and: the leak wouldn't should up with trace-refcount because it's not refcounted. Strictly speaking, it's not the nsTArray itself leaking; it's a allocation in the implementation that's leaked.
Comment 13•17 years ago
|
||
(In reply to comment #12) > Oh, and: the leak wouldn't should up with trace-refcount because it's not > refcounted. Strictly speaking, it's not the nsTArray itself leaking; it's a > allocation in the implementation that's leaked. > So what's the soln?
Comment 14•17 years ago
|
||
Should ~nsTArray() call ShrinkCapacity(0) after Clear()?
Component: DOM: HTML → XPCOM
Comment 15•17 years ago
|
||
No, ShrinkCapacity does more work than necessary. nsTArray_base::~nsTArray_base should have something like: if (mHdr != &sEmptyHdr && !IsAutoArray()) NS_Free(mHdr);
Comment 16•17 years ago
|
||
s/!IsAutoArray()/!UsesAutoArrayBuffer()/ ?
Comment 17•17 years ago
|
||
Yeah, that's what I meant ;-)
Comment 18•17 years ago
|
||
Justin, want to verify this fixes the leak?
Assignee | ||
Comment 19•17 years ago
|
||
Yep, with attachment 298524 [details] [diff] [review] the leak goes away.
Updated•17 years ago
|
Attachment #298524 -
Flags: review?(benjamin)
Comment on attachment 298524 [details] [diff] [review] proposed patch Excellent catch! It's silly that we free the array in two places though. Could you completely remove the ShrinkCapacity call from ShiftData as well? It might actually help with fragmentation too since we'll be better at reusing buffers.
Priority: P3 → P2
Comment 21•17 years ago
|
||
I think ShrinkCapacity should be there in ShiftData to keep expected Clear() behavior. If some nsTArray grows very large, Clear() should delete also the internal buffer.
I suspect in most cases we'd benefit from not freeing the buffer since we're likely to fill up about the same amount of data again. If there are places where we really want to free the memory they can call Compact() after calling Clear(). As it is now there is no way to remove all elements while being able to recycle the buffer. However, I guess we can always do that in a separate bug.
Comment 23•17 years ago
|
||
(In reply to comment #22) > However, I guess we can always do that in a separate bug. Yup, that is better, since some testing is needed to check if Compact() is needed somewhere.
Comment on attachment 298524 [details] [diff] [review] proposed patch An alternative solution is to fix this in ShiftData instead. That way we could inline the nsTArray_base dtor. r/sr=me either way
Attachment #298524 -
Flags: superreview+
Attachment #298524 -
Flags: review?(benjamin)
Attachment #298524 -
Flags: review+
Comment 25•17 years ago
|
||
Comment on attachment 298524 [details] [diff] [review] proposed patch Checking in xpcom/glue/nsTArray.cpp; /cvsroot/mozilla/xpcom/glue/nsTArray.cpp,v <-- nsTArray.cpp new revision: 1.16; previous revision: 1.15 done
Updated•17 years ago
|
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Updated•17 years ago
|
Summary: nsHTMLFormElement's nsFormControlList causes nsTArray to leak → nsHTMLFormElement's nsFormControlList causes nsTArray to leak its mHdr
Comment 26•17 years ago
|
||
Should the form consumer be using an nsAutoTArray here and paying that 8-slot price to avoid the extra allocations if we do go over 8?
Generally having long lived nsAutoTArrays on the heap sounds like a bad idea. It almost always wastes memory. If we do want to do it to save cycles I'd say we should get some data first.
Comment 28•17 years ago
|
||
Yeah, we definitely want data. The win is not just cycles, but also fragmentation is lower if we usually stay within the 8 number.
You need to log in
before you can comment on or make changes to this bug.
Description
•