Closed Bug 401433 Opened 17 years ago Closed 17 years ago

nsHTMLFormElement's nsFormControlList causes nsTArray to leak its mHdr

Categories

(Core :: XPCOM, defect, P2)

x86
OpenSolaris
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: Dolske, Assigned: Dolske)

References

()

Details

(Keywords: memory-leak, testcase, top100)

Attachments

(1 file)

1. Start browser on Solaris with libumem logging enabled.
2. Visit http://digg.com
3. Attach mdb with a _exit sysbreakpoint
4. Quit browser

When the _exit breakpoint triggers, ::findleaks reports a pile of libX11 and libfreetype leaks, and some leaks through libnspr4.so's' PR_[CM]alloc... Most of those are NSS, but two have a signature like this:

> 0d3be540::bufctl_audit
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
         d3be540          d3bd500    3d0646b807860                1
                          a07dc90          856ef50                0
                 libumem.so.1`umem_cache_alloc_debug+0x14f
                 libumem.so.1`umem_cache_alloc+0x180
                 libumem.so.1`umem_alloc+0xc5
                 libumem.so.1`malloc+0x27
                 libnspr4.so`PR_Malloc+0x37
                 libxul.so`NS_Alloc_P+0x2f
                 libxul.so`int nsTArray_base::EnsureCapacity+0xff
                 libxul.so`nsFormControlList::nsFormControlList #Nvariant 1+0x5a
                 libxul.so`unsigned nsHTMLFormElement::Init+0x3a
                 libxul.so`nsGenericHTMLElement*NS_NewHTMLFormElement+0x4b
                 libxul.so`already_AddRefed<nsGenericHTMLElement>CreateHTMLElement+0x28
                 libxul.so`unsigned NS_NewHTMLElement+0x47
                 libxul.so`unsigned NS_NewElement+0xd7
                 libxul.so`unsigned nsDocument::CreateElem+0x81
                 libxul.so`unsigned nsHTMLDocument::CreateElement+0x213
                 libxul.so`NS_InvokeByIndex_P+0x51
                 libxul.so`int XPCWrappedNative::CallMethod+0x100b
                 libxul.so`int XPC_WN_CallMethod+0xf0
                 libmozjs.so`js_Invoke+0x882
                 libmozjs.so`js_Interpret+0x59a1
                 libmozjs.so`js_Execute+0x2ab
                 libmozjs.so`JS_EvaluateUCScriptForPrincipals+0x61
                 libxul.so`unsigned nsJSContext::EvaluateString+0x358
                 libxul.so`unsigned nsScriptLoader::EvaluateScript+0x311
                 libxul.so`unsigned nsScriptLoader::ProcessRequest+0xb9
                 libxul.so`void nsScriptLoader::ProcessPendingRequests+0xc4
                 libxul.so`unsigned nsScriptLoader::OnStreamComplete+0x68
                 libxul.so`unsigned nsStreamLoader::OnStopRequest+0x5f
                 libxul.so`unsigned nsHTTPCompressConv::OnStopRequest+0x1d
                 libxul.so`unsigned nsHttpChannel::OnStopRequest+0x343
                 libxul.so`unsigned nsInputStreamPump::OnStateStop+0x91
                 libxul.so`unsigned nsInputStreamPump::OnInputStreamReady+0x56
                 libxul.so`unsigned nsInputStreamReadyEvent::Run+0x33
                 libxul.so`unsigned nsThread::ProcessNextEvent+0x160
                 libxul.so`int NS_ProcessNextEvent_P+0x3a
                 libxul.so`unsigned nsBaseAppShell::Run+0x3a
                 libxul.so`unsigned nsAppStartup::Run+0x2a
                 libxul.so`XRE_main+0x37fd
                 main+0x281
                 _start+0x7d


The code in question seems to be:

content/src/nsHTMLFormElement.cpp#1943

1939 nsFormControlList::nsFormControlList(nsHTMLFormElement* aForm) :
1940   mForm(aForm),
1941   // Initialize the elements list to have an initial capacity
1942   // of 8 to reduce allocations on small forms.
1943   mElements(8)
1944 {
1945 }

AIUI, mElements' destructor should be called when nsFormControlList is destroyed. So either nsTArray isn't deallocating what what it got in this stack, or this is a secondary leak if the form element itself is leaking... But I'm not seeing the element leak (I think?).

[I thought this might be a symptom of bug 384230, but this profile doesn't have any Digg logins stored to the leak-causing code in pwmgr shouldn't be getting called.]
Keywords: top100
[Actually, now that I think about it, the "secondary leak" thing is mostly nonsense, ::findleaks reports it as a leak because nothing in the process is pointing to that address.]
Flags: blocking1.9?
Flags: blocking1.9? → blocking1.9+
Priority: -- → P3
I don't see any leaks with trace-refcnt on Mac when I load digg.  Dolske, are you still seeing this (on Solaris with libumem)?
Hmm, no, I still see this. CCing dwitte, since he knows nsTArray and was just looking into SetCapacity()...

[solaris] $ mdb -p `pgrep firefox-bin`
Loading modules: [ ld.so.1 libumem.so.1 libc.so.1 libuutil.so.1 ]
> $G
C++ symbol demangling enabled
> ::sysbp _exit
> :c
mdb: stop on entry to _exit
mdb: target stopped at:
0xfee78648:     addb   %al,(%eax)
mdb: You've got symbols!

> ::findleaks
...
CACHE     LEAKED   BUFCTL CALLER
...
0a07ec90       1 0dfe52b0 libnspr4.so`PR_Malloc+0x63
...
------------------------------------------------------------------------
   Total     490 buffers, 53176 bytes

> 0dfe52b0::bufctl_audit
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
         dfe52b0          dfe4440    1e1b098374fcb                1
                          a07ec90          8f01248                0
  libumem.so.1`umem_cache_alloc_debug+0x14f
  libumem.so.1`umem_cache_alloc+0x180
  libumem.so.1`umem_alloc+0xc5
  libumem.so.1`malloc+0x27
  libnspr4.so`PR_Malloc+0x63
  libxpcom_core.so`NS_Alloc_P+0x47
  libxpcom_core.so`int nsTArray_base::EnsureCapacity+0xae
  libgklayout.so`int nsTArray<nsIFormControl*>::SetCapacity+0x30
  libgklayout.so`nsTArray<nsIFormControl*>::nsTArray #Nvariant 1+0x45
  libgklayout.so`nsFormControlList::nsFormControlList #Nvariant 1+0xb8
  libgklayout.so`unsigned nsHTMLFormElement::Init+0x46
  libgklayout.so`nsGenericHTMLElement*NS_NewHTMLFormElement+0x73
  libgklayout.so`already_AddRefed<nsGenericHTMLElement>CreateHTMLElement+0xa6
  libgklayout.so`unsigned NS_NewHTMLElement+0x1bd
  libgklayout.so`unsigned NS_NewElement+0x36
  libgklayout.so`unsigned nsDocument::CreateElem+0x2aa
  libgklayout.so`unsigned nsHTMLDocument::CreateElem+0x17e
  libgklayout.so`unsigned nsHTMLDocument::CreateElement+0x2fd
  libxpcom_core.so`NS_InvokeByIndex_P+0x51
  libxpconnect.so`int XPCWrappedNative::CallMethod+0x1b7a
  libxpconnect.so`int XPC_WN_CallMethod+0x20d
  libmozjs.so`js_Invoke+0xcb1
  libmozjs.so`js_Interpret+0xd9ff
  libmozjs.so`js_Execute+0x505
  libmozjs.so`JS_EvaluateUCScriptForPrincipals+0xbd
  libgklayout.so`unsigned nsJSContext::EvaluateString+0x485
  libgklayout.so`unsigned nsScriptLoader::EvaluateScript+0x48f
  libgklayout.so`unsigned nsScriptLoader::ProcessRequest+0x144
  libgklayout.so`void nsScriptLoader::ProcessPendingRequests+0xe4
  libgklayout.so`unsigned nsScriptLoader::OnStreamComplete+0xea
  libnecko.so`unsigned nsStreamLoader::OnStopRequest+0xbf
  libnecko.so`unsigned nsHTTPCompressConv::OnStopRequest+0x46
  libnecko.so`unsigned nsHttpChannel::OnStopRequest+0x534
  libnecko.so`unsigned nsInputStreamPump::OnStateStop+0x137
  libnecko.so`unsigned nsInputStreamPump::OnInputStreamReady+0xa3
  libxpcom_core.so`unsigned nsInputStreamReadyEvent::Run+0x85
  libxpcom_core.so`unsigned nsThread::ProcessNextEvent+0x332
  libxpcom_core.so`int NS_ProcessNextEvent_P+0x7b
  libwidget_gtk2.so`unsigned nsBaseAppShell::Run+0x82
  libtoolkitcomps.so`unsigned nsAppStartup::Run+0x9a
  libxul.so`XRE_main+0x521e
  main+0x3e5     
  _start+0x7d    
Do you have to be logged into Digg, or have passwords saved for it, or anything like that?

Can you make a reduced testcase?  Even if we can figure out the bug from the stack trace, it would be nice to have a good testcase to stick into the crashtest suite.
I'm testing on a profile without a Digg login stored, and have not logged in. I just open the browser, enter "digg.com" in the URL bar, wait a bit to ensure it loaded, and then quit.
digg.com never finishes loading for me.  Firefox has "waiting for ad.doubleclick.net" in the status bar forever.
Justin, do you see nsTArray showing up as leaked if you test trace-refcnt? I really don't see how we could leak nsTArrays without leaking nsFormControlLists.

--> dolske
Assignee: nobody → dolske
Reduced test case...

I added a printf to nsFormControlList's constructor and destructor. Loading Digg fires the constructor 3 times and the destructor 1 time. After closing the tab and waiting a moment, the destructor fires 2 times. 3 == 1+2, so no refcount leak of this.

Digg has 2 <form> elements in their HTML (login and search). The other form element comes from http://digg.com//js/60/prototype.js... It's comparing two element prototypes for inequality to set |SpecificElementExtensions|. I dunno why it's doing that, but the test there is sufficient to cause the leak. If I start the browser, enter the following JS, and close the tab, I see a leak with basically the same stack:

javascript:document.createElement('div').__proto__!==document.createElement('form').__proto__;

The mElements related code has a comments about weak references due to fixing bug 36639, but I don't see an obvious (to me :-) reason why this would leak. I suppose the next step is to look at who's getting references to mElements during its brief life.
I still don't see the leak using trace-refcnt on Mac, even using the reduced testcase.  Did you ever get trace-refcnt working on Solaris, or are you still only using libumum?
Keywords: testcase
Hmm, so, I think I see what's happening here. Skip to the end of this comment if you just want the conclusion. :-)

I added some extra logging, and grabbed the leak with libumem again:

> 0ccc9178::bufctl_audit
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
         ccc9178          b275d80    5d7500a8bcdd0                1
                          a07ec90          87707e8                0

The BUFADDR (b275d80) is the address of the leaked buffer. libumem stores some metadata in the first 8 bytes, so the malloc caller actually gets b275d88. For this leak, that's nsTArray_base's mHdr. The array in question is nsFormControlList's mElements [mElements.DebugGetHeader() == mHdr]. So, that's enough to firmly associate the nsTArray with the nsFormControlList, so sanity's sake.

Here's the extra logging I generated, filtered to the relevant bits. The "TTT" are in nsTArray.cpp/h (with |"/%p",this| appended), the XXX lines are in the nsFormControlList.

TTT_base/cc4332c ctor
TTT_base/cc4332c EnsureCapacity(8) init mHdr to b275d88 (was sEmptyHdr/fec60fb4)
XXX +++ nsFormControlList/cc43318 created (mHdr == b275d88)

XXX ... nsFormControlList/cc43318 ::Clear clearing mElements...
TTT/cc4332c Clear()
TTT/cc4332c RemoveElementsAt start=0, count=0
TTT_base/cc4332c ShiftData called, start=0, oldLen=0, newLen=0

XXX --- nsFormControlList/cc43318 destroyed (mHdr == b275d88)
XXX ... nsFormControlList/cc43318 ::Clear clearing mElements...
TTT/cc4332c Clear()
TTT/cc4332c RemoveElementsAt start=0, count=0
TTT_base/cc4332c ShiftData called, start=0, oldLen=0, newLen=0

TTT/cc4332c nsTArray dtor...
TTT/cc4332c Clear()
TTT/cc4332c RemoveElementsAt start=0, count=0
TTT_base/cc4332c ShiftData called, start=0, oldLen=0, newLen=0
TTT_base/cc4332c dtor


Yadda, yadda, yadda, the code seems to assume that memory is deallocated from the nsTArray's dtor when it calls Clear() --> RemoveElementsAt() --> DestructRange()/ShiftData().

Unfortunately, the beginning of ShiftData() decides it doesn't have to do anything when oldLen == newLen, and immediately returns. The array never had anything stored in it (even though it's capacity was set to 8 elements when we made it), so 0 == oldLen == newLen. Thus, it never calls ShrinkCapacity() to deallocate mHdr. Ta-da, leak.
Oh, and: the leak wouldn't should up with trace-refcount because it's not refcounted. Strictly speaking, it's not the nsTArray itself leaking; it's a allocation in the implementation that's leaked.
(In reply to comment #12)
> Oh, and: the leak wouldn't should up with trace-refcount because it's not
> refcounted. Strictly speaking, it's not the nsTArray itself leaking; it's a
> allocation in the implementation that's leaked.
> 

So what's the soln?
Should ~nsTArray() call ShrinkCapacity(0) after Clear()?
Component: DOM: HTML → XPCOM
No, ShrinkCapacity does more work than necessary. nsTArray_base::~nsTArray_base should have something like:

if (mHdr != &sEmptyHdr && !IsAutoArray())
  NS_Free(mHdr);
s/!IsAutoArray()/!UsesAutoArrayBuffer()/ ?
Yeah, that's what I meant ;-)
Attached patch proposed patchSplinter Review
Justin, want to verify this fixes the leak?
Yep, with attachment 298524 [details] [diff] [review] the leak goes away.
Attachment #298524 - Flags: review?(benjamin)
Comment on attachment 298524 [details] [diff] [review]
proposed patch

Excellent catch!

It's silly that we free the array in two places though. Could you completely remove the ShrinkCapacity call from ShiftData as well? It might actually help with fragmentation too since we'll be better at reusing buffers.
I think ShrinkCapacity should be there in ShiftData to keep expected Clear() 
behavior. If some nsTArray grows very large, Clear() should delete also
the internal buffer.
I suspect in most cases we'd benefit from not freeing the buffer since we're likely to fill up about the same amount of data again. If there are places where we really want to free the memory they can call Compact() after calling Clear().

As it is now there is no way to remove all elements while being able to recycle the buffer.

However, I guess we can always do that in a separate bug.
(In reply to comment #22)
> However, I guess we can always do that in a separate bug.
Yup, that is better, since some testing is needed to check if
Compact() is needed somewhere.

Comment on attachment 298524 [details] [diff] [review]
proposed patch

An alternative solution is to fix this in ShiftData instead. That way we could inline the nsTArray_base dtor.

r/sr=me either way
Attachment #298524 - Flags: superreview+
Attachment #298524 - Flags: review?(benjamin)
Attachment #298524 - Flags: review+
Comment on attachment 298524 [details] [diff] [review]
proposed patch

Checking in xpcom/glue/nsTArray.cpp;
/cvsroot/mozilla/xpcom/glue/nsTArray.cpp,v  <--  nsTArray.cpp
new revision: 1.16; previous revision: 1.15
done
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Summary: nsHTMLFormElement's nsFormControlList causes nsTArray to leak → nsHTMLFormElement's nsFormControlList causes nsTArray to leak its mHdr
Should the form consumer be using an nsAutoTArray here and paying that 8-slot price to avoid the extra allocations if we do go over 8?
Generally having long lived nsAutoTArrays on the heap sounds like a bad idea. It almost always wastes memory. If we do want to do it to save cycles I'd say we should get some data first.
Yeah, we definitely want data.  The win is not just cycles, but also fragmentation is lower if we usually stay within the 8 number.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: