leak global object after hovering over image in scrollbar

RESOLVED FIXED in M18

Status

()

Core
XUL
P1
major
RESOLVED FIXED
18 years ago
18 years ago

People

(Reporter: dbaron, Assigned: Chris Waterson)

Tracking

({memory-leak})

Trunk
x86
Linux
memory-leak
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [nsbeta3+])

Attachments

(4 attachments)

DESCRIPTION:  We leak two GlobalWindowImpl if we send an event to one of the
images in a GFX scrollbar.

STEPS TO REPRODUCE:
 * setenv XPCOM_MEM_LEAK_LOG 1
 * load a page (from the command line, if you don't want typing leaks!) that's
long enough to have a scrollbar
 * hover the mouse pointer for a short time over one of the images in the GFX
scrollbar (either the arrow image or the one in the middle of the slider)
 * click the X in the window manager to close

ACTUAL RESULTS:
 * leak about 1 meg, including 2 GlobalWindowImpl

DOES NOT WORK CORRECTLY ON:
 * Linux, mozilla, my heavily modified (leak fixes, I hope) build of the morning
of 2000-07-17 tree closure

I'll try to investigate this later today using js_LiveThingToFind.

The initial component (XBL) is a total guess - since I believe this stuff is
constructed by XBL.  The problem could be anywhere.
Keywords: mlk, nsbeta3
The leaked GC root is an nsXULElement::mScriptObject .  The element in question
is a "scrollbarbutton" if I hover over the arrow but an "image" if I hover over
the slider image.

Any ideas?
evaughan:  Are the scrollbar frames made using XBL?

Comment 4

18 years ago
However, the GFX scrollbars for GFX scroll frames are created using C++.  
Therefore this is the ENder problem all over again, i.e., script objects won't 
get cleaned up because the scroll frame doesn't clean them up.

dbaron, you should be able to patch nsGFXScrollFrame to do the same thing for 
its scrollbars that Ender does to make sure its <div> gets cleaned up.

Comment 5

18 years ago
This could also be fixed by making the GFX scroll frame really use XBL.
I'll take a look at just fixing the leak, assuming I can find it.

But I wonder why this only happens if I hover over the images, and not the stuff
next to them.
Assignee: hyatt → dbaron
Component: XP Toolkit/Widgets: XBL → XP Toolkit/Widgets
I'm thinking maybe the right way to fix this (and the inputs leak) is to have an
'UnrootAnonymousContent' (or something) method in nsIAnonymousContentCreator. 
The question is - if we do that, how do we know to call it?  The simplest way
seems to be to QI for it in nsGenericElement::SetDocument().  That seems painful
to me.  However, I think we QI for it during frame construction, so I guess it
shouldn't be too bad.  Also, do we know that the frames holding the anonymous
content are still around?  Should the frames also know to UnrootAnonymousContent
when they are destroyed if it hasn't been done yet?

It also bugs me to see frames (not refcounted) held in nsCOMPtrs.  I'm not sure
why that doesn't assert...

This bug probably could exist for every case where nsIAnonymousContentCreator is
used.  However, I think many of the form control users may have special
GetFrameForPoint methods that prevent events from getting through to the
anonymous content.
Priority: P3 → P1
Created attachment 11646 [details] [diff] [review]
patch that doesn't work because nsGfxScrollFrame::SetDocumentForAnonymousContent is never called
I bet I know why... the GfxScrollFrame probably isn't the *primary* frame for
any element.  I need to get *all* the frames... so I need to figure out how...
I'm stuck.  Some of these frames aren't associated with content, and I don't
even know how to get multiple frames for content that has multiple frames.  It
doesn't seem to me like it's possible to walk the entire frame tree -- and it's
not a pleasant idea either.

Anybody else want to take this bug?
(Assignee)

Comment 11

18 years ago
Can we make this unrooting part of the nsFrame::Destroy() method?
You tried that for the input leak.  At that time, the document no longer had a
reference to the global object so the root couldn't be removed.  At least, I
don't think it could be.  Is there a better way to write
nsGenericElement::SetDocument()?

I'm beginning to wonder whether this method of unrooting the script objects will
be able to hold up.  Another thing that scares me is removing content from the
document:  not only will it get destroyed, but when?  Are there cases where
removing content could cause a leak?  Will the memory be freed *when* the
content is removed?  But that's another issue...
Some correspondance for the permanent record (from hyatt):

"L. David Baron" wrote:
> On Thu, 20 Jul 2000 21:26:42 -0700, hyatt@netscape.com (David Hyatt)
> wrote:
> > Convert GFXScrollFrame to really use XBL, and these leaks would probably go
> > away.
> Is it possible to use XBL to create anonymous content for a frame
> that doesn't have a corresponding content node?
D'oh.  No.  Darn.
(Assignee)

Comment 14

18 years ago
Taking for now, as dbaron has better things to do. hyatt: prolly me and you and 
maybe evaughan are going to have to sit down to crack this chestnut. warren: 
this is a significant leak that I think you should nsbeta3+.
Assignee: dbaron → waterson

Comment 15

18 years ago
My newfound power: nsbeta3+!
Whiteboard: nsbeta3+
Whiteboard: nsbeta3+ → [nsbeta3+]
(Assignee)

Updated

18 years ago
Status: NEW → ASSIGNED
Target Milestone: --- → M18
(Assignee)

Comment 16

18 years ago
dbaron: I applied this patch as a basis to start working from and changed it
like this:

- removed the tag check in nsGenericElement.cpp, so we *always* get the frame,
and then QI() it.

- changed your hack to nsGfxScrollFrame to deal with the fact that sometimes
m[H|V]ScrollBox can be null (fixes crash, otherwise).

Unfortunately, I *still* see the leak.

I'm now experimenting with a minimal test case, running "./mozilla -chrome
file:/tmp/foo.xul", where foo.xul is

  <window xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul">
    <browser type="content-primary"
             src="resource:/res/samples/test0.html"
             flex="1" />
  </window>

In this case, I cannot reproduce the leak (with or without your patches). If
your theory is correct (that trying to handle an event on the scrollbar button
or images is causing the leak), shouldn't we expect this case to leak, as well?

I'm going to start blindly ripping stuff out of navigator.xul. Presumably, at
some point, i'll have minimized navigator.xul so much as to be indistinguishable
from foo.xul. So somewhere along the way, I should be able to detect the feature
that's introducing the leak.

Hoping you have a better idea...
(Assignee)

Comment 17

18 years ago
It's the "tooltip='aToolTip'" attribute on the content area that's "causing" the
leak. Digging further...
(Assignee)

Comment 18

18 years ago
Created attachment 12463 [details]
minimized test case, save as /tmp/foo.xul; run ./mozilla -chrome file:/tmp/foo.xul
(Assignee)

Comment 19

18 years ago
My guess is that the circularity evil is coming from document.tooltipNode; how
is it that this will ever be released?

Comment 20

18 years ago
Tooltips have scroll frames right now.  I have an nsbeta3+ bug assigned against 
me to get rid of scroll frames inside tooltips, so that may provide a good 
workaround if we can verify that my fix corrects the leak.

Comment 21

18 years ago
document.tooltipNode should be released when the tooltip goes away.  It's done 
in nsXULPopupListener.cpp.

Comment 22

18 years ago
Wait wait wait.  I believe you may just be seeing a leak because you're actually 
getting the script object for the anonymous scrollbar node.  In simple 
navigation you won't end up doing that.  

Comment 23

18 years ago
See, the tooltip listener's JS code in navigator.xul will actually call 
event.target and then put up a tooltip depending on the type of the node.  So 
you'll end up creating a script object for the anonymous scrollbar.

So my guess is that the leak hasn't been fixed.

Comment 24

18 years ago
Remember, you will only leak here if you create a script object for the 
anonymous scrollbar or its children.
(Assignee)

Comment 25

18 years ago
I wonder if we could just not root anonymous content?
(Assignee)

Comment 26

18 years ago
Remind me why navtive DOM elements have to root their JS objects at all? 
Couldn't the Finalize() hook just clear the mScriptObject?
waterson:  Were you actually seeing the crash where m[H|V]ScrollBox were null? 
I never got that code to run...
(Assignee)

Comment 28

18 years ago
Yeah, but only after removing the tag check ("this is here for performance 
reasons"). I GetPrimaryFrameFor() *every* element, and then QI() each frame to 
nsIAnonymousContentCreator. (Evil, I know. I'm just thrashing to try to get 
something to work here...)
Waterson: what happens if the GC runs in between a DOM operation that cuts the 
content graph into disconnected components, only one of which (the original doc) 
is rooted, and before such a script reconnects the cut set to a new place in the 
document (or into another doc)?  That's why each DOM element roots its script 
object pointer, IIRC.

The alternative could be made safe as you suppose, by clearing mScriptObject in 
the finalizer -- but then any user-decorated properties in script objects would 
be lost, even if properties reflecting attribute values could be re-reflected 
from the native data structures later.

/be
I don't see why removing the tag check should help.  The anonymous content is
only created if the element meets the tag check (in nsCSSFrameConstructor).  At
least, I think so...

What do the cases look like where the tag check fails and the QI succeeds? 
(What is the content node?)
(Assignee)

Comment 31

18 years ago
It's a <div>.
I was discussing this with Hixie on IRC, and he suggested printf debugging, so:

With the modifications you described, plus some printfs, I found that we seem to
be successfully unrooting many scrollbars.  However, I think we never unroot the
scrollbars around the main document.  (No associated element.)  I see one printf
saying there is a scrollframe with no scrollbar boxes all the time when exiting
(probably URLbar).  I see lots more when leaving a test page like
http://www.bath.ac.uk/%7Epy8ieh/internet/eviltests/floatgrowth.html but none
when leaving a normal scrolling document.
(Assignee)

Comment 33

18 years ago
OH MY GOD I AM GOING INSANE.
Waterson fixed this in his tree by having a hack to check for HTML.  I copied
his idea, and put it in nsDocument::SetScriptGlobalObject.  Patch coming, since
waterson's tree is a mess right now.

Performance effects seem minimal.  I did a jprof of loading 3 pages (3d was
my.netscape.com, first two were my own pages), and the time spent in
nsDocument::SetScriptGlobalObject was 3 hits (with hits at 1ms intervals).
Created attachment 12513 [details] [diff] [review]
patch that fixes bug (adds waterson's changes)
This patch still needs some major cleanup.  If the #if 0 stuff goes (which I
think it has to), then the #include "nsXULAtoms.h" and the makefile changes can
also go, but it should be commented as a minor performance problem...
Created attachment 12514 [details] [diff] [review]
patch, cleaned up a bit
(Assignee)

Comment 38

18 years ago
Couple of questions....

- why did you add the '&& aDeep' to the unravelling code in
nsGenericElement::SetDocument()?

- do we need to file bugs for all the unimplemented
SetDocumentForAnonymousContent() calls?

- hyatt: at this point, XUL elements *never* create frames that require us to do
this anonymous content unrooting hackery, correct? Or do we have parallel tags
in the XUL universe that end up re-using frames like combobox, GFX scroll frame,
ender lite? If so, we probably need to add the unroot hackery to
nsXULElement::SetDocument()

- Similarly, is it possible that an XBL binding could create "doubly anonymous"
content (and would also need this hackery)? My guess is the answer is "no",
because we'll be covered by the nsGenericElement::SetDocument() (or
nsXULElement:SetDocument, if we determine that's necessary), but somebody sanity
check me.

Other than that, this patch looks great.
> - why did you add the '&& aDeep' to the unravelling code in
> nsGenericElement::SetDocument()?

aDeep says that we should SetDocument for the children, too.  I don't know who
ever sets it to false (or why it exists), but we may as well support it
correctly, and not touch the children (including anonymous ones) unless told to
do so.  But then again, I don't see the point of it, and I don't think anyone
ever uses it.

> - do we need to file bugs for all the unimplemented
> SetDocumentForAnonymousContent() calls?

I'm not sure.  We may be OK because some of the form frames have
GetFrameForPoint methods that prevent the anonymous content from receiving
events.  However, we may want to implement them anyway.  We certainly need to
investigate.
Fix checked in before the tree closed this morning (2000-08-09 05:51 PDT). 
Marking fixed.
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED

Comment 41

18 years ago
David - I'm verifying this bug, but I'm a little confused about reproducing this
bug.  Do I use the DOS prompt to load from the command line and where will I see
the 1 meg leak?
You need to log in before you can comment on or make changes to this bug.