Closed Bug 64819 Opened 24 years ago Closed 2 years ago

nsDOMEvent objects held too long in mozilla, not in viewer, causes bloat

Categories

(Core :: DOM: UI Events & Focus Handling, defect, P3)

x86
Windows NT
defect

Tracking

()

RESOLVED WORKSFORME
Future

People

(Reporter: buster, Unassigned)

References

()

Details

(Keywords: memory-leak, perf)

1. use your favorite technique for detecting leaks.  In playing with the idea of
an event recycler, I put printf's in the constructor and destructor of nsDOMEvent
2. load the page in viewer
3. use mouse-moves to generate events.  you should see a few hundred within a
very short time of waving the mouse around in the page.  none of these leak.
4. load the page in mozilla (if it matters, I'm using classic skin.)
5. put your mouse in the air, and wave it around like you just don't care
6. every event leaks.  this includes mouse moves over the html content area, or
just over the chrome (menus, buttons, etc.)
7. load a different page
8. you'll notice all the events get free'd at the page transition

64 bytes per event (more really....) times hundreds or thousands of events for a
realistic interaction with a page....that's a bunch of memory!
although the memory is eventually reclaimed, I think we have to consider this a
leak because I can't imagine we actually need those events to persist for the
life of a document.  They're just bloat.
Keywords: mlk, perf
Priority: -- → P2
This bug blocks bug 64696.  A recycler is useless if the objects it's trying to
recycle are leaked.
Blocks: 64696
Wow, that's a doozy.
> and wave it around like you just don't care

come on,  we could release not this...
  "all mouse movements should be made with thought and care" ;-)

joki, any ideas on how we could fix this?
In mozilla, I think we examine events in javascript for tooltips (right?) --
this would create JS objects for the events that wouldn't be collected until
the next GC.  What happens if you trigger a JS garbage collection?  Do they go
away?
buster: have you verified that these really are leaks? i.e.,

  set XPCOM_MEM_REFCOUNT_LOG=c:\tmp\refcnt.log
  run
  waggle your mouse
  shutdown
  see ns*Event objects in refcnt.log

they aren't "leaks" in the sense that eventually they are reclaimed, but they
are pure bloat.  the lifetime of an event is very short, and JS is holding onto
these events for a relatively long time.
Summary: nsDOMEvent objects leaked in mozilla, not in viewer → nsDOMEvent objects held too long in mozilla, not in viewer, causes bloat
GC is bloaty compared to ref-counting -- no news there.  This bug is really
asking for better GC scheduling, if necessary (see bug 64696).  Hard to say what
necessary is without looking at all intra-GC bloat.  Here's an idea:

- Extend the JS API with a JS_AdviseGC call that allows one to make claims to
the JS GC such as:
 - JS_ADVISE_GC_LIKELY_GARBAGE: An object is likely to be garbage now.
 - JS_ADVISE_GC_LIKELY_THRESHOLD: Run a GC after N such likely garbage pieces of
advice, or perhaps N bytes containing such likely garbage objects.
 - Additional per-object size information (private data, entrained out of line
allocations owned by the private data) for JS_ADVISE_GC_LIKELY_GARBAGE.
 - Additional per-object size threshold at which to run the GC.  This could be
combined with JS_ADVISE_GC_LIKELY_THRESHOLD so that the single threshold is a
byte count, and the engine sums intrinsic (JS engine data structure) object size
with additional (private data) size, and compares to the threshold.

- Call JS_AdviseGC(cx, JS_ADVISE_GC_LIKELY_THRESHOLD, 10*1024) or whatever bloat
limit is justified by measurement of all intra-GC bloat, when creating a DOM
script context.
- Call JS_AdviseGC(cx, JS_ADVISE_GC_LIKELY_GARBAGE, sizeof(nsDOMEvent)) after
each event handler returns.

Having sketched the above, I'm now loath to add anything to the JS API.  All of
the above could be implemented in dom or layout code, using the existing JS API.
The byte counter and byte threshold could be stored in the nsJSContext.  The
intrinsic size of a JS object can be estimated or hardwired, or we could add a
JS API entry point JS_SizeofObject.

Before any of this happens, though, it seems necessary to get a total measure of
intra-GC bloat, and to compare it to the worst case from mousing around.  The
current code will run a GC after enough mouse motion and 50% growth in nominal
live object count.

/be
Cc'ing jst for comments.

/be
s/intra-GC/inter-GC/ -- I'm tired.

Another good to optimize is garbage collected / time spent in GC.  That will
take some inside-the-JS-engine work, and I'll look into that as part of bug
66381 (just filed).  The goal is to make JS_MaybeGC fast enough to call whenever
a JS embedding believes it has created garbage (no matter how little).

I am not sure how this will pan out, and without a JS_AdviseGC hook there is no
way for the engine to size private data subtrees hanging off of JS objects.  So
don't wait for me: if the inter-GC bloat due to nsDOMEvents is too great, force
a GC from dom or layout code every N event handler dispatches.

/be
I'd like to see some numbers here too before spending any time on this,
currently the DOM code calls JS_MaybeGC() after every 20:th script evaluation
(i.e. after every 20:th or so event handler is run) so we should be calling
JS_MaybeGC() often enough IMO but it seems like the events and event handlers
don't generate enough garbage for JS_MaybeGC() to actually run the GC.
be: I appreciate your efforts here, but I really hope we don't need to pull out
such a big jackhammer to solve this iddy biddy problem.
Call me a fat lover, but I don't think two new member variables per DOM context,
and some logic to add to one and check against the other before calling JS_GC,
is bigger than "iddy biddy".  OTOH, I'm not sure that you have swallowed the
whole garbage collection pill, practice as well as theory.  GC means more bloat,
until the next GC runs (incremental of not), than an ideal system, or than a
ref-counted system.

That bloat need not be large compared to the working set, and well-tuned GC
scheduling will keep it down.  But the problem is not as small as its smallest
ref-counted analogue.  Ref-counting will always be less bloaty than GC, but it
has its own downside (leaks, cycles, code bloat, dangling dumb pointers), with
which we are well acquainted.

/be
well, I'm going to move forward with the patch in 64696 as is, because it does
help quite a bit in many situations (embedded gecko, applications other than
browser), and has almost no additional cost in the case of the browser chrome.

We can continue this thread here, or move it to a newsgroup.  My main point
remains:  in the case where we know the lifetime of an object, there is a
significant cost to having that object migrate into javascript, where the GC
memory model renders recyclers much less effective.
What would we say in the newsgroup?  Argue about who should measure the bloat
and compare it to other concurrent bloat in order to back up the claims about
this bug's severity?  I'll save the newsgroups the thread and try some
trace-malloc fu, as soon as I upgrade to a version of GCC that can handle the
latest xpcom/base/nsTypeInfo.cpp.

Clearly, recyclers are a means to an end.  The one you added saves cycles in
viewer, but cycles are not bloat, and don't even correlate to bloat in many
cases.  You have never quantified the transient bloat problem that this bug
reports, and compared it to other inter-GC or intra-page-load bloats, so I think
your use of "significant" is unjustified.

Again, if the problem of Mozilla chrome using JS to capture events, leading to
event bloat between GC runs, *is* significant (and I agree, obviously it could
be, with enough mousing around between GCs -- but how bad is it in the real
world?), then we have a GC scheduling bug, and not an excuse or pretext to
badmouth JS, or to rewrite the UI in C++.  The former is just wrong, the latter
would add a ton of C++ to the system (source brainprint and compiled footprint),
raising the bar for programmers, introducing leak and bad pointer bugs, etc. etc.

More repeating of what I wrote elsewhere: say we do a hybrid C++/JS scheme that
rewrites event listeners in JS to take only the public members of the nsDOMEvent
instance, including methods, but never to get a JS object wrapping the C++
instance.  This approach still adds otherwise unnecessary C++ to the system (a
fixed code footprint cost).  It also obfuscates the JS code, so much so that it
will probably lead to a JS object being created to hold all the related members
and methods.  Then for each nsDOMEvent, which on the upside would be recycled
after the handler returns, you'd still get a JS object that would not be
recycled until the next GC.

But here I'm guessing at exactly what you propose as a practical improvement
over the current state of affairs where JS can implement the same listener
interfaces that C++ viewer, layout, and DOM code implement and call through. 
Did you have something other than the hybrid approach, or rewrite-it-all-in-C++,
in mind?

JS and languages like it have win over C++ at the higher layers, and we'll see
more of them, including more garbage-collected languages, in Mozilla.  Their
embeddings will have bugs that need to be fixed, such as GC scheduling (if it's
not handled magically by the language's engine).  By all means, let's call a
spade a spade, but please don't try to puff up the significance of GC's downside
without any apples-to-apples quantification.  It's not as if refcounting has no
downside, in different (oranges to apples, and usually worse, judging from
Mozilla experience) ways than transient bloat.

Can we let this bug stand as the GC scheduling one?  Should joki own it?

/be
Reassigning QA Contact for all open and unverified bugs previously under Lorca's
care to Gerardo as per phone conversation this morning.
QA Contact: lorca → gerardok
Setting milestone
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9
I think the title of this bug is misleading, I get permanent threadsafety
assertions for the nsDOMEvent when leaving the viewer. As this is under Win98 it
usually indicates that an object is held past xpcom shutdown. It makes table
regression tests  a really funny exercise.
I'll continue to hold onto this bug but I don't know that we've decided on a 
course of action to do anything about it.  For the moment I'm moving it to 0.9.1 
since nothing is going to be done on it before tomorrow but we should agree what 
needs to be done.
Target Milestone: mozilla0.9 → mozilla0.9.1
Moving this to 1.0 as I don't think we have a plan of action to do anything 
about it for 0.9.1
Target Milestone: mozilla0.9.1 → mozilla1.0
Shaver may prefer to dup this against his incremental/generational GC bug.

/be
QA contact updated
QA Contact: gerardok → madhur
Blocks: 92580
No longer blocks: 92580
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 
(you can query for this string to delete spam or retrieve the list of bugs I've 
moved)
Target Milestone: mozilla1.0 → mozilla1.0.1
QA Contact: madhur → rakeshmishra
QA Contact: rakeshmishra → trix
.
Assignee: joki → saari
Status: ASSIGNED → NEW
QA Contact: trix → ian
retargeting
Target Milestone: mozilla1.0.1 → Future
Assignee: saari → nobody
QA Contact: ian → events
Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Component: Event Handling → User events and focus handling

Hi Brendan, do you believe this issue is still relevant today or it can be closed?

Flags: needinfo?(brendan)

Redirect a needinfo that is pending on an inactive user to the triage owner.
:peterv, since the bug has high severity, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(brendan) → needinfo?(peterv)

This is some ancient bug from pre-cycle-collector era.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(peterv)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.