Closed Bug 296987 Opened 15 years ago Closed 14 years ago

Increased leakage on Linux balsa Dep GTK1 (gcc 3.4)

Categories

(Toolkit :: Application Update, defect, major)

x86
Linux
defect
Not set
major

Tracking

()

RESOLVED FIXED

People

(Reporter: Peter6, Assigned: bugs)

References

Details

(Keywords: memory-leak, perf, regression, Whiteboard: [ETA ?])

Attachments

(1 file, 1 obsolete file)

Landing of bug 296868 may have cause memleak on Balsa

20050606 14:11:00 pdt , before landing
RLk:816B
Lk:278KB
MH:8.28MB
A:216K 

20050606 15:23:00 pdt , after landing
RLk:75.2KB <--
Lk:1.46MB <--
MH:8.22MB
A:215K

also minor Ts regr. on Pacifica
OS -> Linux
Severity -> Major
Requesting blocking aviary1.1
Severity: normal → major
Flags: blocking-aviary1.1?
OS: Windows 2000 → Linux
brendan: Any chance you can look into this (i'll ask because you also debugged
the leak in Bug 294893 together with some other people), graph of the leak can
be found under
http://build-graphs.mozilla.org/graph/query.cgi?tbox=balsa&testname=trace_malloc_leaks&autoscale=1&size=&units=bytes&ltype=&points=&showpoint=&avg=1&days=30
(the leak is even a bit higher than the one fixed).
Assignee: nobody → bugs
Has this been fixed by Bug 294893?
Leakage is almost back to the level noted in Comment #0.

RLk:6.84KB
Lk:346KB
MH:7.64MB
A:213K
(In reply to comment #4)
> Has this been fixed by Bug 294893?
> Leakage is almost back to the level noted in Comment #0.

No it's not been fixed.

This leak was introduced by something else (the software update stuff).
Checkins that might have fixed this leak (also there is still something left):
Bug 299302 and Bug 85207, Bug 299302 is more likely so (i skiped the other
checkins in the timeframe since those didn't fix the leak to 99,9%).
The reason this is leaking is that the timer manager creates a reference cycle:

function TimerManager() {
  const nsITimer = Components.interfaces.nsITimer;
  this._timer = Components.classes["@mozilla.org/timer;1"]
                          .createInstance(nsITimer);
  var timerInterval = getPref("getIntPref", PREF_APP_UPDATE_TIMER, 5000);
  this._timer.initWithCallback(this, timerInterval, 
                               nsITimer.TYPE_REPEATING_SLACK);
}

Now the TimerManager JS object keeps alive the XPCWrappedNative for the nsITimer
object, which keeps alive the C++ timer, which keeps alive the nsXPCWrappedJS
that reflects the TimerManager into C++, which make the TimerManager JS object a
GC root so the whole thing can't go away.

If the TimerManager is a service, this code could use a third object
implementing nsITimerCallback and getting the manager via getService as needed;
that would break the cycle...

That all said, there is other toolkit code that does similar things -- both
toolkit/mozapps/extensions/content/update.xml and
toolkit/content/widgets/preferences.xml are XBL bindings that keep a timer in a
member variable and pass the binding itself as the nsITimerCallback; those
bindings, together with whatever other resources they hold on to would also
leak, but that requires opening up the update or preferences UI, which balsa
doesn't do.
One other note.  Fixing the timer cycle doesn't fix all the leaks -- the
UpdateService is somehow leaked via the TimerManager, probably because the last
GC we do GCs the TimerManager and just drops the nsXPCWrappedJS to the
UpdateService; the UpdateService doesn't get GCed at that point yet.  Ideally,
the TimerManager would observe shutdown and drop its pointers to timer callbacks
when it happens.
(In reply to comment #8)
> Ideally,
> the TimerManager would observe shutdown and drop its pointers to timer
> callbacks when it happens.

Would that make real leaks not show up in the leak logs?
Not sure... At the moment, what's happening is that TimerManager holds a JS ref
to the UpdateService.  Both are services implemented in JS.  At shutdown, we
shut down the service manager, then we're able to GC the TimerManager.  But it
was holding a ref to an XPCWrappedNative for an nsXPCWrappedJS for the
UpdateService.  So UpdateService was a GC root for this GC cycle, and didn't get
GC'ed.  At least as far as shaver and I can tell this is what's happening, since
the TimerManager itself is not listed as being leaked.

Oddly, the nsXPCWrappedJS for the UpdateService _is_ leaked according to the
refcount log...
If a finalizer or GC callback removes a root, rt->gcLevel will be bumped and the
GC will restart.  It sounds like some XPConnect root created using GC callbacks
is being removed or cleared without rt->gcLevel being bumped.  bz, does that
make sense at all?

/be
So comment 8 after the "--" part is wrong.  The reason the UpdateService leaks
is the following:

We leak an nsXPCWrappedJS.  The ptr to this is 0x082CD890.  The log output from
my build says:

  Created wrapped native [xpconnect wrapped nsITimerCallback @ 0x827df40
  (native @ 0x82cd890)], flat JSObject is 0x82d9dc8

This is the nsXPCWrappedJS getting wrapped in an XPCWrappedNative to go back out
to JS; as long as the JSObject at 0x82d9dc8 is alive, this nsXPCWrappedJS object
will stay alive.  The JSObject is kept alive via the following GC path:

082d9dc8 object 0x827df40 XPCWrappedNative_NoHelper via
nsXPCWrappedJS::mJSObj[nsIObserver,0x82cd890,0x82d9a30](Object @
0x082d9a30).isDownloading getter(Function @ 0x082861c0).__proto__(Function @
0x08285778).__proto__(Object @ 0x082857a8).__parent__(BackstagePass @
0x08285770).TimerManager(Function @ 0x082d9428).prototype(Object @
0x082d9890)._timers(Object @ 0x082d9898).background-update-timer(Object @
0x082da1e0).callback(XPCWrappedNative_NoHelper @ 0x082d9dc8).

So we do in fact have a cycle here: TimerManager.prototype -> _timer -> callback
-> XPCWrappedNative JSObject -> XPCWrappedNative -> nsXPCWrappedJS ->
UpdateService jsobject -> BackstagePass (scope object?) -> TimerManager ->
TimerManager.prototype.
I filed bug 300023 on the issue starting with comment 8 so that this bug can
stay focused on the timer cycle.
Flags: blocking-aviary1.1? → blocking1.8b4+
TimerManager not only holds the timer, but a whole bucket o' objs impl'ing
nsITimerCallback (including the UpdateService)
Comment on attachment 189707 [details] [diff] [review]
remove the xpcom-shutdown listener, too

>+      gOS.removeObserver(this, "xpcom-shutdown");

Removing the observer seems a little unusual; are you sure the observer service
handles removal during notification correctly?	(I'd hope so, but...)
Whiteboard: SWAG: 2d
It's safe to remove an observer during the notification (from Darin).

This patch has been checked in... it looks like Lk on Balsa went down around 7/9
and went up again on 7/23... not sure why on 7/23, that's probably unrelated. 
I don't see this bug# in the CVS log... when was this patch checked in?  Note
that the Rlk number on balsa as of today is still about 8 times higher than it
was before the bug 296868 landing (3556 vs 816)....  But perhaps someone else
checked something leaky in in the meantime, and we just didn't notice.
OK, I bet the remaining Rlk is bug 300833
Depends on: 300833
(In reply to comment #19)
> OK, I bet the remaining Rlk is bug 300833

Also see bug 297312 comment 30.

I checked the tinderbox logs of the build that showed the increased memory
usage, and I did not see a NEW-LEAKS table. (bsmedberg also said on IRC that he
doesn't see any potential code that could leak)

Although a few days later, these 2 checkins:
http://tinderbox.mozilla.org/bonsai/cvsquery.cgi?module=PhoenixTinderbox&date=explicit&mindate=1122300300&maxdate=1122304319

Showed this in the log:
 --NEW-LEAKS-----------------------------------leaks------leaks%-----------------------
nsVoidArray                                      24      50.00%
bug 297312 caused a bug during startup and bug 302206 fixed that bug... I
suspect bug 297312 for the increase in leakage on the day it was checked in and
bug 302206 for the corresponding decrease when it was checked in. It appears
there may have been a slight increase as well between the two.
Blocks: 290390
Whiteboard: SWAG: 2d → [ETA ?]
Flags: blocking1.8b5+ → blocking1.8b5-
This got fixed.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.