Closed Bug 191195 Opened 22 years ago Closed 22 years ago

trace malloc leaks up 50% due to checkin on 22nd Jan

Categories

(SeaMonkey :: General, defect)

x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.3final

People

(Reporter: gerv, Assigned: ccarlen)

Details

(Keywords: memory-leak, perf, topembed+, Whiteboard: edt_b3)

Attachments

(1 file)

[I don't know where this bug should live; hopefully one of the CC list should be able to place it.] http://tegu.mozilla.org/graph/query.cgi?tbox=backupboy&testname=trace_malloc_leaks&autoscale=1&size=&units=bytes&ltype=&points=&showpoint=&avg=1&days=31 This graph has a large upward jump in it. That's probably not good. Tinderbox only keeps data for a week, so I can't see which checkin it was. But looking at the graph, it seems to be between 9pm on the 21st and midnight. http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=01%2F21%2F2003+21%3A00&maxdate=01%2F22%2F2003+00%3A00&cvsroot=%2Fcvsroot The obvious culprit (to my untrained eye) is: "bug 186703 - fix moz-ns installers to build and run with GRE. got verbal r=sgehani,sr=dveditz" by ssu. CCing him. Gerv
Keywords: mlk
Keywords: perf
over to ssu to take a look.
Assignee: asa → ssu
interesting that my checkin was for win32 only, and even with that it only touched the native win32 installer code, it's perl build scripts, and packages file lists, not browser code. Adding more people to the cc: who might know something.
It looks to me like ssu's stuff landed quite after this jump... That checkin was at 01/21/2003 23:59 while the leak data points are: 23096 2003:01:21:20:36:40 660873 -- 209.189.198.101 backupboy 23097 2003:01:21:21:24:12 660745 -- 209.189.198.101 backupboy 23098 2003:01:21:22:18:53 833315 -- 209.189.198.101 backupboy 23099 2003:01:21:23:17:45 902262 -- 209.189.198.101 backupboy 23100 2003:01:22:00:14:07 900565 -- 209.189.198.101 backupboy So the jump seems to be rather earlier than ssu's checkin... Patches in that time period that could be guilty, imo: bug 179798, bug 97622 for the first jump (660k -> 833k) bug 180384 for the second jump (833k -> 900k) Not sure what to think of the fact that those are two distinct jumps...
Going over the checkin for bug 179798 by eye, I can't see any problems there... but without knowing what specific tests backupboy is running, or installing valgrind, I can't be sure. Might give valgrind a try... Does anyone else have valgrind handy (ccarlen, timeless?) to make sure it's not the checkins to bug 97622 or bug 180384?
valgrind? no. There is doc for leak tests here, though: http://mozilla.org/performance/leak-brownbag.html#tbox Dan, I doubt it's your checkin. One thing that the checkins for bug 97622 and bug 180384 have in common: They both register objects (1 time only since they're both services) with the observer service for the lifetime of the app. This bug shouldn't belong to ssu. It looks to be either me or timeless.
Assignee: ssu → ccarlen
Target Milestone: --- → mozilla1.3final
It looks like we're leaking the hidden window and everything it holds onto, perhaps due to JS rooting issues with XPCWrappedNatives. Specifically, if I make the closeAll method in nsCloseAllWindows.js just return true, I don't see the leak. If, however, I have it get the window mediator service, and nothing else, the leak returns. I'm not familiar enough with this code to know what's happening... maybe we're ill-prepared for instantiating JS components at this point in the shutdown process?
Brian, I assume the getService is not just throwing an exception?
I don't see any exceptions printed to stdout.
can someone try this: set GRE_BUILD=0 in xpfe/bootstrap/Makefile.in and clobber rebuild xpfe/bootstrap.
dougt: that doesn't seem to have any impact.
I'm not seeing a leak of the mediator service. It gets properly destructed on shutdown and I verified I was going through closeAll. This was using a trunk build from 3/3
If this is a big regression in 1.3 over 1.2.1, maybe we should hold 1.3 for a day and make a concerted effort to fix it? We should definitely fix for 1.4a. /be
Flags: blocking1.4a?
Flags: blocking1.3?
Keywords: topembed+
dbradley: I didn't say the mediator service was leaking. However, I believe maybe there's an XPCWrappedNative associated with it that is leaking somehow.
brendan: While this badly skews our leak tracking (and is therefore important to fix), it would have minimal end-user impact for 1.3, as the leak occurs at shutdown. (though perhaps you could hit this code path multiple times if you're using turbo mode? I'm not sure.)
From the "Show the raw data for this plot", I can see the increase this bug is about: 23096 2003:01:21:20:36:40 660873 -- 209.189.198.101 backupboy 23097 2003:01:21:21:24:12 660745 -- 209.189.198.101 backupboy 23098 2003:01:21:22:18:53 833315 -- 209.189.198.101 backupboy 23099 2003:01:21:23:17:45 902262 -- 209.189.198.101 backupboy (looks like two increases, the first roughly between 20:36 and 21:24, and the second between 21:24 and 22:18, both on 2003-01-21) and a more recent drop: 23323 2003:02:03:21:20:12 902317 -- 209.189.198.101 backupboy 23324 2003:02:03:22:42:43 902317 -- 209.189.198.101 backupboy 23325 2003:02:04:00:06:08 742456 -- 209.189.198.101 backupboy (roughly between 21:20 and 22:42 on 2003-02-03) (I wish these logs stored the pull times instead of the report times.)
(Note, also, that the log where the increase is has a lot of info about the allocation stacks of the new objects leaked, so it's good to paste that information in when it's available. That log is long gone now, though.)
Cc'ing mcafee on account of dbaron's comment #16. Can we record cvs pull times? /be
Flags: blocking1.3? → blocking1.3-
Whiteboard: edt_a3
Whiteboard: edt_a3 → edt_b3
I heard that brendan & bryner had found the cause of this. If so, thanks, and can one of you put up a patch or take the bug?
ccarlen, brendan and dbaron think this is fixed. Can you confirm that things are better?
Looking at the graph, the original problem is solved, but there has been a recent jump just today. It's unclear yet whether it's come back down again. Gerv
This is fixed. Gerv
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Flags: blocking1.4a? → blocking1.4a-
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: