Closed Bug 28413 Opened 20 years ago Closed 20 years ago

bloat stats leak total incorrect

Categories

(Core :: XPCOM, defect, P3, blocker)

x86
Linux
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: dbaron, Assigned: warrensomebody)

Details

(Keywords: verifyme)

Attachments

(1 file)

In the bloats stats totals on tinderbox (and, I assume, elsewhere), the TOTAL
for bytes leaked listed at the top is not the total of the numbers of bytes
listed below it.  This is most painfully obvious for some of the builds that are
showing 133K of leaks lately, where there are 145K of StyleContextImpl leaks. 
However, the TOTAL is definitely wrong in other lists as well.
I just pulled some bloat/leak numbers off of tinderbox and added them up in 
excel and sure enough, they're wrong again. Excel showed the Leaked column 
adding up to 116216 bytes, whereas the total reported was 89324. When I added 
things up before this way, these totals used to be the same, so I'm not sure 
what happened. Looking at the code in nsTraceRefcnt.cpp, it's not obvious.

However, I have no idea where you get this number that 145k StyleContextImpls 
are being leaked. I only see 64148 in my stats.
The bloat stats on Tinderbox have been oscillating wildly lately, and when the
"total" was 133K on one of the builds this morning (that was the highest it
hit), there were 145K of DeviceContextImpl leaks.  It was one of the autobahn
builds - I think the ones around 7AM, although there was one at 10AM with about
130K of leaks.
Severity: normal → blocker
Could this be something along the lines of multiplying the number of objects
leaked by the average size of the objects leaked?  I ask because
StyleContextImpl's, which are very large, seem to be undercounted a bit in the
leak totals.  I think the StyleContextImpl leak was introduced on Feb 10 (see
bug 28555), and this bug may have masked its appearance.

Raising severity to blocker because I think this blocks effective leak
monitoring significantly.
Hmmm... was revision 1.46 of nsTraceRefcnt.cpp:
http://bonsai.mozilla.org/cvslog.cgi?file=mozilla/xpcom/base/nsTraceRefcnt.cpp
an attempt to correct the bloat stats?  Could it have fixed the bloat stats and
broken the leak stats?  I might try undoing it to see what happens...
I think it's impossible to get both the bloat stats and the leak stats to have
the correct totals by manipulating the mClassSize of the total, which seems to
be what you're trying to do.  I think you need to just add things up (and
perhaps use a different structure for the TOTAL).
Does the patch I just attached look good?  (It's a little bit bloaty itself,
since it uses an extra 32 bit int for each class...)  Since the bloat totals are
calculated using the class size listed, I kept the class size the same and
computed the leaks separately.

However, I'm not sure the bloat totals are right, either.  It could be a
spreadsheet error, though :-).    I'll investigate further...

If you review this I could check it in when the tree opens again (or before, if
you think it would be good to get approval and do it), although I think it would
be good to check it in at a time when there's not much other activity in the
tree, and you might be in a better position to do that...
Never mind about that comment on the bloat stats - I was looking at a
spreadsheet that only listed leaks!

However, it might not be a bad idea to print the total Bloat in the classSize
field for total, because right now the bloat statistics are very insensitive to
bloat changes caused by changes in class size (due to the rounding).  That
change would involve:
 * commenting out the one line in DumpTotal()
 * changing Tinderbox (or whatever produces the diffs) so it doesn't multiply.
Should I file a separate bug for making the bloat stats more accurate?  It would
require the change I mentioned above and a check for $name=="TOTAL" at
http://lxr.mozilla.org/mozilla/source/tools/tinderbox/bloatdiff.pl#23 .
dbaron: I'm just catching up on this... I don't understand why you said "I think 
it's impossible to get both the bloat stats and the leak stats to have the 
correct totals by manipulating the mClassSize of the total." All I was trying to 
do here was use the mClassSize field of the record for TOTAL to hold the average  
instance size. That only gets printed out in the leak report, and isn't used in 
any other way that I can remember. 

The real problem here is that the bloat column (total count allocated for a 
class times instance size for that class) isn't getting summed up properly.
Tinderbox is making the bloat stats (see the link above to the tinderbox source)
by multiplying the average size listed for TOTAL by the object count.  A month
or two ago, you corrected the average size so that calculation is correct. 
However, internally you are computing the leak stats the same way, but the
change in the computation of the average size messed those up.

However, because of the rounding of the average size to the nearest integer,
tinderbox bloat stats are very insensitive to changes in object size, rather
than object count.  That's why I'm suggesting printing the total bloat for total
rather than the average size (which isn't all that meaningful).
But all of this has nothing to do with what I discovered -- that the total bytes 
leaked column when added up isn't correct. 

I agree that the average instance size is truncated and will end up reporting a 
smaller value for the bloat stat than is the actual value, but this amount will 
be less than the count of remaining objects which is only around 3081 right now 
(i.e. an average object size off by at most one byte times 3081 objects = 3081 
bytes).

What I'm seeing is that if I sum up the leaked bytes column in the latest bloat 
view log, I get 209876, but the number reported is 144115. Also, if I sum up the 
objects times their instance size (which comes to 24895500) and divide by total 
number of objects I get an average size of 46.77556634. Multiplying this by the 
number remaining (3081) gives me 144115.5199 which *is* close to the number 
reported. I don't quite see what's going wrong here.
Now that I think about this, I guess it's not valid to compute the total bytes 
leaked by multiplying the average object size by the number of objects leaked 
because the average object size is not the average for the leaked classes.

I guess we should add an extra field to the struct as you suggest. However I 
want to change your diffs a little. I'll submit another patch.

Sorry, when you said "tinderbox" was doing something, I thought you meant 
bloatdiff.pl, not nsTraceRefcnt.cpp.
Ok, I've decided that your patch looks good after all. I was worried about the 
ability to view deltas with DumpNewEntries, but I think everything's ok. Go 
ahead and check it in (or let me know if you need help with getting approval, 
etc.) Thanks for spotting and fixing this.
I did mean tinderbox, at least at some points.  I'll open up a new bug on that
issue once the original problem here is closed, and there I'll explain more
clearly what I think is wrong.
Target Milestone: M15
OK... I checked in this fix, and I'll file a separate bug on the other issues
later.
I split off the other issues into bug 28804, and I'm marking this bug fixed.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Summary: bloat stats total incorrect → bloat stats leak total incorrect
Adding verifyme keyword.
Keywords: verifyme
Marking this bug Verified due to split.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.