Closed Bug 28413 Opened 25 years ago Closed 25 years ago

bloat stats leak total incorrect

Categories

(Core :: XPCOM, defect, P3)

x86
Linux
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: dbaron, Assigned: warrensomebody)

Details

(Keywords: verifyme)

Attachments

(1 file)

In the bloats stats totals on tinderbox (and, I assume, elsewhere), the TOTAL for bytes leaked listed at the top is not the total of the numbers of bytes listed below it. This is most painfully obvious for some of the builds that are showing 133K of leaks lately, where there are 145K of StyleContextImpl leaks. However, the TOTAL is definitely wrong in other lists as well.
I just pulled some bloat/leak numbers off of tinderbox and added them up in excel and sure enough, they're wrong again. Excel showed the Leaked column adding up to 116216 bytes, whereas the total reported was 89324. When I added things up before this way, these totals used to be the same, so I'm not sure what happened. Looking at the code in nsTraceRefcnt.cpp, it's not obvious. However, I have no idea where you get this number that 145k StyleContextImpls are being leaked. I only see 64148 in my stats.
The bloat stats on Tinderbox have been oscillating wildly lately, and when the "total" was 133K on one of the builds this morning (that was the highest it hit), there were 145K of DeviceContextImpl leaks. It was one of the autobahn builds - I think the ones around 7AM, although there was one at 10AM with about 130K of leaks.
Severity: normal → blocker
Could this be something along the lines of multiplying the number of objects leaked by the average size of the objects leaked? I ask because StyleContextImpl's, which are very large, seem to be undercounted a bit in the leak totals. I think the StyleContextImpl leak was introduced on Feb 10 (see bug 28555), and this bug may have masked its appearance. Raising severity to blocker because I think this blocks effective leak monitoring significantly.
Hmmm... was revision 1.46 of nsTraceRefcnt.cpp: http://bonsai.mozilla.org/cvslog.cgi?file=mozilla/xpcom/base/nsTraceRefcnt.cpp an attempt to correct the bloat stats? Could it have fixed the bloat stats and broken the leak stats? I might try undoing it to see what happens...
I think it's impossible to get both the bloat stats and the leak stats to have the correct totals by manipulating the mClassSize of the total, which seems to be what you're trying to do. I think you need to just add things up (and perhaps use a different structure for the TOTAL).
Does the patch I just attached look good? (It's a little bit bloaty itself, since it uses an extra 32 bit int for each class...) Since the bloat totals are calculated using the class size listed, I kept the class size the same and computed the leaks separately. However, I'm not sure the bloat totals are right, either. It could be a spreadsheet error, though :-). I'll investigate further... If you review this I could check it in when the tree opens again (or before, if you think it would be good to get approval and do it), although I think it would be good to check it in at a time when there's not much other activity in the tree, and you might be in a better position to do that...
Never mind about that comment on the bloat stats - I was looking at a spreadsheet that only listed leaks! However, it might not be a bad idea to print the total Bloat in the classSize field for total, because right now the bloat statistics are very insensitive to bloat changes caused by changes in class size (due to the rounding). That change would involve: * commenting out the one line in DumpTotal() * changing Tinderbox (or whatever produces the diffs) so it doesn't multiply.
Should I file a separate bug for making the bloat stats more accurate? It would require the change I mentioned above and a check for $name=="TOTAL" at http://lxr.mozilla.org/mozilla/source/tools/tinderbox/bloatdiff.pl#23 .
dbaron: I'm just catching up on this... I don't understand why you said "I think it's impossible to get both the bloat stats and the leak stats to have the correct totals by manipulating the mClassSize of the total." All I was trying to do here was use the mClassSize field of the record for TOTAL to hold the average instance size. That only gets printed out in the leak report, and isn't used in any other way that I can remember. The real problem here is that the bloat column (total count allocated for a class times instance size for that class) isn't getting summed up properly.
Tinderbox is making the bloat stats (see the link above to the tinderbox source) by multiplying the average size listed for TOTAL by the object count. A month or two ago, you corrected the average size so that calculation is correct. However, internally you are computing the leak stats the same way, but the change in the computation of the average size messed those up. However, because of the rounding of the average size to the nearest integer, tinderbox bloat stats are very insensitive to changes in object size, rather than object count. That's why I'm suggesting printing the total bloat for total rather than the average size (which isn't all that meaningful).
But all of this has nothing to do with what I discovered -- that the total bytes leaked column when added up isn't correct. I agree that the average instance size is truncated and will end up reporting a smaller value for the bloat stat than is the actual value, but this amount will be less than the count of remaining objects which is only around 3081 right now (i.e. an average object size off by at most one byte times 3081 objects = 3081 bytes). What I'm seeing is that if I sum up the leaked bytes column in the latest bloat view log, I get 209876, but the number reported is 144115. Also, if I sum up the objects times their instance size (which comes to 24895500) and divide by total number of objects I get an average size of 46.77556634. Multiplying this by the number remaining (3081) gives me 144115.5199 which *is* close to the number reported. I don't quite see what's going wrong here.
Now that I think about this, I guess it's not valid to compute the total bytes leaked by multiplying the average object size by the number of objects leaked because the average object size is not the average for the leaked classes. I guess we should add an extra field to the struct as you suggest. However I want to change your diffs a little. I'll submit another patch. Sorry, when you said "tinderbox" was doing something, I thought you meant bloatdiff.pl, not nsTraceRefcnt.cpp.
Ok, I've decided that your patch looks good after all. I was worried about the ability to view deltas with DumpNewEntries, but I think everything's ok. Go ahead and check it in (or let me know if you need help with getting approval, etc.) Thanks for spotting and fixing this.
I did mean tinderbox, at least at some points. I'll open up a new bug on that issue once the original problem here is closed, and there I'll explain more clearly what I think is wrong.
Target Milestone: M15
OK... I checked in this fix, and I'll file a separate bug on the other issues later.
I split off the other issues into bug 28804, and I'm marking this bug fixed.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Summary: bloat stats total incorrect → bloat stats leak total incorrect
Adding verifyme keyword.
Keywords: verifyme
Marking this bug Verified due to split.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: