The default bug view has changed. See this FAQ.

Find classes that are wasting memory due to alignment

NEW
Unassigned

Status

()

Core
Rewriting and Analysis
8 years ago
3 years ago

People

(Reporter: Swatinem, Unassigned)

Tracking

(Blocks: 2 bugs, {footprint})

Trunk
footprint
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P3])

Attachments

(1 attachment, 10 obsolete attachments)

(Reporter)

Description

8 years ago
Created attachment 376545 [details]
dehydra script

I have been hacking up a dehydra script that is calculating a classes "real" sizeof and comparing it to the final sizeof to determine if that class is wasting memory due to alignment.
The script works quite well as far as I can tell.
(Reporter)

Comment 1

8 years ago
Created attachment 376546 [details]
List of badly aligned classes (ods)

This is a filtered list of badly aligned classes. I will skim through it and try to fix some of the classes.

Comment 2

8 years ago
This awesome stuff. Who would've thought that getting this number is as simple as subtracting sizes of members from the struct size :) Whenever I thought about this issue I forgot that dehydra provides size_of.

/**
 * Size of a pointer, assume as 8, TODO: how to get the *real* pointer size?
 */
var pointerSize = 8;

I presume this is 64bit

	if(type.isPointer)
	{
		size.real = pointerSize;
I think you can do size.real = type.precision/8

For lazy bums like me, csv files are easier to open.

As I said before, awesome!

David, I believe you wanted to see some static analysis results on accessibility, this spreadsheet looks like a goldmine.
Blocks: 430328
(Reporter)

Comment 3

8 years ago
>         size.real = pointerSize;
> I think you can do size.real = type.precision/8

Yes indeed. I use pointerSize mainly for adding the size of the vtable pointer to classes.
Will see what else I can do there.

I wonder how much freedom I have in reordering members. I suppose I can't change anything in NSPR/NSS/frozen linkage due to ABI compatibility. Other classes should be fine though?

A quick look over some classes also revealed more optimization opportunities by changing away from PRBool (4byte) to PRPackedBool (1byte) or even straight to bool, as that is now used in some places in the codebase already.

Comment 4

8 years ago
> I wonder how much freedom I have in reordering members. I suppose I can't
> change anything in NSPR/NSS/frozen linkage due to ABI compatibility. Other
> classes should be fine though?

Think so.

> 
> A quick look over some classes also revealed more optimization opportunities by
> changing away from PRBool (4byte) to PRPackedBool (1byte) or even straight to
> bool, as that is now used in some places in the codebase already.

PRPackedBool is probably easier since you don't have to add casts everywhere to switch to bool. I'm also not sure if all modules allow bool usage.

While you are at it could you do a check similar to the one mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=376434

Basically scan for structs that have both signed and unsigned bitfields and warn because MSVC packs them wrong.
(In reply to comment #2)
> David, I believe you wanted to see some static analysis results on
> accessibility, this spreadsheet looks like a goldmine.

Wow, crazy! Great work. What happens next?

Comment 6

8 years ago
(In reply to comment #5)
> (In reply to comment #2)
> > David, I believe you wanted to see some static analysis results on
> > accessibility, this spreadsheet looks like a goldmine.
> 
> Wow, crazy! Great work. What happens next?

Would be nice if you could go through accessibility ones to verify the results are correct and maybe pick a couple that are frequently allocated as candidates to fixing.
Arpad: excellent work -- any clues from your tool how JSRuntime loses 92 bytes to internal fragmentation?

/be
(Reporter)

Comment 8

8 years ago
(In reply to comment #7)
> Arpad: excellent work -- any clues from your tool how JSRuntime loses 92 bytes
> to internal fragmentation?
> 
> /be


My tool currently doesn't print a list of members, but that would be easy to add.
Looking at the code, I can see some possible problems:

>    uint32              gcMallocBytes;
>    JSGCArenaInfo       *gcUntracedArenaStackTop;

>    JSBool              rngInitialized;
>    int64               rngMultiplier;

>    uint32              requestCount;
>    JSThread            *gcThread;

And quite a few more... I'm on x86_64, and every time a 4byte member is followed by a 8byte member (or pointer) its wasting 4 bytes.
(Reporter)

Comment 9

8 years ago
Started a thread on dev.platform: http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/52bc09157428dadb#
(Reporter)

Updated

8 years ago
Depends on: 492257
(Reporter)

Comment 10

8 years ago
Created attachment 376620 [details]
dehydra script, v2
Attachment #376545 - Attachment is obsolete: true

Comment 11

8 years ago
On 32-bit x86, misaligned doubles are allowed but slow down the program considerably.  It might be useful to run this script with pointerSize=4 to find those as well.
(Reporter)

Comment 12

8 years ago
Created attachment 376682 [details]
list of badly aligned classes (csv)

This list includes a possible size if using C++ bools (which are 1byte as are PRPackedBools) and a list of direct members with their sizes (does not include inherited members from parent classes).
Attachment #376546 - Attachment is obsolete: true
(Reporter)

Comment 13

8 years ago
Created attachment 376683 [details]
dehydra script, v3
Attachment #376620 - Attachment is obsolete: true
(Reporter)

Comment 14

8 years ago
Created attachment 376728 [details]
list of badly aligned classes (csv)

The script had some issues with inheritance and vtables, probably still has, I'm not quite sure.
Attachment #376682 - Attachment is obsolete: true
(Reporter)

Updated

8 years ago
Attachment #376728 - Attachment mime type: application/octet-stream → text/plain
(Reporter)

Updated

8 years ago
Blocks: 492376
You might find this document helpful, if you haven't seen it already:

http://www.codesourcery.com/public/cxx-abi/abi.html#layout

It lays out things like the rules for where the vtable pointer appears and under what circumstances a derived class's members can be packed into the base class's tail padding.  (Ignore mention of Itanium; GCC uses this ABI on all architectures nowadays, modulo things like pointer size and minimum required alignment.)
(Reporter)

Comment 16

8 years ago
Created attachment 376742 [details]
dehydra script

Still has problems with inheritance and vtable pointers though :(
Attachment #376683 - Attachment is obsolete: true
Quick comments on looking at the list:

1)  A number of these are services or one-per-document objects.  Savings from
    those will be small no matter what.  It might not be worth making the
    changes if it decreases readability by separating related members.
2)  You might want to get an NSS person to look at those SEC* things near the
    top of the list (as well as the other ones on it).
3)  A bunch of per-element accessibility stuff here that could really use
    fixing, if the numbers are right.
4)  The frame and element classes in that list should be a high priority.
5)  I'm not sure where some of these differences are coming from.  For example,
    The one for nsHTMLFormElement seems way too big; I would have expected
    about 2 bytes of difference there.  The class does have 5 vtable pointers
    in addition to the ones inherited from the superclass; is that where the
    other 40 bytes come from?
(Reporter)

Updated

8 years ago
Blocks: 492400
(Reporter)

Updated

8 years ago
Blocks: 492510
(Reporter)

Updated

8 years ago
Blocks: 492520
(Reporter)

Updated

8 years ago
Blocks: 492531
Minor output nit:  can you include a percentage, or better yet just stick the results in a google spreadsheet and link to them?

Tying these results to trace-refcnt output from a Tp run would also be interesting, as far as looking at the maximum number of bytes saved during runtime.
(Reporter)

Comment 19

8 years ago
This time I tried a different approach. Instead of taking the compiled sizeof() and messing around with vtable pointers, I rather look at the members and see how many GAPs are between them. I then calculate how much one can win by reordering the members or using 1byte bools.

This time, there aren't that many wins to be made. More wins could be made by cutting down the inheritance tree, consolidating Interfaces and remove unnecessary ones.

Here is the spreadsheet: http://spreadsheets.google.com/ccc?key=rVIrgklQt9Y2H1hnS87L1HA
(Reporter)

Updated

8 years ago
Attachment #376728 - Attachment is obsolete: true
The desire for a cheap, always-on memory profiler that puts allocations in coarse, non-hierarchical buckets (JS, DOM, HTML, XUL, CSS, ...) came up again, Ben Galbraith is leading the charge.

This goal reminded me of SGI's Power Bloat Vision(tm) tool, a graphical line-chart or (default UI) bar with color-coded segments jumping around as different kernel identified resource users of memory changed their shares. PBV really helped users zero in on bloat in their apps, as well as in Motif/X, the SGI graphical desktop, and Irix ;-).

Thinking about how to do it super-cheaply, I thought of this bug. We don't want to add malloc interception by default, but we use jemalloc on Windows and Linux, and may on Mac. We can hack jemalloc. Problem is we don't want to change malloc, new, etc. signatures to carry "bucket ids", nor do we wish to walk the stack on every allocation, even by one frame.

But suppose each bucket (subsystem) happened to allocate unique sizes? Then we could statically analyze the sizes and produce a mapping to burn into the build.

Ok, I'm sure different buckets have their own structs or classes that happen to have the same size. Still, we could find such types with static analysis, and even generate a patch to farble the size of one or the other to differ.

If this wastes space, we could do something trickier, like farble the size passed to malloc in a way that did not waste space (e.g., subtract from 0 to 3 to help distinguish 4 buckets' uses of the same size).

What say you static analysis folks? We need your northern-style kung fu!

/be
I will file a new bug if needed, I didn't mean to hijack this one -- sorry. Want to get a quick sanity check, figured this bug could use some action ;-).

/be
(Reporter)

Updated

8 years ago
Depends on: 509380
(Reporter)

Comment 22

7 years ago
Created attachment 466989 [details]
dehydra script
Attachment #376742 - Attachment is obsolete: true
(Reporter)

Comment 23

7 years ago
Created attachment 466990 [details]
Vava prog for parsing output into cvs

I have been running this again after a long time, there are quite a few offenders left, see the bug url field.
Assuming we fixed everything and every structure was tightly packed, what's the total net savings?  (Both absolute and %)

Comment 25

7 years ago
(In reply to comment #24)
> Assuming we fixed everything and every structure was tightly packed, what's the
> total net savings?  (Both absolute and %)

Savings would require instrumenting the build to figure out how often these datastructures are used.
Can we get the list sorted by win size or win%? I suspect that most of these classes are very infrequently used, but a few seem like they could be big wins. Some suggestions, if you want to do tryserver runs on a limited set:

nsHtml5StackNode (not actually stack-allocated, despite its name)
nsStyleVisibility
mozilla::dom::Link (there can be thousands of these per page, might improve Tp times)
nsHtml5TreeOperation
nsHtml5ElementName
nsHTMLElement (!)

For all of the IPDL-generated classes (P*), you should probably just file a bug on the code generator to emit the members in a better order.

It also looks like you're listing typedefs of classes, which adds a lot of duplicate data.
(Reporter)

Comment 27

7 years ago
(In reply to comment #26)
> Can we get the list sorted by win size or win%?

"usingbool" means wins if PRBools are changed to PRPackedBools (or C++ bools) and is thus a better sorting column, especially for bool heavy Classes like nsHtml5StackNode. In extreme cases, using a bitfield may yield even better wins. What is the policy regarding such changes?

> It also looks like you're listing typedefs of classes, which adds a lot of
> duplicate data.

Yes, the script is far from being perfect :(
We're not going to switch to real bool until after FF4. Can we fix the easy wins I mentioned now?

Comment 29

7 years ago
Just FYI, Julian has a decent list of distributions of allocated objects from bug 551477. I'm gonna try cross-referencing that with this.
(Reporter)

Comment 30

7 years ago
Created attachment 467717 [details] [diff] [review]
fix classes wasting memory

(In reply to comment #28)
> We're not going to switch to real bool until after FF4. Can we fix the easy
> wins I mentioned now?

mozilla::dom::Link does already use bool, and according to dehydra, they are 1 bit types instead of 1 byte, so there are possibly more wins to have. Will update my script accordingly.

The easy cases you mentioned are included in the patch, except for nsHTMLElement, which would require a rewrite of all the static initialization.
Attachment #467717 - Flags: review?
(Reporter)

Comment 31

7 years ago
(In reply to comment #30)
> mozilla::dom::Link does already use bool, and according to dehydra, they are 1
> bit types instead of 1 byte, so there are possibly more wins to have. Will
> update my script accordingly.

Dehydra reports a precision of 1bit for C++ bools, but GCC does not automatically fold adjacent bools into bitfields, would have to be done manually. I updated the table to assume bitfield packing for bools, also I hope to have fixed the duplicate typedefs.
You cannot and should not fold bools into bitfields because you can take their address. In some cases it can also be a performance hit because it involves a full read-write.

Updated

7 years ago
Blocks: 583074

Updated

6 years ago
Keywords: footprint
Whiteboard: [MemShrink:P3]
OS: Linux → All
Hardware: x86_64 → All
Comment on attachment 467717 [details] [diff] [review]
fix classes wasting memory

I'm going through unassigned review -- and am going to take the liberty of r-ing this based on comment 32.

Please update patch and request review again if this is still an issue.
Attachment #467717 - Flags: review? → review-
Depends on: 745758
(Reporter)

Comment 34

5 years ago
Last time I tried, like half a year ago, neither my original dehydra script worked with the latest gcc/dehydra, nor pahole (http://lwn.net/Articles/206805/).
If someone else wants to take over, feel free to do so.
Depends on: 747516
(Reporter)

Comment 35

3 years ago
Created attachment 8378185 [details]
Python script to aggregate clang -Wpadded warnings

So apparently compilers have had `-Wpadded` warnings for some time. GCCs warnings are not that useful, those from clang are more so, they look like this:

(this is from `mach warnings-list`)
```
js/src/vm/Runtime.h:693:25 [-Wpadded] padding struct 'JSRuntime' with 7 bytes to align 'operationCallback'
js/src/vm/Runtime.h:764:12 [-Wpadded] padding struct 'JSRuntime' with 6 bytes to align 'numExclusiveThreads'
js/src/vm/Runtime.h:814:11 [-Wpadded] padding struct 'JSRuntime' with 4 bytes to align 'ownerThread_'
js/src/vm/Runtime.h:958:25 [-Wpadded] padding struct 'JSRuntime' with 3 bytes to align 'gcChunkSet'
js/src/vm/Runtime.h:983:25 [-Wpadded] padding struct 'JSRuntime' with 4 bytes to align 'gcMarker'
[…]
```

So I just wrote a small python script (attached) to sum up all those numbers, they then look like this:
```
144.0	JSRuntime	js/src/vm/Runtime.h
92.0	icu_52::DateFormatSymbols	intl/icu/source/i18n/unicode/dtfmtsym.h
58.625	mozilla::net::nsHttpHandler	netwerk/protocol/http/nsHttpHandler.h
52.0	webrtc::acm1::AudioCodingModuleImpl	media/webrtc/trunk/webrtc/modules/audio_coding/main/source/audio_coding_module_impl.h
48.0	mozilla::net::WebSocketChannel	netwerk/protocol/websocket/WebSocketChannel.h
47.0	fsmdef_dcb_t	media/webrtc/signaling/src/sipcc/core/gsm/h/fsm.h
42.0	nsFtpState	netwerk/protocol/ftp/nsFtpConnectionThread.h
42.0	mozilla::WebGLContext	content/canvas/src/WebGLContext.h
42.0	mozilla::net::BackgroundFileSaver	netwerk/base/src/BackgroundFileSaver.h
40.0	AffixMgr	extensions/spellcheck/hunspell/src/affixmgr.hxx
```

I have no idea how clang handles unions in this case though, so this might have some double counting and false positives. Haven’t done any research on that yet.
Attachment #466989 - Attachment is obsolete: true
Attachment #466990 - Attachment is obsolete: true
Arpad: nice! I suggest that you post the full list to dev-platform to get feedback on which of these objects are likely to matter. For example, I know that the padding in JSRuntime isn't important, because we have 1+N JSRuntime objects live at any one time, where N is the number of workers, which is typically 3. But I don't know about any of those others.
(Reporter)

Updated

3 years ago
(Reporter)

Updated

3 years ago
Attachment #467717 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.