Closed Bug 962262 Opened 6 years ago Closed 5 years ago

Use external structure padding to save ~30% of circular buffer space

Categories

(Core :: Gecko Profiler, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla31

People

(Reporter: BenWa, Assigned: vikstrous)

Details

Attachments

(1 file, 3 obsolete files)

See Nathan' blog on structure padding:
https://blog.mozilla.org/nfroyd/2014/01/17/packing-structures-for-fun-and-profit/

Currently in ProfileEntry we have a pointer and a char. This means that we're leaving 3 bytes in 32-bit and 7 bytes on 64-bit. We should define ProfileEntry to use 8 bytes on both 32-bit and 64-bit platforms.

On 32-bit b2g this should give us a 37.5% theoretical improvement. Realistically I think we can get 30% improvements.

This will be easier to do then the compression we're considering. We should do this first.
(In reply to Benoit Girard (:BenWa) from comment #0)
> Currently in ProfileEntry we have a pointer and a char.

It looks like your union is actually going to be 8 bytes, because you have a double as a member:

http://mxr.mozilla.org/mozilla-central/source/tools/profiler/ProfileEntry.h#52

So ProfileEntry is going to be 12 bytes on 32-bit platforms and 16 bytes on 64-bit platforms.

Maybe you should use some wild NaN-boxing scheme like the JavaScript engine uses for compressing everything down to 8 bytes everywhere.
Good point. We care the most about mobile memory while profiling so that would be the 32-bit case.

An array of ProfileEntry will consume 16 bytes per element then with external padding I imagine. Thus we could really be saving 50%.
(In reply to Benoit Girard (:BenWa) from comment #2)
> Good point. We care the most about mobile memory while profiling so that
> would be the 32-bit case.
> 
> An array of ProfileEntry will consume 16 bytes per element then with
> external padding I imagine. Thus we could really be saving 50%.

Only 12 bytes on 32-bit platforms; you only pad to the next closest multiple of the word size.

If you could stand an extra indirection for doubles, which don't appear to be used that often, you could just have a |double*| member and do an extra allocation for entries that require doubles.  That would immediately get you down to 8 bytes for the 32-bit case,
The sizes of the struct members on 64 bit (arch linux):
Total: 16
8 mTagData
8 mTagChars
8 mTagPtr
8 mTagMarker
8 mTagFloat
8 mTagAddress
8 mTagOffset
4 mTagLine
1 mTagChar

The sizes of the struct members on 32 bit (linux mint vm):
Total: 12
4 mTagData
4 mTagChars
4 mTagPtr
4 mTagMarker
8 mTagFloat
4 mTagAddress
4 mTagOffset
4 mTagLine
1 mTagChar


It seems to me that the best way to save space on 32 bit is by changing the double to a float.

That would be 12 -> 8 (33%) on 32 bit only.

There isn't anything we can do about the extra char in the general case if we keep a structure like this. Instead we can pack 4 tag names together on 32 bit and 8 on 64 bit. That might give us the easiest win. As long as the API remains the same we don't lose too much in code complexity. As long as the chunks are small (they are) we don't lose too many samples when we wrap.

This would give us 4*8=32 -> 4*4+4=20 (37%) on 32 bit and 8*16=128 -> 8*8+8=72 (43.75%) on 64 bit

The total for 32 bit is 4*12=48 -> 4*4+4=20 (41.67%)

After that, we should be able to still do compression on top of this.

What do you think, Benoit?
Flags: needinfo?(bgirard)
Oops I did the calculation wrong for the total on 32 bit. It's (48-20)/48 = 58.33%
Attached patch profiler_memory_float (obsolete) — Splinter Review
Here's the patch for changing the double to a float.

https://tbpl.mozilla.org/?tree=Try&rev=06dd900e5bb0
Attachment #8395061 - Flags: review?(bgirard)
I realized that it would give us the same space improvement if we group tags in groups of 8 on 32 bit as well. This will make the code simpler and we won't have to check the word size (which I'm not even sure is easily doable). The ability to lose samples in groups of 4 instead of 8 when we wrap is not worth the added complexity IMO.
Attachment #8395061 - Flags: review?(bgirard) → review+
The calculation look nice. As we discussed in bug 987297 attribute packed might prevent us from having to do a significant code change.
Flags: needinfo?(bgirard)
Attached patch profiler_memory_float2 (obsolete) — Splinter Review
Fixed the Windows issue

https://tbpl.mozilla.org/?tree=Try&rev=fe8f7edd0494
Attachment #8395061 - Attachment is obsolete: true
Flags: needinfo?(bgirard)
https://hg.mozilla.org/mozilla-central/rev/cd140d715b37
https://hg.mozilla.org/mozilla-central/rev/8ce79b93a998
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Reopening because this was backed out.
My last try push didn't work for some reason. Here's another one: https://tbpl.mozilla.org/?tree=Try&rev=c06320c7d8da
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: nobody → vstanchev
Flags: needinfo?(bgirard)
Looks like you forgot to upload the lastest patch from try.
Attachment #8403479 - Attachment is obsolete: true
Flags: needinfo?(bgirard)
Actually I can't see what failed and what was fixed? We should check that to make sure we don't break the same thing twice.
Flags: needinfo?(me)
Oh, oops. The change is in ProfileEntry.cpp:476

addTag(ProfileEntry('t', static_cast<float>((mozilla::TimeStamp::Now() - sStartTime).ToMilliseconds())));
Flags: needinfo?(me)
https://hg.mozilla.org/mozilla-central/rev/4743c6a40048
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.