Closed Bug 1525223 Opened 8 months ago Closed 7 months ago

ProfileBufferEntry wastes 44% of its memory space on ARM

Categories

(Core :: Gecko Profiler, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: gerald, Assigned: mstange)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Each ProfileBufferEntry object contains one byte for its kind, and a union (of all the possible kinds of values) that is 8 bytes long and also 8-byte aligned.

Because of the union size and alignment, the whole ProfileBufferEntry structure takes 16 bytes, including a 7-byte padding between the first byte and the union.

7 out of 16 bytes is about 44% wasted. Or another way, we're storing 9 bytes in 16 bytes, so that's an extra 78% used by padding.

On top of the wasted memory, this may also have an impact on running time, as the CPU probably spends 78% more time reading&writing this useless padding from/to memory.

I'm not too sure what the best solution is. Some thoughts:
A. IIRC, the JS engine uses some of the value bits to encode the kind of data stored in JS objects. Not sure it's possible with our kinds of data, which include uint64_t's.
B. Or we could lose the union alignment, but this would require careful reading and writing. This may be fine, as typically the buffer is filled continuously (as opposed to randomly), and then read sequentially a few times when producing JSON.

Option B should be easy-enough to try, in which case we should carefully measure whether the extra work dealing with non-aligned data doesn't negate the gain of reading&writing less memory for the same amount of actual data.

Assignee: nobody → mstange
Status: NEW → ASSIGNED
Summary: ProfileBufferEntry wastes 44% of its memory space (and maybe runtime?) → ProfileBufferEntry wastes 44% of its memory space on ARM

I've tried to implement option B, by doing a memcpy whenever an entry is written or read. The reading performance doesn't matter much. The writing performance might be regressed by this approach, and that might hurt a bit. I haven't measured whether it actually has any impact.

For completeness sake, I should mention that the 44% wastage currently does not occur on x86: The #pragma pack in ProfileBufferEntry.h eliminates the padding. It also puts the union members at unaligned addresses. x86 doesn't seem to care about those unaligned reads / writes, but ARM does, so the packing is disabled on ARM. However, this also means that the buffer size values we display in the UI are currently incorrect for ARM targets: If the UI displays a buffer size of 90MB, the buffer is really taking up 160MB. The patch in this bug changes that and will make the buffer actually take up 90MB.

Thank you for the clarification, I had not noticed the #pragma pack.

Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/27ab5f5e1fd4
Pack ProfileBufferEntry storage without the use of unaligned union members. r=gerald
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
You need to log in before you can comment on or make changes to this bug.