Closed Bug 681161 Opened 13 years ago Closed 13 years ago

Shrink nsCSSCompressedDataBlock on 64-bit platforms

Categories

(Core :: CSS Parsing and Computation, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla9

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(1 file, 1 obsolete file)

This is a spin-off from bug 671299.  Basically, nsCSSCompressedDataBlock can be made smaller on 64-bit platforms.  Ideas:

- Swap mStyleBits and mBlockEnd.

- Get rid of block_chars;  it's confusing and may be wasting space.

- Can mBlockEnd be changed to a 32-bit mDataSize?
David?
Assignee: nnethercote → nobody
Component: Layout → Style System (CSS)
QA Contact: layout → style-system
Oh, and some questions I had:

1)  Do we need to ensure that the various pointers inside nsCSSValue are aligned on a word boundary (8 bytes for 64-bit, specifically)?

2)  Given the answer to question 1, how can we best handle the packing in both nsCSSCompressedDataBlock itself and in CDBValueStorage?  Right now CDBValueStorage on a 64-bit system is 50% bigger than it really needs to be...
If I can convert mBlockEnd to a 32-bit mDataSize, we'll get 8-byte alignment naturally :)
Assignee: nobody → nnethercote
Well, for nsCSSCompressedDataBlock.  The CDMValueStorage issue can be spun off into a separate bug, I guess.
Attached patch patch (obsolete) — Splinter Review
This patch removes mBlock_ and changes word-sized mBlockEnd to a 32-bit mDataSize.  It seems to work, I'll do a try server run shortly.

On 64-bit platforms, this change reduces the useful size of nsCSSDataBlock by 12 bytes, but in practice we usually save 16 due to less slop.  I haven't measured on 32-bit platforms, in theory it won't make any difference.

I'm assuming 32-bits is big enough for mDataSize.  The biggest size I've seen in practice is around 2000 bytes so I think I'm safe.

On Linux64, when I start-up the browser and open 5 gmails, the cumulative memory allocations in nsCSSCompressedDataBlock::'operator new' drop from 14,048,000 bytes to 13,029,864, which is a 7.2% drop.  Judging from the prior DMD results, close to all of those allocations are still live once the tabs finish loading.  
Not a huge improvement, but not bad for a small patch.

Here are the top 30 size requests with the patch applied:

( 1)  16928 (29.5%, 29.5%): CSS(8, 24): 32 -> 32 (0)
( 2)   6839 (11.9%, 41.4%): CSS(8, 192): 200 -> 208 (8)
( 3)   5013 ( 8.7%, 50.2%): CSS(8, 48): 56 -> 64 (8)
( 4)   4655 ( 8.1%, 58.3%): CSS(8, 72): 80 -> 80 (0)
( 5)   3109 ( 5.4%, 63.7%): CSS(8, 216): 224 -> 224 (0)
( 6)   2514 ( 4.4%, 68.1%): CSS(8, 240): 248 -> 256 (8)
( 7)   2317 ( 4.0%, 72.1%): CSS(8, 96): 104 -> 112 (8)
( 8)   1785 ( 3.1%, 75.2%): CSS(8, 0): 8 -> 8 (0)
( 9)   1707 ( 3.0%, 78.2%): CSS(8, 120): 128 -> 128 (0)
(10)   1297 ( 2.3%, 80.5%): CSS(8, 144): 152 -> 160 (8)
(11)   1162 ( 2.0%, 82.5%): CSS(8, 264): 272 -> 272 (0)
(12)    757 ( 1.3%, 83.8%): CSS(8, 168): 176 -> 176 (0)
(13)    705 ( 1.2%, 85.0%): CSS(8, 288): 296 -> 304 (8)
(14)    601 ( 1.0%, 86.1%): CSS(8, 384): 392 -> 400 (8)
(15)    561 ( 1.0%, 87.1%): CSS(8, 312): 320 -> 320 (0)
(16)    470 ( 0.8%, 87.9%): CSS(8, 408): 416 -> 416 (0)
(17)    425 ( 0.7%, 88.6%): CSS(8, 336): 344 -> 352 (8)
(18)    416 ( 0.7%, 89.4%): CSS(8, 696): 704 -> 1024 (320)
(19)    384 ( 0.7%, 90.0%): CSS(8, 432): 440 -> 448 (8)
(20)    371 ( 0.6%, 90.7%): CSS(8, 360): 368 -> 368 (0)
(21)    330 ( 0.6%, 91.3%): CSS(8, 456): 464 -> 464 (0)
(22)    314 ( 0.5%, 91.8%): CSS(8, 888): 896 -> 1024 (128)
(23)    314 ( 0.5%, 92.3%): CSS(8, 720): 728 -> 1024 (296)
(24)    304 ( 0.5%, 92.9%): CSS(8, 744): 752 -> 1024 (272)
(25)    281 ( 0.5%, 93.4%): CSS(8, 576): 584 -> 1024 (440)
(26)    266 ( 0.5%, 93.8%): CSS(8, 912): 920 -> 1024 (104)
(27)    227 ( 0.4%, 94.2%): CSS(8, 480): 488 -> 496 (8)
(28)    190 ( 0.3%, 94.6%): CSS(8, 528): 536 -> 1024 (488)
(29)    172 ( 0.3%, 94.9%): CSS(8, 1128): 1136 -> 2048 (912)
(30)    171 ( 0.3%, 95.2%): CSS(8, 768): 776 -> 1024 (248)

The top 30 without the patch look like this (copied from bug 671299 comment 10):

( 1)  16990 (29.5%, 29.5%): CSS(24, 24): 44 -> 48 (4)
( 2)   6839 (11.9%, 41.4%): CSS(24, 192): 212 -> 224 (12)
( 3)   5044 ( 8.8%, 50.2%): CSS(24, 48): 68 -> 80 (12)
( 4)   4665 ( 8.1%, 58.3%): CSS(24, 72): 92 -> 96 (4)
( 5)   3109 ( 5.4%, 63.7%): CSS(24, 216): 236 -> 240 (4)
( 6)   2514 ( 4.4%, 68.1%): CSS(24, 240): 260 -> 272 (12)
( 7)   2319 ( 4.0%, 72.1%): CSS(24, 96): 116 -> 128 (12)
( 8)   1832 ( 3.2%, 75.3%): CSS(24, 0): 20 -> 32 (12)
( 9)   1707 ( 3.0%, 78.2%): CSS(24, 120): 140 -> 144 (4)
(10)   1307 ( 2.3%, 80.5%): CSS(24, 144): 164 -> 176 (12)
(11)   1162 ( 2.0%, 82.5%): CSS(24, 264): 284 -> 288 (4)
(12)    757 ( 1.3%, 83.8%): CSS(24, 168): 188 -> 192 (4)
(13)    705 ( 1.2%, 85.1%): CSS(24, 288): 308 -> 320 (12)
(14)    601 ( 1.0%, 86.1%): CSS(24, 384): 404 -> 416 (12)
(15)    561 ( 1.0%, 87.1%): CSS(24, 312): 332 -> 336 (4)
(16)    470 ( 0.8%, 87.9%): CSS(24, 408): 428 -> 432 (4)
(17)    425 ( 0.7%, 88.6%): CSS(24, 336): 356 -> 368 (12)
(18)    416 ( 0.7%, 89.4%): CSS(24, 696): 716 -> 1024 (308)
(19)    384 ( 0.7%, 90.0%): CSS(24, 432): 452 -> 464 (12)
(20)    371 ( 0.6%, 90.7%): CSS(24, 360): 380 -> 384 (4)
(21)    330 ( 0.6%, 91.3%): CSS(24, 456): 476 -> 480 (4)
(22)    314 ( 0.5%, 91.8%): CSS(24, 888): 908 -> 1024 (116)
(23)    314 ( 0.5%, 92.3%): CSS(24, 720): 740 -> 1024 (284)
(24)    304 ( 0.5%, 92.9%): CSS(24, 744): 764 -> 1024 (260)
(25)    281 ( 0.5%, 93.4%): CSS(24, 576): 596 -> 1024 (428)
(26)    266 ( 0.5%, 93.8%): CSS(24, 912): 932 -> 1024 (92)
(27)    228 ( 0.4%, 94.2%): CSS(24, 480): 500 -> 512 (12)
(28)    190 ( 0.3%, 94.6%): CSS(24, 528): 548 -> 1024 (476)
(29)    172 ( 0.3%, 94.9%): CSS(24, 1128): 1148 -> 2048 (900)
(30)    171 ( 0.3%, 95.1%): CSS(24, 768): 788 -> 1024 (236)

I haven't looked at the CDBValueStorage at all.
Attachment #555023 - Flags: review?(dbaron)
So we've decided to take the most space-optimized structure in the entire style system (xref bug 125246, bug 107270) and optimize it some more, eh?  (Seriously: we should really worry about selectors at some point, by which I mean a redesign, not just tweaking.  But here, I think tweaking is the right approach.)

(In reply to Nicholas Nethercote [:njn] from comment #0)
> This is a spin-off from bug 671299.  Basically, nsCSSCompressedDataBlock can
> be made smaller on 64-bit platforms.  Ideas:
> 
> - Swap mStyleBits and mBlockEnd.

That might lead to potential alignment issues on some 64-bit platforms.

> - Get rid of block_chars;  it's confusing and may be wasting space.

I think the technique of using a stub data chunk for dealing with structures with storage at the end is preferable for getting the alignment issues right (though using char[] isn't; given that there's now only one type in the block I think we should just have a CDBValueStorage[1] instead of the char[4]).

> - Can mBlockEnd be changed to a 32-bit mDataSize?

Yes.

(In reply to Boris Zbarsky (:bz) from comment #2)
> Oh, and some questions I had:
> 
> 1)  Do we need to ensure that the various pointers inside nsCSSValue are
> aligned on a word boundary (8 bytes for 64-bit, specifically)?

This may be platform-dependent, but the compiler should deal with it just fine as long as the starting point is properly aligned, since CDBValueStorage will include the necessary padding.  Hence my preference for switching the char[4] to CDBValueStorage[1].

> 2)  Given the answer to question 1, how can we best handle the packing in
> both nsCSSCompressedDataBlock itself and in CDBValueStorage?  Right now
> CDBValueStorage on a 64-bit system is 50% bigger than it really needs to
> be...

We could store the array of properties and then the array of values.  Since most data blocks are small, most of the time this won't cause cache problems, although it might in a few cases.

(In reply to Nicholas Nethercote [:njn] from comment #3)
> If I can convert mBlockEnd to a 32-bit mDataSize, we'll get 8-byte alignment
> naturally :)
(In reply to Nicholas Nethercote [:njn] from comment #5)
> This patch removes mBlock_ and changes word-sized mBlockEnd to a 32-bit
> mDataSize.  It seems to work, I'll do a try server run shortly.

I'd prefer not to do this, but instead to make mBlock a CDBValueStorage[1], as I described in the previous comment (which I wrote before seeing comment 5).
> We could store the array of properties and then the array of values.

Hmm... that will help some when we have a bunch of properties, yes.  We'll still waste 4 bytes per property because of internal padding in nsCSSValue, but avoiding that sounds pretty painful.
Comment on attachment 555023 [details] [diff] [review]
patch

In addition to my previous comment (the main reason I want to see a new patch):

 * DataSize() should return size_t and ptrdiff_t

 * You should add a comment noting that if, when we change nsCSSDeclaration to store the declarations in order (or at any later time), we also store repeated declarations of the same property, then we need to worry about checking for integer overflow with SetDataSize.
Attachment #555023 - Flags: review?(dbaron) → review-
(In reply to David Baron [:dbaron] from comment #9)
>  * DataSize() should return size_t and ptrdiff_t

er, "size_t, not ptrdiff_t"
(In reply to David Baron [:dbaron] from comment #6)
> So we've decided to take the most space-optimized structure in the entire
> style system (xref bug 125246, bug 107270) and optimize it some more, eh? 

Hey, I'm just going where the memory profilers are telling me to go :)
(In reply to David Baron [:dbaron] from comment #6)
> 
> I think the technique of using a stub data chunk for dealing with structures
> with storage at the end is preferable for getting the alignment issues right
> (though using char[] isn't; given that there's now only one type in the
> block I think we should just have a CDBValueStorage[1] instead of the
> char[4]).

That turns out to be a bad idea:  it causes crashes.  The reason is that CDBValueStorage contains an nsCSSValue which has a constructor that initializes some fields.  So in the case where we have *zero* CDBValueStorage elements (which occurs 3% of the time), we end up constructing that first CDBValueStorage element even though we haven't allocated space for it, i.e. we write past the end of the memory allocated by |operator new|.

Is the approach in my patch so bad?  If the nsCSSCompressedDataBlock is 8-aligned then the first CDBValueStorage will clearly also be 8-aligned.  And the existing char[4] doesn't provide much in the way of alignment guarantees so we're no worse off than before, AFAICT.
Well, we're a little worse off than before because we used to have a guarantee of meeting pointer alignment requirements and now we just have a guarantee of meeting 32-bit integer alignment requirements.

The alternative solution is to define a constant (preferably in an enum, so it's clearly compile-time) that represents the extra space needed for alignment of CDBValueStorage.  I think this should be one of (depending on whether you think PR_ROUNDUP makes things clearer or not):

PR_ROUNDUP(sizeof(nsCSSCompressedDataBlock), NS_ALIGNMENT_OF(CDBValueStorage)) - sizeof(nsCSSCompressedDataBlock)

(NS_ALIGNMENT_OF(CDBValueStorage) - (sizeof(nsCSSCompressedDataBlock) % NS_ALIGNMENT_OF(CDBValueStorage))) % NS_ALIGNMENT_OF(CDBValueStorage)

Then you can use this constant in both operator new and in Block().  This may require moving some inline methods out of the class definition (since I suspect you can't call sizeof(nsCSSCompressedDataBlock) inside the class definition... although it actually might be possible).


(To be clear, if I knew what I now know about alignment, I'd have review-'d the existing code in this file for not handling alignment correctly.)
Whiteboard: [MemShrink] → [MemShrink:P2]
Attached patch patch, v2Splinter Review
Here's a slightly different approach.  I added some static assertions that check that nsCSSCompressedBlock has the expected size, and that CDBValueStorage's alignment is compatible.

This approach has the advantage that it's brittle, i.e. if anything changes about the size/alignment, we'll get build failures, rather than silently adapting to new sizes that might cause wasted space.  It's also a lot simpler :)
Attachment #555023 - Attachment is obsolete: true
Attachment #555334 - Flags: review?(dbaron)
Comment on attachment 555334 [details] [diff] [review]
patch, v2

r=dbaron, with two more runtime assertions (both NS_ABORT_IF_FALSE):

(1) in the operator new, assert that aBaseSize == sizeof(nsCSSCompressedDataBlock)

(2) in SetBlockEnd, assert that size_t(blockEnd - Block()) <= size_t(PR_UINT32_MAX) (and then move the comment to right above that assertion)


But if somebody reports the static asserts as firing on some platform, please back out and take the approach I previously suggested.
Attachment #555334 - Flags: review?(dbaron) → review+
Summary: Shrink nsCSSCompressedDataBlock → Shrink nsCSSCompressedDataBlock on 64-bit platforms
http://hg.mozilla.org/integration/mozilla-inbound/rev/c70c539f2e93
Whiteboard: [MemShrink:P2] → [MemShrink:P2][inbound]
http://hg.mozilla.org/mozilla-central/rev/c70c539f2e93
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [MemShrink:P2][inbound] → [MemShrink:P2]
Target Milestone: --- → mozilla9
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: