Last Comment Bug 681161 - Shrink nsCSSCompressedDataBlock on 64-bit platforms
: Shrink nsCSSCompressedDataBlock on 64-bit platforms
Status: RESOLVED FIXED
[MemShrink:P2]
:
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla9
Assigned To: Nicholas Nethercote [:njn] (on vacation until July 11)
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-22 18:23 PDT by Nicholas Nethercote [:njn] (on vacation until July 11)
Modified: 2011-08-26 09:27 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (5.16 KB, patch)
2011-08-22 20:26 PDT, Nicholas Nethercote [:njn] (on vacation until July 11)
dbaron: review-
Details | Diff | Review
patch, v2 (7.31 KB, patch)
2011-08-23 23:39 PDT, Nicholas Nethercote [:njn] (on vacation until July 11)
dbaron: review+
Details | Diff | Review

Description Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-22 18:23:49 PDT
This is a spin-off from bug 671299.  Basically, nsCSSCompressedDataBlock can be made smaller on 64-bit platforms.  Ideas:

- Swap mStyleBits and mBlockEnd.

- Get rid of block_chars;  it's confusing and may be wasting space.

- Can mBlockEnd be changed to a 32-bit mDataSize?
Comment 1 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-08-22 19:13:08 PDT
David?
Comment 2 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-08-22 19:17:54 PDT
Oh, and some questions I had:

1)  Do we need to ensure that the various pointers inside nsCSSValue are aligned on a word boundary (8 bytes for 64-bit, specifically)?

2)  Given the answer to question 1, how can we best handle the packing in both nsCSSCompressedDataBlock itself and in CDBValueStorage?  Right now CDBValueStorage on a 64-bit system is 50% bigger than it really needs to be...
Comment 3 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-22 19:23:28 PDT
If I can convert mBlockEnd to a 32-bit mDataSize, we'll get 8-byte alignment naturally :)
Comment 4 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-08-22 19:26:40 PDT
Well, for nsCSSCompressedDataBlock.  The CDMValueStorage issue can be spun off into a separate bug, I guess.
Comment 5 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-22 20:26:12 PDT
Created attachment 555023 [details] [diff] [review]
patch

This patch removes mBlock_ and changes word-sized mBlockEnd to a 32-bit mDataSize.  It seems to work, I'll do a try server run shortly.

On 64-bit platforms, this change reduces the useful size of nsCSSDataBlock by 12 bytes, but in practice we usually save 16 due to less slop.  I haven't measured on 32-bit platforms, in theory it won't make any difference.

I'm assuming 32-bits is big enough for mDataSize.  The biggest size I've seen in practice is around 2000 bytes so I think I'm safe.

On Linux64, when I start-up the browser and open 5 gmails, the cumulative memory allocations in nsCSSCompressedDataBlock::'operator new' drop from 14,048,000 bytes to 13,029,864, which is a 7.2% drop.  Judging from the prior DMD results, close to all of those allocations are still live once the tabs finish loading.  
Not a huge improvement, but not bad for a small patch.

Here are the top 30 size requests with the patch applied:

( 1)  16928 (29.5%, 29.5%): CSS(8, 24): 32 -> 32 (0)
( 2)   6839 (11.9%, 41.4%): CSS(8, 192): 200 -> 208 (8)
( 3)   5013 ( 8.7%, 50.2%): CSS(8, 48): 56 -> 64 (8)
( 4)   4655 ( 8.1%, 58.3%): CSS(8, 72): 80 -> 80 (0)
( 5)   3109 ( 5.4%, 63.7%): CSS(8, 216): 224 -> 224 (0)
( 6)   2514 ( 4.4%, 68.1%): CSS(8, 240): 248 -> 256 (8)
( 7)   2317 ( 4.0%, 72.1%): CSS(8, 96): 104 -> 112 (8)
( 8)   1785 ( 3.1%, 75.2%): CSS(8, 0): 8 -> 8 (0)
( 9)   1707 ( 3.0%, 78.2%): CSS(8, 120): 128 -> 128 (0)
(10)   1297 ( 2.3%, 80.5%): CSS(8, 144): 152 -> 160 (8)
(11)   1162 ( 2.0%, 82.5%): CSS(8, 264): 272 -> 272 (0)
(12)    757 ( 1.3%, 83.8%): CSS(8, 168): 176 -> 176 (0)
(13)    705 ( 1.2%, 85.0%): CSS(8, 288): 296 -> 304 (8)
(14)    601 ( 1.0%, 86.1%): CSS(8, 384): 392 -> 400 (8)
(15)    561 ( 1.0%, 87.1%): CSS(8, 312): 320 -> 320 (0)
(16)    470 ( 0.8%, 87.9%): CSS(8, 408): 416 -> 416 (0)
(17)    425 ( 0.7%, 88.6%): CSS(8, 336): 344 -> 352 (8)
(18)    416 ( 0.7%, 89.4%): CSS(8, 696): 704 -> 1024 (320)
(19)    384 ( 0.7%, 90.0%): CSS(8, 432): 440 -> 448 (8)
(20)    371 ( 0.6%, 90.7%): CSS(8, 360): 368 -> 368 (0)
(21)    330 ( 0.6%, 91.3%): CSS(8, 456): 464 -> 464 (0)
(22)    314 ( 0.5%, 91.8%): CSS(8, 888): 896 -> 1024 (128)
(23)    314 ( 0.5%, 92.3%): CSS(8, 720): 728 -> 1024 (296)
(24)    304 ( 0.5%, 92.9%): CSS(8, 744): 752 -> 1024 (272)
(25)    281 ( 0.5%, 93.4%): CSS(8, 576): 584 -> 1024 (440)
(26)    266 ( 0.5%, 93.8%): CSS(8, 912): 920 -> 1024 (104)
(27)    227 ( 0.4%, 94.2%): CSS(8, 480): 488 -> 496 (8)
(28)    190 ( 0.3%, 94.6%): CSS(8, 528): 536 -> 1024 (488)
(29)    172 ( 0.3%, 94.9%): CSS(8, 1128): 1136 -> 2048 (912)
(30)    171 ( 0.3%, 95.2%): CSS(8, 768): 776 -> 1024 (248)

The top 30 without the patch look like this (copied from bug 671299 comment 10):

( 1)  16990 (29.5%, 29.5%): CSS(24, 24): 44 -> 48 (4)
( 2)   6839 (11.9%, 41.4%): CSS(24, 192): 212 -> 224 (12)
( 3)   5044 ( 8.8%, 50.2%): CSS(24, 48): 68 -> 80 (12)
( 4)   4665 ( 8.1%, 58.3%): CSS(24, 72): 92 -> 96 (4)
( 5)   3109 ( 5.4%, 63.7%): CSS(24, 216): 236 -> 240 (4)
( 6)   2514 ( 4.4%, 68.1%): CSS(24, 240): 260 -> 272 (12)
( 7)   2319 ( 4.0%, 72.1%): CSS(24, 96): 116 -> 128 (12)
( 8)   1832 ( 3.2%, 75.3%): CSS(24, 0): 20 -> 32 (12)
( 9)   1707 ( 3.0%, 78.2%): CSS(24, 120): 140 -> 144 (4)
(10)   1307 ( 2.3%, 80.5%): CSS(24, 144): 164 -> 176 (12)
(11)   1162 ( 2.0%, 82.5%): CSS(24, 264): 284 -> 288 (4)
(12)    757 ( 1.3%, 83.8%): CSS(24, 168): 188 -> 192 (4)
(13)    705 ( 1.2%, 85.1%): CSS(24, 288): 308 -> 320 (12)
(14)    601 ( 1.0%, 86.1%): CSS(24, 384): 404 -> 416 (12)
(15)    561 ( 1.0%, 87.1%): CSS(24, 312): 332 -> 336 (4)
(16)    470 ( 0.8%, 87.9%): CSS(24, 408): 428 -> 432 (4)
(17)    425 ( 0.7%, 88.6%): CSS(24, 336): 356 -> 368 (12)
(18)    416 ( 0.7%, 89.4%): CSS(24, 696): 716 -> 1024 (308)
(19)    384 ( 0.7%, 90.0%): CSS(24, 432): 452 -> 464 (12)
(20)    371 ( 0.6%, 90.7%): CSS(24, 360): 380 -> 384 (4)
(21)    330 ( 0.6%, 91.3%): CSS(24, 456): 476 -> 480 (4)
(22)    314 ( 0.5%, 91.8%): CSS(24, 888): 908 -> 1024 (116)
(23)    314 ( 0.5%, 92.3%): CSS(24, 720): 740 -> 1024 (284)
(24)    304 ( 0.5%, 92.9%): CSS(24, 744): 764 -> 1024 (260)
(25)    281 ( 0.5%, 93.4%): CSS(24, 576): 596 -> 1024 (428)
(26)    266 ( 0.5%, 93.8%): CSS(24, 912): 932 -> 1024 (92)
(27)    228 ( 0.4%, 94.2%): CSS(24, 480): 500 -> 512 (12)
(28)    190 ( 0.3%, 94.6%): CSS(24, 528): 548 -> 1024 (476)
(29)    172 ( 0.3%, 94.9%): CSS(24, 1128): 1148 -> 2048 (900)
(30)    171 ( 0.3%, 95.1%): CSS(24, 768): 788 -> 1024 (236)

I haven't looked at the CDBValueStorage at all.
Comment 6 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2011-08-22 20:36:43 PDT
So we've decided to take the most space-optimized structure in the entire style system (xref bug 125246, bug 107270) and optimize it some more, eh?  (Seriously: we should really worry about selectors at some point, by which I mean a redesign, not just tweaking.  But here, I think tweaking is the right approach.)

(In reply to Nicholas Nethercote [:njn] from comment #0)
> This is a spin-off from bug 671299.  Basically, nsCSSCompressedDataBlock can
> be made smaller on 64-bit platforms.  Ideas:
> 
> - Swap mStyleBits and mBlockEnd.

That might lead to potential alignment issues on some 64-bit platforms.

> - Get rid of block_chars;  it's confusing and may be wasting space.

I think the technique of using a stub data chunk for dealing with structures with storage at the end is preferable for getting the alignment issues right (though using char[] isn't; given that there's now only one type in the block I think we should just have a CDBValueStorage[1] instead of the char[4]).

> - Can mBlockEnd be changed to a 32-bit mDataSize?

Yes.

(In reply to Boris Zbarsky (:bz) from comment #2)
> Oh, and some questions I had:
> 
> 1)  Do we need to ensure that the various pointers inside nsCSSValue are
> aligned on a word boundary (8 bytes for 64-bit, specifically)?

This may be platform-dependent, but the compiler should deal with it just fine as long as the starting point is properly aligned, since CDBValueStorage will include the necessary padding.  Hence my preference for switching the char[4] to CDBValueStorage[1].

> 2)  Given the answer to question 1, how can we best handle the packing in
> both nsCSSCompressedDataBlock itself and in CDBValueStorage?  Right now
> CDBValueStorage on a 64-bit system is 50% bigger than it really needs to
> be...

We could store the array of properties and then the array of values.  Since most data blocks are small, most of the time this won't cause cache problems, although it might in a few cases.

(In reply to Nicholas Nethercote [:njn] from comment #3)
> If I can convert mBlockEnd to a 32-bit mDataSize, we'll get 8-byte alignment
> naturally :)
Comment 7 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2011-08-22 20:37:46 PDT
(In reply to Nicholas Nethercote [:njn] from comment #5)
> This patch removes mBlock_ and changes word-sized mBlockEnd to a 32-bit
> mDataSize.  It seems to work, I'll do a try server run shortly.

I'd prefer not to do this, but instead to make mBlock a CDBValueStorage[1], as I described in the previous comment (which I wrote before seeing comment 5).
Comment 8 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-08-22 20:41:35 PDT
> We could store the array of properties and then the array of values.

Hmm... that will help some when we have a bunch of properties, yes.  We'll still waste 4 bytes per property because of internal padding in nsCSSValue, but avoiding that sounds pretty painful.
Comment 9 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2011-08-22 20:43:37 PDT
Comment on attachment 555023 [details] [diff] [review]
patch

In addition to my previous comment (the main reason I want to see a new patch):

 * DataSize() should return size_t and ptrdiff_t

 * You should add a comment noting that if, when we change nsCSSDeclaration to store the declarations in order (or at any later time), we also store repeated declarations of the same property, then we need to worry about checking for integer overflow with SetDataSize.
Comment 10 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2011-08-22 20:44:54 PDT
(In reply to David Baron [:dbaron] from comment #9)
>  * DataSize() should return size_t and ptrdiff_t

er, "size_t, not ptrdiff_t"
Comment 11 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-22 21:02:25 PDT
(In reply to David Baron [:dbaron] from comment #6)
> So we've decided to take the most space-optimized structure in the entire
> style system (xref bug 125246, bug 107270) and optimize it some more, eh? 

Hey, I'm just going where the memory profilers are telling me to go :)
Comment 12 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-22 22:59:20 PDT
(In reply to David Baron [:dbaron] from comment #6)
> 
> I think the technique of using a stub data chunk for dealing with structures
> with storage at the end is preferable for getting the alignment issues right
> (though using char[] isn't; given that there's now only one type in the
> block I think we should just have a CDBValueStorage[1] instead of the
> char[4]).

That turns out to be a bad idea:  it causes crashes.  The reason is that CDBValueStorage contains an nsCSSValue which has a constructor that initializes some fields.  So in the case where we have *zero* CDBValueStorage elements (which occurs 3% of the time), we end up constructing that first CDBValueStorage element even though we haven't allocated space for it, i.e. we write past the end of the memory allocated by |operator new|.

Is the approach in my patch so bad?  If the nsCSSCompressedDataBlock is 8-aligned then the first CDBValueStorage will clearly also be 8-aligned.  And the existing char[4] doesn't provide much in the way of alignment guarantees so we're no worse off than before, AFAICT.
Comment 13 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2011-08-23 06:07:21 PDT
Well, we're a little worse off than before because we used to have a guarantee of meeting pointer alignment requirements and now we just have a guarantee of meeting 32-bit integer alignment requirements.

The alternative solution is to define a constant (preferably in an enum, so it's clearly compile-time) that represents the extra space needed for alignment of CDBValueStorage.  I think this should be one of (depending on whether you think PR_ROUNDUP makes things clearer or not):

PR_ROUNDUP(sizeof(nsCSSCompressedDataBlock), NS_ALIGNMENT_OF(CDBValueStorage)) - sizeof(nsCSSCompressedDataBlock)

(NS_ALIGNMENT_OF(CDBValueStorage) - (sizeof(nsCSSCompressedDataBlock) % NS_ALIGNMENT_OF(CDBValueStorage))) % NS_ALIGNMENT_OF(CDBValueStorage)

Then you can use this constant in both operator new and in Block().  This may require moving some inline methods out of the class definition (since I suspect you can't call sizeof(nsCSSCompressedDataBlock) inside the class definition... although it actually might be possible).


(To be clear, if I knew what I now know about alignment, I'd have review-'d the existing code in this file for not handling alignment correctly.)
Comment 14 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-23 23:39:37 PDT
Created attachment 555334 [details] [diff] [review]
patch, v2

Here's a slightly different approach.  I added some static assertions that check that nsCSSCompressedBlock has the expected size, and that CDBValueStorage's alignment is compatible.

This approach has the advantage that it's brittle, i.e. if anything changes about the size/alignment, we'll get build failures, rather than silently adapting to new sizes that might cause wasted space.  It's also a lot simpler :)
Comment 15 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2011-08-24 07:41:05 PDT
Comment on attachment 555334 [details] [diff] [review]
patch, v2

r=dbaron, with two more runtime assertions (both NS_ABORT_IF_FALSE):

(1) in the operator new, assert that aBaseSize == sizeof(nsCSSCompressedDataBlock)

(2) in SetBlockEnd, assert that size_t(blockEnd - Block()) <= size_t(PR_UINT32_MAX) (and then move the comment to right above that assertion)


But if somebody reports the static asserts as firing on some platform, please back out and take the approach I previously suggested.
Comment 16 Nicholas Nethercote [:njn] (on vacation until July 11) 2011-08-25 23:16:57 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/c70c539f2e93
Comment 17 Matt Brubeck (:mbrubeck) 2011-08-26 09:27:44 PDT
http://hg.mozilla.org/mozilla-central/rev/c70c539f2e93

Note You need to log in before you can comment on or make changes to this bug.