Last Comment Bug 684180 - Avoid wasted space caused by AssemblerBuffer's growth strategy
: Avoid wasted space caused by AssemblerBuffer's growth strategy
Status: RESOLVED FIXED
[inbound]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86_64 Linux
: -- normal (vote)
: mozilla9
Assigned To: Nicholas Nethercote [:njn]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-02 00:15 PDT by Nicholas Nethercote [:njn]
Modified: 2011-09-05 05:06 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (1.96 KB, patch)
2011-09-02 00:15 PDT, Nicholas Nethercote [:njn]
dvander: review+
Details | Diff | Review

Description Nicholas Nethercote [:njn] 2011-09-02 00:15:12 PDT
Created attachment 557774 [details] [diff] [review]
patch

AssemblerBuffer starts with an inline buffer of 256 bytes.  If that's not enough, it heap allocates progressively larger buffers using the following growth formula:

    int newCapacity = m_capacity + m_capacity / 2 + extraCapacity;

In practice, extraCapacity is always zero so we're multiplying by 1.5 each time.  But heap allocators work much better with requests that are powers-of-two;  multiplying by 1.5 gives request sizes that get rounded up a lot, leading to wasted memory.  In particular, any request in the range 513..8191 bytes gets rounded up by jemalloc to the nearest power of two.  (Beyond that, jemalloc rounds to the next 4KB until we get to 1MB, then it rounds to the nearest MB.)

I instrumented AssemblerBuffer::grow() to print out on every grow() call a line like this:

  m_capacity * 1.5 + extraCapacity == newCapacity --> usable_size (waste)

Firing up the browser and opening gmail, I get this frequency of lines:

21652 lines:
( 1)   4775 (22.1%, 22.1%): 256 * 1.5 + 0 == 384 --> 384 (0)
( 2)   3821 (17.6%, 39.7%): 384 * 1.5 + 0 == 576 --> 1024 (448)
( 3)   3419 (15.8%, 55.5%): 576 * 1.5 + 0 == 864 --> 1024 (160)
( 4)   2896 (13.4%, 68.9%): 864 * 1.5 + 0 == 1296 --> 2048 (752)
( 5)   2275 (10.5%, 79.4%): 1296 * 1.5 + 0 == 1944 --> 2048 (104)
( 6)   1621 ( 7.5%, 86.9%): 1944 * 1.5 + 0 == 2916 --> 4096 (1180)
( 7)   1114 ( 5.1%, 92.0%): 2916 * 1.5 + 0 == 4374 --> 8192 (3818)
( 8)    758 ( 3.5%, 95.5%): 4374 * 1.5 + 0 == 6561 --> 8192 (1631)
( 9)    475 ( 2.2%, 97.7%): 6561 * 1.5 + 0 == 9841 --> 12288 (2447)
(10)    262 ( 1.2%, 98.9%): 9841 * 1.5 + 0 == 14761 --> 16384 (1623)
(11)    138 ( 0.6%, 99.5%): 14761 * 1.5 + 0 == 22141 --> 24576 (2435)
(12)     62 ( 0.3%, 99.8%): 22141 * 1.5 + 0 == 33211 --> 36864 (3653)
(13)     18 ( 0.1%, 99.9%): 33211 * 1.5 + 0 == 49816 --> 53248 (3432)
(14)      9 ( 0.0%,100.0%): 49816 * 1.5 + 0 == 74724 --> 77824 (3100)
(15)      3 ( 0.0%,100.0%): 74724 * 1.5 + 0 == 112086 --> 114688 (2602)
(16)      3 ( 0.0%,100.0%): 112086 * 1.5 + 0 == 168129 --> 172032 (3903)
(17)      2 ( 0.0%,100.0%): 168129 * 1.5 + 0 == 252193 --> 253952 (1759)
(18)      1 ( 0.0%,100.0%): 252193 * 1.5 + 0 == 378289 --> 380928 (2639)

The most common case, 256->384, results in no waste, but all the others give lots of waste.  The cumulative waste total is 14,340,985 bytes.  I think AssemblerBuffer is fairly short-lived so not many of these bytes are live at any one time but it's still worth fixing this.

If we just change the growth factor from 1.5 to 2, we get this:

13417 lines:
( 1)   4645 (34.6%, 34.6%): 256 * 2 + 0 == 512 --> 512 (0)
( 2)   3424 (25.5%, 60.1%): 512 * 2 + 0 == 1024 --> 1024 (0)
( 3)   2562 (19.1%, 79.2%): 1024 * 2 + 0 == 2048 --> 2048 (0)
( 4)   1517 (11.3%, 90.5%): 2048 * 2 + 0 == 4096 --> 4096 (0)
( 5)    784 ( 5.8%, 96.4%): 4096 * 2 + 0 == 8192 --> 8192 (0)
( 6)    351 ( 2.6%, 99.0%): 8192 * 2 + 0 == 16384 --> 16384 (0)
( 7)    107 ( 0.8%, 99.8%): 16384 * 2 + 0 == 32768 --> 32768 (0)
( 8)     18 ( 0.1%, 99.9%): 32768 * 2 + 0 == 65536 --> 65536 (0)
( 9)      6 ( 0.0%,100.0%): 65536 * 2 + 0 == 131072 --> 131072 (0)
(10)      2 ( 0.0%,100.0%): 131072 * 2 + 0 == 262144 --> 262144 (0)
(11)      1 ( 0.0%,100.0%): 262144 * 2 + 0 == 524288 --> 524288 (0)

Zero waste, and roughly 40% fewer malloc/realloc calls because of (a) less waste, and (b) faster growth rate.

The attached patch makes this change to the growth rate.  It also removes append(), which is unused, and this allowed me to remove extraCapacity, thus guaranteeing that the allocation amounts will always be powers of two.

On Sunspider the patch gives a small but clear reduction in instruction counts (measured on a 64-bit shell with '-j -m -n -p'), so it'll definitely help on the much larger codebases that we compile on real sites like gmail:

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|    53.397    53.295 (1.002x) |    19.438    19.438 (------) | 3d-cube
|    43.655    43.642 (------) |    26.796    26.795 (------) | 3d-morph
|    57.449    57.393 (1.001x) |    24.341    24.341 (------) | 3d-raytrace
|    15.170    15.156 (1.001x) |     8.104     8.104 (------) | access-binary-
|    50.813    50.795 (------) |    43.132    43.132 (------) | access-fannkuc
|    19.241    19.227 (1.001x) |    11.912    11.912 (------) | access-nbody
|    18.924    18.921 (------) |    14.900    14.900 (------) | access-nsieve
|     8.540     8.537 (------) |     5.335     5.335 (------) | bitops-3bit-bi
|    29.286    29.282 (------) |    26.222    26.222 (------) | bitops-bits-in
|    24.985    24.985 (------) |    22.200    22.200 (------) | bitops-bitwise
|    29.055    29.053 (------) |    25.415    25.415 (------) | bitops-nsieve-
|    15.460    15.449 (1.001x) |    11.625    11.625 (------) | controlflow-re
|    26.264    26.234 (1.001x) |     7.856     7.856 (------) | crypto-md5
|    19.451    19.428 (1.001x) |     8.383     8.383 (------) | crypto-sha1
|    61.176    61.182 (------) |    13.092    13.092 (------) | date-format-to
|    61.241    61.229 (------) |     9.534     9.534 (------) | date-format-xp
|    24.562    24.561 (------) |    20.721    20.721 (------) | math-cordic
|    34.751    34.746 (------) |    12.459    12.459 (------) | math-partial-s
|    14.987    14.979 (1.001x) |     9.372     9.372 (------) | math-spectral-
|    45.492    45.494 (------) |    34.486    34.486 (------) | regexp-dna
|    36.756    36.739 (------) |    10.449    10.449 (------) | string-base64
|    41.397    41.371 (1.001x) |    24.435    24.435 (------) | string-fasta
|    92.143    92.128 (------) |    17.511    17.511 (------) | string-tagclou
|   113.711   113.473 (1.002x) |    12.147    12.147 (------) | string-unpack-
|    39.128    39.119 (------) |    10.305    10.305 (------) | string-validat
-------
|   977.046   976.427 (1.001x) |   430.181   430.180 (------) | all
Comment 1 Nicholas Nethercote [:njn] 2011-09-02 03:50:21 PDT
> The cumulative waste total is 14,340,985 bytes.

And the cumulative non-waste total is 46,685,959 bytes.  So the waste proportion is 23.5%.
Comment 2 Nicholas Nethercote [:njn] 2011-09-04 16:11:32 PDT
It turns out that append() is used in AssemblerBufferWithConstantPool.h, so I didn't remove it.

http://hg.mozilla.org/integration/mozilla-inbound/rev/f31743731039
Comment 3 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-09-05 05:06:22 PDT
http://hg.mozilla.org/mozilla-central/rev/f31743731039

Note You need to log in before you can comment on or make changes to this bug.