Last Comment Bug 611400 - mjit::Compiler::finishThisUp: pc->ncode map wastes space
: mjit::Compiler::finishThisUp: pc->ncode map wastes space
Status: RESOLVED FIXED
[jsperf][fixed-in-tracemonkey]
: footprint
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal with 2 votes (vote)
: ---
Assigned To: Julian Seward [:jseward]
:
:
Mentors:
Depends on:
Blocks: 598466 JaegerShrink
  Show dependency treegraph
 
Reported: 2010-11-11 11:31 PST by Julian Seward [:jseward]
Modified: 2011-01-04 15:32 PST (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
betaN+


Attachments
change to non-sparse mapping w/ binary search (10.43 KB, patch)
2010-11-12 09:54 PST, Julian Seward [:jseward]
no flags Details | Diff | Splinter Review
remove partially redundant load (10.43 KB, patch)
2010-11-15 08:19 PST, Julian Seward [:jseward]
no flags Details | Diff | Splinter Review
respin, addressing c7 feedback (10.96 KB, patch)
2010-12-02 08:53 PST, Julian Seward [:jseward]
dvander: review+
Details | Diff | Splinter Review

Description Julian Seward [:jseward] 2010-11-11 11:31:06 PST
mjit::Compiler::finishThisUp callocs space to put a bunch of stuff
in, including a pc->ncode map.  The map is mostly empty.

An extreme case is on kraken-1.1/imaging-darkroom, which for a 32-bit build
has a max-heap-liveness of 102MB.  Of that, 12.2MB is occupied by
data calloc'd here, and that space is completely unused:

  -------------------- 4 of 100 --------------------
  max-live:    12,197,056 in 1 blocks
  tot-alloc:   12,197,056 in 1 blocks (avg size 12197056.00)
  deaths:      1, at avg age 1,432,667,675 (42.98% of prog lifetime)
  acc-ratios:  0.00 rd, 0.00 wr  (59 b-read, 129 b-written)
      at 0x47B44FF: calloc (vg_replace_malloc.c:467)
      by 0x81FF24D: js::mjit::Compiler::finishThisUp(js::mjit::JITSc
      by 0x8215E47: js::mjit::Compiler::performCompilation(js::mjit:
      by 0xC2F402F: ???

In this case a total of 3050540 map slots are allocated, of which
just 18 are non-zero.  I guess this is because of the huge tables of
constants in this test (others behave similarly).

------

Problem is more general though.  Running a browser with 5 reasonably
complex tabs open, there are in total 350498 slots allocated, of which
7049 are non-zero.

Estimating (from other data) that 3/4 of those slots are
simultaneously live in the browser, that's 3/4 * (350498 - 7049) *
sizeof(void*) = 2.06 MB of zeroes on a 64-bit target.  That's more
than 2% of the total C++ heap size in this run (89.2MB).

------

Could we instead use an array of word-pairs, and binary-search as
necessary?  This would remove around 95% of the wasted space.
Comment 1 Julian Seward [:jseward] 2010-11-12 09:54:29 PST
Created attachment 490131 [details] [diff] [review]
change to non-sparse mapping w/ binary search

Implements non sparse mapping w/ binary search, as per comment #0.

Reduces max live heap running kraken-1.1/imaging-darkroom from 102.5MB
to 90.0MB (32-bit Linux).

When running all of kraken-1.1 on 32-bit linux, there are no obvious
performance penalties.  Insn count falls marginally from 35,187
million to 35,172 million.  Miss count on a 4MB L3 cache falls from
35.5 million to 35.0 million.  There are about 220000 binary searches
performed.
Comment 2 Julian Seward [:jseward] 2010-11-12 16:21:57 PST
(In reply to comment #1)
> Created attachment 490131 [details] [diff] [review]

SS numbers (ss 0.9.1 from
svn.webkit.org/repository/webkit/trunk/SunSpider + pbiggar's
make-it-deterministic patch in 580532 comment 32)

on 32-bit x86 linux

In short, a 0.0193% increase in instruction count.  OTOH it's a
nano-win on kraken in both insn counts and L3 misses, and it does
measurably reduce the heap size of the entire browser.


------------------------------------------------------------------
instructions executed (nb: "on-trace" may overestimate slightly)
------------------------------------------------------------------
3d-cube:
  total          80,989,452      81,015,351     -- 
  on-trace       45,103,763      45,103,789     -- 
3d-morph:
  total          40,352,790      40,353,703     -- 
  on-trace       23,498,398      23,498,392     -- 
3d-raytrace:
  total         104,305,853     104,360,555    *1.001x worse*
  on-trace       41,235,219      41,235,247     -- 
access-binary-trees:
  total          26,369,799      26,372,751     -- 
  on-trace       12,223,384      12,223,376     -- 
access-fannkuch:
  total          89,080,367      89,101,292     -- 
  on-trace       79,049,944      79,049,964     -- 
access-nbody:
  total          29,204,526      29,219,975    *1.001x worse*
  on-trace       15,532,117      15,532,109     -- 
access-nsieve:
  total          36,560,446      36,558,001     -- 
  on-trace       25,245,377      25,245,369     -- 
bitops-3bit-bits-in-byte:
  total           7,463,084       7,463,382     -- 
  on-trace        3,250,323       3,250,315     -- 
bitops-bits-in-byte:
  total          36,858,840      36,861,468     -- 
  on-trace       32,524,253      32,524,245     -- 
bitops-bitwise-and:
  total          15,871,627      15,871,542     -- 
  on-trace       12,016,982      12,016,974     -- 
bitops-nsieve-bits:
  total          38,737,937      38,736,131     -- 
  on-trace       33,392,857      33,392,853     -- 
controlflow-recursive:
  total          17,387,529      17,387,196     -- 
  on-trace       13,169,276      13,169,268     -- 
crypto-aes:
  total          69,908,532      69,964,419    *1.001x worse*
  on-trace       27,946,704      27,946,834     -- 
crypto-md5:
  total          32,603,170      32,602,828     -- 
  on-trace        4,787,562       4,787,560     -- 
crypto-sha1:
  total          17,228,155      17,236,699     -- 
  on-trace        6,623,850       6,623,852     -- 
date-format-tofte:
  total          69,431,577      69,439,492     -- 
  on-trace       21,307,289      21,307,281     -- 
date-format-xparb:
  total          69,658,521      69,711,965    *1.001x worse*
  on-trace        9,670,251       9,670,243     -- 
math-cordic:
  total          43,487,414      43,487,827     -- 
  on-trace       29,662,897      29,662,889     -- 
math-partial-sums:
  total          22,765,364      22,770,643     -- 
  on-trace        6,270,213       6,270,205     -- 
math-spectral-norm:
  total          22,422,499      22,430,152     -- 
  on-trace       13,634,579      13,634,577     -- 
regexp-dna:
  total          49,520,074      49,523,108     -- 
  on-trace       34,587,322      34,587,314     -- 
string-base64:
  total          30,244,170      30,253,351     -- 
  on-trace        9,282,553       9,282,545     -- 
string-fasta:
  total          85,531,280      85,531,893     -- 
  on-trace       24,495,021      24,495,023     -- 
string-tagcloud:
  total         106,235,981     106,248,739     -- 
  on-trace       17,118,481      17,118,473     -- 
string-unpack-code:
  total         135,209,381     135,166,286     -- 
  on-trace       20,853,993      20,853,985     -- 
string-validate-input:
  total          43,909,353      43,924,497     -- 
  on-trace        8,480,687       8,480,683     -- 
-------
all:
  total       1,321,337,721   1,321,593,246     -- 
  on-trace      570,963,295     570,963,365     --
Comment 3 Julian Seward [:jseward] 2010-11-12 16:36:29 PST
Perf numbers in comments 1 and 2 are with "-j -m", I should add.
Comment 4 Nicholas Nethercote [:njn] 2010-11-12 17:37:08 PST
(In reply to comment #3)
> Perf numbers in comments 1 and 2 are with "-j -m", I should add.

Running with '-j -m -p' is best;  that's what the browser effectively does.  Not that it'll make a skerrick of difference in this case, I imagine.
Comment 5 Julian Seward [:jseward] 2010-11-12 23:36:19 PST
(In reply to comment #4)
> Running with '-j -m -p' is best; 

I'm not sure what a skerrick is.  Perhaps a small boat, a bit like a
coracle?

Anyway, with -j -m -p the increase in instruction count on SS is
reduced, from 0.0193% (-j -m) to 0.0139%, or about 1 part in 7200.
The space savings on imaging-darkroom are unchanged.


------------------------------------------------------------------
instructions executed (nb: "on-trace" may overestimate slightly)
------------------------------------------------------------------
3d-cube:
  total          69,741,119      69,786,418    *1.001x worse*
  on-trace       44,497,984      44,497,992     -- 
3d-morph:
  total          39,541,147      39,546,921     -- 
  on-trace       24,587,583      24,587,575     -- 
3d-raytrace:
  total          70,410,616      70,432,055     -- 
  on-trace       41,270,450      41,270,442     -- 
access-binary-trees:
  total          25,595,255      25,597,805     -- 
  on-trace       12,242,688      12,242,680     -- 
access-fannkuch:
  total          93,605,442      93,610,511     -- 
  on-trace       88,475,968      88,475,960     -- 
access-nbody:
  total          28,391,512      28,397,098     -- 
  on-trace       16,274,177      16,274,169     -- 
access-nsieve:
  total          36,615,173      36,618,067     -- 
  on-trace       30,207,428      30,207,420     -- 
bitops-3bit-bits-in-byte:
  total           7,489,837       7,490,205     -- 
  on-trace        3,256,081       3,256,073     -- 
bitops-bits-in-byte:
  total          35,331,790      35,332,687     -- 
  on-trace       30,401,150      30,401,142     -- 
bitops-bitwise-and:
  total          15,885,196      15,885,166     -- 
  on-trace       12,019,216      12,019,208     -- 
bitops-nsieve-bits:
  total          38,515,088      38,518,480     -- 
  on-trace       33,394,926      33,394,922     -- 
controlflow-recursive:
  total          17,387,939      17,387,650     -- 
  on-trace       13,169,288      13,169,280     -- 
crypto-aes:
  total          56,004,894      56,017,085     -- 
  on-trace       32,159,499      32,159,491     -- 
crypto-md5:
  total          24,715,625      24,736,064    *1.001x worse*
  on-trace       12,371,649      12,371,647     -- 
crypto-sha1:
  total          20,144,832      20,151,979     -- 
  on-trace        8,772,674       8,772,676     -- 
date-format-tofte:
  total          68,374,916      68,381,923     -- 
  on-trace       21,500,778      21,500,770     -- 
date-format-xparb:
  total          68,297,325      68,304,002     -- 
  on-trace       10,329,790      10,329,782     -- 
math-cordic:
  total          43,610,475      43,610,511     -- 
  on-trace       29,515,359      29,515,351     -- 
math-partial-sums:
  total          22,881,300      22,887,020     -- 
  on-trace        6,310,886       6,310,878     -- 
math-spectral-norm:
  total          32,735,683      32,736,644     -- 
  on-trace       27,545,614      27,545,606     -- 
regexp-dna:
  total          49,156,069      49,158,150     -- 
  on-trace       34,581,182      34,581,174     -- 
string-base64:
  total          30,431,623      30,434,918     -- 
  on-trace        9,320,547       9,320,539     -- 
string-fasta:
  total          64,711,451      64,714,231     -- 
  on-trace       32,459,728      32,459,730     -- 
string-tagcloud:
  total         105,565,938     105,569,290     -- 
  on-trace       17,153,732      17,153,724     -- 
string-unpack-code:
  total         133,481,056     133,481,308     -- 
  on-trace       20,864,422      20,864,414     -- 
string-validate-input:
  total          44,023,802      44,031,979     -- 
  on-trace        8,583,556       8,583,552     -- 
-------
all:
  total       1,242,645,103   1,242,818,167     -- 
  on-trace      621,266,355     621,266,197     -- 


(gdb) p ((1.0 / (1242645103.0 / 1242818167.0))-1.0) * 100
$4 = 0.01392706570702007
Comment 6 Julian Seward [:jseward] 2010-11-15 08:19:49 PST
Created attachment 490575 [details] [diff] [review]
remove partially redundant load

1-line change compared to previous version.  Reduces instruction
count overhead on sunspider down to 1 part in 7500.
Comment 7 David Anderson [:dvander] 2010-11-23 11:48:29 PST
Comment on attachment 490575 [details] [diff] [review]
remove partially redundant load

Sorry for the delay here. Thanks for doing this, I didn't realize nmap was so memory hungry. Patch looks good but would like to see some style changes before r+'ing, I promise faster turnaround :)

>+    if (nNmapLive > 0) {
>+        for (size_t i = 0; i < script->length; i++) {
>+            analyze::Bytecode *opinfo = analysis->maybeCode(i);
>+            if (opinfo && opinfo->safePoint) {
>+                Label L = jumpMap[i];
>+                JS_ASSERT(L.isValid());
>+                nmap[ix++] = (void*)(uintptr_t)i;
>+                nmap[ix++] = (uint8 *)(result + masm.distanceOf(L));
>+            }

Could this be something like:
  struct NativeMapEntry {
    jsbytecode *pc;
    void       *ncode;
  };

instead?

>+inline void * bsearch_nmap(void** map, size_t nPairs, void* key)
>+{
>+    ssize_t lo = 0, hi = nPairs-1; /* lo, hi, mid must be signed! */
>+    while (1) {
>+        /* current unsearched space is from lo to hi, inclusive. */
>+        if (lo > hi) return NULL; /* not found */
>+        ssize_t mid     = (lo + hi) / 2;
>+        void*   key_mid = map[2 * mid];
>+        if (key < key_mid) { hi = mid-1; continue; } 
>+        if (key > key_mid) { lo = mid+1; continue; }
>+        return map[2 * mid + 1];
>+    }

SpiderMonkey style wants 

>+        if (lo > hi) return NULL; /* not found */

To be

>         if (low > hi)
>             return NULL;

And 

>+        if (key > key_mid) { lo = mid+1; continue; }

To be

>         if (key > key_mid) {
>             lo = mid + 1;
>             continue;
>         }

Etc          

> inline void *
> JSScript::nativeCodeForPC(bool constructing, jsbytecode *pc)
> {
>-    void **nmap = nativeMap(constructing);
>+    js::mjit::JITScript *jit = getJIT(constructing);
>+    JS_ASSERT(jit);

You can remove this assert, since it'll null-deref pretty quickly.

>+    bool isCon = f.fp()->isConstructing();

Prefer "ctor" to "isCon" here and elsewhere
Comment 8 Julian Seward [:jseward] 2010-12-02 08:53:19 PST
Created attachment 494730 [details] [diff] [review]
respin, addressing c7 feedback

Addresses all the c7 issues (thx for feedback).  Also gets
rid of use of 'ssize_t' in bsearch_nmap as MSVC9 doesn't
take kindly to it.
Comment 9 David Anderson [:dvander] 2010-12-08 12:13:21 PST
http://hg.mozilla.org/tracemonkey/rev/92a5b1438bae
Comment 11 christian 2011-01-04 15:32:53 PST
As per today's meeting, beta 9 will be a time-based release. Marking these all betaN+. Please move it back to beta9+ if  you believe it MUST be in the next beta (ie: trunk is in an unshippable state without this)

Note You need to log in before you can comment on or make changes to this bug.