Closed Bug 611400 Opened 14 years ago Closed 14 years ago

mjit::Compiler::finishThisUp: pc->ncode map wastes space

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: jseward, Assigned: jseward)

References

Details

(Keywords: memory-footprint, Whiteboard: [jsperf][fixed-in-tracemonkey])

Attachments

(1 file, 2 obsolete files)

mjit::Compiler::finishThisUp callocs space to put a bunch of stuff in, including a pc->ncode map. The map is mostly empty. An extreme case is on kraken-1.1/imaging-darkroom, which for a 32-bit build has a max-heap-liveness of 102MB. Of that, 12.2MB is occupied by data calloc'd here, and that space is completely unused: -------------------- 4 of 100 -------------------- max-live: 12,197,056 in 1 blocks tot-alloc: 12,197,056 in 1 blocks (avg size 12197056.00) deaths: 1, at avg age 1,432,667,675 (42.98% of prog lifetime) acc-ratios: 0.00 rd, 0.00 wr (59 b-read, 129 b-written) at 0x47B44FF: calloc (vg_replace_malloc.c:467) by 0x81FF24D: js::mjit::Compiler::finishThisUp(js::mjit::JITSc by 0x8215E47: js::mjit::Compiler::performCompilation(js::mjit: by 0xC2F402F: ??? In this case a total of 3050540 map slots are allocated, of which just 18 are non-zero. I guess this is because of the huge tables of constants in this test (others behave similarly). ------ Problem is more general though. Running a browser with 5 reasonably complex tabs open, there are in total 350498 slots allocated, of which 7049 are non-zero. Estimating (from other data) that 3/4 of those slots are simultaneously live in the browser, that's 3/4 * (350498 - 7049) * sizeof(void*) = 2.06 MB of zeroes on a 64-bit target. That's more than 2% of the total C++ heap size in this run (89.2MB). ------ Could we instead use an array of word-pairs, and binary-search as necessary? This would remove around 95% of the wasted space.
Whiteboard: [jsperf]
Implements non sparse mapping w/ binary search, as per comment #0. Reduces max live heap running kraken-1.1/imaging-darkroom from 102.5MB to 90.0MB (32-bit Linux). When running all of kraken-1.1 on 32-bit linux, there are no obvious performance penalties. Insn count falls marginally from 35,187 million to 35,172 million. Miss count on a 4MB L3 cache falls from 35.5 million to 35.0 million. There are about 220000 binary searches performed.
(In reply to comment #1) > Created attachment 490131 [details] [diff] [review] SS numbers (ss 0.9.1 from svn.webkit.org/repository/webkit/trunk/SunSpider + pbiggar's make-it-deterministic patch in 580532 comment 32) on 32-bit x86 linux In short, a 0.0193% increase in instruction count. OTOH it's a nano-win on kraken in both insn counts and L3 misses, and it does measurably reduce the heap size of the entire browser. ------------------------------------------------------------------ instructions executed (nb: "on-trace" may overestimate slightly) ------------------------------------------------------------------ 3d-cube: total 80,989,452 81,015,351 -- on-trace 45,103,763 45,103,789 -- 3d-morph: total 40,352,790 40,353,703 -- on-trace 23,498,398 23,498,392 -- 3d-raytrace: total 104,305,853 104,360,555 *1.001x worse* on-trace 41,235,219 41,235,247 -- access-binary-trees: total 26,369,799 26,372,751 -- on-trace 12,223,384 12,223,376 -- access-fannkuch: total 89,080,367 89,101,292 -- on-trace 79,049,944 79,049,964 -- access-nbody: total 29,204,526 29,219,975 *1.001x worse* on-trace 15,532,117 15,532,109 -- access-nsieve: total 36,560,446 36,558,001 -- on-trace 25,245,377 25,245,369 -- bitops-3bit-bits-in-byte: total 7,463,084 7,463,382 -- on-trace 3,250,323 3,250,315 -- bitops-bits-in-byte: total 36,858,840 36,861,468 -- on-trace 32,524,253 32,524,245 -- bitops-bitwise-and: total 15,871,627 15,871,542 -- on-trace 12,016,982 12,016,974 -- bitops-nsieve-bits: total 38,737,937 38,736,131 -- on-trace 33,392,857 33,392,853 -- controlflow-recursive: total 17,387,529 17,387,196 -- on-trace 13,169,276 13,169,268 -- crypto-aes: total 69,908,532 69,964,419 *1.001x worse* on-trace 27,946,704 27,946,834 -- crypto-md5: total 32,603,170 32,602,828 -- on-trace 4,787,562 4,787,560 -- crypto-sha1: total 17,228,155 17,236,699 -- on-trace 6,623,850 6,623,852 -- date-format-tofte: total 69,431,577 69,439,492 -- on-trace 21,307,289 21,307,281 -- date-format-xparb: total 69,658,521 69,711,965 *1.001x worse* on-trace 9,670,251 9,670,243 -- math-cordic: total 43,487,414 43,487,827 -- on-trace 29,662,897 29,662,889 -- math-partial-sums: total 22,765,364 22,770,643 -- on-trace 6,270,213 6,270,205 -- math-spectral-norm: total 22,422,499 22,430,152 -- on-trace 13,634,579 13,634,577 -- regexp-dna: total 49,520,074 49,523,108 -- on-trace 34,587,322 34,587,314 -- string-base64: total 30,244,170 30,253,351 -- on-trace 9,282,553 9,282,545 -- string-fasta: total 85,531,280 85,531,893 -- on-trace 24,495,021 24,495,023 -- string-tagcloud: total 106,235,981 106,248,739 -- on-trace 17,118,481 17,118,473 -- string-unpack-code: total 135,209,381 135,166,286 -- on-trace 20,853,993 20,853,985 -- string-validate-input: total 43,909,353 43,924,497 -- on-trace 8,480,687 8,480,683 -- ------- all: total 1,321,337,721 1,321,593,246 -- on-trace 570,963,295 570,963,365 --
Perf numbers in comments 1 and 2 are with "-j -m", I should add.
Attachment #490131 - Flags: review?(dvander)
(In reply to comment #3) > Perf numbers in comments 1 and 2 are with "-j -m", I should add. Running with '-j -m -p' is best; that's what the browser effectively does. Not that it'll make a skerrick of difference in this case, I imagine.
(In reply to comment #4) > Running with '-j -m -p' is best; I'm not sure what a skerrick is. Perhaps a small boat, a bit like a coracle? Anyway, with -j -m -p the increase in instruction count on SS is reduced, from 0.0193% (-j -m) to 0.0139%, or about 1 part in 7200. The space savings on imaging-darkroom are unchanged. ------------------------------------------------------------------ instructions executed (nb: "on-trace" may overestimate slightly) ------------------------------------------------------------------ 3d-cube: total 69,741,119 69,786,418 *1.001x worse* on-trace 44,497,984 44,497,992 -- 3d-morph: total 39,541,147 39,546,921 -- on-trace 24,587,583 24,587,575 -- 3d-raytrace: total 70,410,616 70,432,055 -- on-trace 41,270,450 41,270,442 -- access-binary-trees: total 25,595,255 25,597,805 -- on-trace 12,242,688 12,242,680 -- access-fannkuch: total 93,605,442 93,610,511 -- on-trace 88,475,968 88,475,960 -- access-nbody: total 28,391,512 28,397,098 -- on-trace 16,274,177 16,274,169 -- access-nsieve: total 36,615,173 36,618,067 -- on-trace 30,207,428 30,207,420 -- bitops-3bit-bits-in-byte: total 7,489,837 7,490,205 -- on-trace 3,256,081 3,256,073 -- bitops-bits-in-byte: total 35,331,790 35,332,687 -- on-trace 30,401,150 30,401,142 -- bitops-bitwise-and: total 15,885,196 15,885,166 -- on-trace 12,019,216 12,019,208 -- bitops-nsieve-bits: total 38,515,088 38,518,480 -- on-trace 33,394,926 33,394,922 -- controlflow-recursive: total 17,387,939 17,387,650 -- on-trace 13,169,288 13,169,280 -- crypto-aes: total 56,004,894 56,017,085 -- on-trace 32,159,499 32,159,491 -- crypto-md5: total 24,715,625 24,736,064 *1.001x worse* on-trace 12,371,649 12,371,647 -- crypto-sha1: total 20,144,832 20,151,979 -- on-trace 8,772,674 8,772,676 -- date-format-tofte: total 68,374,916 68,381,923 -- on-trace 21,500,778 21,500,770 -- date-format-xparb: total 68,297,325 68,304,002 -- on-trace 10,329,790 10,329,782 -- math-cordic: total 43,610,475 43,610,511 -- on-trace 29,515,359 29,515,351 -- math-partial-sums: total 22,881,300 22,887,020 -- on-trace 6,310,886 6,310,878 -- math-spectral-norm: total 32,735,683 32,736,644 -- on-trace 27,545,614 27,545,606 -- regexp-dna: total 49,156,069 49,158,150 -- on-trace 34,581,182 34,581,174 -- string-base64: total 30,431,623 30,434,918 -- on-trace 9,320,547 9,320,539 -- string-fasta: total 64,711,451 64,714,231 -- on-trace 32,459,728 32,459,730 -- string-tagcloud: total 105,565,938 105,569,290 -- on-trace 17,153,732 17,153,724 -- string-unpack-code: total 133,481,056 133,481,308 -- on-trace 20,864,422 20,864,414 -- string-validate-input: total 44,023,802 44,031,979 -- on-trace 8,583,556 8,583,552 -- ------- all: total 1,242,645,103 1,242,818,167 -- on-trace 621,266,355 621,266,197 -- (gdb) p ((1.0 / (1242645103.0 / 1242818167.0))-1.0) * 100 $4 = 0.01392706570702007
Attached patch remove partially redundant load (obsolete) — Splinter Review
1-line change compared to previous version. Reduces instruction count overhead on sunspider down to 1 part in 7500.
Attachment #490131 - Attachment is obsolete: true
Attachment #490575 - Flags: review?(dvander)
Attachment #490131 - Flags: review?(dvander)
Comment on attachment 490575 [details] [diff] [review] remove partially redundant load Sorry for the delay here. Thanks for doing this, I didn't realize nmap was so memory hungry. Patch looks good but would like to see some style changes before r+'ing, I promise faster turnaround :) >+ if (nNmapLive > 0) { >+ for (size_t i = 0; i < script->length; i++) { >+ analyze::Bytecode *opinfo = analysis->maybeCode(i); >+ if (opinfo && opinfo->safePoint) { >+ Label L = jumpMap[i]; >+ JS_ASSERT(L.isValid()); >+ nmap[ix++] = (void*)(uintptr_t)i; >+ nmap[ix++] = (uint8 *)(result + masm.distanceOf(L)); >+ } Could this be something like: struct NativeMapEntry { jsbytecode *pc; void *ncode; }; instead? >+inline void * bsearch_nmap(void** map, size_t nPairs, void* key) >+{ >+ ssize_t lo = 0, hi = nPairs-1; /* lo, hi, mid must be signed! */ >+ while (1) { >+ /* current unsearched space is from lo to hi, inclusive. */ >+ if (lo > hi) return NULL; /* not found */ >+ ssize_t mid = (lo + hi) / 2; >+ void* key_mid = map[2 * mid]; >+ if (key < key_mid) { hi = mid-1; continue; } >+ if (key > key_mid) { lo = mid+1; continue; } >+ return map[2 * mid + 1]; >+ } SpiderMonkey style wants >+ if (lo > hi) return NULL; /* not found */ To be > if (low > hi) > return NULL; And >+ if (key > key_mid) { lo = mid+1; continue; } To be > if (key > key_mid) { > lo = mid + 1; > continue; > } Etc > inline void * > JSScript::nativeCodeForPC(bool constructing, jsbytecode *pc) > { >- void **nmap = nativeMap(constructing); >+ js::mjit::JITScript *jit = getJIT(constructing); >+ JS_ASSERT(jit); You can remove this assert, since it'll null-deref pretty quickly. >+ bool isCon = f.fp()->isConstructing(); Prefer "ctor" to "isCon" here and elsewhere
Attachment #490575 - Flags: review?(dvander)
Addresses all the c7 issues (thx for feedback). Also gets rid of use of 'ssize_t' in bsearch_nmap as MSVC9 doesn't take kindly to it.
Attachment #490575 - Attachment is obsolete: true
Attachment #494730 - Flags: review?(dvander)
Attachment #494730 - Flags: review?(dvander) → review+
Keywords: checkin-needed
Blocks: 598466
blocking2.0: --- → beta9+
Assignee: general → jseward
Status: NEW → ASSIGNED
Whiteboard: [jsperf] → [jsperf][fixed-in-tracemonkey]
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Keywords: footprint
As per today's meeting, beta 9 will be a time-based release. Marking these all betaN+. Please move it back to beta9+ if you believe it MUST be in the next beta (ie: trunk is in an unshippable state without this)
blocking2.0: beta9+ → betaN+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: