Closed
Bug 492673
Opened 15 years ago
Closed 15 years ago
TM: Fix memory pressure measurement
Categories
(Core :: JavaScript Engine, defect, P2)
Tracking
()
RESOLVED
FIXED
People
(Reporter: Natch, Assigned: graydon)
References
()
Details
(Keywords: fixed1.9.1, regression, Whiteboard: fixed-in-tracemonkey)
Attachments
(2 files)
739 bytes,
patch
|
Details | Diff | Splinter Review | |
728 bytes,
patch
|
brendan
:
review+
gal
:
review+
|
Details | Diff | Splinter Review |
Gonna try to nail down a regression range for this, but in the past I would get numbers in the mid-800's, now it's down to about 150.
Reporter | ||
Comment 1•15 years ago
|
||
Bug is on 1.9.1 as well. Regression range: 20090414: works 20090415: regressed pushlog: http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=49643dfd3008&tochange=78a21b8efe1b I don't have a tracemonkey tree for building, so I can't easily bisect, but I hope this is a start. Make sure you purge the browser cache before attempting to bisect as it seems to play with the results.
Flags: blocking1.9.1?
Keywords: regression
Updated•15 years ago
|
Assignee: general → gal
Comment 2•15 years ago
|
||
This is easy to produce in the shell. We get 1000+ score if run separately, but if run as part of the other tests we get hammered (280). h-243:v8 gal$ ../Darwin_OPT.OBJ/js -j run-crypto.js Crypto: 1054 ---- Score: 1054 h-243:v8 gal$ ../Darwin_OPT.OBJ/js -j run.js Richards: 1758 DeltaBlue: 169 Crypto: 284 RayTrace: 244 EarleyBoyer: 390 ---- Score: 381
Comment 3•15 years ago
|
||
I think we should block on this until its know what causes this, mostly to make sure we get someone to look at this in time to identify the severity. sayrer?
Comment 4•15 years ago
|
||
(I wouldn't strictly block on this if we are just slow for some low-risk, soon to be fixed-on-trunk reason, but I would like to be sure)
Updated•15 years ago
|
Flags: blocking1.9.1? → blocking1.9.1+
Comment 5•15 years ago
|
||
Comment 6•15 years ago
|
||
Ok, this is a GC problem. Crypto hits the GC while running, causing performance to suck. This only happens when other stuff runs first, not when run solo.
Comment 7•15 years ago
|
||
(the attached patch makes the slowdown disappear by causing a gc between benchmarks)
Comment 8•15 years ago
|
||
I am just going to have to make a snippy comment about this benchmark program. People who manage to implement a crypto algorithm (integer data read from an array manipulated using arithmetic operations and stored into an array as integer data) in a way that causes excessive GC pressure shouldn't be allowed near computers. I will try to find out where most of the garbage is coming from, but I suspect the OO-ish BigNumber implementation.
Flags: blocking1.9.1+ → blocking1.9.1?
Comment 9•15 years ago
|
||
still a blocker. why should it have gotten slower?
Flags: blocking1.9.1? → blocking1.9.1+
Comment 10•15 years ago
|
||
It didn't get slower. Its just as far as before. We are gcing during the benchmark, which kills the score. Previously the sun and the stars aligned slightly differently, causing the GC somewhere else.
Reporter | ||
Comment 11•15 years ago
|
||
Andreas: the same benchmark page was being used to test the speed, if there was a change in gc that should (IMHO) be investigated: 1.9.1 (2009-04-14 build): mid-500s trunk (2009-04-14 build): mid-800s current 1.9.1: 70-90 current trunk: mid-100s
Comment 12•15 years ago
|
||
Yes, this most definitively needs to be investigated. I am trying to hack together some annotations to better observe what happens. I think this can be fixed for 1.9.1.
Comment 13•15 years ago
|
||
(If someone wants to bisect down to the exact patch that caused this behavior first, that would be great).
Whiteboard: regression-window-wanted
Reporter | ||
Comment 14•15 years ago
|
||
hg bisect says: The first bad revision is: changeset: 27205:78a21b8efe1b user: Brendan Eich <brendan@mozilla.org> date: Wed Apr 15 01:57:13 2009 -0700 summary: Bug 488015 - Crash [@ js_GetUpvar ] (also bogus JS errors, also probably Crash [@js_Interpret]) (future r=mrbkap, see bug).
Blocks: 488015
Reporter | ||
Updated•15 years ago
|
Whiteboard: regression-window-wanted
Comment 15•15 years ago
|
||
I bisected the shell slowdown. It flags the following patch: The first bad revision is: changeset: 28170:05ea1afb04f8 user: Nicholas Nethercote <nnethercote@mozilla.com> date: Sat May 09 14:38:34 2009 -0400 summary: Bug 488775 - TM: widen LIR instructions. r=graydon,edwsmith
Comment 16•15 years ago
|
||
Yeah, something is definitively wrong with the LIR patch: h-243:v8 gal$ hg up -r 28169 9 files updated, 0 files merged, 0 files removed, 0 files unresolved h-243:v8 gal$ TRACEMONKEY=stats ../Darwin_DBG.OBJ/js -j slow.js DeltaBlue: 14 Crypto: 277 ---- Score: 62 recorder: started(186), aborted(102), completed(321), different header(3), trees trashed(2), slot promoted(0), unstable loop variable(132), breaks(0), returns(2), unstableInnerCalls(30) monitor: triggered(21250), exits(21250), type mismatch(0), global mismatch(1) h-243:v8 gal$ hg up -r 28170 9 files updated, 0 files merged, 0 files removed, 0 files unresolved h-243:v8 gal$ TRACEMONKEY=stats ../Darwin_DBG.OBJ/js -j slow.js DeltaBlue: 14 Crypto: 177 ---- Score: 50 recorder: started(678), aborted(91), completed(289), different header(2), trees trashed(2), slot promoted(0), unstable loop variable(128), breaks(0), returns(2), unstableInnerCalls(27) monitor: triggered(21315), exits(21315), type mismatch(0), global mismatch(1)
Comment 17•15 years ago
|
||
I can replicate this, although the numbers are quite variable. But post the widen-LIR patch has worst crypto performance when run after DeltaBlue. The most interesting thing I've found so far is this: I've been working on a patch to narrow the LIR again, by compacting it more (but not as much as it was prior to the widening). I just tried the pre/post measurements with it, and crypto's speed rating jumps from around 200 to around 800 or 900! (That's a debug build.) This would seem to corroborate the theory that the increased LIR size is somehow causing the problem.
Comment 18•15 years ago
|
||
The increased LIR size is definitively the trigger but we have to figure out what exactly happens. A 4x drop is scary.
Comment 19•15 years ago
|
||
Does the change bump it over a cache size?
Comment 20•15 years ago
|
||
What cache size?
Comment 21•15 years ago
|
||
I don't know. What CPU is being used? At least some Core 2's have 32 KB of L1 data cache and 32 KB of L1 instruction cache.
Comment 22•15 years ago
|
||
LIR is write once read once, both in quick succession. I think icache and dcache are irrelevant here, despite the possible increase in L2 write misses (which is probably simply due to higher write throughput).
Comment 23•15 years ago
|
||
Increasing the code cache size doesn't solve the problem. This bug isn't caused by the larger LIR patch. That merely triggers something else. This is 28170 with 32MB code cache. DeltaBlue: 14 Crypto: 158 ---- Score: 47 recorder: started(677), aborted(90), completed(286), different header(3), trees trashed(2), slot promoted(0), unstable loop variable(128), breaks(0), returns(2), unstableInnerCalls(25) monitor: triggered(21133), exits(21133), type mismatch(0), global mismatch(1) This is 28170 with 64MB code cache. DeltaBlue: 14 Crypto: 123 ---- Score: 42 recorder: started(672), aborted(91), completed(270), different header(2), trees trashed(2), slot promoted(0), unstable loop variable(128), breaks(0), returns(2), unstableInnerCalls(24) monitor: triggered(20101), exits(20101), type mismatch(0), global mismatch(1) This is 28170 with 128MB code cache. h-243:v8 gal$ TRACEMONKEY=stats ../Darwin_DBG.OBJ/js -j slow.js DeltaBlue: 14 Crypto: 122 ---- Score: 41 recorder: started(673), aborted(91), completed(270), different header(2), trees trashed(2), slot promoted(0), unstable loop variable(128), breaks(0), returns(2), unstableInnerCalls(24) monitor: triggered(20101), exits(20101), type mismatch(0), global mismatch(1)
Comment 24•15 years ago
|
||
This is the slow revision with 16MB code cache (default): h-243:v8 gal$ TRACEMONKEY=verbose ../Darwin_DBG.OBJ/js -j slow.js | grep "lush" No global slotlist for global shape 1151, flushing cache. Flushing cache. We only flush once early on and never again.
Comment 25•15 years ago
|
||
The slow version allocates another 16MB slab for the code cache (one large mmap, 16MB). It probably costs some, but doesn't explain the stats difference.
Comment 26•15 years ago
|
||
The trace output just visually looks bad: ecording starting from slow.js:729@26 recording completed at slow.js:729@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:682@26: No compatible inner tree. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop
Reporter | ||
Comment 27•15 years ago
|
||
Andreas: this slow down is visible on 1.9.1 where the LIR patch has *not* landed yet, it cannot be the cause AFAICT. Was my bisection in comment 14 wrong? I don't see how, but I'll retest tonight...
Comment 28•15 years ago
|
||
I can't reproduce the slowdown with your revision. We have pretty much proven at this point that the patch I bisected triggers a bug, but doesn't cause it. I am working on the assumption that both your and my rev merely trigger and don't cause the same issue.
Comment 29•15 years ago
|
||
On my machine I get lots of variation in the timings. I wouldn't trust a single run of each version to give reliable numbers.
Reporter | ||
Comment 30•15 years ago
|
||
(In reply to comment #28) Err, I fail :( . I was sure the regression-range was right, turns out the original regression range I posted was totally off. /me tries to corroborate the regression-range again...
Comment 31•15 years ago
|
||
Some updated results: - Running run.js (ie. Richards, then DeltaBlue, then Crypto) I see the slowdown caused by widen-LIR: Crypto goes from 600--700 to about 250 - Running just DeltaBlue then Crypto, I don't see a slowdown after widen-LIR: in both cases it's 250-ish - Inserting the gc() between benchmarks causes the pre/post-widen-LIR differences to go away in both the above cases. Conclusion? GC weirdness that just happends to manifest most strongly with run.js with the widen-LIR patch. But if you bisected on a different benchmark (eg. just DeltaBlue then Crypto) I suspect you'd get a different patch to blame.
Reporter | ||
Comment 32•15 years ago
|
||
So the problem is that I can reproduce that regression range 100% with tracemonkey nightlies, however when I build that revision myself I can't reproduce. The reason why bisect gave me that revision is because it was the last one left, I hadn't tested it until now, and it does not regress it for me. However, local builds of trunk and 3.5 are 100% reproducible for me, as are nightlies from tracemonkey. I'm gonna build tracemonkey tip and see how it goes for me...
No longer blocks: 488015
Comment 33•15 years ago
|
||
To clarify comment 31, it seems that the order in which benchmarks are run makes a huge difference, for whatever reason. Which makes benchmarking very dubious, because the numbers we get within a run of v8, for example, are quite different to those when running a subset of v8, or v8 in a different order... it's quite worrying, really.
Comment 34•15 years ago
|
||
The shell harness doesn't GC and we might be seeing some oracle poisoning (which shouldn't happen since the oracle is supposed to be precise these days).
Assignee | ||
Comment 35•15 years ago
|
||
(In reply to comment #34) > The shell harness doesn't GC and we might be seeing some oracle poisoning > (which shouldn't happen since the oracle is supposed to be precise these days). The oracle is *not* precise. You keep saying this, I don't know where you got that impression. I also don't have a completed patch to make it such long term. Only one small debugging patch that makes a precise oracle using an STL map -- entirely inappropriate for shipping. We'll need something more carefully done for the long term. I can reproduce the GC-sensitivity of crypto in general now, definitely. On practically any rev. Something in deltablue kills it. There are a few loops in deltablue that wind up somewhat odd. For example, the one anchored at line 741 winds up having 33 side exits. It's a 3 line loop with 1 branch (I guess 2 given the short-circuit eval in it) that only calls 4 functions, each of which is a one-liner that delegates to a relatively plain JS object method (length, push, etc.) 33 seems a bit high. Otherwise I don't really get why deltablue is thrashing so badly. Will keep digging.
Assignee | ||
Comment 36•15 years ago
|
||
Not the oracle (checked with precise-oracle debugging patch from 487337, also counted oracle entries and there are only a few). On to blacklisting!
Assignee | ||
Comment 37•15 years ago
|
||
Not related to blacklisting. Turning off compilation-backoff blacklisting makes the total blacklisting count drop to zero, but perf difference is still there.
Assignee | ||
Comment 38•15 years ago
|
||
Scratch that; it *may* be slightly related to blacklisting. Or rather, to the possibility of crypto needing to do some blacklisting, but not being able to. Or doing too much when it should only do a bit. I've spent several hours fiddling with the blacklist knob and it appears to be sensitive to it but not in an obvious fashion. Also curious fact discovered: it's quite sensitive to the setting of HOTLOOP, as is richards in the inverse direction (richards gets faster as HOTLOOP is decreased, crypto gets faster as it's increased). This is silly.
Updated•15 years ago
|
Priority: -- → P2
Assignee | ||
Comment 40•15 years ago
|
||
I cannot believe this survived the way it is as long as it did. We *don't measure the right thing* for memory pressure! Deltablue happens to get close to the ceiling; with LIR-widen it actually uses up enough space that crypto tips over the line. Thereafter *every* recording attempt in crypto fails and attempts to flush the cache. That's why it "slows down". Guh. Another day, another horrible bug I introduced. Sorry.
Attachment #377550 -
Flags: review?(gal)
Comment 41•15 years ago
|
||
Comment on attachment 377550 [details] [diff] [review] Bozotic bug fix Based on grep and reading, and my well-known ability to do a dead-on Andreas Impression ;-), I can r+ this. nanojit/Fragmento.h: uint32_t cacheUsed() const { return (_stats.pages-_freePages.size())<<NJ_LOG2_PAGE_SIZE; } /be
Attachment #377550 -
Flags: review+
Comment 42•15 years ago
|
||
Wanted to facilitate getting this landed on tm tip... /be
Assignee | ||
Updated•15 years ago
|
Summary: TM: Recent regression in v8's crypto module → TM: Fix memory pressure measurement
Comment 43•15 years ago
|
||
I'm glad my widen-LIR patch has proven useful ;) Nice hunting, Graydon.
Assignee | ||
Comment 44•15 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/a75d552b0c64
Whiteboard: fixed-in-tracemonkey
Reporter | ||
Comment 45•15 years ago
|
||
Woohoo: Score: 367 Richards: 1506 DeltaBlue: 116 Crypto: 850 RayTrace: 231 EarleyBoyer: 302 RegExp: 236 In the browser version, every time now! Thanks for the fix!!
Comment 46•15 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/a75d552b0c64
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Updated•15 years ago
|
Attachment #377550 -
Flags: review+
Reporter | ||
Updated•15 years ago
|
Attachment #377550 -
Flags: review?(gal)
Comment 47•15 years ago
|
||
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/d25abc1080d2
Keywords: fixed1.9.1
You need to log in
before you can comment on or make changes to this bug.
Description
•