Closed Bug 492673 Opened 16 years ago Closed 16 years ago

TM: Fix memory pressure measurement

Categories

(Core :: JavaScript Engine, defect, P2)

Product:

Component:

Platform:

x86

Windows Vista

Type:

defect

Priority:

P2

Severity:

normal

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: Natch, Assigned: graydon)

References

(
URL
)

Details

(Keywords: fixed1.9.1, regression, Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files)

patch to v8 benchmark harness causing a gc after every benchmark 16 years ago Andreas Gal :gal 739 bytes, patch		Details \| Diff \| Splinter Review
Bozotic bug fix 16 years ago Graydon Hoare :graydon 728 bytes, patch	brendan : review+ gal : review+	Details \| Diff \| Splinter Review

Nochum Sossonko [:Natch]

Reporter

Description

•

16 years ago

Gonna try to nail down a regression range for this, but in the past I would get numbers in the mid-800's, now it's down to about 150.

Nochum Sossonko [:Natch]

Reporter

Comment 1

•

16 years ago

Bug is on 1.9.1 as well. Regression range: 20090414: works 20090415: regressed pushlog: http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=49643dfd3008&tochange=78a21b8efe1b I don't have a tracemonkey tree for building, so I can't easily bisect, but I hope this is a start. Make sure you purge the browser cache before attempting to bisect as it seems to play with the results.

Flags: blocking1.9.1?

Keywords: regression

Andreas Gal :gal

Updated

•

16 years ago

Assignee: general → gal

Andreas Gal :gal

Comment 2

•

16 years ago

This is easy to produce in the shell. We get 1000+ score if run separately, but if run as part of the other tests we get hammered (280). h-243:v8 gal$ ../Darwin_OPT.OBJ/js -j run-crypto.js Crypto: 1054 ---- Score: 1054 h-243:v8 gal$ ../Darwin_OPT.OBJ/js -j run.js Richards: 1758 DeltaBlue: 169 Crypto: 284 RayTrace: 244 EarleyBoyer: 390 ---- Score: 381

Andreas Gal :gal

Comment 3

•

16 years ago

I think we should block on this until its know what causes this, mostly to make sure we get someone to look at this in time to identify the severity. sayrer?

Andreas Gal :gal

Comment 4

•

16 years ago

(I wouldn't strictly block on this if we are just slow for some low-risk, soon to be fixed-on-trunk reason, but I would like to be sure)

Updated

•

16 years ago

Flags: blocking1.9.1? → blocking1.9.1+

Andreas Gal :gal

Comment 5

•

16 years ago

Attached patch patch to v8 benchmark harness causing a gc after every benchmark — Details — Splinter Review

Andreas Gal :gal

Comment 6

•

16 years ago

Ok, this is a GC problem. Crypto hits the GC while running, causing performance to suck. This only happens when other stuff runs first, not when run solo.

Andreas Gal :gal

Comment 7

•

16 years ago

(the attached patch makes the slowdown disappear by causing a gc between benchmarks)

Andreas Gal :gal

Comment 8

•

16 years ago

I am just going to have to make a snippy comment about this benchmark program. People who manage to implement a crypto algorithm (integer data read from an array manipulated using arithmetic operations and stored into an array as integer data) in a way that causes excessive GC pressure shouldn't be allowed near computers. I will try to find out where most of the garbage is coming from, but I suspect the OO-ish BigNumber implementation.

Flags: blocking1.9.1+ → blocking1.9.1?

Comment 9

•

16 years ago

still a blocker. why should it have gotten slower?

Flags: blocking1.9.1? → blocking1.9.1+

Andreas Gal :gal

Comment 10

•

16 years ago

It didn't get slower. Its just as far as before. We are gcing during the benchmark, which kills the score. Previously the sun and the stars aligned slightly differently, causing the GC somewhere else.

Nochum Sossonko [:Natch]

Reporter

Comment 11

•

16 years ago

Andreas: the same benchmark page was being used to test the speed, if there was a change in gc that should (IMHO) be investigated: 1.9.1 (2009-04-14 build): mid-500s trunk (2009-04-14 build): mid-800s current 1.9.1: 70-90 current trunk: mid-100s

URL: http://v8.googlecode.com/svn/data/ben...

Andreas Gal :gal

Comment 12

•

16 years ago

Yes, this most definitively needs to be investigated. I am trying to hack together some annotations to better observe what happens. I think this can be fixed for 1.9.1.

Andreas Gal :gal

Comment 13

•

16 years ago

(If someone wants to bisect down to the exact patch that caused this behavior first, that would be great).

Whiteboard: regression-window-wanted

Nochum Sossonko [:Natch]

Reporter

Comment 14

•

16 years ago

hg bisect says: The first bad revision is: changeset: 27205:78a21b8efe1b user: Brendan Eich <brendan@mozilla.org> date: Wed Apr 15 01:57:13 2009 -0700 summary: Bug 488015 - Crash [@ js_GetUpvar ] (also bogus JS errors, also probably Crash [@js_Interpret]) (future r=mrbkap, see bug).

Blocks: 488015

Nochum Sossonko [:Natch]

Reporter

Updated

•

16 years ago

Whiteboard: regression-window-wanted

Andreas Gal :gal

Comment 15

•

16 years ago

I bisected the shell slowdown. It flags the following patch: The first bad revision is: changeset: 28170:05ea1afb04f8 user: Nicholas Nethercote <nnethercote@mozilla.com> date: Sat May 09 14:38:34 2009 -0400 summary: Bug 488775 - TM: widen LIR instructions. r=graydon,edwsmith

Andreas Gal :gal

Comment 16

•

16 years ago

Yeah, something is definitively wrong with the LIR patch: h-243:v8 gal$ hg up -r 28169 9 files updated, 0 files merged, 0 files removed, 0 files unresolved h-243:v8 gal$ TRACEMONKEY=stats ../Darwin_DBG.OBJ/js -j slow.js DeltaBlue: 14 Crypto: 277 ---- Score: 62 recorder: started(186), aborted(102), completed(321), different header(3), trees trashed(2), slot promoted(0), unstable loop variable(132), breaks(0), returns(2), unstableInnerCalls(30) monitor: triggered(21250), exits(21250), type mismatch(0), global mismatch(1) h-243:v8 gal$ hg up -r 28170 9 files updated, 0 files merged, 0 files removed, 0 files unresolved h-243:v8 gal$ TRACEMONKEY=stats ../Darwin_DBG.OBJ/js -j slow.js DeltaBlue: 14 Crypto: 177 ---- Score: 50 recorder: started(678), aborted(91), completed(289), different header(2), trees trashed(2), slot promoted(0), unstable loop variable(128), breaks(0), returns(2), unstableInnerCalls(27) monitor: triggered(21315), exits(21315), type mismatch(0), global mismatch(1)

Nicholas Nethercote [inactive]

Comment 17

•

16 years ago

I can replicate this, although the numbers are quite variable. But post the widen-LIR patch has worst crypto performance when run after DeltaBlue. The most interesting thing I've found so far is this: I've been working on a patch to narrow the LIR again, by compacting it more (but not as much as it was prior to the widening). I just tried the pre/post measurements with it, and crypto's speed rating jumps from around 200 to around 800 or 900! (That's a debug build.) This would seem to corroborate the theory that the increased LIR size is somehow causing the problem.

Andreas Gal :gal

Comment 18

•

16 years ago

The increased LIR size is definitively the trigger but we have to figure out what exactly happens. A 4x drop is scary.

Comment 19

•

16 years ago

Does the change bump it over a cache size?

Andreas Gal :gal

Comment 20

•

16 years ago

What cache size?

Comment 21

•

16 years ago

I don't know. What CPU is being used? At least some Core 2's have 32 KB of L1 data cache and 32 KB of L1 instruction cache.

Andreas Gal :gal

Comment 22

•

16 years ago

LIR is write once read once, both in quick succession. I think icache and dcache are irrelevant here, despite the possible increase in L2 write misses (which is probably simply due to higher write throughput).

Andreas Gal :gal

Comment 23

•

16 years ago

Increasing the code cache size doesn't solve the problem. This bug isn't caused by the larger LIR patch. That merely triggers something else. This is 28170 with 32MB code cache. DeltaBlue: 14 Crypto: 158 ---- Score: 47 recorder: started(677), aborted(90), completed(286), different header(3), trees trashed(2), slot promoted(0), unstable loop variable(128), breaks(0), returns(2), unstableInnerCalls(25) monitor: triggered(21133), exits(21133), type mismatch(0), global mismatch(1) This is 28170 with 64MB code cache. DeltaBlue: 14 Crypto: 123 ---- Score: 42 recorder: started(672), aborted(91), completed(270), different header(2), trees trashed(2), slot promoted(0), unstable loop variable(128), breaks(0), returns(2), unstableInnerCalls(24) monitor: triggered(20101), exits(20101), type mismatch(0), global mismatch(1) This is 28170 with 128MB code cache. h-243:v8 gal$ TRACEMONKEY=stats ../Darwin_DBG.OBJ/js -j slow.js DeltaBlue: 14 Crypto: 122 ---- Score: 41 recorder: started(673), aborted(91), completed(270), different header(2), trees trashed(2), slot promoted(0), unstable loop variable(128), breaks(0), returns(2), unstableInnerCalls(24) monitor: triggered(20101), exits(20101), type mismatch(0), global mismatch(1)

Andreas Gal :gal

Comment 24

•

16 years ago

This is the slow revision with 16MB code cache (default): h-243:v8 gal$ TRACEMONKEY=verbose ../Darwin_DBG.OBJ/js -j slow.js | grep "lush" No global slotlist for global shape 1151, flushing cache. Flushing cache. We only flush once early on and never again.

Andreas Gal :gal

Comment 25

•

16 years ago

The slow version allocates another 16MB slab for the code cache (one large mmap, 16MB). It probably costs some, but doesn't explain the stats difference.

Andreas Gal :gal

Comment 26

•

16 years ago

The trace output just visually looks bad: ecording starting from slow.js:729@26 recording completed at slow.js:729@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:682@26: No compatible inner tree. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop recording starting from slow.js:790@36 Abort recording of tree slow.js:790@36 at slow.js:684@43: Inner tree is trying to grow, abort outer recording. recording starting from slow.js:682@26 recording completed at slow.js:682@26 via closeLoop

Nochum Sossonko [:Natch]

Reporter

Comment 27

•

16 years ago

Andreas: this slow down is visible on 1.9.1 where the LIR patch has *not* landed yet, it cannot be the cause AFAICT. Was my bisection in comment 14 wrong? I don't see how, but I'll retest tonight...

Andreas Gal :gal

Comment 28

•

16 years ago

I can't reproduce the slowdown with your revision. We have pretty much proven at this point that the patch I bisected triggers a bug, but doesn't cause it. I am working on the assumption that both your and my rev merely trigger and don't cause the same issue.

Nicholas Nethercote [inactive]

Comment 29

•

16 years ago

On my machine I get lots of variation in the timings. I wouldn't trust a single run of each version to give reliable numbers.

Nochum Sossonko [:Natch]

Reporter

Comment 30

•

16 years ago

(In reply to comment #28) Err, I fail :( . I was sure the regression-range was right, turns out the original regression range I posted was totally off. /me tries to corroborate the regression-range again...

Nicholas Nethercote [inactive]

Comment 31

•

16 years ago

Some updated results: - Running run.js (ie. Richards, then DeltaBlue, then Crypto) I see the slowdown caused by widen-LIR: Crypto goes from 600--700 to about 250 - Running just DeltaBlue then Crypto, I don't see a slowdown after widen-LIR: in both cases it's 250-ish - Inserting the gc() between benchmarks causes the pre/post-widen-LIR differences to go away in both the above cases. Conclusion? GC weirdness that just happends to manifest most strongly with run.js with the widen-LIR patch. But if you bisected on a different benchmark (eg. just DeltaBlue then Crypto) I suspect you'd get a different patch to blame.

Nochum Sossonko [:Natch]

Reporter

Comment 32

•

16 years ago

So the problem is that I can reproduce that regression range 100% with tracemonkey nightlies, however when I build that revision myself I can't reproduce. The reason why bisect gave me that revision is because it was the last one left, I hadn't tested it until now, and it does not regress it for me. However, local builds of trunk and 3.5 are 100% reproducible for me, as are nightlies from tracemonkey. I'm gonna build tracemonkey tip and see how it goes for me...

No longer blocks: 488015

Nicholas Nethercote [inactive]

Comment 33

•

16 years ago

To clarify comment 31, it seems that the order in which benchmarks are run makes a huge difference, for whatever reason. Which makes benchmarking very dubious, because the numbers we get within a run of v8, for example, are quite different to those when running a subset of v8, or v8 in a different order... it's quite worrying, really.

Andreas Gal :gal

Comment 34

•

16 years ago

The shell harness doesn't GC and we might be seeing some oracle poisoning (which shouldn't happen since the oracle is supposed to be precise these days).

Graydon Hoare :graydon

Assignee

Comment 35

•

16 years ago

(In reply to comment #34) > The shell harness doesn't GC and we might be seeing some oracle poisoning > (which shouldn't happen since the oracle is supposed to be precise these days). The oracle is *not* precise. You keep saying this, I don't know where you got that impression. I also don't have a completed patch to make it such long term. Only one small debugging patch that makes a precise oracle using an STL map -- entirely inappropriate for shipping. We'll need something more carefully done for the long term. I can reproduce the GC-sensitivity of crypto in general now, definitely. On practically any rev. Something in deltablue kills it. There are a few loops in deltablue that wind up somewhat odd. For example, the one anchored at line 741 winds up having 33 side exits. It's a 3 line loop with 1 branch (I guess 2 given the short-circuit eval in it) that only calls 4 functions, each of which is a one-liner that delegates to a relatively plain JS object method (length, push, etc.) 33 seems a bit high. Otherwise I don't really get why deltablue is thrashing so badly. Will keep digging.

Graydon Hoare :graydon

Assignee

Comment 36

•

16 years ago

Not the oracle (checked with precise-oracle debugging patch from 487337, also counted oracle entries and there are only a few). On to blacklisting!

Graydon Hoare :graydon

Assignee

Comment 37

•

16 years ago

Not related to blacklisting. Turning off compilation-backoff blacklisting makes the total blacklisting count drop to zero, but perf difference is still there.

Graydon Hoare :graydon

Assignee

Comment 38

•

16 years ago

Scratch that; it *may* be slightly related to blacklisting. Or rather, to the possibility of crypto needing to do some blacklisting, but not being able to. Or doing too much when it should only do a bit. I've spent several hours fiddling with the blacklist knob and it appears to be sensitive to it but not in an obvious fashion. Also curious fact discovered: it's quite sensitive to the setting of HOTLOOP, as is richards in the inverse direction (richards gets faster as HOTLOOP is decreased, crypto gets faster as it's increased). This is silly.

Andreas Gal :gal

Updated

•

16 years ago

Priority: -- → P2

Andreas Gal :gal

Comment 39

•

16 years ago

Graydon is working on this one right now.

Assignee: gal → graydon

Graydon Hoare :graydon

Assignee

Comment 40

•

16 years ago

Attached patch Bozotic bug fix — Details — Splinter Review

I cannot believe this survived the way it is as long as it did. We *don't measure the right thing* for memory pressure! Deltablue happens to get close to the ceiling; with LIR-widen it actually uses up enough space that crypto tips over the line. Thereafter *every* recording attempt in crypto fails and attempts to flush the cache. That's why it "slows down". Guh. Another day, another horrible bug I introduced. Sorry.

Attachment #377550 - Flags: review?(gal)

Brendan Eich [:brendan]

Comment 41

•

16 years ago

Comment on attachment 377550 [details] [diff] [review] Bozotic bug fix Based on grep and reading, and my well-known ability to do a dead-on Andreas Impression ;-), I can r+ this. nanojit/Fragmento.h: uint32_t cacheUsed() const { return (_stats.pages-_freePages.size())<<NJ_LOG2_PAGE_SIZE; } /be

Attachment #377550 - Flags: review+

Brendan Eich [:brendan]

Comment 42

•

16 years ago

Wanted to facilitate getting this landed on tm tip... /be

Graydon Hoare :graydon

Assignee

Updated

•

16 years ago

Summary: TM: Recent regression in v8's crypto module → TM: Fix memory pressure measurement

Nicholas Nethercote [inactive]

Comment 43

•

16 years ago

I'm glad my widen-LIR patch has proven useful ;) Nice hunting, Graydon.

Graydon Hoare :graydon

Assignee

Comment 44

•

16 years ago

http://hg.mozilla.org/tracemonkey/rev/a75d552b0c64

Whiteboard: fixed-in-tracemonkey

Nochum Sossonko [:Natch]

Reporter

Comment 45

•

16 years ago

Woohoo: Score: 367 Richards: 1506 DeltaBlue: 116 Crypto: 850 RayTrace: 231 EarleyBoyer: 302 RegExp: 236 In the browser version, every time now! Thanks for the fix!!

Comment 46

•

16 years ago

http://hg.mozilla.org/mozilla-central/rev/a75d552b0c64

Status: NEW → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Andreas Gal :gal

Updated

•

16 years ago

Attachment #377550 - Flags: review+

Nochum Sossonko [:Natch]

Reporter

Updated

•

16 years ago

Attachment #377550 - Flags: review?(gal)

Comment 47

•

16 years ago

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/d25abc1080d2

Keywords: fixed1.9.1

Nochum Sossonko [:Natch]

Reporter

Updated

•

16 years ago

Blocks: 492694

You need to log in before you can comment on or make changes to this bug.