479090 - TM: Box2D performance regression in JIT

Reporter

Description

•

17 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20090217 Shiretoko/3.1b3pre I saw the URL mentioned above on reddit, and thought I'd give it a try. On Firefox 3.0.6, it worked decently well, using ~60% of the CPU and giving mostly smooth performance. I thought I'd try it on a nightly of Fx3.1 (that had user and chrome JIT enabled), and it started using >85% of the CPU, with much slower performance overall. I tried turning off chrome JIT, and then turning off all JITting. In the last case, performance improved enormously -- <15% CPU, and subjectively faster and smoother than Fx3.0. Reproducible: Always I don't see this as a bug, per se, but as a useful test page to evaluate TraceMonkey's performance improvements. If there's a tracking bug or something for that, just tack this URL on to it, and mark this bug as invalid. Hope this helps...

Robert Sayre

Comment 1

•

17 years ago

I got this to crash: http://crash-stats.mozilla.com/report/index/77b0ecf3-39a4-4d72-b675-756ca2090218?p=1

Andreas Gal :gal

Comment 2

•

17 years ago

This looks like an OOM crash. David has seen similar ones in the past.

David Mandelin [:dmandelin]

Assignee

Comment 3

•

17 years ago

(In reply to comment #2) > This looks like an OOM crash. David has seen similar ones in the past. Yes. I wish I could remember exactly what happened the last time we saw a jump to zero, but I think it was from an OOM that sneaked past the checks.

Assignee: general → dmandelin

David Mandelin [:dmandelin]

Assignee

Comment 4

•

17 years ago

I haven't been able to duplicate any crash. My visualizations show that the perf loss has a couple different sources. First, we have segments (of execution time) where we pay the overhead of calling js_MonitorLoopEdge but don't do any tracing. Second, we have segments where we do a bunch of trace recording, and subsequently call traces, but we fall off trace very quickly and frequently, so we pay the compilation and trace entry overhead for not much gain. I'll look into the causes now. So far, the perf loss is visually apparent but I don't have a way of quantifying it yet. It looks like the application runs on some kind of ticks so it should be possible to modify it to add a progress metric.

David Mandelin [:dmandelin]

Assignee

Comment 5

•

17 years ago

Here are the causes for each time we enter the interpreter (see bug 465773 comment 40 for explanation of terms and columns): s count ms per 10.203056 187932 0.054291 cold 3.342768 2193 1.524290 none 3.032017 56801 0.053380 loop2 1.084147 54646 0.019839 branch 0.900984 728 1.237615 oracle1 0.175597 6399 0.027441 badexit 0.173031 777 0.222691 record 0.001967 174 0.011307 innerPC 0.000769 32 0.024029 unstable 0.000061 1 0.060975 callback "cold" means we called js_MonitorLoopEdge but the hit count on the loop header pc has not yet surpassed HOTLOOP. So on these loops we're racking up some overhead but not getting to do any tracing. Since HOTLOOP is 2, this suggests the app has many different loops, or that there are too many cache flushes, which reset the counter. If we have a lot of code, that could be what's happening. "loop2" and "branch" are the frequent exits after running a trace for a bit. "branch" means we're hitting the MAX_BRANCHES limit. "loop2" means we exited a loop, which is a pretty normal thing. It's weird that it takes us so long (50 us on average) to find another loop edge. "none" is also weird--why does it take us so long to start tracing on each call into the interpreter.

David Mandelin [:dmandelin]

Assignee

Comment 6

•

17 years ago

Heh, too bad I'm not a bigger expert on tracing. All the "cold" exits to interpreter should have been a clue that we are blacklisting some hot loops, which appears to be the biggest problem here. Here are the aborts from a short run: 1060 abort: 6233: can't trace arguments yet 253 abort: 5683: failed to find property 69 abort: 6671: untraceable native 21 abort: 4808: fp->scopeChain is not global or active call object 10 abort: 6151: returned out of a loop we started tracing 6 abort: 6069: elem op hit direct and slotless getter or setter 5 abort: 8231: JSOP_BINDNAME crosses global scopes 5 abort: 7735: non-stub getter 3 abort: 6067: elem op hit prototype property, can't shape-guard 2 abort: 9391: only dense arrays supported 1 abort: 7219: non-string, non-int JSOP_GETELEM index 1 abort: 5788: non-stub getter

David Mandelin [:dmandelin]

Assignee

Updated

•

16 years ago

Summary: The URL above shows a strong performance regression with JIT turned on than with it off → TM: Box2D performance regression in JIT

Andreas Gal :gal

Comment 8

•

16 years ago

We should really trace arguments. Its almost the same thing as upvar on trace.

David Mandelin [:dmandelin]

Assignee

Comment 9

•

16 years ago

On trunk, the main problem right now is that we flush the JIT code cache frequently. See my TraceVis visualization [1]. The red circles are calls to FlushJITCache, and the ones with "O" are caused by code cache OOM. In this case, the tracer records some traces, but goes OOM and flushes them before really getting to run any. Thus, it spends about 1/3 of its time creating traces that it hardly gets to run at all. If I increase code cache size to 64MB (vs. 16 MB standard), things are somewhat better, with hardly any OOMs. But in that case, we do a "B" flush every 8 seconds or so. A "B" flush is just a flush that was triggered in a deep bail and processed later. Exactly why those flushes occur I don't know. The other bad thing here is that we don't trace much after the first burst of trace recording. I think that's caused by a variety of aborts, primarily involving |arguments|. I think the next thing to do for this program is to learn why we use up so much code cache memory. Then, we can either try to reduce the usage, or add something that turns off tracing for loops/programs that blow out the code cache too often. [1] http://people.mozilla.org/~dmandelin/box2d.png

David Mandelin [:dmandelin]

Assignee

Updated

•

16 years ago

Depends on: 500009

Brendan Eich [:brendan]

Updated

•

16 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

David Mandelin [:dmandelin]

Assignee

Comment 10

•

16 years ago

Memory profiling shows that just before each OOM, we've compiled about 2.7 MB native code and generated about 13 MB of LIR. So, if we can free the LIR (bug 497009), we'll be able to do a lot better. We should probably also find a way to detect repeated OOMs at some point as well.

Depends on: 497009

Boris Zbarsky [:bzbarsky]

Updated

•

16 years ago

Blocks: 467263

Keywords: perf

Mark Steele

Comment 11

•

16 years ago

Attached file quick & dirty conversion of box2d to an offline testcase — Details

packaged up box2d with a shell driver. run.js loads a specific test and runs a fixed number of steps. needs more work to be used as benchmark: - stack.js still has slightly random placement of objects. - the driver loads more than it needs and doesn't cycle through other tests - the demos were designed as visual demos and may not have enough varied work to exercise different parts. for example, stack loads a scene with 3 stacked columns of squares and lets them fall and collide. there was also a demo that waits for user input and doesn't test the collision stuff that was hurting on traces.

Boris Zbarsky [:bzbarsky]

Updated

•

16 years ago

Depends on: 497789

Mike Shaver (:shaver emeritus)

Comment 12

•

15 years ago

box2d sure does like fatvals and other improvements!

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → WORKSFORME

Bugzilla

TM: Box2D performance regression in JIT

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: benjamin.lerner, Assigned: dmandelin)

References

(
URL
)

Details

(Keywords: perf)

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 8

Comment 9

Updated

Updated

Comment 10

Updated

Comment 11

Updated

Comment 12

Attachment

General

Description

File Name

Content Type