Closed
Bug 479090
Opened 17 years ago
Closed 15 years ago
TM: Box2D performance regression in JIT
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: benjamin.lerner, Assigned: dmandelin)
References
()
Details
(Keywords: perf)
Attachments
(1 file)
|
116.19 KB,
application/zip
|
Details |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20090217 Shiretoko/3.1b3pre
I saw the URL mentioned above on reddit, and thought I'd give it a try. On Firefox 3.0.6, it worked decently well, using ~60% of the CPU and giving mostly smooth performance. I thought I'd try it on a nightly of Fx3.1 (that had user and chrome JIT enabled), and it started using >85% of the CPU, with much slower performance overall. I tried turning off chrome JIT, and then turning off all JITting. In the last case, performance improved enormously -- <15% CPU, and subjectively faster and smoother than Fx3.0.
Reproducible: Always
I don't see this as a bug, per se, but as a useful test page to evaluate TraceMonkey's performance improvements. If there's a tracking bug or something for that, just tack this URL on to it, and mark this bug as invalid. Hope this helps...
Comment 1•17 years ago
|
||
I got this to crash:
http://crash-stats.mozilla.com/report/index/77b0ecf3-39a4-4d72-b675-756ca2090218?p=1
Comment 2•17 years ago
|
||
This looks like an OOM crash. David has seen similar ones in the past.
| Assignee | ||
Comment 3•17 years ago
|
||
(In reply to comment #2)
> This looks like an OOM crash. David has seen similar ones in the past.
Yes. I wish I could remember exactly what happened the last time we saw a jump to zero, but I think it was from an OOM that sneaked past the checks.
Assignee: general → dmandelin
| Assignee | ||
Comment 4•17 years ago
|
||
I haven't been able to duplicate any crash.
My visualizations show that the perf loss has a couple different sources. First, we have segments (of execution time) where we pay the overhead of calling js_MonitorLoopEdge but don't do any tracing. Second, we have segments where we do a bunch of trace recording, and subsequently call traces, but we fall off trace very quickly and frequently, so we pay the compilation and trace entry overhead for not much gain. I'll look into the causes now.
So far, the perf loss is visually apparent but I don't have a way of quantifying it yet. It looks like the application runs on some kind of ticks so it should be possible to modify it to add a progress metric.
| Assignee | ||
Comment 5•17 years ago
|
||
Here are the causes for each time we enter the interpreter (see bug 465773 comment 40 for explanation of terms and columns):
s count ms per
10.203056 187932 0.054291 cold
3.342768 2193 1.524290 none
3.032017 56801 0.053380 loop2
1.084147 54646 0.019839 branch
0.900984 728 1.237615 oracle1
0.175597 6399 0.027441 badexit
0.173031 777 0.222691 record
0.001967 174 0.011307 innerPC
0.000769 32 0.024029 unstable
0.000061 1 0.060975 callback
"cold" means we called js_MonitorLoopEdge but the hit count on the loop header pc has not yet surpassed HOTLOOP. So on these loops we're racking up some overhead but not getting to do any tracing. Since HOTLOOP is 2, this suggests the app has many different loops, or that there are too many cache flushes, which reset the counter. If we have a lot of code, that could be what's happening.
"loop2" and "branch" are the frequent exits after running a trace for a bit. "branch" means we're hitting the MAX_BRANCHES limit. "loop2" means we exited a loop, which is a pretty normal thing. It's weird that it takes us so long (50 us on average) to find another loop edge.
"none" is also weird--why does it take us so long to start tracing on each call into the interpreter.
| Assignee | ||
Comment 6•17 years ago
|
||
Heh, too bad I'm not a bigger expert on tracing. All the "cold" exits to interpreter should have been a clue that we are blacklisting some hot loops, which appears to be the biggest problem here. Here are the aborts from a short run:
1060 abort: 6233: can't trace arguments yet
253 abort: 5683: failed to find property
69 abort: 6671: untraceable native
21 abort: 4808: fp->scopeChain is not global or active call object
10 abort: 6151: returned out of a loop we started tracing
6 abort: 6069: elem op hit direct and slotless getter or setter
5 abort: 8231: JSOP_BINDNAME crosses global scopes
5 abort: 7735: non-stub getter
3 abort: 6067: elem op hit prototype property, can't shape-guard
2 abort: 9391: only dense arrays supported
1 abort: 7219: non-string, non-int JSOP_GETELEM index
1 abort: 5788: non-stub getter
| Assignee | ||
Updated•16 years ago
|
Summary: The URL above shows a strong performance regression with JIT turned on than with it off → TM: Box2D performance regression in JIT
Comment 8•16 years ago
|
||
We should really trace arguments. Its almost the same thing as upvar on trace.
| Assignee | ||
Comment 9•16 years ago
|
||
On trunk, the main problem right now is that we flush the JIT code cache frequently. See my TraceVis visualization [1]. The red circles are calls to FlushJITCache, and the ones with "O" are caused by code cache OOM. In this case, the tracer records some traces, but goes OOM and flushes them before really getting to run any. Thus, it spends about 1/3 of its time creating traces that it hardly gets to run at all.
If I increase code cache size to 64MB (vs. 16 MB standard), things are somewhat better, with hardly any OOMs. But in that case, we do a "B" flush every 8 seconds or so. A "B" flush is just a flush that was triggered in a deep bail and processed later. Exactly why those flushes occur I don't know. The other bad thing here is that we don't trace much after the first burst of trace recording. I think that's caused by a variety of aborts, primarily involving |arguments|.
I think the next thing to do for this program is to learn why we use up so much code cache memory. Then, we can either try to reduce the usage, or add something that turns off tracing for loops/programs that blow out the code cache too often.
[1] http://people.mozilla.org/~dmandelin/box2d.png
Updated•16 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
| Assignee | ||
Comment 10•16 years ago
|
||
Memory profiling shows that just before each OOM, we've compiled about 2.7 MB native code and generated about 13 MB of LIR. So, if we can free the LIR (bug 497009), we'll be able to do a lot better.
We should probably also find a way to detect repeated OOMs at some point as well.
Depends on: 497009
Updated•16 years ago
|
Comment 11•16 years ago
|
||
packaged up box2d with a shell driver.
run.js loads a specific test and runs a fixed number of steps.
needs more work to be used as benchmark:
- stack.js still has slightly random placement of objects.
- the driver loads more than it needs and doesn't cycle through other tests
- the demos were designed as visual demos and may not have enough varied work to exercise different parts. for example, stack loads a scene with 3 stacked columns of squares and lets them fall and collide. there was also a demo that waits for user input and doesn't test the collision stuff that was hurting on traces.
Comment 12•15 years ago
|
||
box2d sure does like fatvals and other improvements!
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•