Closed
Bug 483457
Opened 16 years ago
Closed 7 years ago
IonMonkey: Maybe slower than baseline, probably slower than v8 and jsc on attached testcase
Categories
(Core :: JavaScript Engine, defect, P3)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: duncan.loveday, Assigned: Waldo)
References
Details
(Keywords: perf, regression, testcase)
Attachments
(4 files)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090312 Minefield/3.2a1pre
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090312 Minefield/3.2a1pre
The attached runs much more slowly with JIT enabled than with it disabled. My results are 3-4 ms without JIT and anywhere from 20-50ms with JIT.
Reproducible: Always
Steps to Reproduce:
1. Load the attached and press the button. Note the timing.
2. Do step 1 with "JIT content" disabled and enabled.
3. The biggest difference is observed after reloading the page (e.g. press ctrl-R).
Actual Results:
Performance with JIT is slower
Expected Results:
Performance with JIT should be at least as fast as without.
This code is an extract from a much larger application. When it's running "in situ" I get much larger variation in the timings with JIT, I've even seen times of over a second. Possibly there's more scope for GC to interrupt it and produce much longer than usual timings ?
Reporter | ||
Comment 1•16 years ago
|
||
Reporter | ||
Updated•16 years ago
|
Comment 2•16 years ago
|
||
I see a certain average increase in time (2x to 3x) after these checkins:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f3d5f4a980a0&tochange=859496f90e66
Tested on Windows XP.
Reporter | ||
Comment 3•16 years ago
|
||
I'm not very happy with this test case, just been trying to improve on it but without success.
In my full app, in addition to the slower running of the function itself what also happens is that after the function in the test case is run the code goes on to execute "setTimeout('nextBitOfWork();', 0)". With JIT off the 'nextBitOfWork()' function starts running pretty much immediately. But with JIT on there are intermittent pauses in the range 500-1500ms between issuing "setTimeout()" and the named function running. It's the pauses I really want to get to the bottom of - they are killing performance for me. But when I do the same thing in a small test case I don't get any pauses.
FWIW The use of setTimeout() is via a variant of Neil Mix's javascript 1.7 trampolining library using generator/iterator loops.
Anyone got a suggestion as to why JIT would cause a delay in the execution of a function scheduled with setTimeout() in this way ? And on how it could be reproduced systematically, e.g. if it's to do with more GC activity is there a way of forcing GC to run ?
There are some scary numbers in my full app for certain functions, e.g.
non JIT => JIT
4 => 8.4
2.9 => 9.4
2.9 => 13.0
10.4 => 37.6
0.6 => 7.8
1.5 => 17.4
but they're not reproducible in small test programs.
Comment 4•16 years ago
|
||
(In reply to comment #2)
> I see a certain average increase in time (2x to 3x) after these checkins:
> http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f3d5f4a980a0&tochange=859496f90e66
I meant that the problem suddenly worsened significantly.
Comment 5•16 years ago
|
||
With the latest trunk nightly, JIT disabled, I get an average of 10 ms.
With JIT enabled before 28 Feb 2009 it was about 22 ms.
With JIT enabled now it is about 55 ms.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Updated•16 years ago
|
Flags: blocking1.9.1?
Comment 6•16 years ago
|
||
The slowdown of the test case appears to be due to the problem bug 478525 is trying to fix. The problem is that there are loops that grow objects while also accessing them. The growing part causes the shape to change. The accessing part causes side exits on a shape guard (the one generated at the very end of guardElemOp). Those shape guards are supposed to be unnecessary for this kind of access, and bug 478525 is going to remove them.
The time variance appears to be related to blacklisting. After a few runs, it gets faster, because the problematic object-building loops get blacklisted, so we close to interpreter performance. Occasionally, the time blows up to a really big number. It seems to be related to tracer stuff and not GC. I want to wait until bug 478525 is fixed before trying to figure that out.
Depends on: 478525
Reporter | ||
Comment 7•16 years ago
|
||
(In reply to comment #6)
> Occasionally, the time blows up to a
> really big number. It seems to be related to tracer stuff and not GC.
Interesting...how big is a really big number ? I haven't seen more than 50ms or so with the attached but with the same code running as part of a bigger program I sometimes see much bigger numbers, a second or more.
Comment 8•16 years ago
|
||
(In reply to comment #7)
> (In reply to comment #6)
> > Occasionally, the time blows up to a
> > really big number. It seems to be related to tracer stuff and not GC.
>
> Interesting...how big is a really big number ? I haven't seen more than 50ms or
> so with the attached but with the same code running as part of a bigger program
> I sometimes see much bigger numbers, a second or more.
I got a 500ms run of the attached test case.
Reporter | ||
Comment 9•16 years ago
|
||
Thanks, strange as it may sound, that's a relief. As in, I'm glad someone other than me has seen that happen.
It'd be nice if there was a systematic way of reproducing those exceptionally high times just so we can be sure they've gone away when the various patches land.
Updated•16 years ago
|
Flags: blocking1.9.1? → blocking1.9.1+
Comment 10•16 years ago
|
||
This seems to be a lot better after changeset 9ae617e1d224 from bug 478525. Can you try it again once this gets out to nightlies?
Comment 11•16 years ago
|
||
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090318 Minefield/3.6a1pre ID:20090318053518
I can still reproduce the bug with the latest nightly and hourly on Windows.
Reporter | ||
Comment 12•16 years ago
|
||
I'd say there's a slight improvement
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090312 Minefield/3.2a1pre
JIT off => 4ms JIT on => 40-45ms
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090318 Minefield/3.2a1pre
JIT off => 4ms JIT on => 35-40ms
Some way to go but there is progress.
Reporter | ||
Comment 13•16 years ago
|
||
(In reply to comment #10)
> This seems to be a lot better after changeset 9ae617e1d224 from bug 478525. Can
> you try it again once this gets out to nightlies?
Actually I don't see that changeset on the bug history for 478525. Has it definitely landed ?
Reporter | ||
Comment 14•16 years ago
|
||
> Actually I don't see that changeset on the bug history for 478525. Has it
> definitely landed ?
Yes I do. Revision != changeset.
Reporter | ||
Comment 15•16 years ago
|
||
The amount of actual execution, as in number of source-level statements executed, in this test case is *tiny*. That's why it runs so fast in the interpreter.
Is it just that the implementation is over-eager to start tracing ? Could a solution be as simple as saying "don't start tracing until you've executed some threshold amount of code" (assuming the interpreter could compute this statement count cheaply). Or possibly until you've been running for a period of time ?
I know Sun's JVM has thresholds below which JIT is not used, they can even be tweaked with command line options.
Updated•16 years ago
|
Priority: -- → P3
Assignee | ||
Updated•16 years ago
|
Assignee: general → jwalden+bmo
Reporter | ||
Comment 16•16 years ago
|
||
Sadly there's no obvious improvement here after all the hard work that landed recently
Assignee | ||
Comment 17•16 years ago
|
||
Definitely still around, with number roughly approximating these in practice:
find-waldo-now:~/moz/js-tm/js/src jwalden$ ../../obj-i386-apple-darwin8.11.1/dist/bin/js ~/moz/inflight/10x.js
Time=6
find-waldo-now:~/moz/js-tm/js/src jwalden$ ../../obj-i386-apple-darwin8.11.1/dist/bin/js -j ~/moz/inflight/10x.js
Time=136
Reporter | ||
Comment 18•16 years ago
|
||
(In reply to comment #15)
> The amount of actual execution, as in number of source-level statements
> executed, in this test case is *tiny*.......
This is the same thing with some counts added to the various loops. The hottest loop in it is only traversed about 1500 times and each traversal does a handful of assignments, comparisons, property and array accesses.
Comment 19•16 years ago
|
||
There is a lot of compilation going on, with a short runtime that doesn't leave a lot of time to recoup the compilation time. For a longer run the overhead drops to 4x (still too high of course).
Comment 20•16 years ago
|
||
This comment got long, so I'll start with a short take: this is probably unfixable in the short term without taking a 100ms+ hit on SunSpider. I recommend removing blocking status. Details:
One thing I keep forgetting to say is that tests are a lot easier for me to work with if they output results in HTML instead of via an alert. Alerts mess up my detailed timers because the entire time the alert is up looks like JS execution time. Also, you can't copy and paste the output.
Anyway, here's a version of the last test done that way. Also, I run the test 10 times so you can see that the jit runs faster than the interpreter after a few runs.
The initial slow runs are due to trace recording and compilation overhead. Recording and compilation of a trace takes about 200x as long as interpreter execution of the same bytecodes. Here are the loop counts:
cnt1=7, cnt2=109, cnt3=109, cnt4=170, cnt5=27, cnt6=107, cnt7=18, cnt8=56,
cnt9=109, cnt10=2, cnt11=2, cnt12=2, cnt13=2, cnt14=1524, cnt15=0, cnt16=0
Given the 200x cost of the second loop iteration in the tracer, most of these loops should be expected to run 2-100x more slowly in a single test run in the tracer. Thus the 10x slowdown for a single run.
Runs 2-4 are also a bit slow, probably because branch traces get added to some of the short loops, so they take longer to finish compiling.
The only solutions I see are:
- increase HOTLOOP and HOTEXIT. I found 200 to be a better value. Of course, this may hurt certain other benchmarks.
- Make the recorder and compiler faster. This is a long-term project, though. Running them in a thread would probably help a lot, too.
My raw times:
no-jit 5 4 4 4 4 3 3 3 2 2 3 2 3 2 2 3 2 3 2 2 (avg 2.9)
jit 29 19 7 4 1 2 5 2 1 1 1 1 1 1 1 1 1 1 1 1 (avg 4.1)
retuned jit(*) 8 4 8 13 5 2 2 2 2 2 2 2 3 1 1 2 1 1 3 1 (avg 3.3)
(*) HOTLOOP=200, HOTEXIT=200, MAXEXIT=205. This takes a 200ms hit on SunSpider, about 140 of which is on fannkuch, the rest spread out in crypto, etc.
Reporter | ||
Comment 21•16 years ago
|
||
Point taken vs HTML output instead of alert.
No matter how fast the recorder and compiler, there'll be some threshold of execution time below which their cost exceeds potential gain. Which I'm guessing is what your HOTLOOP/HOTEXIT thing is about. I suppose the trick is to be able to make a reasonable guess as to where the cost/benefit threshold is. Which I'm guessing is your point about other benchmarks where tracing from the beginning is optimal. There might be a third way forward to add to the two you've given, namely to be smarter about when to start tracing and compiling. Not that I'm saying you aren't smart about it already, just that you might get even smarter !
Did you find anything to explain the occasional very long times as noted in comments 7 and 8 ?
Aside: I ran your latest test case a few times - very nice - then did right click->view page source to see what you'd done - OK - then closed the view page source window and ran the test a few more times and the performance was way slower. Ctrl-R and it's back to normal. Repeat the whole cycle and the same thing happens. Is that another issue ?
Reporter | ||
Comment 22•16 years ago
|
||
FWIW I can copy and paste from an alert on windows. Suggest if you can't that should be a bug.
Reporter | ||
Comment 23•16 years ago
|
||
(In reply to comment #21)
> Aside: I ran your latest test case a few times - very nice - then did right
> click->view page source to see what you'd done - OK - then closed the view page
> source window and ran the test a few more times and the performance was way
> slower. Ctrl-R and it's back to normal. Repeat the whole cycle and the same
> thing happens. Is that another issue ?
Actually, View Page Source is nothing to do with it. It's just that sometimes you get consistently slower runs, look at these two examples. All I did in between was press ctrl-R.
Times: 82 60 25 17 15 13 13 13 13 13 32 18 14 15 14 14 14 14 14 15 14 14 14 14 14 14 62 14 14 14 14 14 14 14 14 13 13 14 14 14 14 14 14 13 14 14 13 14 14 13 14 14 15 14 13 13 14 14 14 14 14 14 14 14 14 14 14 14 15 14 14 13 14 14 14 14 13 13 13 13 13 13 14 14 14 14 14 13 13 14 16 15 15 15 15 14 15 16 16 16 65 33 21 17 16 17 15 15 14 16 15 12 13 13 13 13 14 13 13 13 13 13 14 15 13 13 13 13 13 13 13 13 14 13 13 13 13 13 13 14 13 13 13 14 13 12 13 13 14 14 13 14 13 13 14 13 14 14 12 7
Total: 2480 (avg 15.5)
Counts: cnt1=7, cnt2=109, cnt3=109, cnt4=170, cnt5=27, cnt6=107, cnt7=18, cnt8=56, cnt9=109, cnt10=2, cnt11=2, cnt12=2, cnt13=2, cnt14=1524, cnt15=0, cnt16=0
Times: 38 29 11 7 3 2 8 3 1 1 1 2 2 2 2 1 1 2 2 2 27 9 6 2 2 2 2 1 1 1 1 1 1 2 2 2 3 2 1 1 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 2 1 2 2 2 2 2 2 2 1 2 2 2 2 1 1 1 2 2 2 2 3 2 2 2 1 2 2 2 2 2 3 2 2 2 2 2 1 2 2 2 2 2 1 1 49 2 2 2 2 2 1 2 2 2 2 2 1 2 2 2 2 2 1 1 2 2 2 1 2 2 2 2 1 2 2 2 2 2 1
Total: 469 (avg 2.6055555555555556)
Counts: cnt1=7, cnt2=109, cnt3=109, cnt4=170, cnt5=27, cnt6=107, cnt7=18, cnt8=56, cnt9=109, cnt10=2, cnt11=2, cnt12=2, cnt13=2, cnt14=1524, cnt15=0, cnt16=0
Reporter | ||
Comment 24•16 years ago
|
||
...Here's an example where it seemed to spontaneously slow down. I just kept on clicking.
Times: 34 27 9 6 2 3 3 1 3 2 1 1 2 1 1 1 2 2 2 1 27 10 7 3 2 4 1 1 2 2 1 1 1 2 2 1 2 2 2 1 2 2 2 1 1 1 2 2 1 1 1 2 2 2 2 2 1 1 1 1 28 9 6 2 3 3 1 2 2 2 1 1 1 1 1 2 2 2 1 1 2 1 1 1 1 2 2 2 1 1 1 1 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 1 1 1 1 1 1 2 1 1 2 1 1 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 2 3 1 2 2 2 2 2 2 2 3 88 29 19 17 15 14 13 15 13 14 13 13 13 13 15 14 14 14 14 14 13 13 14 13 13 13 13 14 14 14 13 13 14 14 14 13 14 14 13 14 13 13 13 14 14 13 13 13 13 13 13 13 14 13 13 13 14 14 14 14 13 14 14 13 13 14 14 62 13 13 14 14 14 14 13 14 14 14 14 13 14 13 13 13 14 14 14 14 14 14 14 14 14 13 14 14 14 13 14 14 13 14 13 14 14 14 14 14 14 14 15 14 14 13 13 14 14 14 14 14 15 14 14 14 14 14 14 14 14 14 14 14 14 14 15 14 14 14 14 14 14 14 15 14 15 14 14 14 14 15 14 14 14 14 14 14 15 14 14 14 14 14 14 15 14 15 14 16 15 14 15 16 10 8 9 9 9 8 9 8 7 7 7 8 8 8 7 11 14 14 14 14 15 15 15 16 16 13 14 13 15 14 14 14 15 15 16 15 16 16 16 15 16 16 16 16 15 15 16 16 16 15 14 15 14 14 14 14 16 16 15 14 14 14 16 16 16 14 14 14
Total: 3846 (avg 9.615)
Counts: cnt1=7, cnt2=109, cnt3=109, cnt4=170, cnt5=27, cnt6=107, cnt7=18, cnt8=56, cnt9=109, cnt10=2, cnt11=2, cnt12=2, cnt13=2, cnt14=1524, cnt15=0, cnt16=0
Assignee | ||
Comment 25•16 years ago
|
||
I reran the shell test through Shark again, and the 10x number still roughly holds. Shark says that 85% of the time with -j is spent in TraceRecorder::monitorRecording and js_MonitorLoopEdge; merely speeding up tracing and recording isn't going to make this slowdown go away. Couple that with the lateness in the cycle, and I think we have to punt this. The difference between 1-2ms and 25ms is negligible overall. Speeding that up will have no perceptible impact on browser speed unless this test's code is repeated, and if it's repeated we should start to win as we can avoid interpretation in favor of running the previously-JITted code. It would be better to be faster, but schedule and the nature of the test counsel leaving further work on this for later.
Flags: blocking1.9.1+ → blocking1.9.1?
Updated•16 years ago
|
Flags: blocking1.9.1? → wanted1.9.2+
Reporter | ||
Comment 26•16 years ago
|
||
I find the somewhat unpredicatable nature of the slowdown more troubling than the slowdown itself. E.g. comment #19 suggests a slowdown of a factor of 4 even with longer run times but that's not what we see in comment #20. And the occasional big numbers noted in comment #7 and comment #8 remain unexplained, as do my observations of variable response times in comment #23 and comment #24.
Not disagreeing that this should be left until later, just making an observation.
Comment 27•14 years ago
|
||
Current JS shell numbers for the attached testcase:
Interp: 14.988 ms
-j: 12.501 ms
-m: 18.920 ms
-m -n: 23.949 ms
Looks like the tracejit actually got ahead of interp by a bit, but JM and JM+TI are still slower.
Blocks: 467263
Summary: TM: Slowdown by factor of up to 10 in this example → JM+TI: 2x slowdown of attached testcase compared to interpreter
![]() |
||
Comment 28•13 years ago
|
||
Using testcase attached by David Mandelin
Firefox 19 (Ion+JM+TI) (20121022)
Times: 24 8 2 6 3 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Total: 75 (avg 3.75)
Counts: cnt1=7, cnt2=109, cnt3=109, cnt4=170, cnt5=27, cnt6=107, cnt7=18, cnt8=56, cnt9=109, cnt10=2, cnt11=2, cnt12=2, cnt13=2, cnt14=1524, cnt15=0, cnt16=0
Firefox 19 (Interpreter) (20121022)
Times: 9 8 8 8 8 7 6 7 6 6 6 6 7 6 6 6 7 6 7 6
Total: 136 (avg 6.8)
Counts: cnt1=7, cnt2=109, cnt3=109, cnt4=170, cnt5=27, cnt6=107, cnt7=18, cnt8=56, cnt9=109, cnt10=2, cnt11=2, cnt12=2, cnt13=2, cnt14=1524, cnt15=0, cnt16=0
Firefox 17 (JM+TI)
Times: 23 8 7 2 1 1 1 1 1 1 1 1 5 1 1 1 1 1 1 1
Total: 60 (avg 3)
Counts: cnt1=7, cnt2=109, cnt3=109, cnt4=170, cnt5=27, cnt6=107, cnt7=18, cnt8=56, cnt9=109, cnt10=2, cnt11=2, cnt12=2, cnt13=2, cnt14=1524, cnt15=0, cnt16=0
Chrome 22
Times: 0 16 0 0 0 0 0 0 0 16 0 0 0 0 0 0 0 0 0 0
Total: 32 (avg 1.6)
Counts: cnt1=7, cnt2=109, cnt3=109, cnt4=170, cnt5=27, cnt6=107, cnt7=18, cnt8=56, cnt9=109, cnt10=2, cnt11=2, cnt12=2, cnt13=2, cnt14=1524, cnt15=0, cnt16=0
Reporter | ||
Comment 29•13 years ago
|
||
Taken at face value, Fx 19 is slower than Fx 17 but the question is whether that is repeatable, give the variation we've seen in the numbers from one test run to another.
![]() |
||
Comment 30•13 years ago
|
||
I always get something around 60 on Fx17 and 70 on Fx19. When I disable Ion in about:config, Fx19 gets similar numbers to Fx17.
Comment 31•12 years ago
|
||
Mmh, this is kinda hard to judge. Apparently, all browsers made quite a jump here, causing many runs to result in 0 or 1.
Still, numbers:
Firefox 25 (2013-07-18 Nightly)
Times: 4 3 1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 0 1 0
Total: 20 (avg 1)
Counts: cnt1=7, cnt2=109, cnt3=109, cnt4=170, cnt5=27, cnt6=107, cnt7=18, cnt8=56, cnt9=109, cnt10=2, cnt11=2, cnt12=2, cnt13=2, cnt14=1524, cnt15=0, cnt16=0
Chrome 28:
Times: 1 1 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0
Total: 8 (avg 0.4)
Counts: cnt1=7, cnt2=109, cnt3=109, cnt4=170, cnt5=27, cnt6=107, cnt7=18, cnt8=56, cnt9=109, cnt10=2, cnt11=2, cnt12=2, cnt13=2, cnt14=1524, cnt15=0, cnt16=0
Safari 6.0.5:
Times: 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0
Total: 5 (avg 0.25)
Counts: cnt1=7, cnt2=109, cnt3=109, cnt4=170, cnt5=27, cnt6=107, cnt7=18, cnt8=56, cnt9=109, cnt10=2, cnt11=2, cnt12=2, cnt13=2, cnt14=1524, cnt15=0, cnt16=0
So, it looks like we take more time to warm up, and even then, our times *seem* to be slightly higher, judging by the more common occurrence of ones in the results.
Also, it *looks* like we might be faster in baseline than in ion: the shell test always prints 2 or 3 without args or with --no-baseline, and always 1 with --no-ion. Of course, that could just be an artifact of baseline compilation being faster and ion not helping much with a very short-running test like this.
Comment 32•12 years ago
|
||
Forgot to update the subject to something slightly more fitting, if very nebulous.
Summary: JM+TI: 2x slowdown of attached testcase compared to interpreter → IonMonkey: Maybe slower than baseline, probably slower than v8 and jsc on attached testcase
Comment 33•7 years ago
|
||
When repeating the loop in |test()| 1000 times, I get the following results.
andre@VBdev:~$ mozjs /tmp/t.js
Time=796
andre@VBdev:~$ mozjs --no-ion /tmp/t.js
Time=1078
andre@VBdev:~$ mozjs --no-baseline /tmp/t.js
Time=2262
Therefore resolving as WFM.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•