The IE9 fish demo runs faster with Web JIT disabled.

RESOLVED WORKSFORME

Status

()

defect
P1
major
RESOLVED WORKSFORME
9 years ago
4 years ago

People

(Reporter: streetwolf, Assigned: dvander)

Tracking

(Blocks 1 bug)

Trunk
x86_64
Windows 7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 final+)

Details

(Whiteboard: [painting-perf], ietestdrive, )

Reporter

Description

9 years ago
User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b4pre) Gecko/20100810 Minefield/4.0b4pre
Build Identifier: 20100811040838

The IE9 fish demo runs about 25% faster (according to FPS) when I disable web JIT in Prefs.  

Reproducible: Always




I originally filed this as bug report https://bugzilla.mozilla.org/show_bug.cgi?id=586259.
Reporter

Updated

9 years ago
Version: unspecified → Trunk
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter

Updated

9 years ago
Severity: normal → major
Priority: -- → P1
Reporter

Comment 1

9 years ago
The prefs option in question is javascript.options.jit.content
1-250 - 60fps with JIT on and D2D on (not tested with JIT off)

500 - 53fps wih JIT on and D2D on
500 - 58/59fps with JIT off and D2D on

1000 - 26/27fps with JIT on and D2D on
1000 - 31/32fps with JIT off and D2D on

So I am seeing a slowdown with JIT on compared with JIT off.
Reporter

Comment 3

9 years ago
My results running IE9 Fish Demo at a resolution of 1920x1075.

JIT = on
FISH  FPS
----  -----
500   28-30
1000  19-21

JIT = off
FISH  FPS
----  -----
500   39-41
1000  25-27

Updated

9 years ago
Assignee: general → dvander

Comment 4

9 years ago
we'll have dvander tune for this on JM.
Has anyone looked at jit stats? We need to file and fix TM bugs even if JM is the cure for this bug, if there are easy-to-fix TM bugs.

/be
Hmm.  On Mac this runs like crap no matter what I do with jit (1fps for any number of fish greater than one, and about 8fps for 1 fish).  All the time is iunder img_data_lock under canvas drawImage (which calls img_interpolate_read, which calls resample_band, which calls either resample_byte_v_Nccp_af or resample_byte_h_4cpp_vector, and those resamples are where the time is spent).  roc, jeff, any idea what's going on there?  Happy to file a separate bug for it.

Still looking into the trace aborts.
OK, so for the trace stuff, with 20 fish and running for a minute or so, methodjit disabled, tracejit enabled, on TM build from this morning or so:

recorder: started(58), aborted(36), completed(35), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(11), breaks(3), returns(1), merged loop exits(3), unstableInnerCalls(5), blacklisted(5)
monitor: exits(155), timeouts(0), type mismatch(0), triggered(155), global mismatch(5), flushed(5)

That really doesn't look terrible all things considered.  Only 25 of those aborts are actually on the test page:

   1 Abort recording of tree http://ie.microsoft.com/testdrive/Performance/FishIE%20tank/Default.html:117@32 at http://ie.microsoft.com/testdrive/Performance/FishIE%20tank/Default.html:219@48: DEFLOCALFUN for closure.
   1 Abort recording of tree http://ie.microsoft.com/testdrive/includes/script/s_code_ie9td.js:311@207 at http://ie.microsoft.com/testdrive/includes/script/s_code_ie9td.js:307@26: getargprop.
   1 Abort recording of tree http://ie.microsoft.com/testdrive/includes/script/s_code_ie9td.js:320@1026 at http://ie.microsoft.com/testdrive/includes/script/s_code_ie9td.js:320@1099: length.
   1 Abort recording of tree http://ie.microsoft.com/testdrive/includes/script/s_code_ie9td.js:320@169 at http://ie.microsoft.com/testdrive/includes/script/s_code_ie9td.js:320@19: No compatible inner tree.
   1 Abort recording of tree http://ie.microsoft.com/testdrive/includes/script/s_code_ie9td.js:320@169 at http://ie.microsoft.com/testdrive/includes/script/s_code_ie9td.js:320@47: Inner tree is trying to grow, abort outer recording.
   1 Abort recording of tree http://ie.microsoft.com/testdrive/includes/script/s_code_ie9td.js:7@24 at http://ie.microsoft.com/testdrive/includes/script/s_code_ie9td.js:8@35: getprop.
   3 Abort recording of tree http://ie.microsoft.com/testdrive/includes/script/s_code_ie9td.js:320@169 at http://ie.microsoft.com/testdrive/includes/script/s_code_ie9td.js:307@26: getargprop.
   4 Abort recording of tree http://ie.microsoft.com/testdrive/includes/script/fpsometer.js:393@35 at http://ie.microsoft.com/testdrive/includes/script/fpsometer.js:396@114: length.
   6 Abort recording of tree http://ie.microsoft.com/testdrive/Performance/FishIE%20tank/Default.html:180@136 at http://ie.microsoft.com/testdrive/Performance/FishIE%20tank/Default.html:225@70: getgname.
   6 Abort recording of tree http://ie.microsoft.com/testdrive/includes/script/fpsometer.js:396@58 at http://ie.microsoft.com/testdrive/includes/script/fpsometer.js:396@114: length.

The length aborts are preceded by:

  trace stopped: 12287: cannot trace script getter for this opcode

The opcode is, of course, JSOP_LENGTH, with this stack:

#0  js::TraceRecorder::getPropertyWithScriptGetter (this=0x1779a00, obj=0x201db000, obj_ins=0x17b3260, shape=0x17a65c8) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/jstracer.cpp:12287
#1  0x007256b3 in js::TraceRecorder::propTail (this=0x1779a00, obj=0x201db000, obj_ins=0x17b3260, obj2=0x201daf30, pcval={v = 24798666}, slotp=0x0, v_insp=0x0, outp=0x15a9b220) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/jstracer.cpp:13567
#2  0x00725fd7 in js::TraceRecorder::prop (this=0x1779a00, obj=0x201db000, obj_ins=0x17b3260, slotp=0x0, v_insp=0x0, outp=0x15a9b220) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/jstracer.cpp:13540
#3  0x00726096 in js::TraceRecorder::getProp (this=0x1779a00, obj=0x201db000, obj_ins=0x17b3260) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/jstracer.cpp:13810
#4  0x007265ed in js::TraceRecorder::record_JSOP_LENGTH (this=0x1779a00) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/jstracer.cpp:15760

Seems like adding JSOP_LENGTH to the whitelist in getPropertyWithScriptGetter should be the right thing.

The getgname aborts are preceded by:

  trace stopped: 9428: hitting the global object via a prototype chain

When this happens, we're in guardPropertyCacheHit.  aobj == obj2 == globalObj and is an InnerWindow.  entry->vcapTag() is 16, so we bail out.  So scopeIndex() is 1.  At this point the page is trying to do Math.abs; I assume we lose on Math.

I have no idea yet how we end up with a nonzero scopeIndex for the global object.
(In reply to comment #6)
> Hmm.  On Mac this runs like crap no matter what I do with jit (1fps for any
> number of fish greater than one, and about 8fps for 1 fish).  All the time is
> iunder img_data_lock under canvas drawImage (which calls img_interpolate_read,
> which calls resample_band, which calls either resample_byte_v_Nccp_af or
> resample_byte_h_4cpp_vector, and those resamples are where the time is spent). 
> roc, jeff, any idea what's going on there?  Happy to file a separate bug for
> it.

I think basically FishIE is just blitting scaled images a lot. GPUs eat that up, but it's slow on CPUs. If we're significantly slower than Safari it might be worth looking filing a bug, otherwise we probably just need a better 2D graphics subsystem on Mac, rather than there being any specific bug we can fix. (We're normally fast on FishIE with D2D, and we're even pretty fast on it on my Linux box with a good XRender driver.)
> GPUs eat that up, but it's slow on CPUs.

I did try the GL-accelerated build, but no change there (not surprisingly).

> If we're significantly slower than Safari

We're at 1fps with terrible UI lag (e.g. typing in the url bar takes multiple seconds per character).  Safari is at 5fps, with almost no UI lag.  This is for 20 fish.  I'll get a bug filed when I have a bit more time.

And yeah, running my Linux build over remote X with the server on my mac gives me 50fps or so...  ;)
Hmm, IIRC Safari keeps a cache of scaled images at various sizes. That could be helping them here.

If that is it, I'm not sure I want to go down that route. It would feel like a temporary improvement that would no longer be necessary if we had something like cairo-gl to render into the canvas. (And using cairo-gl would actually be appealing for canvas, if it was fast, because integration with native themes and text drawing would be much less of an issue.)
Whiteboard: [painting-perf]
OK.  I assume cairo-gl is not in the cards for 2.0, right?
OK.  That just means that I can't really easily profile this thing on mac...

I wonder whether someone can put together a copy of this demo that rips all the painting out (so just skips the actual canvas ops, but leaves everything else in).  That would be pretty helpful for sorting out the JS end of this.
Is there a bug filed for cairo-gl? I can't seem to find one.

Updated

9 years ago
Whiteboard: [painting-perf] → [painting-perf], ietestdrive
A lot of tuning has happened in the past two months; I can't reproduce a problem here on either of my Windows machines. I get a solid 55FPS for 500 fish no matter what JIT configuration is used. With 1000 fish I get 33fps.

Starting in safe mode is around 3X slower, so that probably disabled some graphics stuff.

If you can still see a significant difference by toggling the JIT prefs in 4.0b7 or later, please reopen.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → WORKSFORME
Reporter

Comment 16

9 years ago
I've been getting 60fps with 1000 fish for quite some time now.
Issue is resolved - clearing old keywords - qa-wanted clean-up
Keywords: qawanted
You need to log in before you can comment on or make changes to this bug.