Closed Bug 497458 Opened 10 years ago Closed 9 years ago

[meta]Performance on pixastic desaturate is slower than Safari 4

Categories

(Core :: JavaScript Engine, defect)

1.8 Branch
x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: bzbarsky, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: meta)

The url field points to an existing attachment that has this test.  The test is currently also available at <http://www.galbraiths.org/benchmarks/pixastic.html>.

On my machine we're almost 2x slower than Safari 4 on this test.

Bug 495499 will help a good bit.  Once that's done, the profile looks more or less like this:

55% of the time is spent on trace.  This breaks down as:
  34% actual jitted code
  9% under js_BoxDouble
  9% js_Array_dense_setelem
  3% js_UnboxDouble

20% PutImageData, breaking down as:
  5% self (at least half converting to premultiplied alpha)
 13% js_CoerceArrayToCanvasImageData (better than the 35% without bug 495499).
  2% cairo_fill_preserve

19% GetImageData breaking down as:
  5.5% self (at least half converting to non-premultiplied alpha)
  7% JS_NewArrayObject (mostly under memcpy)
  6% JS_NewObject (which triggered gc, which is where all the time is)
  0.5% cairo_paint_with_alpha  

2% DrawImage
3% Setting the height of the canvas and the resulting offscreen surface
   creation.

There are XXX comments about using SSE for the premultiplied conversions in the canvas code.  Using type-specialized arrays (bug 486356) might help with all the boxing/unboxing of doubles, and for that matter with speeding up js_CoerceArrayToCanvasImageData.  Shark also suggests we should -ftree-vectorize or use ICC on that loop, since it's doing the same thing to a bunch of doubles in a row...  Might not be a bad idea.

The bug about not boxing double props in general (I seem to recall we were thinking about that) might help too, but I don't have the bug# offhand.

js_Array_dense_setelem is almost entirely spent in the function prologue and epilogue, looks like.  Which makes sense: the array is already of the right size; all it needs on all those sets is to stick the value in dslots (and do the various checks in setelem).  If we really want, we could come up with some sort of specialized type for this sort of situation, I suppose...  Might not be worth it.

Do we have a bug on being able to have the array take ownership of the jsval* it's initialized with instead of copying?  That would save 5% off the bat on this testcase, and should be pretty easy with some more API...  We'd just need to make sure to JS_malloc or whatever the vector in canvas code.
Keywords: meta
js_Array_dense_setelem would benefit from bug 497173, which seems to me to be the "right" way to fix that portion of the slowness.
Depends on: 497173
Depends on: 497794
Filed bug 497794 on maybe having a way to not copy the data around when creatingthe array.  Note that on this testcase that's a 65MB (yes, 65e6 bytes) memcpy.
Depends on: 497795
Nowadays this traces very well and Minefield beats all others:

JM+TM: 380 ms
Safari 5: 470 ms
Chrome 8: 970 ms
Opera 10.63: 1040 ms
JM: 1190 ms

JM spends at least 10% in GetElem/SetElem stubs, that's bug 594247.
Depends on: 594247
Yeah, typed array goodness made us blow everyone else out of the water here.

I'd be just as happy to just resolve this, or wait till we get bug 594247 fixed if you prefer.
@bz: this does not trace anymore with jit profiling enabled. Once bug 594247 is fixed this may still be as fast as chrome/opera though..
Depends on: 580468
> bz: this does not trace anymore with jit profiling enabled.

That's a problem, no?  Filed bug 606483. Generally, going from "3x faster" to "maybe we'll be as fast, not sure" is ... bad.  ;)
(In reply to comment #6)
> > bz: this does not trace anymore with jit profiling enabled.
> 
> That's a problem, no?

True; thanks for filing the follow up bug.
(In reply to comment #4)
> I'd be just as happy to just resolve this, or wait till we get bug 594247 fixed
> if you prefer.

The profiling issue is fixed and typed arrays helped JM a lot. Some numbers for the attachment:

JM+TM+prof: 261 ms
Safari 5: 510 ms
Chrome 10: 1362 ms
JM: 466 ms
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 467263
You need to log in before you can comment on or make changes to this bug.