Closed Bug 523298 Opened 15 years ago Closed 1 year ago

Much slower than Chrome in demo due to temporary surfaces in background image painting

Categories

(Core :: Graphics, defect)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: sicking, Unassigned)

References

()

Details

(Whiteboard: chromeexperiments)

Attachments

(1 file)

On the following demo

http://mrdoob.com/projects/chromeexperiments/depth_of_field/

we are getting badly beaten by chrome. Not actually sure if this is due to javascript issues, or canvas issues. Guessing canvas for now, but we really need profiles.

However I think that drawImage currently makes us always fall off trace because of the way that the optional arguments work. Bug 459452 might fix that.
Actually, this isn't canvas at all since it's not using canvas. Seems like this is simply a load of absolutely positioned elements combined with CSS-sprite and transforms (to do scaling).

So also not bug 459452 as there is no drawImage calls.
Component: Graphics → Layout
No longer depends on: 459452
QA Contact: thebes → layout
Summary: Much slower that chrome in canvas-heavy demo → Much slower that chrome in demo
Blocks: chromex
Basic breakdown from a shark profile:

52% under background painting (and yes, transforms are involved).
20% (!) vm_fault

If I exclude the supervisor callstacks, painting is at 80%, 10% is the usual AppKit/HiToolBox event mess, 7% is under js_Interpret.  We don't seem to jit this, in fact, but given the painting that's not exactly the bottleneck.  Might still want a bug on that, of course.
Summary: Much slower that chrome in demo → Much slower that chrome in demo due to painting being slow
Hmm.  So I just did a malloc trace in shark.  Over 10s, the testcase allocated about 11MB.  2MB of that was painting; the rest was JS.
The painting allocations are from ripl_Create called from ripc_GetColor, called from ripc_Render, called from ripc_DrawRects, called from CGContextFillRects, called from CGContextFillRect,called from CGContextDrawTiledImage, called from _cairo_quartz_surface_paint, called via some cairo stuff from imgFrame::Draw.

The JS allocations are from js_NewStringFromCharBuffer (3.5 MB), js_ValueToString on numbers (2.5MB), js_ConcatStrings (1.6MB).  Also about 500KB under quickstub conversions to string, 140k under ExecuteTree (running regexps).  88KB under js_GetMutableScope (JSScope::create), 70KB under js_SetProperty (JSScope::changeTable).  Other allocations are pretty small all around.
The painting code spends 39% of total testcase time in (not under) sseCGSBlendXXXX8888 in the CoreGraphics library; this is under CGContextDrawTiledImage called from moz_cairo_paint_with_alpha called from imgFrame::Draw.  The moz_cairo_fill_preserve call Draw() does is the other 35+% there; this has no single chokepoint (thought resample_band in CoreGraphics is a lot of it).
(In reply to comment #4)
> The painting allocations are from ripl_Create called from ripc_GetColor, called
> from ripc_Render, called from ripc_DrawRects, called from CGContextFillRects,
> called from CGContextFillRect,called from CGContextDrawTiledImage, called from
> _cairo_quartz_surface_paint, called via some cairo stuff from imgFrame::Draw.

Instruments showed 95% of allocations coming from here.
Attached image Instruments screenshot
The Safari GFX call stacks look pretty different:

One of the big differences is that we use CGContextDrawTiledImage whereas Safari uses CGContextDrawImage. Also interesting is that sseCGSBlendXXXX8888 doesn't even show up in Safari.
The testcase is using background-position and transforms, so I suspect image drawing is going through the path where we create a temporary image which is the piece of the image we need to render, so we can EXTEND_PAD it and not sample the wrong pixels.

We could test that hypothesis by replacing !subimage.Contains(imageRect) with PR_FALSE here:
http://mxr.mozilla.org/mozilla-central/source/modules/libpr0n/src/imgFrame.cpp#535
If that is a big performance issue, then we probably need to cache extracted subimages; exactly how we store them depends on how to maximize performance of the native APIs cairo is using. Or alternatively we could bite the bullet and move forward with adding some kind of subimage API to cairo.
(In reply to comment #6)
> 
> Instruments showed 95% of allocations coming from here.

Shark malloc trace had something similar to say with "Record Only Active Blocks" unchecked.
Filed bug 523452 on the JSeng part here.
Depends on: 523452
> with "Record Only Active Blocks" unchecked.

Oh, without that it only records the blocks that are still alive after the profile ends, it seems?  Bah!
(In reply to comment #9)
> The testcase is using background-position and transforms, so I suspect image
> drawing is going through the path where we create a temporary image which is
> the piece of the image we need to render, so we can EXTEND_PAD it and not
> sample the wrong pixels.

Yeah, that's my guess as well.
I tried the suggestion in comment 9 paragraph 2.  That dropped the paint time from 80% to 53% with supervisor callstacks hidden, and dropped vm_fault from 20% to 2%.  The animation also looks much smoother.  There seems to be no more moz_cairo_paint_with_alpha under imgFrame::Draw.
WebKit is drawing the images as background tiles at 1:1 scale, then somehow scaling down the rendered image according to the transform. (They also clip the destination image when they're drawing sprites, rather than use source clipping.)
FWIW,

1. Open http://mrdoob.com/projects/chromeexperiments/depth_of_field/ with Opera...
2. ???
3. Profit!

It's fast!
(In reply to comment #14)
> I tried the suggestion in comment 9 paragraph 2.  That dropped the paint time
> from 80% to 53%

Or put another way made the testcase about 2.5 times faster...  Seems like the tail end of comment 9 might be worth looking into.
Summary: Much slower that chrome in demo due to painting being slow → Much slower than Chrome in demo due to temporary surfaces in background image painting
Whiteboard: chromeexperiments
Still much faster in Chrome than Firefox trunk. Testing on Win7 w/ D2D enabled.
Component: Layout → Graphics
Much smoother in IE10 , compared to Chrome.
Severity: normal → S3

This demo performs great now.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: