Open Bug 599616 Opened 14 years ago Updated 2 years ago

Chrome Experiment: Depth of Field runs slower in Minefield than Chromium

Categories

(Core :: Graphics, defect)

1.9.2 Branch
x86
Windows 7
defect

Tracking

()

People

(Reporter: kpangilinan, Unassigned)

References

()

Details

(Keywords: perf, Whiteboard: [painting-perf][chromeexperiments] )

User-Agent:       Mozilla/5.0 (Windows NT 6.1; rv:2.0b7pre) Gecko/20100916 Firefox/4.0b7pre
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10 

There is a significant performance issue with Minefield: Ball movements are very slow. Changing the angle with the mouse is unresponsive. Moving the window around will stall computer for a few seconds.

Reproducible: Always

Steps to Reproduce:
1. Open Depth of Field experiment
2. Try changing the view angle
3. Try moving the window around
Actual Results:  
1. Balls are very slow
2. Unresponsive to angle change
3. Computer stalls for a few seconds

Expected Results:  
Smooth ball movements, responsive to angle change, and computer should not stall/freeze when moving the window around.

In Chromium this experiement works perfectly.
Does this happen with Firefox trunk? JS has been majorly improved there. ftp://ftp.mozilla.org/pub/firefox/nightly/latest-mozilla-central/
Version: unspecified → 1.9.2 Branch
I *think* Ken was using the sept 16 build, from the User-Agent indicator, but the Build Identifier line is a little confusing in that context, it's true.

I certainly see it with Firefox trunk as of yesterday's nightly.  Tyler: do you not see it with your builds?
I am seeing variable performance. Some spots, such as the end, run at full speed, others don't run very well. However, I has several tabs and windows open, and this is not a fresh profile, so it's not the best platform for testing ATM
Shark shows that a large part of this is painting. 49.1% under gfxSurfaceDrawable::Draw.15% of total time in sseCGSBlendXXXX8888. Can somebody verify this?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Whiteboard: [chromeexperiments]
Moving to Core:Graphics, as painting seems to take at least 60% on OS X. This demo runs much faster (smoother) in both Chrome and Safari.
Assignee: general → nobody
Component: JavaScript Engine → Graphics
QA Contact: general → thebes
Whiteboard: [chromeexperiments] → [painting-perf][chromeexperiments]
On OS X this is also a bit jerky in Chrome, but in Safari it's very fast. I wonder how this performs on Direct2D/IE9...
Performs poorly with D3D9 layers. It performs reasonable using D2D in Firefox, but has some artifacts. It performs better in IE9. I'll investigate this.
I might note IE9 does not oscillate the ball sizes, so it's not completely correct.
With Direct2D with D3D10, this performs 'alright' but not smooth! About 50-60% of the time is spent in JS, particularly in js_NativeSet(some 20%).

Some 20% in total is spent inside nsCSSFrameConstructor::RestyleElement, of which the majority in ComputeStyleChangeFor.

Some 30% is spent in painting, over half of which is spent in BuildContainerLayerFor.
> particularly in js_NativeSet(some 20%).

In or under?  Over here I see time under it, in particular under nsDOMCSSDeclaration::ParsePropertyValue.  That would be consistent with the page, say, setting style.top and style.left on DOM nodes.  No obvious hotspots under here other than the silly before/after attr change notifications we have to send here and the actual parsing of the number, as usual.

Given that, the style reresolution you're seeing also makes sense....

The remaining "JS" time seems to be stubs::add (usual thing of doing |n + "px"| which involves number to string conversions), stubs::Mul (calling js::ValueToNumberSlow which calls js_strtod), getting .style off nodes, and some stuff like floor() and the like.

For me, (mac, testing without GL layers on for various reasons) painting is over 60% of the total time.
Note that that inline style stuff is something we've been optimizing for a while now.  Webkit can do some of it faster because it just does it wrong in edg-y cases (see "attr change notifications" above).
(In reply to comment #10)
> > particularly in js_NativeSet(some 20%).
> 
> In or under?  Over here I see time under it, in particular under
> nsDOMCSSDeclaration::ParsePropertyValue.  That would be consistent with the
> page, say, setting style.top and style.left on DOM nodes.  No obvious hotspots
> under here other than the silly before/after attr change notifications we have
> to send here and the actual parsing of the number, as usual.
> 
> Given that, the style reresolution you're seeing also makes sense....
> 
> The remaining "JS" time seems to be stubs::add (usual thing of doing |n + "px"|
> which involves number to string conversions), stubs::Mul (calling
> js::ValueToNumberSlow which calls js_strtod), getting .style off nodes, and
> some stuff like floor() and the like.
> 
> For me, (mac, testing without GL layers on for various reasons) painting is
> over 60% of the total time.

What you're seeing seems to agree largely with what I'm seeing. The reason for the higher painting weight in your case is probably just D2D.
Keywords: perf
I profiled this again today, it seems that in current D2D builds this is pretty slow, we're spending a relatively high amount of time creating intermediate surfaces(and so textures) in something called 'CreateSamplingRestrictedDrawable' (gfxUtils.cpp:394).

This probably needs some looking into if there's something better we can do there. That codepath is quite bad for any accelerated system.
If I comment out that codepath (using 'incorrect' sampling) I get much better perf, but it's still worse than IE9 or Chromium.
The other observation here is that we spend a lot of time creating new textures and copying stuff around, because rather than a ball being retained as a layer, it seems to get redrawn all the time (more often than I believe needed due to different sampling sizes), and we're getting different texture rects every frame causing texture recreation and copying around lots of data.

See ThebesLayerD3D10.cpp:222, which is the path being hit.
This demo is using a single images with multiple pictures of the ball, each with varying levels of focus.

It's then setting the style.background to this image and using style.backgroundPosition and style.width/height to only show one of the ball images.

I think if we could get this to result in an ImageLayer, with the image only being a single texture, and just drawn with different texture coordinates, we would get a massive performance improvement.

FrameLayerBuilder.cpp:ContainerState::ThebesLayerData::Accumulate and FrameLayerBuilder.cpp:ContainerState::ThebesLayerData::CanOptimizeImageLayer handle doing something similar to this for nsDisplayImage, and we'd need to extend this to also handle nsDisplayBackground.

This could also probably be implemented by letting nsDisplayBackground directly create it's own layer, but I'm not sure how to detect that it is the only item in the ThebesLayer.
We could extend the nsDisplayImage code to handle other kinds of images. Or we could create an nsDisplayImage (or an extended version thereof) directly instead of nsDisplayBackground in certain cases.

However, we'll run into the problem again that we need to restrict sampling of the source image to a particular rectangle and we don't have a good way to do that in layers :-(.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #17)
> However, we'll run into the problem again that we need to restrict sampling
> of the source image to a particular rectangle and we don't have a good way
> to do that in layers :-(.

What's the complication here? As far as I know modifying ImageLayer::SetCurrentImage to include a source rectangle parameter should be easy.

Unless I'm missing something, we already do this for PlanerYCbCrImages, where we upload the entire frame and only sample from the picture rect.
*Planar

Never mind, forgot about resamping entirely :)
(In reply to Matt Woodrow (:mattwoodrow) from comment #18)
> Unless I'm missing something, we already do this for PlanerYCbCrImages,
> where we upload the entire frame and only sample from the picture rect.

The difference is that with video images, we can assume that the pixels outside the sample rect are "pretty close" to the pixels inside the sample rect. With arbitrary CSS background images, we can't.
We could do something hairy like add a source rectangle parameter to SetCurrentImage and keep a cache of extracted subrectangles, by suballocating areas of a single large texture with thick EXTEND_PAD borders. Or something like that.

Or, Bas could think of some magical way to make the GPU handle this for us :-).
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.