599616 - Chrome Experiment: Depth of Field runs slower in Minefield than Chromium

Reporter

Description

•

15 years ago

User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:2.0b7pre) Gecko/20100916 Firefox/4.0b7pre Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10 There is a significant performance issue with Minefield: Ball movements are very slow. Changing the angle with the mouse is unresponsive. Moving the window around will stall computer for a few seconds. Reproducible: Always Steps to Reproduce: 1. Open Depth of Field experiment 2. Try changing the view angle 3. Try moving the window around Actual Results: 1. Balls are very slow 2. Unresponsive to angle change 3. Computer stalls for a few seconds Expected Results: Smooth ball movements, responsive to angle change, and computer should not stall/freeze when moving the window around. In Chromium this experiement works perfectly.

Tyler Downer [He/Him]

Comment 1

•

15 years ago

Does this happen with Firefox trunk? JS has been majorly improved there. ftp://ftp.mozilla.org/pub/firefox/nightly/latest-mozilla-central/

Version: unspecified → 1.9.2 Branch

Mike Shaver (:shaver emeritus)

Comment 2

•

15 years ago

I *think* Ken was using the sept 16 build, from the User-Agent indicator, but the Build Identifier line is a little confusing in that context, it's true. I certainly see it with Firefox trunk as of yesterday's nightly. Tyler: do you not see it with your builds?

Tyler Downer [He/Him]

Comment 3

•

15 years ago

I am seeing variable performance. Some spots, such as the end, run at full speed, others don't run very well. However, I has several tabs and windows open, and this is not a fresh profile, so it's not the best platform for testing ATM

Jan de Mooij [:jandem]

Comment 4

•

15 years ago

Shark shows that a large part of this is painting. 49.1% under gfxSurfaceDrawable::Draw.15% of total time in sseCGSBlendXXXX8888. Can somebody verify this?

David Humphrey (:humph)

Updated

•

15 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Whiteboard: [chromeexperiments]

Jan de Mooij [:jandem]

Comment 5

•

15 years ago

Moving to Core:Graphics, as painting seems to take at least 60% on OS X. This demo runs much faster (smoother) in both Chrome and Safari.

Assignee: general → nobody

Component: JavaScript Engine → Graphics

QA Contact: general → thebes

Whiteboard: [chromeexperiments] → [painting-perf][chromeexperiments]

Jan de Mooij [:jandem]

Comment 6

•

15 years ago

On OS X this is also a bit jerky in Chrome, but in Safari it's very fast. I wonder how this performs on Direct2D/IE9...

Bas Schouten (:bas.schouten)

Comment 7

•

15 years ago

Performs poorly with D3D9 layers. It performs reasonable using D2D in Firefox, but has some artifacts. It performs better in IE9. I'll investigate this.

Bas Schouten (:bas.schouten)

Comment 8

•

15 years ago

I might note IE9 does not oscillate the ball sizes, so it's not completely correct.

Bas Schouten (:bas.schouten)

Comment 9

•

15 years ago

With Direct2D with D3D10, this performs 'alright' but not smooth! About 50-60% of the time is spent in JS, particularly in js_NativeSet(some 20%). Some 20% in total is spent inside nsCSSFrameConstructor::RestyleElement, of which the majority in ComputeStyleChangeFor. Some 30% is spent in painting, over half of which is spent in BuildContainerLayerFor.

Boris Zbarsky [:bzbarsky]

Comment 10

•

15 years ago

> particularly in js_NativeSet(some 20%). In or under? Over here I see time under it, in particular under nsDOMCSSDeclaration::ParsePropertyValue. That would be consistent with the page, say, setting style.top and style.left on DOM nodes. No obvious hotspots under here other than the silly before/after attr change notifications we have to send here and the actual parsing of the number, as usual. Given that, the style reresolution you're seeing also makes sense.... The remaining "JS" time seems to be stubs::add (usual thing of doing |n + "px"| which involves number to string conversions), stubs::Mul (calling js::ValueToNumberSlow which calls js_strtod), getting .style off nodes, and some stuff like floor() and the like. For me, (mac, testing without GL layers on for various reasons) painting is over 60% of the total time.

Boris Zbarsky [:bzbarsky]

Comment 11

•

15 years ago

Note that that inline style stuff is something we've been optimizing for a while now. Webkit can do some of it faster because it just does it wrong in edg-y cases (see "attr change notifications" above).

Bas Schouten (:bas.schouten)

Comment 12

•

15 years ago

(In reply to comment #10) > > particularly in js_NativeSet(some 20%). > > In or under? Over here I see time under it, in particular under > nsDOMCSSDeclaration::ParsePropertyValue. That would be consistent with the > page, say, setting style.top and style.left on DOM nodes. No obvious hotspots > under here other than the silly before/after attr change notifications we have > to send here and the actual parsing of the number, as usual. > > Given that, the style reresolution you're seeing also makes sense.... > > The remaining "JS" time seems to be stubs::add (usual thing of doing |n + "px"| > which involves number to string conversions), stubs::Mul (calling > js::ValueToNumberSlow which calls js_strtod), getting .style off nodes, and > some stuff like floor() and the like. > > For me, (mac, testing without GL layers on for various reasons) painting is > over 60% of the total time. What you're seeing seems to agree largely with what I'm seeing. The reason for the higher painting weight in your case is probably just D2D.

Dão Gottwald [:dao]

Updated

•

14 years ago

Keywords: perf

Bas Schouten (:bas.schouten)

Comment 13

•

14 years ago

I profiled this again today, it seems that in current D2D builds this is pretty slow, we're spending a relatively high amount of time creating intermediate surfaces(and so textures) in something called 'CreateSamplingRestrictedDrawable' (gfxUtils.cpp:394). This probably needs some looking into if there's something better we can do there. That codepath is quite bad for any accelerated system.

Bas Schouten (:bas.schouten)

Comment 14

•

14 years ago

If I comment out that codepath (using 'incorrect' sampling) I get much better perf, but it's still worse than IE9 or Chromium.

Bas Schouten (:bas.schouten)

Comment 15

•

14 years ago

The other observation here is that we spend a lot of time creating new textures and copying stuff around, because rather than a ball being retained as a layer, it seems to get redrawn all the time (more often than I believe needed due to different sampling sizes), and we're getting different texture rects every frame causing texture recreation and copying around lots of data. See ThebesLayerD3D10.cpp:222, which is the path being hit.

Matt Woodrow (:mattwoodrow)

Comment 16

•

14 years ago

This demo is using a single images with multiple pictures of the ball, each with varying levels of focus. It's then setting the style.background to this image and using style.backgroundPosition and style.width/height to only show one of the ball images. I think if we could get this to result in an ImageLayer, with the image only being a single texture, and just drawn with different texture coordinates, we would get a massive performance improvement. FrameLayerBuilder.cpp:ContainerState::ThebesLayerData::Accumulate and FrameLayerBuilder.cpp:ContainerState::ThebesLayerData::CanOptimizeImageLayer handle doing something similar to this for nsDisplayImage, and we'd need to extend this to also handle nsDisplayBackground. This could also probably be implemented by letting nsDisplayBackground directly create it's own layer, but I'm not sure how to detect that it is the only item in the ThebesLayer.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 17

•

14 years ago

We could extend the nsDisplayImage code to handle other kinds of images. Or we could create an nsDisplayImage (or an extended version thereof) directly instead of nsDisplayBackground in certain cases. However, we'll run into the problem again that we need to restrict sampling of the source image to a particular rectangle and we don't have a good way to do that in layers :-(.

Matt Woodrow (:mattwoodrow)

Comment 18

•

14 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #17) > However, we'll run into the problem again that we need to restrict sampling > of the source image to a particular rectangle and we don't have a good way > to do that in layers :-(. What's the complication here? As far as I know modifying ImageLayer::SetCurrentImage to include a source rectangle parameter should be easy. Unless I'm missing something, we already do this for PlanerYCbCrImages, where we upload the entire frame and only sample from the picture rect.

Matt Woodrow (:mattwoodrow)

Comment 19

•

14 years ago

*Planar Never mind, forgot about resamping entirely :)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 20

•

14 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #18) > Unless I'm missing something, we already do this for PlanerYCbCrImages, > where we upload the entire frame and only sample from the picture rect. The difference is that with video images, we can assume that the pixels outside the sample rect are "pretty close" to the pixels inside the sample rect. With arbitrary CSS background images, we can't.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 21

•

14 years ago

We could do something hairy like add a source rectangle parameter to SetCurrentImage and keep a cache of extracted subrectangles, by suballocating areas of a single large texture with thick EXTEND_PAD borders. Or something like that. Or, Bas could think of some magical way to make the GPU handle this for us :-).

BMO Automation

Updated

•

3 years ago

Severity: normal → S3