If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Animated Harmonograph Chrome Experiment runs slower in Minefield than in Chromium

NEW
Unassigned

Status

()

Core
Canvas: 2D
7 years ago
7 years ago

People

(Reporter: Steven, Unassigned)

Tracking

({perf})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [chromeexperiments] [painting-perf], URL)

Attachments

(1 attachment, 1 obsolete attachment)

195.52 KB, image/png
Details
(Reporter)

Description

7 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.13) Gecko/20100914 Firefox/3.5.13
Build Identifier: Mozilla/5.0 (Windows NT 6.1; rv:2.0b7pre) Gecko/20100921 Firefox/4.0b7pre

The image "rotation speed" is determined by how quickly the browser can render each still image. Chromium can render the harmonograph quicker, and thus the image spins quicker.

Reproducible: Always

Steps to Reproduce:
1. Visit http://www.chromeexperiments.com/detail/animated-harmonograph/
2. Observe the rotation speed of the harmonograph
Actual Results:  
Rotation of the image in Minefield is slower than in Chromium

Expected Results:  
Image rendering should be at similar speeds

Minefield can not catch Chromium with or without hardware acceleration
(Reporter)

Updated

7 years ago
Whiteboard: [chromeexperiments]
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Reporter)

Comment 1

7 years ago
Seems I made an oversight and didn't put enough information in. My graphics card information:

Adapter Description     NVIDIA GeForce 9300 GE
Vendor ID               10de
Device ID               06e0
Adapter RAM             256
Adapter Drivers         nvd3dum nvwgf2um,nvwgf2um
Driver Version          8.17.12.5896
Driver Date             7-9-2010
Direct2D Enabled        true
DirectWrite Enabled     true
GPU Accelerated Windows 1/1 Direct3D 9
(Reporter)

Comment 2

7 years ago
Created attachment 478179 [details]
CPU Profiling

If there's room for optimization in the _moz_cairo_matrix_transform_point function, it seems be a determining factor of speed in both this and another chrome experiment (Bug 598834)
I don't understand, sorry. What does "Weight" mean in this profiler screenshot? It is % of the total time spent in this function? If yes, why do you think that 0.70% is "a determining factor" ?

Adding these people in CC although they probably already read canvas bugs:
 * jmuizelaar --> CC him in every cairo-related bug
 * bas --> CC him in every Direct2D-related bug
(In reply to comment #2)
> Created attachment 478179 [details]
> CPU Profiling
> 
> If there's room for optimization in the _moz_cairo_matrix_transform_point
> function, it seems be a determining factor of speed in both this and another
> chrome experiment (Bug 598834)

I had a quick look at your profile.

You need to get xperf to show you the weight by stack (enable the stack column under 'Columns'). Right now you're just seeing that individual function taking a relatively high time (but less than 2.6% of the total firefox CPU time). It'd be interesting to see what functions in d2d1.dll get hit hard too (as 3 times as much time is spent in d2d1.dll as is in xul.dll). The most interesting clues as too why it is slow for you will probably be found there!

Perhaps all these fine lines make tessellation a bit complex. I should note on my machine there's no visible performance difference between chromium and Fx4 with D2D.
(In reply to comment #4)
> (In reply to comment #2)
> Perhaps all these fine lines make tessellation a bit complex. I should note on
> my machine there's no visible performance difference between chromium and Fx4
> with D2D.

Actually, it is a little faster in Chromium.
(Reporter)

Comment 6

7 years ago
(In reply to comment #3)
> I don't understand, sorry. What does "Weight" mean in this profiler screenshot?
> It is % of the total time spent in this function? If yes, why do you think that
> 0.70% is "a determining factor" ?
> 
> Adding these people in CC although they probably already read canvas bugs:
>  * jmuizelaar --> CC him in every cairo-related bug
>  * bas --> CC him in every Direct2D-related bug

I'm new to profiling, but my understanding is that in xperf "Weight" is an approximation of how many milliseconds of CPU time a given function was on the call stack. Though as bas has led me to realize, I may've been a little over-zealous in suggesting it. It just seems as though it's on the call stack the most of any function in xul.dll. Thanks for CC'ing bas and jmuizelaar on this bug!

(In reply to comment #4)
> (In reply to comment #2)
> > Created attachment 478179 [details] [details]
> > CPU Profiling
> > 
> > If there's room for optimization in the _moz_cairo_matrix_transform_point
> > function, it seems be a determining factor of speed in both this and another
> > chrome experiment (Bug 598834)
> 
> I had a quick look at your profile.
> 
> You need to get xperf to show you the weight by stack (enable the stack column
> under 'Columns'). Right now you're just seeing that individual function taking
> a relatively high time (but less than 2.6% of the total firefox CPU time). It'd
> be interesting to see what functions in d2d1.dll get hit hard too (as 3 times
> as much time is spent in d2d1.dll as is in xul.dll). The most interesting clues
> as too why it is slow for you will probably be found there!
> 
> Perhaps all these fine lines make tessellation a bit complex. I should note on
> my machine there's no visible performance difference between chromium and Fx4
> with D2D.

Thanks for the suggestions, you were right. In flat view, 21.35% of total firefox CPU time (36.6% of total time in d2d1.dll) is spent in CHwRasterizer::RasterizeEdges, with SortActiveEdges and InsertNewActiveEdge being on the stack the next most commonly, though consuming 33% and 25% of CPU time CHwRasterizer::RasterizeEdges does.

After showing weight by stack I don't see any further function calls from within _moz_cairo_matrix_transform_point, it just seems to be called from a large number of places.
(Reporter)

Comment 7

7 years ago
Created attachment 478593 [details]
CPU Profiling

An updated CPU profiling, this time showing all modules and stack information for select functions
Attachment #478179 - Attachment is obsolete: true
(In reply to comment #6)
> (In reply to comment #3)
> (In reply to comment #4)
> > (In reply to comment #2)
> > > Created attachment 478179 [details] [details] [details]
> > > CPU Profiling
> > > 
> > > If there's room for optimization in the _moz_cairo_matrix_transform_point
> > > function, it seems be a determining factor of speed in both this and another
> > > chrome experiment (Bug 598834)
> > 
> > I had a quick look at your profile.
> > 
> > You need to get xperf to show you the weight by stack (enable the stack column
> > under 'Columns'). Right now you're just seeing that individual function taking
> > a relatively high time (but less than 2.6% of the total firefox CPU time). It'd
> > be interesting to see what functions in d2d1.dll get hit hard too (as 3 times
> > as much time is spent in d2d1.dll as is in xul.dll). The most interesting clues
> > as too why it is slow for you will probably be found there!
> > 
> > Perhaps all these fine lines make tessellation a bit complex. I should note on
> > my machine there's no visible performance difference between chromium and Fx4
> > with D2D.
> 
> Thanks for the suggestions, you were right. In flat view, 21.35% of total
> firefox CPU time (36.6% of total time in d2d1.dll) is spent in
> CHwRasterizer::RasterizeEdges, with SortActiveEdges and InsertNewActiveEdge
> being on the stack the next most commonly, though consuming 33% and 25% of CPU
> time CHwRasterizer::RasterizeEdges does.
> 

Probably intersection finding stuff and such which gets pretty complex. I'm not sure if there's alot about that 21% we can do (we can still have a look ofcourse!), but I think most of the win is really to be gained here in the other 80% most likely. The view is a bit oddly sorted though (column ordering does wonders in xperfview! Also play with moving the yellowish divider). But in this case we also need a profile without PGO so we can do better code path analysis.
Whiteboard: [chromeexperiments] → [chromeexperiments] [painting-perf]
Some more extensive profiling shows 70% of firefox execution time being spent on d2d1.dll!CCommand_StrokePath::Execute and it's children, of which a very large part goes to d2d1.dll!CHwRasterizer::RasterizePath and its children.

Presumably confirming my earlier suspicion that this is all about the complexity of stroking fine lines on hardware. I'm not sure there's a spectacular lot we can do about this. Cairo software rendering doesn't seem to be doing any better here than D2D. Presumably SKIA's software rasterizer does a better job at this kind of fine line stuff.

Updated

7 years ago
Keywords: perf
You need to log in before you can comment on or make changes to this bug.