Closed Bug 598584 Opened 12 years ago Closed 5 years ago

CPU usage with test case is 18% on 4.0b8pre with D2D disabled vs 6% on 3.6.10

Categories

(Core :: Widget: Win32, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Tracking Status
status2.0 --- wanted

People

(Reporter: scoobidiver, Unassigned)

References

()

Details

(Keywords: regression)

Build : Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b7pre) Gecko/20100917
Firefox/4.0b7pre

This bug is duplicated from bug 597416 which is only about HW acceleration.

Every tests have been done with a new profile.
In the ref URL, without playing any video, on my T4300 Pentium, the CPU usage is :
* 45% for FF 4.0b7pre without HW acceleration
* 12% for FF 3.6.10
blocking2.0: --- → ?
Which processes are using the CPU?
                          FF 4.0 b7pre without HW accel    FF 3.6.10
firefox.exe                          36%                       5%
plugin-container.exe                  5%                       5%
dwm.exe                               3%                       3%

May be it is not a Core plugins issue, but a Core layout issue ?
The regression range is :
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e1d55bbd1d1d&tochange=6e3f6d18c124
Component: Plug-ins → Layout
Keywords: regression
QA Contact: plugins → layout
roc, tn, any idea what's up here?  Looks like a possible regression from bug 130078...
This is a wmode=transparent flash, we have known problems with that and we have a fix underway.

I tried a build before 130078 (2010-08-27) and a current nightly, the cpu usage was similar.

What also landed in that range was to change the name of the prefs that disable/enable hardware acceleration. The render-mode pref was changed to a direct2d enabled/disabled pref. That may be why you got that regression range.
No longer depends on: 597416
I set mozilla.widget.render-mode to 0 and I verified in "about:support" page that D2D is disabled.
            b5pre/20100827      b5pre/20100828
CPU usage         20%                50%
So the regression range in comment 3 is right.
What happens if you disable/enabled OOP plugins (using the pref dom.ipc.plugins.enabled)?
I set dom.ipc.plugins.enabled to false and mozilla.widget.render-mode to 0
            b5pre/20100827         b5pre/20100828
CPU usage         2%         Fluctuation between 5 to 18%
What if you try Aero Basic (no translucent glass) and/or the classic theme?
I set dom.ipc.plugins.enabled to false and mozilla.widget.render-mode to 0.
               Classic menu    Button menu     Classic menu      Button menu
               Windows aero    Windows aero    Windows basic     Windows basic
b5pre/20100827     2%               2%               1%               1%
b5pre/20100828    5-30%           5-30%              1%               1%
Ok, so the transparent widget is the problem. I assume OOP plugins shows the same thing? Thanks for doing this testing.
Summary: CPU usage with flash video in pause is 45% on 4.0b7pre vs 12% on 3.6.10 in safe mode → [Windows7 Aero] CPU usage with a flash video in pause is 45% on 4.0b7pre with D2D disabled vs 12% on 3.6.10
Bug 591554 and bug 591558 are good candidates for the regression.
Blocks: 591554, 591558
Component: Layout → Widget: Win32
QA Contact: layout → win32
Blocks: slowui
(In reply to comment #12)
> Bug 591554 and bug 591558 are good candidates for the regression.

What's your hardware again scoobidiver? I'm guessing we're hitting alpha extraction here, potential the slow path version.
> What's your hardware again scoobidiver?
CPU name : Intel Pentium T4300 (2.1 GHz, 800MHz FSB)
CPU info : GenuineIntel family 6 model 23 stepping 10

Adapter Description : Mobile Intel(R) 4 Series Express Chipset Family
Vendor ID : 8086
Device ID : 2a42
Adapter RAM : 1800
Adapter Drivers : igdumd64 igd10umd64 igdumdx32 igd10umd32
Driver Version : 8.15.10.2202
Driver Date : 8-25-2010
Direct2D Enabled : true
DirectWrite Enabled : true
GPU Accelerated Windows : 1/1 Direct3D 9
Assignee: nobody → tnikkel
Now, with 4.0b8pre/20101106 build, here are the results:
* 30% for FF 4.0b8pre without HW acceleration
* 12% for FF 3.6.10
I think the fixing of bug 545892 has improved CPU usage.
Depends on: 545892
Why do you suspect bug 545892?
> Why do you suspect bug 545892?
It is the only bug I know that changed Firefox behavior towards aero glass.
But I didn't look for the improvement range, so it could have been due to another fix.
Assignee: tnikkel → nobody
Numbers for me on this page have improved quite a bit since we landed async plugin painting. I see plugin: 6%, fx: 2% currently.
Please reopen if you're still seeing problems.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
With aero theme:
4.0b8pre/20101119 without HW acceleration: 45%
FF 3.6.12:                                 22%
In relative value, it is better. But it is only because FF 3.6.12 performances became worse than FF 3.6.10 ones.
So I reopen this bug
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
taking to investigate further.
Assignee: nobody → jmathies
(In reply to comment #20)
> With aero theme:
> 4.0b8pre/20101119 without HW acceleration: 45%
> FF 3.6.12:                                 22%
> In relative value, it is better. But it is only because FF 3.6.12 performances
> became worse than FF 3.6.10 ones.
> So I reopen this bug

We're those numbers for firefox.exe or the plugin-container?

I did some profiling / debugging on this flash applet. It's really pretty poorly designed. They trigger invalidations on a rectangle with dims of about 750 x 240 constantly despite a lack of any visible animation.

In the container, we mitigate this through async painting in some respects, by accumulating every 3 or 4 invalidations into a single operation. We then spend some time copying over our back buffer in ReadbackDifferenceRect and then shipping that over. Firefox proper then has to composite the page together.

I'm not seeing any obvious ways to optimize this in the container.
I also didn't see any differences between aero glass and aero basic. Not sure if that's still an issue here.
How much would we save by speeding up ReadbackDifferenceRect?

One thing we could do is https://bugzilla.mozilla.org/show_bug.cgi?id=596451#c46. If the front and back buffers were kept in D3D on the plugin-container side, ReadbackDifferenceRect (which really should be called CopyDifferenceRect) could happen on the GPU. This would also have the nice effect of the D3D upload overhead being lower (since we only need to upload the plugin area that's changed) and moving from the browser process to the plugin process. Of course this all only helps where hardware acceleration is available.

Probably quite a bit of work to implement that though. But, it also paves the way for NPAPI for plugin rendering through D3D. Also, if we decide we need to fall back to alpha recovery in some cases, this approach would let use the GPU to do it.
> We're those numbers for firefox.exe or the plugin-container?
> I also didn't see any differences between aero glass and aero basic. Not sure
> if that's still an issue here.
Here are my new results (with a new profile):
                                    4.0b8pre/20101201 
                                      w/o HW accel.           3.6.12
                             aero glass         basic     aero glass/basic
firefox.exe                      18              12              6
plugin-container.exe             12              12             12
dwm.exe                           4               0              4
(In reply to comment #25)
> > We're those numbers for firefox.exe or the plugin-container?
> > I also didn't see any differences between aero glass and aero basic. Not sure
> > if that's still an issue here.
> Here are my new results (with a new profile):
>                                     4.0b8pre/20101201 
>                                       w/o HW accel.           3.6.12
>                              aero glass         basic     aero glass/basic
> firefox.exe                      18              12              6
> plugin-container.exe             12              12             12
> dwm.exe                           4               0              4

Lets split the aero glass / aero basic issue out into another bug. From these numbers and your numbers in bug 597416, you're seeing a 30-50% difference regardless of your acceleration settings between the two.

Lets make this bug about getting that 12% plugin-container / firefox value down when hardware acceleration is disabled. (bug 597416 is the same for hardware acceleration.)

Do you have accelerated layers enabled? (layers.accelerate-all = ?)
(In reply to comment #24)
> How much would we save by speeding up ReadbackDifferenceRect?
> 
> One thing we could do is
> https://bugzilla.mozilla.org/show_bug.cgi?id=596451#c46. If the front and back
> buffers were kept in D3D on the plugin-container side, ReadbackDifferenceRect
> (which really should be called CopyDifferenceRect) could happen on the GPU.
> This would also have the nice effect of the D3D upload overhead being lower
> (since we only need to upload the plugin area that's changed) and moving from
> the browser process to the plugin process. Of course this all only helps where
> hardware acceleration is available.
> 
> Probably quite a bit of work to implement that though. But, it also paves the
> way for NPAPI for plugin rendering through D3D. Also, if we decide we need to
> fall back to alpha recovery in some cases, this approach would let use the GPU
> to do it.

That sounds like a fun experiment. :) I'll look at it. I think though that would land in bug 597416.
(In reply to comment #26)
> (In reply to comment #25)
> > Here are my new results (with a new profile):
> >                                     4.0b8pre/20101201 
> >                                       w/o HW accel.           3.6.12
> >                              aero glass         basic     aero glass/basic
> > firefox.exe                      18              12              6
> > plugin-container.exe             12              12             12
> > dwm.exe                           4               0              4
> 
> Lets make this bug about getting that 12% plugin-container / firefox value down
> when hardware acceleration is disabled. (bug 597416 is the same for hardware
> acceleration.)

Actually, plugin-container is 12% across the board and the numbers are way down from your original filing. I guess maybe this bug can be about the difference in aero glass / aero basic in firefox.exe. I'm not sure this bug should block anymore though. We've made substantial improvements since your filed the original bug.
Fixing up title and removing blocking status. We can re-nom if the differences between 3.6 and 4.0 warrant it.
blocking2.0: final+ → ---
Summary: [Windows7 Aero] CPU usage with a flash video in pause is 45% on 4.0b7pre with D2D disabled vs 12% on 3.6.10 → CPU usage with test case is 18% on 4.0b8pre with D2D disabled vs 6% on 3.6.10
> Do you have accelerated layers enabled? (layers.accelerate-all = ?)
I disable HW acceleration in Option window. So it means that accelerated layers are disabled.
No longer depends on: 545892
Jim says that we are in ReadbackDifferenceRect we are spending time in Cairo-land: http://hg.mozilla.org/mozilla-central/annotate/2b44a6a3bfd8/dom/plugins/PluginInstanceChild.cpp#l2793

This sounds really unnecessary, at least on Windows: we have both surfaces as Shared DIB surfaces, and so all of the operations should just be BitBlt with SRCCOPY. ReadbackDifferenceRect is just a performance optimization: if it's going to be expensive, we should just skip it and repaint the entire combined dirty area.

The current performance is good enough, so this is not a blocker, but I'd really like to verify or fix this so we just bitblt.
status2.0: --- → wanted
ReadbackDifferenceRect already tries to optimize the area copied, by the look of it. If the entire surface is invalid, its 'result' region should be empty and we won't copy anything.
I'm not talking about the area copied, but rather the code used to do the actual copy. Windows BitBlt is probably cheaper than anything in Cairo, for SharedDIB surfaces.
Maybe, although I'd hope that pixman is good enough to be limited only by memory bandwidth for simple rectangle copies.

I sure hope it's better than asking Flash to repaint the entire surface.
For reference, this is a screen cap of the paint call tree generated by AQTime. Unfortunately I can't export these to html. I can post the aqtime data files if anyone wants them.

http://i53.tinypic.com/vzzot0.png
Assignee: jmathies → nobody
Status: REOPENED → RESOLVED
Closed: 12 years ago5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.