30-50ms Ts regression on Sept 6 2010 after turning on D3D9 accelerated layer composition

RESOLVED FIXED in mozilla2.0b8

Status

()

RESOLVED FIXED
8 years ago
7 years ago

People

(Reporter: joe, Assigned: bas.schouten)

Tracking

Trunk
mozilla2.0b8
x86
Windows XP
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 final+)

Details

Attachments

(7 attachments, 7 obsolete attachments)

3.59 KB, patch
roc
: review+
Details | Diff | Splinter Review
16.01 KB, patch
roc
: review+
Details | Diff | Splinter Review
2.20 KB, patch
roc
: review+
Details | Diff | Splinter Review
11.43 KB, patch
roc
: review+
Details | Diff | Splinter Review
3.62 KB, patch
roc
: review+
Details | Diff | Splinter Review
5.01 KB, patch
roc
: review+
Details | Diff | Splinter Review
1.01 KB, patch
roc
: review+
Details | Diff | Splinter Review
(Reporter)

Description

8 years ago
We saw a reasonably large Ts regression from turning on D3D9 accelerated layer composition by default. The explanation for this is that we are simply loading and initializing a library and objects from that library, respectively, and there's not a lot that can be done about it directly.

However, it should be possible to mitigate this Ts loss by doing some things off the main thread; either doing the D3D9 device initialization off the main thread, or (preferably) doing something else asynchronously, because using D3D9 on another thread would imply a lot of things about how we access DirectX would need to change to that thread.
Marking blocking; we need to make a decision, even if it's to WONTFIX this, though I really hope that isn't the case!
blocking2.0: --- → final+
(Assignee)

Comment 3

8 years ago
Shaver's suggestion was to move some other IO intensive stuff off the main thread. I do not know much about what things we could do their and how to do those things on another thread (depending on what they are). We should probably talk about what can be moved and who's the best person to help on that.

Comment 4

8 years ago
We can move the initial read of omnijar and/or startup cache off the main thread pretty easily, I think, to parallelize that with D3D9 initialization. Who wants to take a crack at that?

Comment 5

8 years ago
(In reply to comment #4)
> We can move the initial read of omnijar and/or startup cache off the main
> thread pretty easily, I think, to parallelize that with D3D9 initialization.
> Who wants to take a crack at that?

Is it possible to move d3d initialization instead?
If we initialize D3D off the main thread, we have to proxy all D3D access to that other thread as well.  Pretty gross, and pervasive cost.

Comment 7

8 years ago
(In reply to comment #6)
> If we initialize D3D off the main thread, we have to proxy all D3D access to
> that other thread as well.  Pretty gross, and pervasive cost.

I was thinking of doing it from main() and then doing a thread.join() on first d3d access as a way to cut down latency, but if that doesn't work we can try to gain this back elsewhere. Problem is that the sum of those jars is < 30ms on warm startup.
(In reply to comment #6)
> If we initialize D3D off the main thread, we have to proxy all D3D access to
> that other thread as well.  Pretty gross, and pervasive cost.

Is that really true?  I thought D3D was multithread-happy, so it seems like we could create the initial device and stuff on a thread.  We'd have to synchronize the first time we draw to make sure that initialization is done...

It would be useful to know how long the d3d initialization actually takes -- function timer stuff can help there, or even just a simple time around the platform Init() or the layer manager's init function.
(Assignee)

Comment 9

8 years ago
(In reply to comment #8)
> (In reply to comment #6)
> > If we initialize D3D off the main thread, we have to proxy all D3D access to
> > that other thread as well.  Pretty gross, and pervasive cost.
> 
> Is that really true?  I thought D3D was multithread-happy, so it seems like we
> could create the initial device and stuff on a thread.  We'd have to
> synchronize the first time we draw to make sure that initialization is done...
> 
> It would be useful to know how long the d3d initialization actually takes --
> function timer stuff can help there, or even just a simple time around the
> platform Init() or the layer manager's init function.

It takes 40ms in a completely bare stand-alone application. Pretty much all that time is spent in CreateDevice and D3D9Create. The application to test it is on my machine back home, I'll try to upload it here sometime soon and make it report the time to something more accessible than the Debug console so people can try it for themselves if they're interested.

Resource access and drawing wise D3D9 is multithread-happy. However device creation and reset wise it's not. The device is basically 'owned' by a thread, and more fundamental access of a device (resetting it, which is a frequent scenario in XP) and some other things like that need to occur on the owning thread. This makes it all complex.
Summary: 30-50ms Ts regression after turning on D3D9 accelerated layer composition → 30-50ms Ts regression on Sept 6 2010 after turning on D3D9 accelerated layer composition
(Reporter)

Updated

8 years ago
Assignee: nobody → bas.schouten
(Assignee)

Comment 10

8 years ago
The D3D9 layers startup time cannot be improved in any way I can see. It will be avoided soon for D2D capable machines. On other machines we'll have to execute on our idea of moving something else to another thread I suppose. Somebody please re-open if they disagree strongly.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → WONTFIX
I don't think you can WONTFIX a 30-50ms Ts regression without a discussion about the tradeoffs.

For example: I don't know what D3D9 is buying us which is worth a the Ts regression.I don't know if D3D10 buys us those same things without the regression. I don't know if 64-bit makes this better, or if there's something architectural we can do to make it better like e10s.

We seem to have followed this process:

 - build D3D9 code on assumption it will make things faster
 - turned on D3D9 in nightlies and betas to evaluate
 - discovered a measurable Ts regression
 - decided to keep it 

We're missing the step where we ensure that D3D9 actually makes things faster.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
(Assignee)

Comment 12

8 years ago
(In reply to comment #11)
> I don't think you can WONTFIX a 30-50ms Ts regression without a discussion
> about the tradeoffs.
> 
> For example: I don't know what D3D9 is buying us which is worth a the Ts
> regression.I don't know if D3D10 buys us those same things without the
> regression. I don't know if 64-bit makes this better, or if there's something
> architectural we can do to make it better like e10s.
> 
> We seem to have followed this process:
> 
>  - build D3D9 code on assumption it will make things faster
>  - turned on D3D9 in nightlies and betas to evaluate
>  - discovered a measurable Ts regression
>  - decided to keep it 
> 
> We're missing the step where we ensure that D3D9 actually makes things faster.

Well, there's a couple of demos I can list that definitely perform quite a bit faster with D3D9. That's cherry picking though, whether the 'general browsing experience' becomes a lot faster, is very hard to say. I don't know of anything in the browser experience that becomes slower, but something might exist! The question I suppose is what kind of performance data you're interested in.

As for D3D10, all the evidence we have shows pretty much everything is better with D3D10 (although it's only available for our D2D users).
(Assignee)

Comment 13

8 years ago
It should be noted for Win7 the regression is gone due to landing of D3D10 layers by default.
Based on Joe's previous email to dev-planning [1] and blog post [2] I understand that we should see general speedups in page responsiveness in terms of scrolling, etc. I can tell you that - anecdotally speaking - I didn't feel it as much on Windows as I did on OSX, but that might be due to limitations of my Windows hardware.

Bas asked what sort of performance data I'd be interested in. Well, for one, I'd be interested in comparisons of scrolling on pages where we *know* we should be seeing an improvement. That would, at the least, give me some handle of how to trade off the best case scenario. The worst case scenario, I suspect, would be that there would be little improvement at the cost of the Ts regression.

Another option we have would be to not do D3D9 by default, but do D3D10 by default.

[1]: http://groups.google.com/group/mozilla.dev.planning/browse_thread/thread/90658cdda3c0e063
[2]: http://blog.mozilla.com/joe/2010/05/25/hardware-accelerating-firefox/

Comment 15

8 years ago
Beltzner wanted me to mention that Ts only measures warm startup and that there is a bigger impact on cold startup(which is the startup that users actually hate). I haven't measured it specifically, but from xperf logs flying by it's somewhere over 200ms for cold startup just to load the d3d-related dlls.
(Reporter)

Comment 16

8 years ago
(In reply to comment #14)
> Another option we have would be to not do D3D9 by default, but do D3D10 by
> default.

I'd expect to see a similar regression from D3D10 as D3D9. The main problem was that we doubled the regression by using both - switching to D3D10 removed one of the regressions from newer machines capable of both 10 and 9.

Also, if we use D3D10 only by default, all our Windows XP users get no hardware acceleration at all.
(In reply to comment #16)
> I'd expect to see a similar regression from D3D10 as D3D9. The main problem was
> that we doubled the regression by using both - switching to D3D10 removed one
> of the regressions from newer machines capable of both 10 and 9.

Wait - now I'm confused again. I thought you had said to me that you were able to remove the Ts regression with D3D10, but wouldn't be able to do so for D3D9. Is it the case that D3D10 (so, Windows 7) users are also still seeing a Ts regression?

> Also, if we use D3D10 only by default, all our Windows XP users get no hardware
> acceleration at all.

Yes, I understand that part. That's part of the "trade off" that I'm proposing we analyse.
(Assignee)

Comment 18

8 years ago
I did some measurements as requested. I did not cherry pick the pages for the scrolling/redraw test, both tests were ran using a bookmarklet that scrolls/redraws a bunch of times. Redraw should -not- be faster for D3D9 layers (since most of the time will be spent redrawing everything in GDI) unless the page happens to be doing something that accelerates well (it looks like Google's frontpage does). The scrolling advantages are apparent. Do note that bigger than the measured timing difference, is the UXP difference, in the D3D9 case the pages moves by slowly, in the no-D3D9 case the page has faster and slower frames, causing a 'stuttery' feeling. I added the transform stress test page as an example of something that accelerates -really really- well. Additionally it should be noted that due to a bug in D3D9 layers the 25fps it achieves is fairly limited. With a patch we have in the queue it goes to 60 fps. But I did the tests with clean nightlies.

GDI + D3D9:

Scrolling - http://frontpage.fok.nl/ - 4050 ms
Redraw - http://frontpage.fok.nl/ - 5250 ms
Redraw - http://www.google.com/ - 3780 ms
Scrolling - http://www.slashdot.org/ - 3816 ms
Redraw - http://www.slashdot.org/ - 5500 ms

Transform stresstest - https://bugzilla.mozilla.org/attachment.cgi?id=463418 - 25 fps

GDI:

Scrolling - http://frontpage.fok.nl/ - 4700 ms
Redraw - http://frontpage.fok.nl/ - 5500 ms
Redraw - http://www.google.com/ - 5500 ms
Scrolling - http://www.slashdot.org/ - 5400 ms
Redraw - http://www.slashdot.org/ - 5500 ms

Transform stresstest - https://bugzilla.mozilla.org/attachment.cgi?id=463418 - 1-2 fps
(Assignee)

Comment 19

8 years ago
(In reply to comment #17)
> (In reply to comment #16)
> > I'd expect to see a similar regression from D3D10 as D3D9. The main problem was
> > that we doubled the regression by using both - switching to D3D10 removed one
> > of the regressions from newer machines capable of both 10 and 9.
> 
> Wait - now I'm confused again. I thought you had said to me that you were able
> to remove the Ts regression with D3D10, but wouldn't be able to do so for D3D9.
> Is it the case that D3D10 (so, Windows 7) users are also still seeing a Ts
> regression?

They saw one with D2D, that's the one Joe's referring to. For D2D we already had to initialize the D3D10 device/libraries. The D3D10 layers backend uses that device created for D2D, so incurs no performance penalty. This is only true because we use it for users who are using D2D anyway though.
Also, any page that has video embedded as part of page content will be improved with d3d9 layers -- even if the performance is "acceptable" without it, it will use much less CPU with d3d9.
(In reply to comment #20)
> Also, any page that has video embedded as part of page content will be improved
> with d3d9 layers -- even if the performance is "acceptable" without it, it will
> use much less CPU with d3d9.

Is that Flash video, HTML <video>, or both?
(Reporter)

Comment 22

8 years ago
<video>.
(Assignee)

Comment 23

8 years ago
I'm working on doing an initial paint using BasicLayers, and then once the browser is shown switching to D3D9 (and blocking for 50ms), presumably those 50 ms are acceptable once the browser is visible. And will have less of an affect on UXP.
(Assignee)

Comment 24

8 years ago
http://graphs.mozilla.org/#tests=[[16,23,389]] shows the results of a test run from Try where I used my delayed D3D9 code.
(Assignee)

Comment 25

8 years ago
Created attachment 493125 [details] [diff] [review]
Part 1: Add method to get all windows
Attachment #493125 - Flags: review?(roc)
(Assignee)

Comment 26

8 years ago
Created attachment 493126 [details] [diff] [review]
Part 2: Add parameter to request permanent layer manager
Attachment #493126 - Flags: review?(roc)
(Assignee)

Comment 27

8 years ago
Created attachment 493127 [details] [diff] [review]
Part 3: Make Canvas and Video request the permanent layer manager
Attachment #493127 - Flags: review?(roc)
(Assignee)

Updated

8 years ago
Attachment #493127 - Attachment is patch: true
Attachment #493127 - Attachment mime type: application/octet-stream → text/plain
(Assignee)

Updated

8 years ago
Attachment #493125 - Attachment description: Add method to get all windows → Part 1: Add method to get all windows
(Assignee)

Comment 28

8 years ago
Created attachment 493128 [details] [diff] [review]
Part 4: Delay Direct3D 9 usage when possible
Attachment #493128 - Flags: review?(roc)
+    virtual LayerManager* GetLayerManager(bool* aAllowRetaining = nsnull,
+                                          bool aRequirePermanent = false) = 0;

These parameters should go in the other order so the out param is last.

Here we run into the ugly problem that we're not supposed to change nsIWidget anymore until the FF4 release. So you have to actually add to nsIWidget_MOZILLA_2_0_BRANCH. You can static_cast from nsIWidget to nsIWidget_MOZILLA_2_0_BRANCH though.

It probably does make sense to add a separate method instead of a bool parameter though, say GetPermanentLayerManager. Or maybe GetPersistentLayerManager since the layer manager is never really permanent due to device resets.
(In reply to comment #25)
> Created attachment 493125 [details] [diff] [review]
> Part 1: Add method to get all windows

As discussed, instead of building an array it's slightly simpler and more efficient here to just pass a callback function that gets called on each window.
(Assignee)

Comment 31

8 years ago
Created attachment 493138 [details] [diff] [review]
Part 1: Add method to enumerate all windows

Hopefully this does roughly what you wanted.
Attachment #493125 - Attachment is obsolete: true
Attachment #493138 - Flags: review?(roc)
Attachment #493125 - Flags: review?(roc)
(Assignee)

Comment 32

8 years ago
Created attachment 493149 [details] [diff] [review]
Part 2: Add parameter to request persistent layer manager
Attachment #493126 - Attachment is obsolete: true
Attachment #493149 - Flags: review?(roc)
Attachment #493126 - Flags: review?(roc)
(Assignee)

Updated

8 years ago
Attachment #493149 - Attachment is patch: true
Attachment #493149 - Attachment mime type: application/octet-stream → text/plain
(Assignee)

Comment 33

8 years ago
Created attachment 493150 [details] [diff] [review]
Part 3: Make Canvas and Video request the permanent layer manager v2
Attachment #493127 - Attachment is obsolete: true
Attachment #493150 - Flags: review?(roc)
(Assignee)

Comment 34

8 years ago
Created attachment 493151 [details] [diff] [review]
Part 4: Delay Direct3D 9 usage when possible v2
Attachment #493128 - Attachment is obsolete: true
Attachment #493151 - Flags: review?(roc)
Attachment #493128 - Flags: review?(roc)
Comment on attachment 493149 [details] [diff] [review]
Part 2: Add parameter to request persistent layer manager

+   * layer manager. In addition to the normal layer manager lookup this will
+   * specifically request a persistent layer manager.

You should explain what "persistent layer manager" means.

+  LayerManagerForDocumentInternal(nsIDocument *aDoc,
+                                  bool aRequirePersistent = false);

Don't need a default parameter here. In fact this doesn't need to be in nsContentUtils.h, it can just be a static function in nsLayoutUtils.cpp.

+    enum LayerManagerType
+    {
+      Current = 0,
+      Persistent
+    };

I think LAYER_MANAGER_CURRENT and LAYER_MANAGER_PERSISTENT. Also, LayerManagerType is ambiguous with the backend type, so how about LayerManagerPersistence.

Otherwise looks good.
Attachment #493149 - Flags: review?(roc) → review+
+  nsCOMPtr<nsITimer>    mD3D9Timer;

This is associated with the first widget, then. I think we should cancel this timer if we shut down before it expires. But we can't cancel it when the window is destroyed since what if someone closes the first window and opens a new one?

How about just recording a timestamp at the creation of the first window, then in GetLayerManager if !sAllowD3D9, check the current time to see if more than five seconds have elapsed? Then we don't need a timer.
(In reply to comment #36)
> How about just recording a timestamp at the creation of the first window, then
> in GetLayerManager if !sAllowD3D9, check the current time to see if more than
> five seconds have elapsed? Then we don't need a timer.

If more than five seconds have elapsed, then I suppose we should use an asynchronous XPCOM event to actually do the layer manager switch. We don't want to change layer managers while we're painting or anything like that.
(Assignee)

Comment 38

8 years ago
Created attachment 493265 [details] [diff] [review]
Delay Direct3D 9 usage when possible v3

Updated as per IRC discussion.
Attachment #493265 - Flags: review?(roc)
Comment on attachment 493265 [details] [diff] [review]
Delay Direct3D 9 usage when possible v3

+nsCOMPtr<nsITimer> nsToolkit::mD3D9Timer;

You can't use a global nsCOMPtr ... we have to avoid global objects with constructors and destructors. Just make it a non-static member of nsToolkit.
Attachment #493265 - Flags: review?(roc) → review+
(Assignee)

Comment 40

8 years ago
(In reply to comment #39)
> Comment on attachment 493265 [details] [diff] [review]
> Delay Direct3D 9 usage when possible v3
> 
> +nsCOMPtr<nsITimer> nsToolkit::mD3D9Timer;
> 
> You can't use a global nsCOMPtr ... we have to avoid global objects with
> constructors and destructors. Just make it a non-static member of nsToolkit.

Is it fair to assume then that we always have a single instance of nsToolkit on the main thread (i.e. I can get is using NS_GetCurrentToolkit and be safe)? The code in nsToolkit seems to suggest other scenarios are possible from a design perspective.
(In reply to comment #40)
> Is it fair to assume then that we always have a single instance of nsToolkit on
> the main thread (i.e. I can get is using NS_GetCurrentToolkit and be safe)?

Yes.
(Assignee)

Updated

8 years ago
Attachment #493151 - Attachment is obsolete: true
Attachment #493151 - Flags: review?(roc)
(Assignee)

Comment 42

8 years ago
Created attachment 495437 [details] [diff] [review]
Part 0: Clear out user data on destroy.

There's some trouble with the patches on TryServer. Roc pointing me in this direction and I think this will fix the issue. I'm still waiting on the tryserver run to confirm.
Attachment #495437 - Flags: review?(roc)
(Assignee)

Comment 43

8 years ago
Created attachment 495662 [details] [diff] [review]
Part 4: Delay Direct3D 9 usage when possible v4

Two minor changes here compared to previous:

- Addressed review comment of using non-static nsCOMPtr.
- Call Destroy() on mLayerManager before setting to NULL to trigger clearing user data.
Attachment #493265 - Attachment is obsolete: true
Attachment #495662 - Flags: review?(roc)
(Assignee)

Comment 44

8 years ago
Created attachment 495776 [details] [diff] [review]
Part -1: Keep LayerManager reference inside LayerManagerData.

This was checked in but then backed out due to some leaks.

These were caused by part 0 clearing out UserData on destroy. In some scenarios this could lead to the UserData being removed by Destroy rather than by InternalDestroyDisplayItemData. The latter would then never release the reference it took in WillEndTransaction. This patch keeps the reference inside the user data, making sure it will always be destroyed.
Attachment #495776 - Flags: review?(roc)
Hmm, this would actually cause a leak by itself, due to a cycle between the userdata and the layer manager, but I guess with the extra changes in this patch it shouldn't leak.
(Assignee)

Comment 46

8 years ago
Created attachment 495965 [details] [diff] [review]
Part 0: Clear out user data on destroy. v2

This makes sure everywhere in widget where we null our layer manager we actually call destroy first.
Attachment #495437 - Attachment is obsolete: true
Attachment #495965 - Flags: review?(roc)
Comment on attachment 493149 [details] [diff] [review]
Part 2: Add parameter to request persistent layer manager

>diff --git a/widget/public/nsIWidget.h b/widget/public/nsIWidget.h
>+    virtual LayerManager *GetLayerManager(LayerManagerType aType = Current,
>+                                          bool* aAllowRetaining = nsnull) = 0;

The above change introduced 707 instances of GCC warnings like this:
{
In file included from ../../dist/include/nsGUIEvent.h:57,
                 from ../../dist/include/IPC/nsGUIEventIPC.h:43,
                 from ../../ipc/ipdl/_ipdlheaders/mozilla/dom/PBrowser.h:22,
                 from ../../ipc/ipdl/_ipdlheaders/mozilla/dom/PBrowserParent.h:9,
                 from PBrowserParent.cpp:7:
../../dist/include/nsIWidget.h:883: warning: 'virtual mozilla::layers::LayerManager* nsIWidget::GetLayerManager(bool*)' was hidden
../../dist/include/nsIWidget.h:1394: warning:   by 'virtual mozilla::layers::LayerManager* nsIWidget_MOZILLA_2_0_BRANCH::GetLayerManager(nsIWidget_MOZILLA_2_0_BRANCH::LayerManagerPersistence, bool*)'
PContentParent.cpp
}

...which is approximately 5000 new lines of build-warning output (assuming ~7 lines of output per warning).

(based on http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1291851874.1291857776.22630.gz )

This makes build output super-unreadable, since yesterday. :) Could this method be renamed to something different, perchance?
> This makes build output super-unreadable, since yesterday
(er, s/yesterday/this morning/, I guess.)
Created attachment 496429 [details] [diff] [review]
followup to fix build warning about nsIWidget::GetLayerManager

(In reply to comment #48)
> Could this method
> be renamed to something different, perchance?

In IRC, Bas suggested just adding "using nsIWidget::GetLayerManager;" to the nsIWidget_MOZILLA_2_0_BRANCH class definition.  That fixes the warnings on my system.

Here's a patch to make that change.
Attachment #496429 - Flags: review?(roc)
Target Milestone: --- → mozilla2.0b8
Depends on: 617860

Updated

7 years ago
Depends on: 692122
You need to log in before you can comment on or make changes to this bug.