Closed Bug 588451 Opened 14 years ago Closed 13 years ago

> 100ms Ts regressions from D2D on by default

Categories

(Core :: Graphics, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: joe, Unassigned)

Details

We saw a lot of big Ts regressions from turning D2D on by default (why we only saw it the third time it was on by default I don't know). Unfortunately the very first time we start D2D on any machine, there is a significantly larger Ts penalty because of DLL mechanics that Windows does the first time you load a DLL.

We should ensure that the Ts regressions we get long-term are much lower than this, in addition to mitigating the startup costs we are able to control.
(In reply to comment #0)
> We saw a lot of big Ts regressions from turning D2D on by default (why we only
> saw it the third time it was on by default I don't know). Unfortunately the
> very first time we start D2D on any machine, there is a significantly larger Ts
> penalty because of DLL mechanics that Windows does the first time you load a
> DLL.

Can you link to a description of these mechanics?
(In reply to comment #1)
> (In reply to comment #0)
> > We saw a lot of big Ts regressions from turning D2D on by default (why we only
> > saw it the third time it was on by default I don't know). Unfortunately the
> > very first time we start D2D on any machine, there is a significantly larger Ts
> > penalty because of DLL mechanics that Windows does the first time you load a
> > DLL.
> 
> Can you link to a description of these mechanics?

I have no idea! -But- I'm sure you remember the test application I sent you, and the difference between first run and subsequent runs, even after rebooting your VM, I don't remember the exact numbers but I seem to recall it was something like 180ms vs 30ms?
That was not the first time I'd loaded D3D10 or D2D before. I had run IE9 which does both of those things.
(In reply to comment #3)
> That was not the first time I'd loaded D3D10 or D2D before. I had run IE9 which
> does both of those things.

Well, you didn't say that then! :)

But yes, I was seeing the same thing on my machine (which has loaded them many times before), when I first ran it I think it's more a matter of loading DLLs from our process. I'm not sure why this is, another option is it's the graphics driver UMDs initiating their profile for a process. A lot of graphics drivers maintain profiles for stuff using them. Just taking guesses here, it's going to be interesting to see if subsequent Ts hits will be smaller.
Look at the graph - http://bit.ly/cZjYFy - it is indeed going down. We'll see what the hit ends up being within an hour or two.
(In reply to comment #5)
> Look at the graph - http://bit.ly/cZjYFy - it is indeed going down. We'll see
> what the hit ends up being within an hour or two.

Not really, we can only see that once all the Win7 boxes have run it at least once.
http://bit.ly/bTK0ty - We may have reached a new level at about 150 ms higher than we started.
(In reply to comment #0)
> We saw a lot of big Ts regressions from turning D2D on by default (why we only
> saw it the third time it was on by default I don't know). Unfortunately the
> very first time we start D2D on any machine, there is a significantly larger Ts
> penalty because of DLL mechanics that Windows does the first time you load a
> DLL.

What DLL mechanics are you mentioning exactly?
We don't know, honestly. I took a Talos machine out for a spin with Bas's d3d10 timer[1], and saw nothing obvious in a registry diff. I'm more or less out of ideas for now; when I install Win7 on my laptop, I'll give it another go.
(In reply to comment #9)
> We don't know, honestly. I took a Talos machine out for a spin with Bas's d3d10
> timer[1], and saw nothing obvious in a registry diff. I'm more or less out of
> ideas for now; when I install Win7 on my laptop, I'll give it another go.

Looking into file system access might also be useful here.
I need to create a test app that loads the D2D1 DLLs as well to make that app more indicative of our actual minimum overhead (i.e. the amount of time we absolutely cannot get out of without doing asynchronous loads or other tricks). We currently have some extra overhead on our test boxes because they support DX 10.1, we initialize 10.0 first to see if we support it, find out we do, then try DX 10.1, succeed and throw away the DX 10 version, this is pretty bad! But we currently don't know an easy way to get around it and still get the max performance we can do.
I'm hopeful Bug 585817 will mean some improvement here.
So, sadly it was backed out again, but numbers we're seeing currently are certainly pointing at an improvement due to that bug landing(will re-land soon, the issue it had was resolved):

    Previous results:
        555.947 from build 20100819123813 of revision 55ef0e0529bc at 2010-08-19 16:11:12 on talos-r3-w7-044 run # 0
    New results:
        448.316 from build 20100819130317 of revision 90ad165ae21b at 2010-08-19 14:34:43 on talos-r3-w7-022 run # 0
    http://mzl.la/9RUMvS
    http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=55ef0e05... 

Let's not cheer too soon though, these numbers seem to have been erratic before :)
(In reply to comment #12)
> We currently have some extra overhead on our test boxes because they support DX
> 10.1, we initialize 10.0 first to see if we support it, find out we do, then
> try DX 10.1, succeed and throw away the DX 10 version, this is pretty bad! But
> we currently don't know an easy way to get around it and still get the max
> performance we can do.

Dumb question: can we try DX 10.1 first?
(In reply to comment #15)
> (In reply to comment #12)
> > We currently have some extra overhead on our test boxes because they support DX
> > 10.1, we initialize 10.0 first to see if we support it, find out we do, then
> > try DX 10.1, succeed and throw away the DX 10 version, this is pretty bad! But
> > we currently don't know an easy way to get around it and still get the max
> > performance we can do.
> 
> Dumb question: can we try DX 10.1 first?

That would mean we have to try DX 10 next, and try -twice- for people who don't get D2D at all. i.e. we'd have twice the Ts hit for people who don't benefit at all.
(In reply to comment #14)
> So, sadly it was backed out again, but numbers we're seeing currently are
> certainly pointing at an improvement due to that bug landing(will re-land soon,
> the issue it had was resolved):
> 
>     Previous results:
>         555.947 from build 20100819123813 of revision 55ef0e0529bc at
> 2010-08-19 16:11:12 on talos-r3-w7-044 run # 0
>     New results:
>         448.316 from build 20100819130317 of revision 90ad165ae21b at
> 2010-08-19 14:34:43 on talos-r3-w7-022 run # 0
>     http://mzl.la/9RUMvS
>     http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=55ef0e05... 
> 
> Let's not cheer too soon though, these numbers seem to have been erratic before
> :)

For the moment it looks like with the relanding we get similar test results. So it seems we've indeed corrected a lot of the Ts impact with the fixing of Bug 585817. There's a small Ts increase still, but it's a lot more like the predicted regression for D2D users.
It looks like perhaps the first run long initialization isn't easily reproducible. I just installed Windows 7, and here are my results from Bas's d3d10loadspeed tool:

VERY FIRST TIME


joe@JOE-LAPTOP ~/Downloads
$ ./d3d10loadspeed.exe
D3D10 loading and device creation SUCCESS took: 327.232 ms

joe@JOE-LAPTOP ~/Downloads
$ ./d3d10loadspeed.exe
D3D10 loading and device creation SUCCESS took: 43.405 ms

joe@JOE-LAPTOP ~/Downloads
$ ./d3d10loadspeed.exe
D3D10 loading and device creation SUCCESS took: 24.904 ms

joe@JOE-LAPTOP ~/Downloads
$


----REBOOT----


joe@JOE-LAPTOP ~/Downloads
$ ./d3d10loadspeed.exe
D3D10 loading and device creation SUCCESS took: 474.669 ms

joe@JOE-LAPTOP ~/Downloads
$ ./d3d10loadspeed.exe
D3D10 loading and device creation SUCCESS took: 22.736 ms

joe@JOE-LAPTOP ~/Downloads
$ ./d3d10loadspeed.exe
D3D10 loading and device creation SUCCESS took: 23.994 ms

joe@JOE-LAPTOP ~/Downloads
$ ./d3d10loadspeed.exe
D3D10 loading and device creation SUCCESS took: 23.459 ms

joe@JOE-LAPTOP ~/Downloads
$ ./d3d10loadspeed.exe
D3D10 loading and device creation SUCCESS took: 23.690 ms


The first run time is totally reproducible after a restart, meaning we can pin it on DLL loading from disk.
Hrm, it wasn't reproducible on my Mac Mini! :-(
Depends on: 595365
No longer depends on: 595365
It's not clear if there's anything left to do here. If there indeed is anything, please reopen this bug.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.