Closed Bug 906147 Opened 11 years ago Closed 11 years ago

The metro browser randomly hangs while running mochitest-metro-chrome in xul.dll!mozilla::layers::DeprecatedTextureClientD3D11::LockTexture()

Categories

(Core :: Graphics, defect)

x86_64
Windows 8.1
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla26

People

(Reporter: jimm, Assigned: nrc)

References

Details

Attachments

(2 files, 1 obsolete file)

The main intermittent bug is bug 891892.

After a lot of triggers I think we've nailed this down to a set of patches that landed last weekend:

3d4fa3ab3108 Nicholas Cameron – Bug 899435. Part 4 - smooth resizing with OMTC. r=roc
d232d5d67b92 Nicholas Cameron – Bug 901382. Don't fall back to basic OMTC. r=mattwoodrow
cc3ab8b0a16c Nicholas Cameron – Bug 902330. Initialise shmem texture clients properly. r=mattwoodrow
0428fd3a0d3a Nicholas Cameron – Bug 902330. Fix the SupportsAzureContent mess. r=mattwoodrow
8d8f82c911f6 Nicholas Cameron – Bug 901722. Part 2 - changes to generated files. r=bas
00d371b69457 Nicholas Cameron – Bug 901722. Implement component alpha for d3d11 compositor. r=Bas
198e81ab1bb9 Nicholas Cameron – Bug 901722. Tweek to d3d9 compositor blend modes. r=bas
7ffdc2d559d7 Nicholas Cameron – Bug 902329. Implement component alpha thebes layers with Azure. r=mattwoodrow
c0274b4c2499 Nicholas Cameron – Bug 901722. Fix up component alpha/sub-pixel AA enablement. r=mattwoodrow
04d8c8834642 Nicholas Cameron – Bug 901404. Support Azure with single buffered thebes layers. r=mattwoodrow
407e267aa4ed Nicholas Cameron – Bug 902528. Check we still have a render target in BeginFrame. r=Bas 

This push with one timeout:

https://tbpl.mozilla.org/?tree=Mozilla-Inbound&showall=1&jobname=winnt%206.2&rev=3d4fa3ab3108

Previous push, no timeouts:

https://tbpl.mozilla.org/?tree=Mozilla-Inbound&showall=1&jobname=winnt%206.2&rev=18236f0722de

Note, we should be getting stacks with these hangs, but automation is currently broken with the metro browser. I'm sorting this out in bug 905628. 

Cameron, any idea which patch potentially could have caused this?
Full listing for the time period:

https://tbpl.mozilla.org/?tree=Mozilla-Inbound&showall=1&jobname=winnt 6.2&fromchange=e177c3d6ffc2&tochange=782e74f1c4c6
On whim I pushed a patch to try with the componentalpha pref disabled. Test runs are a little backed up so results probably won't be available till tomorrow.
Hmm, it is pretty hard to say without a stack trace or warnings/assertions. Do we run metro chrome with debug modes? Or can it be repro'd locally? At a very rough guess I would say component alpha is the most likely reason for this to happen. But a lot of the others could possibly do it too. Hard to say without further info, sorry.
(In reply to Nick Cameron [:nrc] from comment #3)
> Hmm, it is pretty hard to say without a stack trace or warnings/assertions.
> Do we run metro chrome with debug modes? Or can it be repro'd locally? At a
> very rough guess I would say component alpha is the most likely reason for
> this to happen. But a lot of the others could possibly do it too. Hard to
> say without further info, sorry.

Stacks!

https://tbpl.mozilla.org/php/getParsedLog.php?id=26665941&tree=Try&full=1
Also, disabling those componentalpha patches seems to fix it - 

https://tbpl.mozilla.org/?tree=Try&showall=0&rev=ad42acc3e052
(In reply to Jim Mathies [:jimm] from comment #4)
> (In reply to Nick Cameron [:nrc] from comment #3)
> > Hmm, it is pretty hard to say without a stack trace or warnings/assertions.
> > Do we run metro chrome with debug modes? Or can it be repro'd locally? At a
> > very rough guess I would say component alpha is the most likely reason for
> > this to happen. But a lot of the others could possibly do it too. Hard to
> > say without further info, sorry.
> 
> Stacks!
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=26665941&tree=Try&full=1

Actually, I need to confirm this, I backed out some extended test timeouts with that push. Will post back after another try run.
Ok, confirmed I was catching the hang in question. Here's the try push and a hang stack - 

https://tbpl.mozilla.org/?tree=Try&showall=0&rev=198287b1324a

https://tbpl.mozilla.org/php/getParsedLog.php?id=26683736&tree=Try&full=1#error0
Summary: The metro browser randomly hangs while running mochitest-metro-chrome → The metro browser randomly hangs while running mochitest-metro-chrome in xul.dll!mozilla::layers::DeprecatedTextureClientD3D11::LockTexture()
> This push with one timeout:
> 
> https://tbpl.mozilla.org/?tree=Mozilla-Inbound&showall=1&jobname=winnt%206.
> 2&rev=3d4fa3ab3108

I haven't had much luck confirming these pushes were the cause. After a number of retriggers we only see the hang once.

> Also, disabling those componentalpha patches seems to fix it - 
> 
> https://tbpl.mozilla.org/?tree=Try&showall=0&rev=ad42acc3e052

This does seem to address it, in a try push of tip with this disabled w/ ~100 retriggers we haven't reproduced the hang. Comparing this to another push of tip w/ the automation fixes - 

https://tbpl.mozilla.org/?tree=Try&showall=0&rev=198287b1324a

In ~30 retriggers we see the hang three times.

It's also pretty clear that this problem has gotten worse over the last week.
I bet we are never returning from http://dxr.mozilla.org/mozilla-central/source/gfx/layers/d3d11/TextureD3D11.cpp#l228. I don't really have insight why the lock is not being released. Presumably the compositor has not released it, but I don't know why that should happen.

Is it usual for us to be painting from a thread other than the main thread? That doesn't happen on desktop, I believe.

Actually, we should not be using component alpha on Metro, that was an oversight when I landed the component alpha patches. So that should make this go away.

We should have a non-infinite timeout when we try to lock the texture and we handle failure more gracefully.

We should also try and figure out why this is happening in the first place, because I bet it will happen on desktop once we turn OMTC on.
Assignee: nobody → ncameron
Attachment #791787 - Flags: review?(jmathies)
With component alpha, we're generally trying to take ownership of locks on two separate textures.

If we're not guaranteeing which order we take these locks, then we have the potential to deadlock.

Parameter evaluation order is undefined, but it looks like msvc will be evaluating parameters right-to-left here: 
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ContentClient.cpp#475

That would mean we'd lock the on-white texture first, which is the reverse order to what we do in ContentHost::Composite.
CC'ing dvander, since he had a deadlock on win7+e10s that is probably a duplicate of this bug.
(In reply to Nick Cameron [:nrc] from comment #9)
> I bet we are never returning from
> http://dxr.mozilla.org/mozilla-central/source/gfx/layers/d3d11/TextureD3D11.
> cpp#l228. I don't really have insight why the lock is not being released.
> Presumably the compositor has not released it, but I don't know why that
> should happen.
> 
> Is it usual for us to be painting from a thread other than the main thread?
> That doesn't happen on desktop, I believe.

Yes that's normal. The main thread drops into winrt land early in startup, we get back a new thread we treat as main.

> We should also try and figure out why this is happening in the first place,
> because I bet it will happen on desktop once we turn OMTC on.

I think I've reproduced this using a test in the metro browser. I haven't had a chance to conform it yet though, will tomorrow.
Comment on attachment 791787 [details] [diff] [review]
don't use component alpha on Metro

Review of attachment 791787 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxPlatform.cpp
@@ +1908,1 @@
>      sComponentAlphaEnabled = Preferences::GetBool("layers.componentalpha.enabled", true);

Do we need this hardcoded? We could just set layers.componentalpha.enabled to false in metro.js.
Attached patch prefSplinter Review
This way we can still test with it when we want to.
Attachment #791831 - Flags: review?(ncameron)
Comment on attachment 791831 [details] [diff] [review]
pref

Review of attachment 791831 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, this is a better way of doing it.

::: browser/metro/profile/metro.js
@@ +34,5 @@
>  
>  // Enable off main thread compositing
>  pref("layers.offmainthreadcomposition.enabled", true);
>  pref("layers.async-pan-zoom.enabled", false);
> +// componentalpha disabled due to a threading issue (bug 906147)

nits: s/componentalpha/component alpha

We're disabling it because it should be disabled (component alpha looks horrible when we zoom without re-rendering which I believe we do on Metro - we do that on Android and b2g). It is a happy coincidence that it fixes this bug.
Attachment #791831 - Flags: review?(ncameron) → review+
Attachment #791787 - Attachment is obsolete: true
Attachment #791787 - Flags: review?(jmathies)
I just took the comment out, blame will get people back to this bug.

https://hg.mozilla.org/integration/mozilla-inbound/rev/2f7dbf83cad3
https://hg.mozilla.org/mozilla-central/rev/2f7dbf83cad3
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
Attachment #792511 - Flags: review?(ncameron)
Comment on attachment 792511 [details] [diff] [review]
Fix the actual bug

Review of attachment 792511 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comments

::: gfx/layers/client/ContentClient.cpp
@@ +471,5 @@
>   
>    AutoDeprecatedTextureClient autoTextureFront;
>    AutoDeprecatedTextureClient autoTextureFrontOnWhite;
>    if (SupportsAzureContent()) {
> +    DrawTarget* dt = autoTextureFront.GetDrawTarget(mFrontClient);

You should add some comments here to explain why we are doing this. Also probably somewhere obvious in the header file for ContentClientDoubleBuffered to explain that there can be this issue.
Attachment #792511 - Flags: review?(ncameron) → review+
OS: Windows 8 Metro → Windows 8.1
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: