Closed Bug 950235 Opened 6 years ago Closed 6 years ago

Crashes at mozilla::layers::TextureParent::RecvInit() or mozilla::layers::TextureHost::Create(mozilla::layers::SurfaceDescriptor const&, mozilla::layers::ISurfaceAllocator*, unsigned int)

Categories

(Core :: Graphics: Layers, defect, critical)

29 Branch
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla30
Tracking Status
firefox28 --- unaffected
firefox29 + verified
firefox30 --- verified

People

(Reporter: smichaud, Assigned: nical)

References

Details

(Keywords: reproducible, topcrash-mac)

Crash Data

Attachments

(1 file)

Crash Signature: [@ mozilla::layers::TextureParent::RecvInit(mozilla::layers::SurfaceDescriptor const&, unsigned int const&) ]
Oops, sorry for the mis-classification.
Component: JavaScript Engine → Graphics: Layers
Here's the implied regression range:

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1ad9af3a2ab8&tochange=8b5875dc7e31

I'd guess the trigger is one of the patches for bug 897452.
Blocks: PTexture
I get this crash reproducibly (on 10.9) when I visit:
  http://www.ifixit.com/Teardown/Mac+Pro+Late+2013+Teardown/20778
Nical, these crashes all seem to be this assertion:

http://hg.mozilla.org/mozilla-central/annotate/8b5875dc7e31/gfx/layers/composite/TextureHost.cpp#l173

MOZ_CRASH("Couldn't create texture host");

They're hitting it on Mac; I can't reproduce on Linux. Do you have an idea of what needs to be done?
Flags: needinfo?(nical.bugzilla)
I'm not seeing this on 29.0a1 (2013-12-31) on 10.9.1.
I can reproduce this on my 2010-era MacBook Pro, but not my 2013 one.  Both are on 10.9.1.  One difference between the two is that I've disabled hardware acceleration on my 2010 one (I unchecked "Use hardware acceleration when available" because I've been getting GPU panics.)
That sort of makes sense, then. It looks like the switch statement that has this assertion needs to be updated to account for the kind of TextureHost involved when disabling GL layers on Mac.
When I re-enable hardware acceleration on my 2010 MBP and then restart the browser, I don't get the crash any more on the URL in comment 3.  Likewise, when I disable hardware acceleration on my 2013 MBP, I get the crash on that URL
Is basic layers + OMTC a supported config (or somewhat semi-supported) my understanding is that at the moment basic-OMTC is not shippable quality yet or wasn't not too long ago. We had the same problem on windows very recently and the fix was to make sure we don't get OMTC+basic (by disabling OMTC when layers acceleration is off iirc) on windows.
Flags: needinfo?(nical.bugzilla)
(In reply to Nicolas Silva [:nical] from comment #9)
> Is basic layers + OMTC a supported config (or somewhat semi-supported) my
> understanding is that at the moment basic-OMTC is not shippable quality yet
> or wasn't not too long ago. We had the same problem on windows very recently
> and the fix was to make sure we don't get OMTC+basic (by disabling OMTC when
> layers acceleration is off iirc) on windows.

You are right, OMTC/basic is not shipping yet and should not get turned on anywhere unless it is overridden with a pref or we are using e10s.
Another page that causes this crash is: http://www.nytimes.com/redesign/
Mine's running macosx 10.8.5, so Mavericks isn't necessary. I have hardware acceleration turned off, due to a bug in rendering BMC Remedy pages (which is currently a problem with or without hardware acceleration, but BMC's support suggested turning it off which helped on a previous version). http://cloud.feedly.com/#latest reliably crashes (presumably with my login cookies needed).
Summary: Crashes at mozilla::layers::TextureParent::RecvInit() on OS X 10.9 → Crashes at mozilla::layers::TextureParent::RecvInit() on OS X with hardware acceleration disabled
Hey, I would really like this bug fixed too, but here is likely why it hasn't been taken care of yet: the two 'core' people who worked on the PTexture patches which caused this bug are :nical and me, and neither of us have a Mac dev environment ready. This bug could likely get fixed very fast if someone with a Mac worked on it with input from :nical. Milan, I believe that it would be worth finding someone to take care of this.
Flags: needinfo?(milan)
Sold! to Jeff, and we'll all blame Benoit if it turns out to be a complicated mess.
Assignee: nobody → jmuizelaar
Flags: needinfo?(milan)
Also reproducible just by opening flickr.com - BOOM!
Crash Signature: [@ mozilla::layers::TextureParent::RecvInit(mozilla::layers::SurfaceDescriptor const&, unsigned int const&) ] → [@ mozilla::layers::TextureParent::RecvInit(mozilla::layers::SurfaceDescriptor const&, unsigned int const&) ] [@ mozilla::layers::TextureParent::Init(mozilla::layers::SurfaceDescriptor const&, unsigned int const&) ]
Jeff, this is caused by our assumption that when we get to create a TextureHost, we have already created a Compositor. Creating the compositor is what sets the sBackend global on the compositor side. Somehow we run into the case where ImageBridge is setup and starts sending frames before we created a compositor.
My suggestion is that in TextureParent::RecvInit we just select the backend depending on the SurfaceDescriptor type. This used to not be possible when each backend had a shmem texture, but now this is better decoupled.
Except in the case of SurfaceDescriptorD3D10 for which both D3D9 and D3D11 will have an implementation after Matt lands one of his patches. To handle that case we can probably just check gfxPlatform::GetPrefLayersPreferD3D9.
Assignee: jmuizelaar → nical.bugzilla
This should solve the problem in most cases, although MacIOSurface is getting in the way because it has both a basic TextureHost and a GL TextureHost.

I'll look into that. Either we have only one TextureHost for the two cases and move the backend specific bits into the TextureSource as we did for shmem, or we create have a separate descriptor type for the two cases (provided we know on the client side which backend we are using on the compositor side)
At the moment I prefer the first solution.
Attachment #8368641 - Flags: review?(matt.woodrow)
Comment on attachment 8368641 [details] [diff] [review]
Pick the backend in function of the descriptor type

Review of attachment 8368641 [details] [diff] [review]:
-----------------------------------------------------------------

I don't particularly like this. As you said, there are already multiple surface descriptor types that have more than one possible TextureHost. I suspect this number is going to grow too, for things like SurfaceStream.

I really think we need to catch this on the client side, setting up IPDL objects when they'll never be used is a waste.
Attachment #8368641 - Flags: review?(matt.woodrow) → review-
I share your concern, although with ImageBridge being independent of the main thread, it's hard to know if a Compositor has already been created. I'll try to think of a way. There should not be descriptor types with several TextureHosts though. The separation between TextureHost and TextureSource lets us decouple the TextureHost and the backend when the descriptor type is usable on several backends.
Matt, considering that this is a top crasher and already made it into aurora, could we revisit the review decision?
I don't think this can be efficiently caught on the client side, because ImageBridge is by design independent of the main thread so we can't efficiently get it to wait for the first layer transaction to have been complete before sending messages.
The IPDL object we are sending before we have a compositor here is most likely to end up on the screen because it arrives just before the creation of the Compositor, so it will be in the next composition. If we are creating a video element and the layer tree at the same time there isn't much we can do to prevent ImageBridge to be faster than main thread transaction, and actually we kind want it to be, otherwise we may composite the first screen before we get a video frame which is not correct.
Since we do have the information about the backend in the surface descriptor type 90% of the time there is no harm in using it, even if we may need the Compositor::GetBackend() to disambiguate some edge cases that should not be there (I still think we should have a 1-1 relation between TextureHost and SurfaceDescriptor).
Flags: needinfo?(matt.woodrow)
Ok, in the interest of fixing a topcrasher on aurora I think we can probably take this.

So we're making the assertion that TextureHosts cannot rely on anything backend specific. While this patch fixes this for now (or at least the case that we hit for now), it still seems rather non-obvious and fragile.

What I think we should do then, is make TextureHosts entirely independent of backend code. That would involve moving them all out of opengl/, d3d11/ etc (and dropping the backend specific suffixes). Then we get a very clear division of responsibilities. TextureHost only manages the liftetime of a shared object, and has nothing backend dependent. When we first render one, we lazily allocate the correct TextureSource for the Compositor type we have.

I guess we could have a problem where we require the compositor in order to do resource deallocation (needing a device/gl context etc). I think we already have this problem though, if we start async video but don't have OMTC enabled (or close the window before the compositor is initialised). This change just make the issue much more visible (which seems like a good thing).
Flags: needinfo?(matt.woodrow)
Comment on attachment 8368641 [details] [diff] [review]
Pick the backend in function of the descriptor type

Review of attachment 8368641 [details] [diff] [review]:
-----------------------------------------------------------------

I don't particularly like this. As you said, there are already multiple surface descriptor types that have more than one possible TextureHost. I suspect this number is going to grow too, for things like SurfaceStream.

I really think we need to catch this on the client side, setting up IPDL objects when they'll never be used is a waste.
Attachment #8368641 - Flags: review- → review+
(In reply to Matt Woodrow (:mattwoodrow) from comment #23)
> Ok, in the interest of fixing a topcrasher on aurora I think we can probably
> take this.

Thanks!

> 
> So we're making the assertion that TextureHosts cannot rely on anything
> backend specific. While this patch fixes this for now (or at least the case
> that we hit for now), it still seems rather non-obvious and fragile.

Most of the surface descriptor types imply a backend, and for shmem and memory which are backend-agnostic, you don't need a compositor right away (you will need one to composite eventually). So the information that is encoded in the descriptor type is something we can use.

> 
> What I think we should do then, is make TextureHosts entirely independent of
> backend code.

Well, i'd say make them all dependent on the shared memory type: ShmemTextureClient and MemoryTextureClient are backend-agostic which is great since all or most backends need them at least as a fallback.

> That would involve moving them all out of opengl/, d3d11/ etc
> (and dropping the backend specific suffixes).

I like having the backend suffix because "SharedTextureHost" could be seen as a D3D or GL thing by the name whereas with "SharedTextureHostOGL" you get what you expect.

> Then we get a very clear
> division of responsibilities. TextureHost only manages the liftetime of a
> shared object, and has nothing backend dependent.

Yeah, the backend specific stuff for drawing really needs to go into TextureSource (it's tru for most existing TextureClients). There can be a bit of backend specificities in TextureHost when needed, for instance when using D3D APIs to lock the texture's memory, but the rest should be in the TextureSource.

> When we first render one,
> we lazily allocate the correct TextureSource for the Compositor type we have.

In some cases like shmem, creating the TextureSource means doing the texture upload so we may want to control doing it in the transaction to avoid double buffering

> 
> I guess we could have a problem where we require the compositor in order to
> do resource deallocation (needing a device/gl context etc). I think we
> already have this problem though, if we start async video but don't have
> OMTC enabled (or close the window before the compositor is initialised).
> This change just make the issue much more visible (which seems like a good
> thing).

I don't know if it's doable to destroy a window before it gets its first layer transaction. The code that kicks the transaction is the same that caused the first frame to be sent through async-video, the latter was just faster at getting to the compositor side and I don't think we have anything that will interrupt the former.
TextureHosts often need access to platform headers (gl.h, d3d.h, gfxWindowsPlatform.h etc), but they should never need access to our backend headers (TextureHostD3D11.h, CompositorD3D11.h etc).

That's why I'd prefer to have them moved outside of the platform specific folders, because we don't export the headers, and it guarantees we don't break that.

As for the naming, I think your example still needs GL in the name, because the underlying object is a GL texture. Not because it has anything to do with our CompositorOGL code. But, for example, MacIOSurfaceTextureHostOGL, could drop the suffix and move out of opengl/.
I can also reproduce by setting:
  user_pref("layers.acceleration.disabled", true);
and loading:
  layout/reftests/ogg-video/black100x100-aspect3to2.ogv
(on mozilla-central)
Backed out because bug 947045 was backed out and it appeared that these landings were intertwined a bit.
https://hg.mozilla.org/integration/mozilla-inbound/rev/a150cbbc959b
https://hg.mozilla.org/mozilla-central/rev/61c1351849d8
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Comment on attachment 8368641 [details] [diff] [review]
Pick the backend in function of the descriptor type

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: crashes
Testing completed (on m-c, etc.): a few days
Risk to taking this patch (and alternatives if risky): low risk
String or IDL/UUID changes made by this patch:
Attachment #8368641 - Flags: approval-mozilla-aurora?
Attachment #8368641 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Crash Signature: [@ mozilla::layers::TextureParent::RecvInit(mozilla::layers::SurfaceDescriptor const&, unsigned int const&) ] [@ mozilla::layers::TextureParent::Init(mozilla::layers::SurfaceDescriptor const&, unsigned int const&) ] → [@ mozilla::layers::TextureParent::RecvInit(mozilla::layers::SurfaceDescriptor const&, unsigned int const&) ] [@ mozilla::layers::TextureParent::Init(mozilla::layers::SurfaceDescriptor const&, unsigned int const&) ] [@ mozilla::layers::TextureHost::Crea…
OS: Mac OS X → All
Summary: Crashes at mozilla::layers::TextureParent::RecvInit() on OS X with hardware acceleration disabled → Crashes at mozilla::layers::TextureParent::RecvInit() or mozilla::layers::TextureHost::Create(mozilla::layers::SurfaceDescriptor const&, mozilla::layers::ISurfaceAllocator*, unsigned int)
Reproduced in nightly 2014-02-03, Mac OS X 10.8.5, HWA disabled, by opening:
http://www.ifixit.com/Teardown/Mac+Pro+Late+2013+Teardown/20778
http://www.nytimes.com/redesign/
http://www.visagetechnologies.com/HTML5/Samples/VirtualEyewearTryOn/eyewear.html
http://www.flickr.com/

Verified fixed in FF 29.0a2(2014-02-11), 30.0a1(2014-02-11), Mac OS X 10.8.5.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.