All users were logged out of Bugzilla on October 13th, 2018

Crash in mozilla::layers::LockD3DTexture<T>

RESOLVED FIXED in Firefox 55

Status

()

--
critical
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: philipp, Assigned: vliu)

Tracking

({crash, regression})

unspecified
mozilla55
All
Windows
crash, regression
Points:
---

Firefox Tracking Flags

(firefox-esr52 wontfix, firefox53 wontfix, firefox54- wontfix, firefox55+ fixed)

Details

(crash signature)

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

2 years ago
[Tracking Requested - why for this release]:
this signature is the #6 browser crash on nightly currently and is regressing in volume in 55.0a1.

This bug was filed from the Socorro interface and is 
report bp-ec46cfdb-e252-4bd0-af22-0dc8b0170505.
=============================================================
Crashing Thread (3)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::layers::LockD3DTexture<ID3D11Texture2D> 	gfx/layers/d3d11/TextureD3D11.cpp:216
1 	xul.dll 	mozilla::layers::DXGITextureHostD3D11::LockInternal() 	gfx/layers/d3d11/TextureD3D11.cpp:819
2 	xul.dll 	mozilla::layers::ImageHost::Lock() 	gfx/layers/composite/ImageHost.cpp:403
3 	xul.dll 	mozilla::layers::AutoLockCompositableHost::AutoLockCompositableHost(mozilla::layers::CompositableHost*) 	obj-firefox/dist/include/CompositableHost.h:259
4 	xul.dll 	mozilla::layers::ImageHost::Composite(mozilla::layers::Compositor*, mozilla::layers::LayerComposite*, mozilla::layers::EffectChain&, float, mozilla::gfx::Matrix4x4Typed<mozilla::gfx::UnknownUnits, mozilla::gfx::UnknownUnits> const&, mozilla::gfx::SamplingFilter, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const*, mozilla::Maybe<mozilla::gfx::PolygonTyped<mozilla::gfx::UnknownUnits> > const&) 	gfx/layers/composite/ImageHost.cpp:217
5 	xul.dll 	<lambda_900b9b3618a2532715a35ddaee6fe7a4>::operator() 	gfx/layers/composite/ImageLayerComposite.cpp:105
6 	xul.dll 	mozilla::layers::RenderWithAllMasks<<lambda_900b9b3618a2532715a35ddaee6fe7a4> >(mozilla::layers::Layer*, mozilla::layers::Compositor*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, <lambda_900b9b3618a2532715a35ddaee6fe7a4>) 	obj-firefox/dist/include/mozilla/layers/LayerManagerComposite.h:753
7 	xul.dll 	mozilla::layers::ImageLayerComposite::RenderLayer(mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::Maybe<mozilla::gfx::PolygonTyped<mozilla::gfx::UnknownUnits> > const&) 	gfx/layers/composite/ImageLayerComposite.cpp:102
8 	xul.dll 	mozilla::layers::RenderLayers<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::RenderTargetPixel> const&, mozilla::Maybe<mozilla::gfx::PolygonTyped<mozilla::gfx::UnknownUnits> > const&) 	gfx/layers/composite/ContainerLayerComposite.cpp:471
9 	xul.dll 	mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::Maybe<mozilla::gfx::PolygonTyped<mozilla::gfx::UnknownUnits> > const&) 	gfx/layers/composite/ContainerLayerComposite.cpp:636
10 	xul.dll 	mozilla::layers::RenderLayers<mozilla::layers::RefLayerComposite>(mozilla::layers::RefLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::RenderTargetPixel> const&, mozilla::Maybe<mozilla::gfx::PolygonTyped<mozilla::gfx::UnknownUnits> > const&) 	gfx/layers/composite/ContainerLayerComposite.cpp:471
11 	xul.dll 	mozilla::layers::ContainerRender<mozilla::layers::RefLayerComposite>(mozilla::layers::RefLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::Maybe<mozilla::gfx::PolygonTyped<mozilla::gfx::UnknownUnits> > const&) 	gfx/layers/composite/ContainerLayerComposite.cpp:636
12 	xul.dll 	mozilla::layers::RenderLayers<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::RenderTargetPixel> const&, mozilla::Maybe<mozilla::gfx::PolygonTyped<mozilla::gfx::UnknownUnits> > const&) 	gfx/layers/composite/ContainerLayerComposite.cpp:471
13 	xul.dll 	mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite>(mozilla::layers::ContainerLayerComposite*, mozilla::layers::LayerManagerComposite*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::Maybe<mozilla::gfx::PolygonTyped<mozilla::gfx::UnknownUnits> > const&) 	gfx/layers/composite/ContainerLayerComposite.cpp:636
14 	xul.dll 	mozilla::layers::LayerManagerComposite::Render(mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::IntRegionTyped<mozilla::gfx::UnknownUnits> const&) 	gfx/layers/composite/LayerManagerComposite.cpp:928
15 	xul.dll 	mozilla::layers::LayerManagerComposite::UpdateAndRender() 	gfx/layers/composite/LayerManagerComposite.cpp:515
16 	xul.dll 	mozilla::layers::LayerManagerComposite::EndTransaction(mozilla::TimeStamp const&, mozilla::layers::LayerManager::EndTransactionFlags) 	gfx/layers/composite/LayerManagerComposite.cpp:436
17 	xul.dll 	mozilla::layers::CompositorBridgeParent::CompositeToTarget(mozilla::gfx::DrawTarget*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const*) 	gfx/layers/ipc/CompositorBridgeParent.cpp:1008
18 	xul.dll 	mozilla::layers::CompositorVsyncScheduler::Composite(mozilla::TimeStamp) 	gfx/layers/ipc/CompositorVsyncScheduler.cpp:258
19 	xul.dll 	mozilla::detail::RunnableMethodImpl<SoftwareDisplay* const, void ( SoftwareDisplay::*)(mozilla::TimeStamp), 1, 1, mozilla::TimeStamp>::Run() 	obj-firefox/dist/include/nsThreadUtils.h:909
20 	xul.dll 	MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) 	ipc/chromium/src/base/message_loop.cc:361
21 	xul.dll 	MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) 	ipc/chromium/src/base/message_loop.cc:369
22 	xul.dll 	MessageLoop::DoWork() 	ipc/chromium/src/base/message_loop.cc:444
23 	xul.dll 	base::MessagePumpForUI::DoRunLoop() 	ipc/chromium/src/base/message_pump_win.cc:212
24 	xul.dll 	base::MessagePumpWin::Run(base::MessagePump::Delegate*) 	ipc/chromium/src/base/message_pump_win.h:80
25 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:231
26 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:211
27 	xul.dll 	base::Thread::ThreadMain() 	ipc/chromium/src/base/thread.cc:179
28 	xul.dll 	`anonymous namespace'::ThreadFunc 	ipc/chromium/src/base/platform_thread_win.cc:28
29 	kernel32.dll 	BaseThreadInitThunk 	
30 	ntdll.dll 	RtlUserThreadStart

filing a new bug for the signature to get some traction on it. the signature has been around for a while but on nightly it is jumping up in frequency since build 20170323030203. this would be the changelog to the day before: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=201231223cd4354a450c3e5d80959f35b8e4cf0c&tochange=7513b3f42058e9bcf9950d4acf4647d4ad2240f0
out of that, perhaps bug 1343814 is related?

Correlations for Firefox Nightly
(22.05% in signature vs 00.17% overall) GFX_ERROR "(nsWindow) Detected device reset: " = true [100.0% vs 00.22% if startup_crash = 0]
(22.05% in signature vs 00.17% overall) GFX_ERROR "(nsWindow) Finished device reset." = true [100.0% vs 00.22% if startup_crash = 0]
(100.0% in signature vs 04.79% overall) address = 0x0
(100.0% in signature vs 12.06% overall) reason = EXCEPTION_ACCESS_VIOLATION_READ
(100.0% in signature vs 31.48% overall) moz_crash_reason = null
(100.0% in signature vs 36.63% overall) abort_message = null
(53.54% in signature vs 01.26% overall) Module "msvproc.dll" = true [69.88% vs 02.55% if platform_pretty_version = Windows 10]
(16.54% in signature vs 00.59% overall) GFX_ERROR "(gfxWindowsPlatform) Finished device reset." = true [75.00% vs 00.66% if startup_crash = 0]
(16.54% in signature vs 00.61% overall) GFX_ERROR "(gfxWindowsPlatform) Detected device reset: " = true [75.00% vs 00.69% if startup_crash = 0]
(89.76% in signature vs 00.70% overall) GFX_ERROR "GFX: D3D11 skip BeginFrame with device-removed." = true [100.0% vs 38.51% if process_type = gpu]
(Assignee)

Comment 1

2 years ago
Crash report showed that EXCEPTION_ACCESS_VIOLATION_READ happens in [1]. In mini dump, aTexture in [1] was nullptr so we crashed in accessing QueryInterface.

[1]: https://searchfox.org/mozilla-central/rev/8b70b0a5038ef9472fe7c53e04a70c978bb06aed/gfx/layers/d3d11/TextureD3D11.cpp#216

But more looked into it, device-removed happens many times before hitting the crash. Maybe that is why we don't have D3D11Texture in mTextureSource to lock it.

[G0][GFX1-]: GFX: D3D11 skip BeginFrame with device-removed.
225 crashes in the last week on nightly - that is a lot. 54 also seems to be affected.
Vincent, is this something you are going to work on further? Or should we try to find someone else?
status-firefox54: --- → affected
tracking-firefox55: ? → +
Flags: needinfo?(vliu)
(Assignee)

Comment 3

2 years ago
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #2)
> 225 crashes in the last week on nightly - that is a lot. 54 also seems to be
> affected.
> Vincent, is this something you are going to work on further? Or should we
> try to find someone else?

Currently I don't have any further thought to figure out the possible reason why this crash happens. Maybe David can have any idea with that.
Flags: needinfo?(vliu) → needinfo?(dvander)
No idea, unfortunately. Looking at crash-stats it's almost exclusively on Windows 8+10, which means GPU process isolation is kicking in. Is this crash correlated with device resets?
Flags: needinfo?(dvander)
(Assignee)

Comment 5

2 years ago
I found there was code we are missing checking which may need to have a patch to fix. When We have a calling in [1], it will call into [2] to construct DataTextureSourceD3D11. In this case, it seems that aProvider->GetD3D11Device() doesn't have any check to make sure whether it is valid or not. Is that missing relative to this issue?

[1]: https://searchfox.org/mozilla-central/rev/224cc663d54085994a4871ef464b7662e0721e83/gfx/layers/d3d11/TextureD3D11.cpp#828
[2]: https://searchfox.org/mozilla-central/rev/224cc663d54085994a4871ef464b7662e0721e83/gfx/layers/d3d11/TextureD3D11.cpp#196
Flags: needinfo?(dvander)
Probably not, though a correlation with device resets could make a better argument. Or you could MOZ_RELEASE_ASSERT that mProvider->IsValid is true.
Flags: needinfo?(dvander)
(Assignee)

Updated

a year ago
See Also: → bug 1363675
Assignee: nobody → vliu
(Assignee)

Comment 7

a year ago
Created attachment 8867681 [details] [diff] [review]
0001-Bug-1362366-Set-mTextureSource-to-nullptr-when-ID3D1.patch

Hi David,

From back trace in mini-dump, it might have chance to call [1] to NULL mTexture in mTextureSource[2]. If it did, Gecko might hit access violation in [3] when Lock D3DTexture. The above case might happens when Content device was changed[4].


[1]: http://searchfox.org/mozilla-central/rev/484d2b7f51b7aed035147bbb4a565061659d9278/gfx/layers/composite/ImageHost.cpp#213
[2]: http://searchfox.org/mozilla-central/rev/484d2b7f51b7aed035147bbb4a565061659d9278/gfx/layers/d3d11/TextureD3D11.cpp#1144
[3]: http://searchfox.org/mozilla-central/rev/484d2b7f51b7aed035147bbb4a565061659d9278/gfx/layers/d3d11/TextureD3D11.cpp#212
[4]: http://searchfox.org/mozilla-central/rev/484d2b7f51b7aed035147bbb4a565061659d9278/gfx/layers/d3d11/TextureD3D11.cpp#1174

To avoid this, the attached patch intends to NULL mTextureSource when device change was detected. After that, we may create a new mTextureSource when Lock D3DTexture is called. Could you please have a review? Really thanks.
Attachment #8867681 - Flags: review?(dvander)
Comment on attachment 8867681 [details] [diff] [review]
0001-Bug-1362366-Set-mTextureSource-to-nullptr-when-ID3D1.patch

Review of attachment 8867681 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/d3d11/TextureD3D11.cpp
@@ +766,5 @@
>      mTextureSource = nullptr;
>      return;
>    }
>  
>    mProvider = aProvider;

If mDevice is changed, I think you need to change mTextureSource in line 770, why do you want to cleanup everything in line 762?

@@ +1166,5 @@
>  }
>  
>  void
>  DataTextureSourceD3D11::SetTextureSourceProvider(TextureSourceProvider* aProvider)
>  {

I think it's better to add assertion if we detect mDevice changed.
There's a couple instances of this on ESR52 as well, but low volume enough that I think we can wontfix. Feel free to set the status back to affected if you want to consider this for backport there as well.
status-firefox53: --- → wontfix
status-firefox-esr52: --- → wontfix
tracking-firefox54: --- → ?
Comment on attachment 8867681 [details] [diff] [review]
0001-Bug-1362366-Set-mTextureSource-to-nullptr-when-ID3D1.patch

Okay, this looks good. I agree with Peter re: adding an assert in DataTextureSourceD3D11, but actually you could just remove the whole function and override. It doesn't do anything after your patch, and it should have no other callers.

Whichever you do, I would also add this before you set mTextureSource = nullptr:

> if (mTextureSource) {
>   mTextureSource->Reset();
> }

Just in case some layer or cache is retaining it.
Attachment #8867681 - Flags: review?(dvander) → review+
Track 54- as the volume of crashes in Beta54 is low.
tracking-firefox54: ? → -
(Assignee)

Comment 12

a year ago
(In reply to David Anderson [:dvander] from comment #10)
> Comment on attachment 8867681 [details] [diff] [review]
> 0001-Bug-1362366-Set-mTextureSource-to-nullptr-when-ID3D1.patch
> 
> Okay, this looks good. I agree with Peter re: adding an assert in
> DataTextureSourceD3D11, but actually you could just remove the whole
> function and override. It doesn't do anything after your patch, and it
> should have no other callers.
> 

May I clarify that the function I could just remove the whole function and override was [1] pointed to? 

[1]: http://searchfox.org/mozilla-central/source/gfx/layers/d3d11/TextureD3D11.cpp#1185-1199

I have two questions.

1. DXGIYCbCrTextureHostD3D11 also use this override. Since it called SetNextSibling() in [2], mNextSibling would be valid. 
   In this case, should we still keep this function for DXGIYCbCrTextureHostD3D11?

   [2]: http://searchfox.org/mozilla-central/source/gfx/layers/d3d11/TextureD3D11.cpp#968-969
  
2. if the override function [1] was removed, [3] would call into its base case (class TextSource). 
   The behavior would be like other TextureSouce (MacIOSurfaceTextureSourceBasic is a case). Based on this, does it have any concern for 
   DXGIYCbCrTextureHostD3D11 and DXGITextureHostD3D11.


      [3]: http://searchfox.org/mozilla-central/rev/79f625ac56d936ef5d17ebe13a123b25efee4241/gfx/layers/d3d11/TextureD3D11.cpp#1197


Please correct me if I got anything wrong. Really thanks.
Flags: needinfo?(dvander)
IMO, DXGIYCbCrTextureHostD3D11 and DXGITextureHostD3D11 are basically the same thing. Probably in both cases we should actually revoke the device on the sources, but just doing nothing is probably fine too (compositing should re-acquire the source every frame, and therefore get no TextureSource until the device reset is handled).

Does that answer your question?
Flags: needinfo?(dvander)
That is: the code for handling a different TextureSourceProvider should look the same for both.
(Assignee)

Comment 15

a year ago
Created attachment 8870741 [details] [diff] [review]
0001-Bug-1362366-Set-DataTextureSourceD3D11-to-nullptr-on.patch

(In reply to David Anderson [:dvander] from comment #13)
> IMO, DXGIYCbCrTextureHostD3D11 and DXGITextureHostD3D11 are basically the
> same thing. Probably in both cases we should actually revoke the device on
> the sources, but just doing nothing is probably fine too (compositing should
> re-acquire the source every frame, and therefore get no TextureSource until
> the device reset is handled).
> 
> Does that answer your question?

Thanks for your reply. Based on above, I drafted a patch to fix this issue. Could you please review again my patch to make sure the patch get the thing right. Really thanks
Attachment #8867681 - Attachment is obsolete: true
Attachment #8870741 - Flags: review?(dvander)
Attachment #8870741 - Flags: review?(dvander) → review+

Comment 17

a year ago
Pushed by vliu@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e22128ea91bf
Set DataTextureSourceD3D11 to nullptr once devce change was detected. r=dvander

Comment 18

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/e22128ea91bf
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox55: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Crash volume seems pretty low on 54. With the RC build happening on Monday, can we let this ride the 55 train at this point or did you want to request Beta approval on it? If yes, please do so ASAP :).
Flags: needinfo?(vliu)
(Assignee)

Comment 20

a year ago
(In reply to Ryan VanderMeulen [:RyanVM] from comment #19)
> Crash volume seems pretty low on 54. With the RC build happening on Monday,
> can we let this ride the 55 train at this point or did you want to request
> Beta approval on it? If yes, please do so ASAP :).

We just let this ride the 55 train and I won't request Beta approval since the Crash volume pretty low. Please have comment if anyone need this Beta approval.
Flags: needinfo?(vliu)
status-firefox54: affected → wontfix
You need to log in before you can comment on or make changes to this bug.