Closed Bug 1131370 Opened 5 years ago Closed 5 years ago

crash in mozilla::layers::SyncObjectD3D11::FinalizeFrame()

Categories

(Core :: Graphics, defect, critical)

x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox36 --- unaffected
firefox37 + fixed
firefox38 + fixed
firefox39 --- fixed

People

(Reporter: dmajor, Assigned: bas.schouten)

References

Details

(Keywords: crash, Whiteboard: gfx-noted)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-516db8b3-ba27-4aee-a9dc-cf7ff2150206.
=============================================================

Many, but not all, say things like:
Failed to OpenSharedResource: 0x887a0005
or
Failed to OpenSharedResource: 0x80070057 

0 	xul.dll 	mozilla::layers::SyncObjectD3D11::FinalizeFrame() 	gfx/layers/d3d11/TextureD3D11.cpp
1 	xul.dll 	mozilla::layers::ClientLayerManager::ForwardTransaction(bool) 	gfx/layers/client/ClientLayerManager.cpp
2 	xul.dll 	mozilla::layers::ClientLayerManager::EndTransaction(void (*)(mozilla::layers::PaintedLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) 	gfx/layers/client/ClientLayerManager.cpp
3 	xul.dll 	nsDisplayList::PaintRoot(nsDisplayListBuilder*, nsRenderingContext*, unsigned int) 	layout/base/nsDisplayList.cpp
4 	xul.dll 	nsLayoutUtils::PaintFrame(nsRenderingContext*, nsIFrame*, nsRegion const&, unsigned int, unsigned int) 	layout/base/nsLayoutUtils.cpp
5 	xul.dll 	PresShell::Paint(nsView*, nsRegion const&, unsigned int) 	layout/base/nsPresShell.cpp
[Tracking Requested - why for this release]: #6 crash in recent auroras (aurorae?)
Is this more error-consolidation or a legitimate bug?
Flags: needinfo?(bas)
Whiteboard: gfx-noted
Tracking Aurora topcrash in 37+.
I got a crash (bp-f8e2ddce-627f-457e-afb9-f1dad2150217) with this signature shortly after updating gfx drivers, although FF remained useable for a minute or two before it crashed.
I feel this might be a genuine timeout that occurs more frequently during TDR (which would explain the messages we're seeing). I'd like to give raising the timeout a shot if only to see if the numbers go down, so that we know it's a genuine timeout or whether WAIT_TIMEOUT is just returned due to some other type of error state.
Assignee: nobody → bas
Status: NEW → ASSIGNED
Flags: needinfo?(bas)
Attachment #8567286 - Flags: review?(jmuizelaar)
Attachment #8567286 - Flags: review?(jmuizelaar) → review+
https://hg.mozilla.org/mozilla-central/rev/814042ff9c7d
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
We don't expect this to actually fix anything, it may just make the crash less likely.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Did it?  I see we have crashes from nightly builds after March 24th, but could somebody sort out the stats and see if those numbers changed post this landing? Also, do we want to uplift this timeout change either way?
Flags: needinfo?(dmajor)
Flags: needinfo?(bas)
Flags: needinfo?(bas)
Attachment #8571452 - Flags: review?(milan)
Comment on attachment 8571452 [details] [diff] [review]
Try to ignore transient errors and increase D3D11 timeout as well.

Review of attachment 8571452 [details] [diff] [review]:
-----------------------------------------------------------------

Nice.  We may eventually want to pull out the timeout value into a #define or a const.

::: gfx/layers/d3d11/TextureD3D11.cpp
@@ +873,5 @@
> +
> +    if (!dev) {
> +      if (gfxWindowsPlatform::GetPlatform()->DidRenderingDeviceReset()) {
> +        return;
> +      }

Should we just put MOZ_CRASH() here if we didn't return because of the reset, rather than dereference a nullptr below?  Same result, perhaps easier to show intent.
Attachment #8571452 - Flags: review?(milan) → review+
(In reply to Milan Sreckovic [:milan] from comment #9)
> Did it?  I see we have crashes from nightly builds after March 24th, but
> could somebody sort out the stats and see if those numbers changed post this
> landing? Also, do we want to uplift this timeout change either way?

Doesn't seem to have gone down in volume since the first patch landed.
Flags: needinfo?(dmajor)
https://hg.mozilla.org/mozilla-central/rev/23f0b767c77e
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: mozilla38 → mozilla39
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Anyone who understands how to analyze the data properly, is the volume of this crash any better with the patches we've added?
Flags: needinfo?(dmajor)
(In reply to Bas Schouten (:bas.schouten) from comment #14)
> Anyone who understands how to analyze the data properly, is the volume of
> this crash any better with the patches we've added?

Yes, this improved with the second landing at comment 13. The volume is now roughly 25-50% of the previous volume (there is some fluctuation).
Flags: needinfo?(dmajor)
(In reply to David Major [:dmajor] (UTC+13) from comment #15)
> (In reply to Bas Schouten (:bas.schouten) from comment #14)
> > Anyone who understands how to analyze the data properly, is the volume of
> > this crash any better with the patches we've added?
> 
> Yes, this improved with the second landing at comment 13. The volume is now
> roughly 25-50% of the previous volume (there is some fluctuation).

Do we feel the reduction is sufficient to untrack this? I believe the timeouts should be the only crashes we have left here. We could attempt to ignore them but there's a real risk that these users will just no crash but instead end up with a broken browser without us knowing about it.
Flags: needinfo?(lmandel)
We should try to gather some information about what's happening on the other side of the lock when this happens. i.e. Have a global that we set when we take the lock and unset otherwise. We can then print the status of this on the other side.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #17)
> We should try to gather some information about what's happening on the other
> side of the lock when this happens. i.e. Have a global that we set when we
> take the lock and unset otherwise. We can then print the status of this on
> the other side.

I've been looking at this, it's not easy, since both sides are in different processes (potentially) we'd need to somehow create an area of shmem to sort of store this information. I've played around with a patch but I have some doubts as to whether we want to introduce this sort of messiness.
Flags: needinfo?(jmuizelaar)
37 is marked as affected. Can we uplift what we have in order to improve the situation for 37?

In terms of tracking, if either the volume has been reduced significantly enough or there is nothing else to do in this bug, then yes, we should stop tracking. Should we rather resolve this bug and file a follow up for any additional work?
Flags: needinfo?(lmandel)
Flags: needinfo?(bugmail.mozilla)
Flags: needinfo?(bas)
I agree, we should close this bug and uplift the patches, and file a follow-up for any additional work. I'm marking 38 as affected since the second patch landed on 39 and was never uplifted. The first patch (which according to comment 15 didn't help) landed on 38.

Bas, can you request uplift to 38 and 37 for the second patch (and 37 for the first patch, if you think we might as well uplift that too)? I'll clone this bug to a new one that we can use for any remaining work.
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Flags: needinfo?(bugmail.mozilla)
Resolution: --- → FIXED
Attachment #8567286 - Flags: approval-mozilla-beta?
Attachment #8571452 - Flags: approval-mozilla-beta?
Attachment #8571452 - Flags: approval-mozilla-aurora?
Comment on attachment 8571452 [details] [diff] [review]
Try to ignore transient errors and increase D3D11 timeout as well.

From speaking with Jeff, the patches on this bug haven't really fixed the problem but have reduced the incidence of the crash associated with it. There is very little risk associated with the changes, which have now been on m-c for some time. We're going to take these changes in Beta 6. Beta+ Aurora+
Flags: needinfo?(jmuizelaar)
Attachment #8571452 - Flags: approval-mozilla-beta?
Attachment #8571452 - Flags: approval-mozilla-beta+
Attachment #8571452 - Flags: approval-mozilla-aurora?
Attachment #8571452 - Flags: approval-mozilla-aurora+
Attachment #8567286 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: needinfo?(bas)
You need to log in before you can comment on or make changes to this bug.