Crash in mozilla::layers::BasicLayerManager::PushGroupForLayer

VERIFIED FIXED in Firefox 48

Status

()

defect
--
critical
VERIFIED FIXED
3 years ago
3 years ago

People

(Reporter: mccr8, Assigned: jerry)

Tracking

(Blocks 1 bug, {crash})

Trunk
mozilla50
Unspecified
Windows 10
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox47 affected, firefox48 fixed, firefox49 fixed, firefox-esr45 affected, firefox50 verified)

Details

(Whiteboard: [gfx-noted], crash signature)

Attachments

(3 attachments, 2 obsolete attachments)

Reporter

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-95a626c7-a8c1-49e1-becb-fa7c82160526.
=============================================================

I see 3 crashes with this signature on 05-25 Nightly Windows builds. They all look like they are from a single install time, so normally I wouldn't file a bug, but it looks like this is an intentional crash Milan landed in bug 1213007, so maybe they'll be useful to somebody, and there's no bug on file yet.
Reporter

Comment 1

3 years ago
ni? Milan in case there's something useful he can get out of these couple of crashes.
Flags: needinfo?(milan)
Yorgos, you seem to be able to reproduce this crash - do you have the steps to reproduce?
Flags: needinfo?(milan)
Whiteboard: [gfx-noted]
This call:

RefPtr<DrawTarget> dt = aContext->GetDrawTarget()->CreateSimilarDrawTarget(surfRect.Size(), SurfaceFormat::B8G8R8A8);

at this line: https://dxr.mozilla.org/mozilla-central/source/gfx/layers/basic/BasicLayerManager.cpp#111

returns a null draw target.  For this crash: https://crash-stats.mozilla.com/report/index/796b2c1b-d6ac-4d49-9d40-dc5ff2160526 we fail on a reasonably sized 348x80, with D2D1.1 backend here:

mDC->CreateBitmap(D2DIntSize(aSize), nullptr, 0, props, (ID2D1Bitmap1**)getter_AddRefs(mBitmap));

here: https://dxr.mozilla.org/mozilla-central/source/gfx/2d/DrawTargetD2D1.cpp#1033

getting a D2DERR_RECREATE_TARGET error.

Comment 8

3 years ago
(In reply to Milan Sreckovic [:milan] from comment #5)
> Yorgos, you seem to be able to reproduce this crash - do you have the steps
> to reproduce?

yes,
scrolling a facebook photoalbum with many photos.
Bas, any thoughts?  Lots of images, "recreate target" error - leaking resources?
Flags: needinfo?(bas)

Comment 10

3 years ago
((anyway I should experience crashes at 2GB, or above, of Firefox ram, but at 900MB - 1GB perhaps the crash comes is too early; ok if I want 10 tabs with FB albums, that were an error in my behaviour; but I should be able to scroll a big FB photoalbum without issues when the total ram usage is 60% )) .
(In reply to Milan Sreckovic [:milan] from comment #9)
> Bas, any thoughts?  Lots of images, "recreate target" error - leaking
> resources?

That's a device reset error. So this is probably a place we're not guarding correctly for device resets occurring.
Assignee: nobody → bas
Status: NEW → ASSIGNED
Flags: needinfo?(bas)
So, there is not a lot of these crashes, but perhaps we can do something about it.  This particular crash will only show up in nightly and aurora, as it's a "gfxDevCrash".

And, yes, it looks to be a device reset related, as we have D2D and basic layer manager together.  For example: https://crash-stats.mozilla.com/report/index/21b37ed2-d8a8-4221-ba63-3873c2160602

[C0][GFX1]: [D2D1.1] 3CreateBitmap failure Size(1903,72) Code: 0x8899000c format 0 (t=38.5349) 
[C1][GFX1-]: Invalid target in gfxContext::ForDrawTarget 0000000000000000 (t=38.5349)
[C2][GFX1 28]: BasicLayerManager context problem 0000000000000000 (t=38.5349) 

The first one is from DrawTargetD2D1.cpp, the second/third from BasicLayerManager.cpp.  Something isn't getting reset somewhere.
Flags: needinfo?(hshih)
Flags: needinfo?(dvander)

Comment 13

3 years ago
I seem to be able to reproduce this literally every time I try to load this page, or any other page on the Overwatch subreddit. https://www.reddit.com/r/Overwatch/comments/451gn3/slicrossfire_support_has_been_added/

It doesn't happen on other subreddits.
Lily, could you give us the graphics section of about:support?  I'd like to see what kind of graphics setup you have.
Flags: needinfo?(thesweetlilycake)
(In reply to Milan Sreckovic [:milan] from comment #14)
> Lily, could you give us the graphics section of about:support?  I'd like to
> see what kind of graphics setup you have.

Alternatively, go to about:crashes and send us one of the crash links that is related to this crash.  The crash report, if submitted, will contain your graphics configuration.
A couple of crashes with the extra information all have DeviceResetReason::DRIVER_ERROR as the reset cause:
https://crash-stats.mozilla.com/report/index/5d3c9b99-8a02-4137-846f-594172160603
https://crash-stats.mozilla.com/report/index/48280ea4-691d-4212-a77b-df96c2160606

These have messages coming from SourceSurfaceD2D1::GetDataSurface().  Canvas?

Comment 17

3 years ago
Here you go:

Graphics
Features
Compositing	Direct3D 11
Asynchronous Pan/Zoom	wheel input enabled; touch input enabled
WebGL Renderer	Google Inc. -- ANGLE (AMD Radeon R9 200 / HD 7900 Series Direct3D11 vs_5_0 ps_5_0)
Hardware H264 Decoding	Yes; Using D3D11 API
Direct2D	true
DirectWrite	true (10.0.10586.0)
GPU #1
Active	Yes
Description	AMD Radeon R9 200 / HD 7900 Series
Vendor ID	0x1002
Device ID	0x6798
Driver Version	16.200.1025.0
Driver Date	6-2-2016
Drivers	aticfx64 aticfx64 aticfx64 amdxc64 aticfx32 aticfx32 aticfx32 amdxc32 atiumd64 atidxx64 atidxx64 atiumdag atidxx32 atidxx32 atiumdva atiumd6a atitmm64
Subsys ID	32111682
RAM	3072
Diagnostics
ClearType Parameters	D [ Gamma: 2200 Pixel Structure: R ClearType Level: 100 Enhanced Contrast: 0 ] D [ Gamma: 2200 Pixel Structure: R ClearType Level: 0 Enhanced Contrast: 0 ]
AzureCanvasAccelerated	0
AzureCanvasBackend	direct2d 1.1
AzureContentBackend	direct2d 1.1
AzureFallbackCanvasBackend	cairo
ClearType Parameters	D [ Gamma: 2200 Pixel Structure: R ClearType Level: 100 Enhanced Contrast: 0 ] D [ Gamma: 2200 Pixel Structure: R ClearType Level: 0 Enhanced Contrast: 0 ] 

These should be two crash reports: 
https://crash-stats.mozilla.com/report/index/ef0806e1-1f64-4340-a777-d886a2160606
https://crash-stats.mozilla.com/report/index/30641768-d8e6-4205-b5ea-659dd2160606

____

Oddly enough I cannot seem to reproduce this problem anymore today.
Flags: needinfo?(thesweetlilycake)
(In reply to Milan Sreckovic [:milan] from comment #12)
> So, there is not a lot of these crashes, but perhaps we can do something
> about it.  This particular crash will only show up in nightly and aurora, as
> it's a "gfxDevCrash".
> 
> And, yes, it looks to be a device reset related, as we have D2D and basic
> layer manager together.  For example:
> https://crash-stats.mozilla.com/report/index/21b37ed2-d8a8-4221-ba63-
> 3873c2160602
> 
> [C0][GFX1]: [D2D1.1] 3CreateBitmap failure Size(1903,72) Code: 0x8899000c
> format 0 (t=38.5349) 
> [C1][GFX1-]: Invalid target in gfxContext::ForDrawTarget 0000000000000000
> (t=38.5349)
> [C2][GFX1 28]: BasicLayerManager context problem 0000000000000000
> (t=38.5349) 
> 
> The first one is from DrawTargetD2D1.cpp, the second/third from
> BasicLayerManager.cpp.  Something isn't getting reset somewhere.

Never mind this.  I fell for the "basic in BasicCompositor means software, where basic in BasicLayerManager means simple".  This is a valid combination, it just means inactive layers.

Comment 19

3 years ago
Hi!

I seem to have the same issue, especially with the Overwatch subreddit.

My GPU is AMD Radeon R9 270X

This was happening with the 16.5.3 AMD GPU Driver and also with the 16.6.1 Driver that I installed yesterday.


Graphics
--------

Features
Compositing: Direct3D 11
Asynchronous Pan/Zoom: none
WebGL Renderer: Google Inc. -- ANGLE (AMD Radeon R9 200 Series Direct3D11 vs_5_0 ps_5_0)
Hardware H264 Decoding: Yes; Using D3D11 API
Direct2D: true
DirectWrite: true (10.0.10586.0)
GPU #1
Active: Yes
Description: AMD Radeon R9 200 Series
Vendor ID: 0x1002
Device ID: 0x6810
Driver Version: 16.200.1025.0
Driver Date: 6-2-2016
Drivers: aticfx64 aticfx64 aticfx64 amdxc64 aticfx32 aticfx32 aticfx32 amdxc32 atiumd64 atidxx64 atidxx64 atiumdag atidxx32 atidxx32 atiumdva atiumd6a atitmm64
Subsys ID: 22721458
RAM: 2048

Diagnostics
ClearType Parameters: Gamma: 2200 Pixel Structure: R ClearType Level: 100 Enhanced Contrast: 50
AzureCanvasAccelerated: 0
AzureCanvasBackend: direct2d 1.1
AzureContentBackend: direct2d 1.1
AzureFallbackCanvasBackend: cairo
ClearType Parameters: Gamma: 2200 Pixel Structure: R ClearType Level: 100 Enhanced Contrast: 50

Some crashes:

https://crash-stats.mozilla.com/report/index/bp-ccc54c8b-c789-4336-8c55-867b92160608
https://crash-stats.mozilla.com/report/index/bp-a4885a7e-f3f2-404f-bc40-d377a2160607
https://crash-stats.mozilla.com/report/index/bp-d4e794b3-071f-4bea-98d4-f62782160607
https://crash-stats.mozilla.com/report/index/bp-48280ea4-691d-4212-a77b-df96c2160606
Agree with comment #11, looks like we're not null checking something that could TDR.
Flags: needinfo?(dvander)
I'm checking this crash that is necessary or not.

At content side, I only see that we will reset the d2d device after the RecvCompositorUpdated() ipc message.
https://dxr.mozilla.org/mozilla-central/search?q=%22mD2D1Device%22&redirect=false

So it's weird that the DT creation is failed.
https://hg.mozilla.org/mozilla-central/annotate/5f95858f8ddf/gfx/layers/basic/BasicLayerManager.cpp#l110
Assignee: bas → hshih
Flags: needinfo?(hshih)
Finally, I understood the crash flow.

In crash report:
|[0][GFX1]: [D2D1.1] 3CreateBitmap failure Size(1903,863) Code: 0x8899000c format 0 (t=11649.8)
|[1][GFX1]: [D2D1.1] 3CreateBitmap failure Size(1903,863) Code: 0x8899000c format 0 (t=11649.8) |[2][GFX1-]: Invalid target in gfxContext::CreateOrNull 0000000000000000 (t=11649.8)
|[3][GFX1 28]: BasicLayerManager context problem 0000000000000000 (t=11649.8) 

The "CreateBitmap failure Size(1903,863) Code: 0x8899000c" message shows that the bitmap is unavailable because of driver-reset or other reason.

I will upload a patch to remove the gfxCriticalError and its related callers.

[1] From msdn:
D2DERR_RECREATE_TARGET 0x8899000C
A presentation error has occurred that may be recoverable. The caller needs to re-create the render target then attempt to render the frame again.
Comment on attachment 8767105 [details] [diff] [review]
P2: obtain the group creation result from PushGroupForLayer(). v1

Review of attachment 8767105 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicLayerManager.cpp
@@ +108,5 @@
>        RefPtr<DrawTarget> dt = aContext->GetDrawTarget()->CreateSimilarDrawTarget(surfRect.Size(), SurfaceFormat::B8G8R8A8);
>  
>        RefPtr<gfxContext> ctx =
>          gfxContext::CreateOrNull(dt, ToRect(rect).TopLeft());
>        if (!ctx) {

This nullptr of dt is due to the bitmap creation failed in DrawTargetD2D1::init().
Please reference the message in crash report for message "CreateBitmap failure Size(1903,863) Code: 0x8899000c".

So, change gfxDevCrash to gfxCriticalNote here

::: gfx/layers/basic/BasicLayers.h
@@ +153,5 @@
>  
> +  // Construct a PushedGroup for a specific layer.
> +  // Return false if it has some errors in PushGroupForLayer(). Then, the
> +  // "aGroupResult" is unavailable for future using.
> +  bool PushGroupForLayer(gfxContext* aContext, Layer* aLayerContext, const nsIntRegion& aRegion, PushedGroup& aGroupResult);

Since it might has driver-reset in this call, return a boolean status to obtain the failed.
Comment on attachment 8767106 [details] [diff] [review]
P3: update PushGroupForLayer() usage. v1

Review of attachment 8767106 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicLayerManager.cpp
@@ +901,5 @@
>  
>    if (is2D) {
>      if (needsGroup) {
> +      PushedGroup pushedGroup;
> +      if (PushGroupForLayer(aTarget, aLayer, aLayer->GetLocalVisibleRegion().ToUnknownRegion(), pushedGroup)) {

If there is an error in PushGropForLayer(), skip this painting.

::: gfx/layers/basic/BasicPaintedLayer.cpp
@@ +84,3 @@
>        if (needsGroup) {
> +        availableGroup =
> +            BasicManager()->PushGroupForLayer(aContext, this, toDraw, group);

If there is an error in PushGropForLayer(), skip this painting.

Comment 28

3 years ago
Mh, I've noticed that after updating my driver to the latest hotfix (Crimson Edition 16.6.2 Hotfix) I no longer seem to have this issue.
Attachment #8767104 - Flags: review?(bas) → review+
Comment on attachment 8767106 [details] [diff] [review]
P3: update PushGroupForLayer() usage. v1

Review of attachment 8767106 [details] [diff] [review]:
-----------------------------------------------------------------

This and pt2 should be a single changeset as they won't compile individually.
Attachment #8767106 - Flags: review?(bas) → review+
Comment on attachment 8767105 [details] [diff] [review]
P2: obtain the group creation result from PushGroupForLayer(). v1

Review of attachment 8767105 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicLayerManager.cpp
@@ +109,5 @@
>  
>        RefPtr<gfxContext> ctx =
>          gfxContext::CreateOrNull(dt, ToRect(rect).TopLeft());
>        if (!ctx) {
> +        gfxCriticalNote << "BasicLayerManager context problem " << gfx::hexa(dt);

Be a little more explicit here, i.e.: gfxContext::CreateOrNull failed or something along those lines.
Attachment #8767105 - Flags: review?(bas) → review+
Attachment #8767105 - Attachment is obsolete: true
Attachment #8767106 - Attachment is obsolete: true
Duplicate of this bug: 1283975

Comment 36

3 years ago
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/deb3ad46e030
show a log if there is no device for DrawTargetD2D1::Init(). r=bas
https://hg.mozilla.org/integration/mozilla-inbound/rev/018db940b995
obtain the group creation result from PushGroupForLayer(). r=bas
Keywords: checkin-needed
Comment on attachment 8767104 [details] [diff] [review]
P1: show a log if there is no device for DrawTargetD2D1::Init(). v1

Review of attachment 8767104 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/DrawTargetD2D1.cpp
@@ +1008,5 @@
>    HRESULT hr;
>  
>    ID2D1Device* device = Factory::GetD2D1Device();
>    if (!device) {
> +    gfxCriticalNote << "[D2D1.1] Failed to obtain a device for DrawTargetD2D1::Init().";

May not be important in this case, but it is convenient when the messages are different, so that we can tell where they come from.  This one matches the message in the other Init function, so we if we get it in the crash report, we won't know which version was hit.  However, in this case, we're reporting the device being null, so it probably doesn't matter which of the two we've hit.
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #35)
> There are a lot of BasicPaintedLayer::PaintThebes crash for 48.
> And I think the reason is the same as this bug. The context is nullptr
> because of driver-reset.

Yes, let's consider this for uplift.

Comment 39

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/deb3ad46e030
https://hg.mozilla.org/mozilla-central/rev/018db940b995
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Comment on attachment 8767827 [details] [diff] [review]
P2: obtain the group creation result from PushGroupForLayer(). v2. r=bas

Approval Request Comment
[Feature/regressing bug #]:
bug 1283975

[User impact if declined]:
crash as:
https://crash-stats.mozilla.org/signature/?signature=mozilla%3A%3Alayers%3A%3ABasicPaintedLayer%3A%3APaintThebes&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&_sort=&page=1

[Describe test coverage new/current, TreeHerder]:
try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6f1cb0418fea

already merged to m-c

[Risks and why]: 
If we have driver reset during the DrawTarget creation, we will de-reference a nullptr context in BasicPaintedLayer::PaintThebes().
https://hg.mozilla.org/releases/mozilla-beta/annotate/dd7af1fa4ece/gfx/layers/basic/BasicPaintedLayer.cpp#l90

I think the risk is low. This patch skip the rendering if we hit the driver reset. User might see the old content in that moment, but gecko will recover later.

[String/UUID change made/needed]:
none
Attachment #8767827 - Flags: approval-mozilla-beta?
And I don't know whether ff49 needs this or not.
Flags: needinfo?(milan)
update for comment 37.
just a simple update for error message.
Attachment #8768280 - Flags: review+
Attachment #8768280 - Attachment description: P3: update DrawTargetD2D1::Init() log. → P3: update DrawTargetD2D1::Init() log. r=me
Hi Jerry,
Yes, please also file another uplift request for 49.
Flags: needinfo?(milan)
As described in comment #43.
Flags: needinfo?(hshih)
Comment on attachment 8767827 [details] [diff] [review]
P2: obtain the group creation result from PushGroupForLayer(). v2. r=bas

Comment on attachment 8767827 [details] [diff] [review] [diff] [review]
P2: obtain the group creation result from PushGroupForLayer(). v2. r=bas

Approval Request Comment
[Feature/regressing bug #]:
bug 1276062

[User impact if declined]:
Gecko will crash at https://dxr.mozilla.org/mozilla-aurora/source/gfx/layers/basic/BasicLayerManager.cpp?q=BasicLayerManager%3A%3APushGroupForLayer&redirect_type=single#115
as bp-95a626c7-a8c1-49e1-becb-fa7c82160526

It is not the same as the crash in ff48:
https://crash-stats.mozilla.org/signature/?signature=mozilla%3A%3Alayers%3A%3ABasicPaintedLayer%3A%3APaintThebes&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&_sort=&page=1

We will have an assert with gfxDevCrash() for nightly and aurora, but not at beta and release.

[Describe test coverage new/current, TreeHerder]:
try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6f1cb0418fea

already merged to m-c

[Risks and why]: 
If we have driver reset during the DrawTarget creation, we will have an assert in BasicLayerManager::PushGroupForLayer()
https://dxr.mozilla.org/mozilla-aurora/source/gfx/layers/basic/BasicLayerManager.cpp?q=BasicLayerManager%3A%3APushGroupForLayer&redirect_type=single#115

I think the risk is low. This patch skip the rendering if we hit the driver reset. User might see the old content in that moment, but gecko will recover later.

[String/UUID change made/needed]:
none
Flags: needinfo?(hshih)
Attachment #8767827 - Flags: approval-mozilla-aurora?
Comment on attachment 8767827 [details] [diff] [review]
P2: obtain the group creation result from PushGroupForLayer(). v2. r=bas

Review of attachment 8767827 [details] [diff] [review]:
-----------------------------------------------------------------

After discussing with Jerry, the fix is small and low risk. Take it in 48 beta 6 and aurora.
Attachment #8767827 - Flags: approval-mozilla-beta?
Attachment #8767827 - Flags: approval-mozilla-beta+
Attachment #8767827 - Flags: approval-mozilla-aurora?
Attachment #8767827 - Flags: approval-mozilla-aurora+

Comment 50

3 years ago
Pushed by hshih@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d4203db17100
update DrawTargetD2D1::Init() log. r=hshih
Crash volume for signature 'mozilla::layers::BasicLayerManager::PushGroupForLayer':
 - nightly (version 50): 241 crashes from 2016-06-06.
 - aurora  (version 49): 96 crashes from 2016-06-07.
 - beta    (version 48): 5 crashes from 2016-06-06.
 - release (version 47): 5 crashes from 2016-05-31.
 - esr     (version 45): 1 crash from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          2          0          3         19        129         53         35
 - aurora           2          0         10         26         14         38          6
 - beta             2          0          1          0          0          2          0
 - release          1          4          0          0          0          0          0
 - esr              0          0          1          0          0          0          0

Affected platform: Windows
I believe we can safely mark this verified fixed on Fx50, based on the crash
stats available for the last 3 months.

  SIGNATURE   | mozilla::layers::BasicLayerManager::PushGroupForLayer
  -------------------------------------------------------------------
  CRASH STATS | http://tinyurl.com/zahv29k
  -------------------------------------------------------------------
  OVERVIEW    | 5 crashes on nightly 50
		0 crashes on aurora 50
		0 crashes on beta 50
Status: RESOLVED → VERIFIED

Updated

3 years ago
Blocks: 1297204
You need to log in before you can comment on or make changes to this bug.