Closed Bug 1276062 Opened 8 years ago Closed 8 years ago

Crash in mozilla::layers::BasicLayerManager::PushGroupForLayer

Categories

(Core :: Graphics, defect)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla50
Tracking Status
firefox47 --- affected
firefox48 --- fixed
firefox49 --- fixed
firefox-esr45 --- affected
firefox50 --- verified

People

(Reporter: mccr8, Assigned: jerry)

References

(Blocks 1 open bug)

Details

(Keywords: crash, Whiteboard: [gfx-noted])

Crash Data

Attachments

(3 files, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-95a626c7-a8c1-49e1-becb-fa7c82160526.
=============================================================

I see 3 crashes with this signature on 05-25 Nightly Windows builds. They all look like they are from a single install time, so normally I wouldn't file a bug, but it looks like this is an intentional crash Milan landed in bug 1213007, so maybe they'll be useful to somebody, and there's no bug on file yet.
ni? Milan in case there's something useful he can get out of these couple of crashes.
Flags: needinfo?(milan)
Yorgos, you seem to be able to reproduce this crash - do you have the steps to reproduce?
Flags: needinfo?(milan)
Whiteboard: [gfx-noted]
This call:

RefPtr<DrawTarget> dt = aContext->GetDrawTarget()->CreateSimilarDrawTarget(surfRect.Size(), SurfaceFormat::B8G8R8A8);

at this line: https://dxr.mozilla.org/mozilla-central/source/gfx/layers/basic/BasicLayerManager.cpp#111

returns a null draw target.  For this crash: https://crash-stats.mozilla.com/report/index/796b2c1b-d6ac-4d49-9d40-dc5ff2160526 we fail on a reasonably sized 348x80, with D2D1.1 backend here:

mDC->CreateBitmap(D2DIntSize(aSize), nullptr, 0, props, (ID2D1Bitmap1**)getter_AddRefs(mBitmap));

here: https://dxr.mozilla.org/mozilla-central/source/gfx/2d/DrawTargetD2D1.cpp#1033

getting a D2DERR_RECREATE_TARGET error.
(In reply to Milan Sreckovic [:milan] from comment #5)
> Yorgos, you seem to be able to reproduce this crash - do you have the steps
> to reproduce?

yes,
scrolling a facebook photoalbum with many photos.
Bas, any thoughts?  Lots of images, "recreate target" error - leaking resources?
Flags: needinfo?(bas)
((anyway I should experience crashes at 2GB, or above, of Firefox ram, but at 900MB - 1GB perhaps the crash comes is too early; ok if I want 10 tabs with FB albums, that were an error in my behaviour; but I should be able to scroll a big FB photoalbum without issues when the total ram usage is 60% )) .
(In reply to Milan Sreckovic [:milan] from comment #9)
> Bas, any thoughts?  Lots of images, "recreate target" error - leaking
> resources?

That's a device reset error. So this is probably a place we're not guarding correctly for device resets occurring.
Assignee: nobody → bas
Status: NEW → ASSIGNED
Flags: needinfo?(bas)
So, there is not a lot of these crashes, but perhaps we can do something about it.  This particular crash will only show up in nightly and aurora, as it's a "gfxDevCrash".

And, yes, it looks to be a device reset related, as we have D2D and basic layer manager together.  For example: https://crash-stats.mozilla.com/report/index/21b37ed2-d8a8-4221-ba63-3873c2160602

[C0][GFX1]: [D2D1.1] 3CreateBitmap failure Size(1903,72) Code: 0x8899000c format 0 (t=38.5349) 
[C1][GFX1-]: Invalid target in gfxContext::ForDrawTarget 0000000000000000 (t=38.5349)
[C2][GFX1 28]: BasicLayerManager context problem 0000000000000000 (t=38.5349) 

The first one is from DrawTargetD2D1.cpp, the second/third from BasicLayerManager.cpp.  Something isn't getting reset somewhere.
Flags: needinfo?(hshih)
Flags: needinfo?(dvander)
I seem to be able to reproduce this literally every time I try to load this page, or any other page on the Overwatch subreddit. https://www.reddit.com/r/Overwatch/comments/451gn3/slicrossfire_support_has_been_added/

It doesn't happen on other subreddits.
Lily, could you give us the graphics section of about:support?  I'd like to see what kind of graphics setup you have.
Flags: needinfo?(thesweetlilycake)
(In reply to Milan Sreckovic [:milan] from comment #14)
> Lily, could you give us the graphics section of about:support?  I'd like to
> see what kind of graphics setup you have.

Alternatively, go to about:crashes and send us one of the crash links that is related to this crash.  The crash report, if submitted, will contain your graphics configuration.
A couple of crashes with the extra information all have DeviceResetReason::DRIVER_ERROR as the reset cause:
https://crash-stats.mozilla.com/report/index/5d3c9b99-8a02-4137-846f-594172160603
https://crash-stats.mozilla.com/report/index/48280ea4-691d-4212-a77b-df96c2160606

These have messages coming from SourceSurfaceD2D1::GetDataSurface().  Canvas?
Here you go:

Graphics
Features
Compositing	Direct3D 11
Asynchronous Pan/Zoom	wheel input enabled; touch input enabled
WebGL Renderer	Google Inc. -- ANGLE (AMD Radeon R9 200 / HD 7900 Series Direct3D11 vs_5_0 ps_5_0)
Hardware H264 Decoding	Yes; Using D3D11 API
Direct2D	true
DirectWrite	true (10.0.10586.0)
GPU #1
Active	Yes
Description	AMD Radeon R9 200 / HD 7900 Series
Vendor ID	0x1002
Device ID	0x6798
Driver Version	16.200.1025.0
Driver Date	6-2-2016
Drivers	aticfx64 aticfx64 aticfx64 amdxc64 aticfx32 aticfx32 aticfx32 amdxc32 atiumd64 atidxx64 atidxx64 atiumdag atidxx32 atidxx32 atiumdva atiumd6a atitmm64
Subsys ID	32111682
RAM	3072
Diagnostics
ClearType Parameters	D [ Gamma: 2200 Pixel Structure: R ClearType Level: 100 Enhanced Contrast: 0 ] D [ Gamma: 2200 Pixel Structure: R ClearType Level: 0 Enhanced Contrast: 0 ]
AzureCanvasAccelerated	0
AzureCanvasBackend	direct2d 1.1
AzureContentBackend	direct2d 1.1
AzureFallbackCanvasBackend	cairo
ClearType Parameters	D [ Gamma: 2200 Pixel Structure: R ClearType Level: 100 Enhanced Contrast: 0 ] D [ Gamma: 2200 Pixel Structure: R ClearType Level: 0 Enhanced Contrast: 0 ] 

These should be two crash reports: 
https://crash-stats.mozilla.com/report/index/ef0806e1-1f64-4340-a777-d886a2160606
https://crash-stats.mozilla.com/report/index/30641768-d8e6-4205-b5ea-659dd2160606

____

Oddly enough I cannot seem to reproduce this problem anymore today.
Flags: needinfo?(thesweetlilycake)
(In reply to Milan Sreckovic [:milan] from comment #12)
> So, there is not a lot of these crashes, but perhaps we can do something
> about it.  This particular crash will only show up in nightly and aurora, as
> it's a "gfxDevCrash".
> 
> And, yes, it looks to be a device reset related, as we have D2D and basic
> layer manager together.  For example:
> https://crash-stats.mozilla.com/report/index/21b37ed2-d8a8-4221-ba63-
> 3873c2160602
> 
> [C0][GFX1]: [D2D1.1] 3CreateBitmap failure Size(1903,72) Code: 0x8899000c
> format 0 (t=38.5349) 
> [C1][GFX1-]: Invalid target in gfxContext::ForDrawTarget 0000000000000000
> (t=38.5349)
> [C2][GFX1 28]: BasicLayerManager context problem 0000000000000000
> (t=38.5349) 
> 
> The first one is from DrawTargetD2D1.cpp, the second/third from
> BasicLayerManager.cpp.  Something isn't getting reset somewhere.

Never mind this.  I fell for the "basic in BasicCompositor means software, where basic in BasicLayerManager means simple".  This is a valid combination, it just means inactive layers.
Hi!

I seem to have the same issue, especially with the Overwatch subreddit.

My GPU is AMD Radeon R9 270X

This was happening with the 16.5.3 AMD GPU Driver and also with the 16.6.1 Driver that I installed yesterday.


Graphics
--------

Features
Compositing: Direct3D 11
Asynchronous Pan/Zoom: none
WebGL Renderer: Google Inc. -- ANGLE (AMD Radeon R9 200 Series Direct3D11 vs_5_0 ps_5_0)
Hardware H264 Decoding: Yes; Using D3D11 API
Direct2D: true
DirectWrite: true (10.0.10586.0)
GPU #1
Active: Yes
Description: AMD Radeon R9 200 Series
Vendor ID: 0x1002
Device ID: 0x6810
Driver Version: 16.200.1025.0
Driver Date: 6-2-2016
Drivers: aticfx64 aticfx64 aticfx64 amdxc64 aticfx32 aticfx32 aticfx32 amdxc32 atiumd64 atidxx64 atidxx64 atiumdag atidxx32 atidxx32 atiumdva atiumd6a atitmm64
Subsys ID: 22721458
RAM: 2048

Diagnostics
ClearType Parameters: Gamma: 2200 Pixel Structure: R ClearType Level: 100 Enhanced Contrast: 50
AzureCanvasAccelerated: 0
AzureCanvasBackend: direct2d 1.1
AzureContentBackend: direct2d 1.1
AzureFallbackCanvasBackend: cairo
ClearType Parameters: Gamma: 2200 Pixel Structure: R ClearType Level: 100 Enhanced Contrast: 50

Some crashes:

https://crash-stats.mozilla.com/report/index/bp-ccc54c8b-c789-4336-8c55-867b92160608
https://crash-stats.mozilla.com/report/index/bp-a4885a7e-f3f2-404f-bc40-d377a2160607
https://crash-stats.mozilla.com/report/index/bp-d4e794b3-071f-4bea-98d4-f62782160607
https://crash-stats.mozilla.com/report/index/bp-48280ea4-691d-4212-a77b-df96c2160606
Agree with comment #11, looks like we're not null checking something that could TDR.
Flags: needinfo?(dvander)
I'm checking this crash that is necessary or not.

At content side, I only see that we will reset the d2d device after the RecvCompositorUpdated() ipc message.
https://dxr.mozilla.org/mozilla-central/search?q=%22mD2D1Device%22&redirect=false

So it's weird that the DT creation is failed.
https://hg.mozilla.org/mozilla-central/annotate/5f95858f8ddf/gfx/layers/basic/BasicLayerManager.cpp#l110
Assignee: bas → hshih
Flags: needinfo?(hshih)
Finally, I understood the crash flow.

In crash report:
|[0][GFX1]: [D2D1.1] 3CreateBitmap failure Size(1903,863) Code: 0x8899000c format 0 (t=11649.8)
|[1][GFX1]: [D2D1.1] 3CreateBitmap failure Size(1903,863) Code: 0x8899000c format 0 (t=11649.8) |[2][GFX1-]: Invalid target in gfxContext::CreateOrNull 0000000000000000 (t=11649.8)
|[3][GFX1 28]: BasicLayerManager context problem 0000000000000000 (t=11649.8) 

The "CreateBitmap failure Size(1903,863) Code: 0x8899000c" message shows that the bitmap is unavailable because of driver-reset or other reason.

I will upload a patch to remove the gfxCriticalError and its related callers.

[1] From msdn:
D2DERR_RECREATE_TARGET 0x8899000C
A presentation error has occurred that may be recoverable. The caller needs to re-create the render target then attempt to render the frame again.
Attachment #8767104 - Flags: review?(bas)
Attachment #8767105 - Flags: review?(bas)
Attached patch P3: update PushGroupForLayer() usage. v1 (obsolete) β€” β€” Splinter Review
Attachment #8767106 - Flags: review?(bas)
Comment on attachment 8767105 [details] [diff] [review]
P2: obtain the group creation result from PushGroupForLayer(). v1

Review of attachment 8767105 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicLayerManager.cpp
@@ +108,5 @@
>        RefPtr<DrawTarget> dt = aContext->GetDrawTarget()->CreateSimilarDrawTarget(surfRect.Size(), SurfaceFormat::B8G8R8A8);
>  
>        RefPtr<gfxContext> ctx =
>          gfxContext::CreateOrNull(dt, ToRect(rect).TopLeft());
>        if (!ctx) {

This nullptr of dt is due to the bitmap creation failed in DrawTargetD2D1::init().
Please reference the message in crash report for message "CreateBitmap failure Size(1903,863) Code: 0x8899000c".

So, change gfxDevCrash to gfxCriticalNote here

::: gfx/layers/basic/BasicLayers.h
@@ +153,5 @@
>  
> +  // Construct a PushedGroup for a specific layer.
> +  // Return false if it has some errors in PushGroupForLayer(). Then, the
> +  // "aGroupResult" is unavailable for future using.
> +  bool PushGroupForLayer(gfxContext* aContext, Layer* aLayerContext, const nsIntRegion& aRegion, PushedGroup& aGroupResult);

Since it might has driver-reset in this call, return a boolean status to obtain the failed.
Comment on attachment 8767106 [details] [diff] [review]
P3: update PushGroupForLayer() usage. v1

Review of attachment 8767106 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicLayerManager.cpp
@@ +901,5 @@
>  
>    if (is2D) {
>      if (needsGroup) {
> +      PushedGroup pushedGroup;
> +      if (PushGroupForLayer(aTarget, aLayer, aLayer->GetLocalVisibleRegion().ToUnknownRegion(), pushedGroup)) {

If there is an error in PushGropForLayer(), skip this painting.

::: gfx/layers/basic/BasicPaintedLayer.cpp
@@ +84,3 @@
>        if (needsGroup) {
> +        availableGroup =
> +            BasicManager()->PushGroupForLayer(aContext, this, toDraw, group);

If there is an error in PushGropForLayer(), skip this painting.
Mh, I've noticed that after updating my driver to the latest hotfix (Crimson Edition 16.6.2 Hotfix) I no longer seem to have this issue.
Attachment #8767104 - Flags: review?(bas) → review+
Comment on attachment 8767106 [details] [diff] [review]
P3: update PushGroupForLayer() usage. v1

Review of attachment 8767106 [details] [diff] [review]:
-----------------------------------------------------------------

This and pt2 should be a single changeset as they won't compile individually.
Attachment #8767106 - Flags: review?(bas) → review+
Comment on attachment 8767105 [details] [diff] [review]
P2: obtain the group creation result from PushGroupForLayer(). v1

Review of attachment 8767105 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicLayerManager.cpp
@@ +109,5 @@
>  
>        RefPtr<gfxContext> ctx =
>          gfxContext::CreateOrNull(dt, ToRect(rect).TopLeft());
>        if (!ctx) {
> +        gfxCriticalNote << "BasicLayerManager context problem " << gfx::hexa(dt);

Be a little more explicit here, i.e.: gfxContext::CreateOrNull failed or something along those lines.
Attachment #8767105 - Flags: review?(bas) → review+
squash attachment 8767105 [details] [diff] [review] and 8767106.
update for comment 30.
Attachment #8767827 - Flags: review+
Attachment #8767105 - Attachment is obsolete: true
Attachment #8767106 - Attachment is obsolete: true
try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6f1cb0418fea
Please land the
attachment 8767104 [details] [diff] [review]
and
attachment 8767827 [details] [diff] [review]
to m-c
Keywords: checkin-needed
There are a lot of BasicPaintedLayer::PaintThebes crash for 48.
And I think the reason is the same as this bug. The context is nullptr because of driver-reset.

https://crash-stats.mozilla.org/signature/?signature=mozilla%3A%3Alayers%3A%3ABasicPaintedLayer%3A%3APaintThebes&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/deb3ad46e030
show a log if there is no device for DrawTargetD2D1::Init(). r=bas
https://hg.mozilla.org/integration/mozilla-inbound/rev/018db940b995
obtain the group creation result from PushGroupForLayer(). r=bas
Keywords: checkin-needed
Comment on attachment 8767104 [details] [diff] [review]
P1: show a log if there is no device for DrawTargetD2D1::Init(). v1

Review of attachment 8767104 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/DrawTargetD2D1.cpp
@@ +1008,5 @@
>    HRESULT hr;
>  
>    ID2D1Device* device = Factory::GetD2D1Device();
>    if (!device) {
> +    gfxCriticalNote << "[D2D1.1] Failed to obtain a device for DrawTargetD2D1::Init().";

May not be important in this case, but it is convenient when the messages are different, so that we can tell where they come from.  This one matches the message in the other Init function, so we if we get it in the crash report, we won't know which version was hit.  However, in this case, we're reporting the device being null, so it probably doesn't matter which of the two we've hit.
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #35)
> There are a lot of BasicPaintedLayer::PaintThebes crash for 48.
> And I think the reason is the same as this bug. The context is nullptr
> because of driver-reset.

Yes, let's consider this for uplift.
https://hg.mozilla.org/mozilla-central/rev/deb3ad46e030
https://hg.mozilla.org/mozilla-central/rev/018db940b995
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Comment on attachment 8767827 [details] [diff] [review]
P2: obtain the group creation result from PushGroupForLayer(). v2. r=bas

Approval Request Comment
[Feature/regressing bug #]:
bug 1283975

[User impact if declined]:
crash as:
https://crash-stats.mozilla.org/signature/?signature=mozilla%3A%3Alayers%3A%3ABasicPaintedLayer%3A%3APaintThebes&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&_sort=&page=1

[Describe test coverage new/current, TreeHerder]:
try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6f1cb0418fea

already merged to m-c

[Risks and why]: 
If we have driver reset during the DrawTarget creation, we will de-reference a nullptr context in BasicPaintedLayer::PaintThebes().
https://hg.mozilla.org/releases/mozilla-beta/annotate/dd7af1fa4ece/gfx/layers/basic/BasicPaintedLayer.cpp#l90

I think the risk is low. This patch skip the rendering if we hit the driver reset. User might see the old content in that moment, but gecko will recover later.

[String/UUID change made/needed]:
none
Attachment #8767827 - Flags: approval-mozilla-beta?
And I don't know whether ff49 needs this or not.
Flags: needinfo?(milan)
update for comment 37.
just a simple update for error message.
Attachment #8768280 - Flags: review+
Attachment #8768280 - Attachment description: P3: update DrawTargetD2D1::Init() log. → P3: update DrawTargetD2D1::Init() log. r=me
Please land the attachment 8768280 [details] [diff] [review] to m-c.
Keywords: checkin-needed
Hi Jerry,
Yes, please also file another uplift request for 49.
Flags: needinfo?(milan)
As described in comment #43.
Flags: needinfo?(hshih)
Comment on attachment 8767827 [details] [diff] [review]
P2: obtain the group creation result from PushGroupForLayer(). v2. r=bas

Comment on attachment 8767827 [details] [diff] [review] [diff] [review]
P2: obtain the group creation result from PushGroupForLayer(). v2. r=bas

Approval Request Comment
[Feature/regressing bug #]:
bug 1276062

[User impact if declined]:
Gecko will crash at https://dxr.mozilla.org/mozilla-aurora/source/gfx/layers/basic/BasicLayerManager.cpp?q=BasicLayerManager%3A%3APushGroupForLayer&redirect_type=single#115
as bp-95a626c7-a8c1-49e1-becb-fa7c82160526

It is not the same as the crash in ff48:
https://crash-stats.mozilla.org/signature/?signature=mozilla%3A%3Alayers%3A%3ABasicPaintedLayer%3A%3APaintThebes&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&_sort=&page=1

We will have an assert with gfxDevCrash() for nightly and aurora, but not at beta and release.

[Describe test coverage new/current, TreeHerder]:
try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6f1cb0418fea

already merged to m-c

[Risks and why]: 
If we have driver reset during the DrawTarget creation, we will have an assert in BasicLayerManager::PushGroupForLayer()
https://dxr.mozilla.org/mozilla-aurora/source/gfx/layers/basic/BasicLayerManager.cpp?q=BasicLayerManager%3A%3APushGroupForLayer&redirect_type=single#115

I think the risk is low. This patch skip the rendering if we hit the driver reset. User might see the old content in that moment, but gecko will recover later.

[String/UUID change made/needed]:
none
Flags: needinfo?(hshih)
Attachment #8767827 - Flags: approval-mozilla-aurora?
Comment on attachment 8767827 [details] [diff] [review]
P2: obtain the group creation result from PushGroupForLayer(). v2. r=bas

Review of attachment 8767827 [details] [diff] [review]:
-----------------------------------------------------------------

After discussing with Jerry, the fix is small and low risk. Take it in 48 beta 6 and aurora.
Attachment #8767827 - Flags: approval-mozilla-beta?
Attachment #8767827 - Flags: approval-mozilla-beta+
Attachment #8767827 - Flags: approval-mozilla-aurora?
Attachment #8767827 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/80731111fb5b
https://hg.mozilla.org/releases/mozilla-beta/rev/1a8e1e5125da
Pushed by hshih@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d4203db17100
update DrawTargetD2D1::Init() log. r=hshih
https://hg.mozilla.org/mozilla-central/rev/d4203db17100
Crash volume for signature 'mozilla::layers::BasicLayerManager::PushGroupForLayer':
 - nightly (version 50): 241 crashes from 2016-06-06.
 - aurora  (version 49): 96 crashes from 2016-06-07.
 - beta    (version 48): 5 crashes from 2016-06-06.
 - release (version 47): 5 crashes from 2016-05-31.
 - esr     (version 45): 1 crash from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          2          0          3         19        129         53         35
 - aurora           2          0         10         26         14         38          6
 - beta             2          0          1          0          0          2          0
 - release          1          4          0          0          0          0          0
 - esr              0          0          1          0          0          0          0

Affected platform: Windows
I believe we can safely mark this verified fixed on Fx50, based on the crash
stats available for the last 3 months.

  SIGNATURE   | mozilla::layers::BasicLayerManager::PushGroupForLayer
  -------------------------------------------------------------------
  CRASH STATS | http://tinyurl.com/zahv29k
  -------------------------------------------------------------------
  OVERVIEW    | 5 crashes on nightly 50
		0 crashes on aurora 50
		0 crashes on beta 50
Status: RESOLVED → VERIFIED
Blocks: 1297204
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: