1306168 - Crash in mozilla::layers::CompositorD3D11::BeginFrame

Reporter

Description

•

8 years ago

This bug was filed from the Socorro interface and is 
report bp-23aeed7e-5196-4cc2-a2e1-2d6f22160927.
=============================================================

#49 of 0925 Nightly on Windows, 7 crashes from 7 installations. From the graph [1], bug 1133623 seemed fixed it, but it came back in the beginning of August. Low volume though.

[1] https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=Nightly&_sort=-date&signature=mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3ABeginFrame&date=%3E2016-06-07#graphs

Edwin Flores [inactive from 2016-12-01] [:eflores] [:edwin]

Comment 1

•

8 years ago

Seems to have spiked only recently in Aurora, when it became 51. That gives our regression range a lower bound of 2016-08-01, the last Nightly 50.

Looking at the build graph[1], Nightly 51 started reporting this a couple of days later in the 2016-08-04 build.

[1] https://crash-stats.mozilla.com/signature/?product=Firefox&date=%3E2016-06-07&signature=mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3ABeginFrame#graph

Edwin Flores [inactive from 2016-12-01] [:eflores] [:edwin]

Comment 2

•

8 years ago

If we're slightly optimistic and assume that the first spike is exactly when this started happening, we get the range at [1].

In particular, a couple of the patches from bug 1289640 talking about threadsafe texture upload seem a bit suspicious. Doesn't add all that much information though, sadly.

[1] http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6608e5864780589b25d5421c3d3673ab30c4c318&tochange=1576e7bc1bec7232e9e4ba78cce62526b1a6380b

Edwin Flores [inactive from 2016-12-01] [:eflores] [:edwin]

Comment 3

•

8 years ago

I wonder if it's possible that we see a device loss, causing these mutexes to not acquire, but manage to reset the device before they time out? So:

1. Device is lost.
2. (Thread 1) Start attempting to acquire the mutex. The device is lost so this can't succeed.
3. (Thread 2) We reset the device.
4. (Thread 1) We time out trying to acquire the mutex; crash.

Vincent Liu[:vliu]

Updated

•

8 years ago

Blocks: 1297204

BugBot [:suhaib / :marco/ :calixte]

Comment 4

•

8 years ago

Crash volume for signature 'mozilla::layers::CompositorD3D11::BeginFrame':
 - nightly (version 52): 53 crashes from 2016-09-19.
 - aurora  (version 51): 53 crashes from 2016-09-19.
 - beta    (version 50): 10 crashes from 2016-09-20.
 - release (version 49): 273 crashes from 2016-09-05.
 - esr     (version 45): 47 crashes from 2016-06-01.

Crash volume on the last weeks (Week N is from 10-03 to 10-09):
            W. N-1  W. N-2
 - nightly      34      19
 - aurora       42      11
 - beta          9       1
 - release     218      55
 - esr           4       1

Affected platform: Windows

Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly #22
 - aurora  #24
 - beta    #1421
 - release #232
 - esr     #1647

status-firefox49: --- → affected

status-firefox50: --- → affected

status-firefox51: --- → affected

status-firefox-esr45: --- → affected

George Wright (:gw280) (needinfo me!)

Updated

•

8 years ago

Priority: -- → P3

Whiteboard: [gfx-noted]

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

8 years ago

Updated

•

8 years ago

Flags: needinfo?(milan)

Nicholas Nethercote [inactive]

Comment 5

•

8 years ago

This is #28 topcrash in Nightly over the past 7 days.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 6

•

8 years ago

Note that majority of these should go away once we're in beta or release - in nightly and aurora, we force a crash when particular errors happen, in beta and release we keep running.  Some of the times we continue, we end up crashing in a driver, or timing out and moz_crash-ing, but the numbers seem to be low.
Most of the work for this is now in bug 1160157.

BugBot [:suhaib / :marco/ :calixte]

Comment 7

•

7 years ago

Crash volume for signature 'mozilla::layers::CompositorD3D11::BeginFrame':
 - nightly (version 53): 247 crashes from 2016-11-14.
 - aurora  (version 52): 232 crashes from 2016-11-14.
 - beta    (version 51): 1405 crashes from 2016-11-14.
 - release (version 50): 709 crashes from 2016-11-01.
 - esr     (version 45): 92 crashes from 2016-07-06.

Crash volume on the last weeks (Week N is from 01-02 to 01-08):
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly      34      33      60      51      38      20       0
 - aurora       32      39      41      47      51      16       0
 - beta        198     201     199     224     271     162      83
 - release      98     112     124     113     105     104      28
 - esr           3       6       3       3       6       4       9

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content   Plugin
 - nightly #166
 - aurora  #24
 - beta    #45
 - release #497
 - esr     #1545

status-firefox53: --- → affected

Wayne Mery (:wsmwk)

Comment 8

•

7 years ago

bp-76f6fe15-c9d8-4b3c-963c-63fc12170118 with nightly 53.0a1 20170111030235.

At the same time I got OOM | small bp-15b14274-ad24-4bd7-85b9-487cb2170118  and  F802033140_______________________________________ bp-603e29e6-538d-46d5-9e70-4c9db2170118

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox49: affected → wontfix

status-firefox50: affected → wontfix

status-firefox51: affected → wontfix

status-firefox-esr45: affected → wontfix

Marco Castelluccio [:marco]

Comment 9

•

7 years ago

This is #34 on Beta 51. In most cases we are moz-crashing after a timeout, like you said:
(99.30% in signature vs 00.17% overall) moz_crash_reason = MOZ_CRASH(GFX: D3D11 normal status timeout)

[:philipp]

Comment 10

•

7 years ago

unfortunately the volume of this crash has increased once 51 went to the release audience - it's the #6 browser crash causing 1.22% of all browser crashes in firefox 51.0.1

Milan Sreckovic [:milan] (needinfo for best results)

Comment 11

•

7 years ago

The device reset happens, and we don't properly deal with it and eventually crash in the timeout.  Seems to be disproportionately many Nvidia cards in these crashes.
The good news is that all (with a couple of interesting exceptions) of the 53 & 54 crashes are the GPU process.  The bad news is that 51 & 52 don't have the GPU process, so the browser goes down with this crash.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 12

•

7 years ago

The "last" of the device reset crashes seem to come from something similar to what's described in bug 1333329, except that we MOZ_CRASH because of the timeout in CompositorD3D11::BeginFrame (e.g., https://crash-stats.mozilla.com/report/index/82e1407a-9506-41f9-8d61-8dbd62170129)

Should we remove this MOZ_CRASH?  We do it because the timeout doesn't think it's part of the device reset, but looking at the log, there clearly were resets in the past, we just "forgot" about them by the time we get here.  Or maybe it's something else.

On a side note, we're going to uplift a patch to beta that could reduce the number of device resets on Nvidia in the first place.  If that shows results, we could do a dot release on 51.

Flags: needinfo?(milan) → needinfo?(dvander)

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 13

•

7 years ago

Crashing here, if in the GPU process, actually seems fine to me. Having the compositor block for 30+ seconds each frame is a little worrying. It might be worth disabling D3D11 at that point.

Flags: needinfo?(dvander)

BugBot [:suhaib / :marco/ :calixte]

Comment 14

•

7 years ago

Crash volume for signature 'mozilla::layers::CompositorD3D11::BeginFrame':
 - nightly (version 54): 27 crashes from 2017-01-23.
 - aurora  (version 53): 20 crashes from 2017-01-23.
 - beta    (version 52): 168 crashes from 2017-01-23.
 - release (version 51): 1906 crashes from 2017-01-16.
 - esr     (version 45): 104 crashes from 2016-08-03.

Crash volume on the last weeks (Week N is from 01-30 to 02-05):
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly      16
 - aurora        9
 - beta        107
 - release     968       0
 - esr           9       4       5       3       3       6       3

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content   Plugin
 - nightly #785
 - aurora  #562
 - beta    #41
 - release #6
 - esr     #1470

status-firefox54: --- → affected

Julian Seward [:jseward]

Comment 15

•

7 years ago

This is the #2 topcrash for Windows nightly of 20170309030216,
reported 729 times.

Toshihiro Yamada

Comment 16

•

7 years ago

(In reply to Julian Seward [:jseward] from comment #15)
> This is the #2 topcrash for Windows nightly of 20170309030216,
> reported 729 times.

Probably, Bug 1345814.

u279076

Comment 17

•

7 years ago

Adding a note that this is the #1 GPU Process crash @ 55.38% in Nightly 55 (#2 @ 13.34% overall) with 5098 of 5352 reports coming from the GPU Process (95.2%). In Beta this is only #109 @ 0.04% and Aurora this is only #54 @ 0.09%.

Keywords: topcrash-win

Julien Cristau [:jcristau]

Comment 18

•

7 years ago

Mass wontfix for bugs affecting firefox 52.

status-firefox52: affected → wontfix

Wayne Mery (:wsmwk)

Comment 19

•

4 years ago

This currently far from being a topcrash, and isn't showing for any current versions
https://crash-stats.mozilla.org/signature/?signature=mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3ABeginFrame

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → WORKSFORME