Closed Bug 1306168 Opened 4 years ago Closed 5 months ago
Crash in mozilla::layers::Compositor
This bug was filed from the Socorro interface and is report bp-23aeed7e-5196-4cc2-a2e1-2d6f22160927. ============================================================= #49 of 0925 Nightly on Windows, 7 crashes from 7 installations. From the graph , bug 1133623 seemed fixed it, but it came back in the beginning of August. Low volume though.  https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=Nightly&_sort=-date&signature=mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3ABeginFrame&date=%3E2016-06-07#graphs
Seems to have spiked only recently in Aurora, when it became 51. That gives our regression range a lower bound of 2016-08-01, the last Nightly 50. Looking at the build graph, Nightly 51 started reporting this a couple of days later in the 2016-08-04 build.  https://crash-stats.mozilla.com/signature/?product=Firefox&date=%3E2016-06-07&signature=mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3ABeginFrame#graph
If we're slightly optimistic and assume that the first spike is exactly when this started happening, we get the range at . In particular, a couple of the patches from bug 1289640 talking about threadsafe texture upload seem a bit suspicious. Doesn't add all that much information though, sadly.  http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6608e5864780589b25d5421c3d3673ab30c4c318&tochange=1576e7bc1bec7232e9e4ba78cce62526b1a6380b
I wonder if it's possible that we see a device loss, causing these mutexes to not acquire, but manage to reset the device before they time out? So: 1. Device is lost. 2. (Thread 1) Start attempting to acquire the mutex. The device is lost so this can't succeed. 3. (Thread 2) We reset the device. 4. (Thread 1) We time out trying to acquire the mutex; crash.
Crash volume for signature 'mozilla::layers::CompositorD3D11::BeginFrame': - nightly (version 52): 53 crashes from 2016-09-19. - aurora (version 51): 53 crashes from 2016-09-19. - beta (version 50): 10 crashes from 2016-09-20. - release (version 49): 273 crashes from 2016-09-05. - esr (version 45): 47 crashes from 2016-06-01. Crash volume on the last weeks (Week N is from 10-03 to 10-09): W. N-1 W. N-2 - nightly 34 19 - aurora 42 11 - beta 9 1 - release 218 55 - esr 4 1 Affected platform: Windows Crash rank on the last 7 days: Browser Content Plugin - nightly #22 - aurora #24 - beta #1421 - release #232 - esr #1647
Priority: -- → P3
4 years ago
See Also: → 1160157
4 years ago
This is #28 topcrash in Nightly over the past 7 days.
Note that majority of these should go away once we're in beta or release - in nightly and aurora, we force a crash when particular errors happen, in beta and release we keep running. Some of the times we continue, we end up crashing in a driver, or timing out and moz_crash-ing, but the numbers seem to be low. Most of the work for this is now in bug 1160157.
Crash volume for signature 'mozilla::layers::CompositorD3D11::BeginFrame': - nightly (version 53): 247 crashes from 2016-11-14. - aurora (version 52): 232 crashes from 2016-11-14. - beta (version 51): 1405 crashes from 2016-11-14. - release (version 50): 709 crashes from 2016-11-01. - esr (version 45): 92 crashes from 2016-07-06. Crash volume on the last weeks (Week N is from 01-02 to 01-08): W. N-1 W. N-2 W. N-3 W. N-4 W. N-5 W. N-6 W. N-7 - nightly 34 33 60 51 38 20 0 - aurora 32 39 41 47 51 16 0 - beta 198 201 199 224 271 162 83 - release 98 112 124 113 105 104 28 - esr 3 6 3 3 6 4 9 Affected platform: Windows Crash rank on the last 7 days: Browser Content Plugin - nightly #166 - aurora #24 - beta #45 - release #497 - esr #1545
bp-76f6fe15-c9d8-4b3c-963c-63fc12170118 with nightly 53.0a1 20170111030235. At the same time I got OOM | small bp-15b14274-ad24-4bd7-85b9-487cb2170118 and F802033140_______________________________________ bp-603e29e6-538d-46d5-9e70-4c9db2170118
This is #34 on Beta 51. In most cases we are moz-crashing after a timeout, like you said: (99.30% in signature vs 00.17% overall) moz_crash_reason = MOZ_CRASH(GFX: D3D11 normal status timeout)
unfortunately the volume of this crash has increased once 51 went to the release audience - it's the #6 browser crash causing 1.22% of all browser crashes in firefox 51.0.1
The device reset happens, and we don't properly deal with it and eventually crash in the timeout. Seems to be disproportionately many Nvidia cards in these crashes. The good news is that all (with a couple of interesting exceptions) of the 53 & 54 crashes are the GPU process. The bad news is that 51 & 52 don't have the GPU process, so the browser goes down with this crash.
The "last" of the device reset crashes seem to come from something similar to what's described in bug 1333329, except that we MOZ_CRASH because of the timeout in CompositorD3D11::BeginFrame (e.g., https://crash-stats.mozilla.com/report/index/82e1407a-9506-41f9-8d61-8dbd62170129) Should we remove this MOZ_CRASH? We do it because the timeout doesn't think it's part of the device reset, but looking at the log, there clearly were resets in the past, we just "forgot" about them by the time we get here. Or maybe it's something else. On a side note, we're going to uplift a patch to beta that could reduce the number of device resets on Nvidia in the first place. If that shows results, we could do a dot release on 51.
Flags: needinfo?(milan) → needinfo?(dvander)
Crashing here, if in the GPU process, actually seems fine to me. Having the compositor block for 30+ seconds each frame is a little worrying. It might be worth disabling D3D11 at that point.
Crash volume for signature 'mozilla::layers::CompositorD3D11::BeginFrame': - nightly (version 54): 27 crashes from 2017-01-23. - aurora (version 53): 20 crashes from 2017-01-23. - beta (version 52): 168 crashes from 2017-01-23. - release (version 51): 1906 crashes from 2017-01-16. - esr (version 45): 104 crashes from 2016-08-03. Crash volume on the last weeks (Week N is from 01-30 to 02-05): W. N-1 W. N-2 W. N-3 W. N-4 W. N-5 W. N-6 W. N-7 - nightly 16 - aurora 9 - beta 107 - release 968 0 - esr 9 4 5 3 3 6 3 Affected platform: Windows Crash rank on the last 7 days: Browser Content Plugin - nightly #785 - aurora #562 - beta #41 - release #6 - esr #1470
This is the #2 topcrash for Windows nightly of 20170309030216, reported 729 times.
(In reply to Julian Seward [:jseward] from comment #15) > This is the #2 topcrash for Windows nightly of 20170309030216, > reported 729 times. Probably, Bug 1345814.
Adding a note that this is the #1 GPU Process crash @ 55.38% in Nightly 55 (#2 @ 13.34% overall) with 5098 of 5352 reports coming from the GPU Process (95.2%). In Beta this is only #109 @ 0.04% and Aurora this is only #54 @ 0.09%.
Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.