Closed Bug 1291084 Opened 3 years ago Closed 3 years ago

Wrong device returned in GetDeviceForCurrentThread

Categories

(Core :: Audio/Video: Playback, defect, P3, critical)

Unspecified
Windows 7
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox48 --- wontfix
firefox49 --- wontfix
firefox-esr45 --- wontfix
firefox50 --- fixed
firefox51 --- unaffected
firefox52 --- unaffected

People

(Reporter: marcia, Assigned: bas.schouten)

References

Details

(6 keywords, Whiteboard: [gfx-noted])

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-ee383ebb-c7af-49ea-a5ac-8d1ce2160801.
=============================================================

Fairly high volume Windows crash that spiked using the 2016073103 build, but in the next build returned to normal. Unsure where this belong component wise - if someone can move it I would appreciate it.

Some comments:

crashed on a website with many blocked scripts
meemory leeks
memory maxed out again 
It keeps eating up all 8 GIGs of my PC's Memory after the last Update..!!
Crash volume for signature 'std::list<T>::clear':
 - nightly (version 50): 123 crashes from 2016-06-06.
 - aurora  (version 49): 19 crashes from 2016-06-07.
 - beta    (version 48): 183 crashes from 2016-06-06.
 - release (version 47): 1357 crashes from 2016-05-31.
 - esr     (version 45): 238 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly      15       0       0       0       0       0       1
 - aurora        0       3       3       4       2       3       3
 - beta         26      35      26      29      23      18      16
 - release     184     201     167     182     184     174     176
 - esr          32      38      28      18      28      33      25

Affected platform: Windows
Summary: Crash in std::list<T>::clear → Crash in std::list<T>::clear() from D2D1::EndDraw()
Whiteboard: [gfx-noted]
My wife experienced this crash on her work computer when she navigated to https://library.leeds.ac.uk/flyingstart/firstyear.html and pressed “Lectures”.  The crash report she had was https://crash-stats.mozilla.com/report/index/3b0caeb3-92b0-4f2b-af4f-5ffe92160803.  Unfortunately it does not reproduce.
ni on Milan - this is the #2 top crash on Nightly - can we get some help investigating why it spiked?  Thanks.
Flags: needinfo?(milan)
I'm going to guess that this is another variant of bug 1291531 and friends.
See Also: → 1291531
Flags: needinfo?(nical.bugzilla)
(In reply to Andrew McCreight [:mccr8] from comment #5)
> I'm going to guess that this is another variant of bug 1291531 and friends.

We're operating under that assumption for now.  It is however possible there are multiple bugs in here, and there is a suggestion that somebody from DOM should also take a look it some of the crashes that are lumped together.
Flags: needinfo?(milan)
Assignee: nobody → nical.bugzilla
Flags: needinfo?(nical.bugzilla)
(In reply to Andrew McCreight [:mccr8] from comment #5)
> I'm going to guess that this is another variant of bug 1291531 and friends.

This started long before bug 1291531 and associated regressions and often happens on the compositor side while bug 1291531 and friends are all content-side crashes. So we can work under the assumption that fixing the recent canvas regressions won't fix this.

It doesn't appear to correlate with memory usage as far as I can tell.

Sicne I am focusing on the canvas regressions in the short term I'll put my self back as needinfo'ed rather than assigned for now.
Bas, any Idea ?
Assignee: nical.bugzilla → nobody
Blocks: 1285271
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(bas)
The crash report linked here seems to point towards video decoder related resources, which is a little weird. It seems like this is some form of driver bug but it's hard to tell really.
Flags: needinfo?(bas)
This is high enough volume that we need to do something about it, or at least understand it better.

Some crashes are shutdown, some of those, and the non-shutdown ones are when deleting D2D device.  With that in mind ,it feels like maybe bug 1284672 would have reduced the numbers down, but that doesn't seem to be the case.

It does look video related, from a few crashes I sampled, but Bas, can you spend a bit more time looking at this, and if you're certain it is video, pass it to that team?
Assignee: nobody → bas
Flags: needinfo?(bas)
(In reply to Milan Sreckovic [:milan] from comment #9)
> This is high enough volume that we need to do something about it, or at
> least understand it better.
> 
> Some crashes are shutdown, some of those, and the non-shutdown ones are when
> deleting D2D device.  With that in mind ,it feels like maybe bug 1284672
> would have reduced the numbers down, but that doesn't seem to be the case.
> 
> It does look video related, from a few crashes I sampled, but Bas, can you
> spend a bit more time looking at this, and if you're certain it is video,
> pass it to that team?

This signature seems to lump a large amount of different crashes together. Some of them are on device resets, some of them on simple texture deallocations. Some of the latter seem to be related to video on the stack.

These crashes seem to occur both on the compositor thread and the main thread. A lot of these have video stuff on the stack of some threads, but not all. Some of these occur in the content process, some of these in the parent process. I don't really see anything that particularly ties any of these together or provides any actionable information. These crashes occur on ATI, NVidia and Intel hardware in ratio's seemingly similar to the distribution of said hardware.

My guess at this point would be some sort of race condition or corruption. I looked at the stacks for about 10-15 of these in more detail but they don't really have anything interesting going on in there. These crashes are occurring on at least 4 different 'lists' inside D3D. It's a little odd we're not seeing a significant amount of crashes in -other- things than lists. I'm going to examine a minidump to see if I can get any more data from that.

If this somehow regressed (and it appears it might), related work I could see regressing this is either from Nical or Mattwoodrow, do you guys have any ideas?
Flags: needinfo?(bas) → needinfo?(matt.woodrow)
This is the #2 crash on Windows Nightly, with 146 crashes. (The #1 is shutdown hangs and the #3 is Flash, so that's quite a lot of crashes.)

> This signature seems to lump a large amount of different crashes together.

I'll get this added to the skip list so that these are split up a bit.
Depends on: 1295362
(In reply to Marcia Knous [:marcia - use ni] from comment #0)

> Fairly high volume Windows crash that spiked using the 2016073103 build, but
> in the next build returned to normal. Unsure where this belong component
> wise - if someone can move it I would appreciate it.

Are there any interesting changesets in that nightly that then got backed out the next day? Knowing what caused the spike may help us track down what the underlying problem is.

It's interesting that the second stack frame is in D3D11VideoDecoderOutputView code.

The content device should never touch video, so it would appear that we're freeing up some shared state between devices. I was told (by Paul Blinzer) that we've had race conditions before with the shared state and multiple devices, so this could be similar.

I wonder if we need to try sharing the same device for d2d and DXVA.

It's also interesting that we're getting D3D11 video objects, since this user is getting D3D9 DXVA. It's possible that this is an implementation detail within the driver though.
Flags: needinfo?(matt.woodrow)
Crash Signature: [@ std::list<T>::clear] → [@ std::list<T>::clear] [@ std::list<T>::clear | CDeviceChild<T>::~CDeviceChild<T>]
No longer blocks: 1285271
Flags: needinfo?(nical.bugzilla)
This is the #2 topcrash in the 2016081908050 Nightly, with 64 occurrences, which is unusually high for a Nightly build. And in the past 7 days it's happened over 800 times in Nightly and Dev Edition:

>                       crashes         installations
> Firefox 	51.0a1 	430 	51.0% 	356
> Firefox 	50.0a2 	411 	48.8% 	236
So, the top one of these (~65%) seems to be std::list<T>::clear | CDeviceChild<T>::~CDeviceChild<T>, for example: https://crash-stats.mozilla.com/report/index/ebc2953a-c688-4d09-a0de-8e7f22160823 - ID3D11VideoDecoderOutputView is mentioned, as Matt alludes to in comment 12. Matt, Anthony, how do we move on this one, given that we're more than half way through dev edition.
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(ajones)
This crash (~6%, second specific one after the one in comment 14) https://crash-stats.mozilla.com/report/index/4a3e7005-0171-4e9f-81d8-a3e8d2160823 is limited to 49 and earlier.  The top one from comment 14 is limited to 50 and 51 (for all intents and purposes.)  Looks like we traded one for the other, except for the overall spike.
(In reply to Milan Sreckovic [:milan] from comment #14)
> So, the top one of these (~65%) seems to be std::list<T>::clear |
> CDeviceChild<T>::~CDeviceChild<T>, for example:
> https://crash-stats.mozilla.com/report/index/ebc2953a-c688-4d09-a0de-
> 8e7f22160823 - ID3D11VideoDecoderOutputView is mentioned, as Matt alludes to
> in comment 12. Matt, Anthony, how do we move on this one, given that we're
> more than half way through dev edition.

This one is even weirder, since video isn't even in the app notes (so hasn't been used).

Is it possible that the symbols here are just wrong and this has nothing to do with video?

We really need to look at what landed before this spike happened.
Flags: needinfo?(matt.woodrow)
Correlations from https://mozilla.github.io/stab-crashes/correlations.html?channel=nightly&signature=std::list%3CT%3E::clear%20|%20CDeviceChild%3CT%3E::~CDeviceChild%3CT%3E are:

> reason = EXCEPTION_ACCESS_VIOLATION_READ (100.00% vs 16.22%)
> platform_pretty_version = Windows 7 (100.00% vs 36.49%)
> ipc_channel_error = null (100.00% vs 46.44%)
> adapter_vendor_id = 0x1002 (57.06% vs 29.12%)
> adapter_vendor_id = 0x8086 (16.22% vs 37.71%)

So Intel (0x8086) gfx cards are under-represented, and AMD (0x1002) gfx cards are over-represented.
This is now #1 crash in nightly.
Flags: needinfo?(milan)
(In reply to Nicholas Nethercote [:njn] from comment #17)
> Correlations from
> https://mozilla.github.io/stab-crashes/correlations.
> html?channel=nightly&signature=std::list%3CT%3E::
> clear%20|%20CDeviceChild%3CT%3E::~CDeviceChild%3CT%3E are:
> 
> > reason = EXCEPTION_ACCESS_VIOLATION_READ (100.00% vs 16.22%)
> > platform_pretty_version = Windows 7 (100.00% vs 36.49%)
> > ipc_channel_error = null (100.00% vs 46.44%)
> > adapter_vendor_id = 0x1002 (57.06% vs 29.12%)
> > adapter_vendor_id = 0x8086 (16.22% vs 37.71%)
> 
> So Intel (0x8086) gfx cards are under-represented, and AMD (0x1002) gfx
> cards are over-represented.

If you search all crashes matching std::list<T>::clear and CDevice the number become:
1. 0x8086 	1334 	42.76 %
2. 0x1002 	1278 	40.96 %
3. 0x10de 	508 	16.28 %

The crashes spiked on 8-1.

https://crash-stats.mozilla.com/signature/?date=%3E%3D2016-07-31&proto_signature=~CDevice&signature=std%3A%3Alist%3CT%3E%3A%3Aclear&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#aggregations

By version:
1. 	50.0a2 	1062 	34.04 %
2. 	51.0a1 	1028 	32.95 %
3. 	47.0 	263 	8.43 %
4. 	50.0a1 	193 	6.19 %

So maybe something landed on Nightly and uplifted to aurora.
Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e5859dfe0bcbd40f4e33f4a633f73ea3473a7849&tochange=6608e5864780589b25d5421c3d3673ab30c4c318

If we restrict the search to 50.0a2 then the crashes started to appear on 8-5
Pushlog: https://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=7ecbc0f93ff4b7954185ced028a34f7aaa44992d&tochange=307fe134b473938131cb0f28db52cf371883f663

https://hg.mozilla.org/releases/mozilla-aurora/rev/291a9823eb6a looks suspicious though jesup said it's low risk.
Flags: needinfo?(rjesup)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/291a9823eb6a looks
> suspicious though jesup said it's low risk.


That's extremely low-risk -- and nowhere near drawing code.
Flags: needinfo?(rjesup)
See Also: → 1300887
User reported a similar crash at https://www.reddit.com/r/firefox/comments/51fvq6/facebook_crash/ from bug 1300887
Looks like the user can reliably reproduce but the crash signature is different and the OS version is Windows 10
(In reply to Matt Woodrow (:mattwoodrow) from comment #21)
>
> That gives us this range:
> 
> https://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=2ea3d51ba1bb9f5c3b6921c43ea63f70b4fdf5d2&tochange=e585
> 9dfe0bcbd40f4e33f4a633f73ea3473a7849
>
> I still don't see anything that stands out though.

What about these?

> e5db12322fd3	Nicolas Silva — Bug 1289816 - Simplify CopyableCanvasLayer::UpdateTarget and remove unnecessary copies. r=jnicol
> e46e53dfb22b	Nicolas Silva — Bug 1290081 - Make canvas layer transactions asynchronous. r=sotaro
> c16134a5a20f	Nicolas Silva — Bug 1285271 - Reenable copy-on-write canvas. r=jnicol
> 59db65b2b2c2	Jeff Muizelaar — Bug 1289236. Remove ResizeTransparentWindow. r=dvander
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(jmuizelaar)
(In reply to Nicholas Nethercote [:njn] from comment #23)
> What about these?
> 
> > e5db12322fd3	Nicolas Silva — Bug 1289816 - Simplify CopyableCanvasLayer::UpdateTarget and remove unnecessary copies. r=jnicol
> > e46e53dfb22b	Nicolas Silva — Bug 1290081 - Make canvas layer transactions asynchronous. r=sotaro
> > c16134a5a20f	Nicolas Silva — Bug 1285271 - Reenable copy-on-write canvas. r=jnicol

There seem to be a mix of similar crashes in this bug. A good chunk of the crash reports are in the ImageBridge thread which is only used for video stuff. These patches affect our canvas code and the affected code paths have been disabled on windows shortly after they landed. If it spiked around the time these three patches lnaded and went back to the previous crash rate after a few days, then these patches may be involved in the non-ImageBridge portion of the crashes during this short period (but not the crashes still happening today).
Flags: needinfo?(nical.bugzilla)
Bulk move of gfx-noted bugs without priority to P3 for tracking.
Priority: -- → P3
Big spike as the trains changed, probably as new users showed up.

David, some of these are compositor destructor, some are other D2D things getting destroyed - could it be that order of destruction we had in other places?  This is not a shutdown situation, but there seem to be destructors involved.
Flags: needinfo?(milan) → needinfo?(dvander)
std::list<T>::clear | CDeviceChild<T>::~CDeviceChild<T> in Beta:
(99.91% in signature vs 06.31% overall) cpu_arch = amd64
(12.90% in signature vs 80.67% overall) useragent_locale = en-US
(99.81% in signature vs 39.92% overall) platform_version = 6.1.7601 Service Pack 1
(100.0% in signature vs 40.78% overall) reason = EXCEPTION_ACCESS_VIOLATION_READ
(67.61% in signature vs 14.14% overall) adapter_vendor_id = 0x1002
(16.20% in signature vs 64.16% overall) adapter_vendor_id = 0x8086
(99.91% in signature vs 58.69% overall) platform_pretty_version = Windows 7
(37.66% in signature vs 03.62% overall) useragent_locale = de
(40.96% in signature vs 11.94% overall) Addon "Adblock Plus" = true
Looking at the URLs (YouTube, streaming sites, etc.), this might be related to video playing.
Is this interesting?
(99.91% in signature vs 06.31% overall) cpu_arch = amd64
or this:
(99.81% in signature vs 39.92% overall) platform_version = 6.1.7601 Service Pack 1
(that may just say "Win7 only")
(In reply to Marco Castelluccio [:marco] from comment #28)
> Looking at the URLs (YouTube, streaming sites, etc.), this might be related
> to video playing.

Those are certainly the comments as well.
(In reply to Milan Sreckovic [:milan] from comment #26)
> Big spike as the trains changed, probably as new users showed up.
> 
> David, some of these are compositor destructor, some are other D2D things
> getting destroyed - could it be that order of destruction we had in other
> places?  This is not a shutdown situation, but there seem to be destructors
> involved.

From the reports in comment #0 and comment #2 - there's no device reset happening, so this just looks like normal texture cleanup going wrong. Probably either a driver problem or maybe we're failing to synchronize with the compositor in some way (I don't quite know the rules for that). But it looks like Bas and Matt have already come to a similar conclusion.

I doubt this is video related. I looked at a few random crashes in comment #19 and they all looked like content textures, not video. If we see ImageBridge stacks as well, it's probably because ImageBridge is more of a stress on texture allocation/destruction.

On the other hand, maybe freeing textures on two threads at the same time has some kind of internal race. Did bug 1284672 change the occurrence of this crash at all?
Flags: needinfo?(dvander)
(In reply to David Anderson [:dvander] from comment #31)
> ...
> 
> On the other hand, maybe freeing textures on two threads at the same time
> has some kind of internal race. Did bug 1284672 change the occurrence of
> this crash at all?

It made this worse, while making other things better.
(In reply to Milan Sreckovic [:milan] from comment #32)
> (In reply to David Anderson [:dvander] from comment #31)
> > ...
> > 
> > On the other hand, maybe freeing textures on two threads at the same time
> > has some kind of internal race. Did bug 1284672 change the occurrence of
> > this crash at all?
> 
> It made this worse, while making other things better.

To be sure - things got worse around the time this landed, but we also changed trains around that time, and I don't have a causal relationship established.
It's interesting that a great number of these occur within D2D flushing, but the top of the crash involves a video decoding signature.

Can we set a bit in crash reports for whether video is playing, we're in a composite, or a texture is being freed, to see if that correlates with main-thread texture destruction crashes?
Hrm, I guess we would be able to tell that by looking at other threads, but nothing really stands out.
(In reply to Randell Jesup [:jesup] from comment #29)
> Is this interesting?
> (99.91% in signature vs 06.31% overall) cpu_arch = amd64
> or this:
> (99.81% in signature vs 39.92% overall) platform_version = 6.1.7601 Service
> Pack 1
> (that may just say "Win7 only")

Overall in beta we have 29419 crashes with platform_version = 6.1.7601 Service Pack 1,
of those only 2643 with cpu_arch = amd64. So I think it is interesting.

In other terms:
(99.81% in this signature vs 03.62% overall) platform_version = 6.1.7601 Service Pack 1 && cpu_arch = amd64
(In reply to David Anderson [:dvander] from comment #34)
> It's interesting that a great number of these occur within D2D flushing, but
> the top of the crash involves a video decoding signature.
> 
> Can we set a bit in crash reports for whether video is playing, we're in a
> composite, or a texture is being freed, to see if that correlates with
> main-thread texture destruction crashes?

Perhaps this is something similar to bug 1272877 comment 6?  Not the same, but perhaps similar in concept.  And something like this could be more likely to crash on a certain driver/OS... but I'm just guessing here.
(In reply to Marco Castelluccio [:marco] from comment #36)
> (In reply to Randell Jesup [:jesup] from comment #29)
> > Is this interesting?
> > (99.91% in signature vs 06.31% overall) cpu_arch = amd64
> > or this:
> > (99.81% in signature vs 39.92% overall) platform_version = 6.1.7601 Service
> > Pack 1
> > (that may just say "Win7 only")
> 
> Overall in beta we have 29419 crashes with platform_version = 6.1.7601
> Service Pack 1,
> of those only 2643 with cpu_arch = amd64. So I think it is interesting.
> 
> In other terms:
> (99.81% in this signature vs 03.62% overall) platform_version = 6.1.7601
> Service Pack 1 && cpu_arch = amd64

Note that cpu_arch=amd64 means that this is only happening for the 64 bit builds of
Firefox (cpu_arch is not the architecture of the CPU, it's the architecture Firefox
was built for).
This is why there's a small number of reports with cpu_arch=amd64 overall, most of
our users don't have Firefox 64 bit.
Other (maybe interesting) data:
(37.70% in signature vs 26.32% overall) "DXVA2D3D9?" in app_notes
(14.23% in signature vs 05.67% overall) "DXVA2D3D9-" in app_notes
(23.54% in signature vs 21.26% overall) "DXVA2D3D9+" in app_notes
(00.00% in signature vs 06.67% overall) "DXVA2D3D11" in app_notes
(96.43% in signature vs 35.56% overall) "D2D1.1+" in app_notes
(03.57% in signature vs 59.72% overall) "D2D1.1-" in app_notes
(78.69% in signature vs 47.07% overall) "D3D11 Layers+" in app_notes
(78.69% in signature vs 48.20% overall) "D3D11 Layers?" in app_notes
(29.04% in signature vs 09.08% overall) domain is www.youtube.com
I want to add, that these crashes in my case happen when I watch video or have paused some amount of videos in the background.
It also crashes interchangeably with other signatures in bug #1294748 with the same STR.



My crashlog reports:

https://crash-stats.mozilla.com/report/index/3c18fa31-df95-4e8f-b5c8-9f0cc2160927
https://crash-stats.mozilla.com/report/index/56ddf280-e23b-49a2-8996-390dd2160927
https://crash-stats.mozilla.com/report/index/88358883-72ad-4a2e-bf7f-68c912160927
https://crash-stats.mozilla.com/report/index/09e592c1-ae18-4493-99b0-40a3e2160929



Graphics section from about:support

Features
Compositing	Direct3D 11
Asynchronous Pan/Zoom	none
WebGL Renderer	WebGL is currently disabled.
WebGL2 Renderer	(no info)
Hardware H264 Decoding	Yes; Using D3D9 API
Audio Backend	wasapi
Direct2D	true
DirectWrite	true (6.2.9200.17568)
GPU #1
Active	Yes
Description	NVIDIA GeForce GTX 750 Ti
Vendor ID	0x10de
Device ID	0x1380
Driver Version	21.21.13.7290
Driver Date	9-16-2016
Drivers	nvd3dumx,nvwgf2umx,nvwgf2umx nvd3dum,nvwgf2um,nvwgf2um
Subsys ID	36811458
RAM	2048
Diagnostics
AzureCanvasAccelerated	0
AzureCanvasBackend	direct2d 1.1
AzureContentBackend	direct2d 1.1
AzureFallbackCanvasBackend	cairo
Component: Graphics → Audio/Video: Playback
Keywords: regression
See Also: → 1294748
Your crashes seem to fall in the bucket Direct3D11 + DXVA2D3D9.

This crash seem to happen almost exclusively on Windows 7,
which would explain why we aren't trying with DXVA2D3D11.

Does it happen also when you're playing a single video, or
only when you're playing multiple videos?

Does it happen when you close a tab containing a video, when
the video ends or while you're playing it?

Can you reproduce the crash in a clean profile? In a 32 bit
build?
(In reply to Marco Castelluccio [:marco] from comment #41)
> Does it happen also when you're playing a single video, or
> only when you're playing multiple videos?
It can crash in both cases,
more often when I play only one video in one tab in one window,
but it also crash when I play one video in the background tab and other videos are paused and buffering/downloading for later watching,
and the foreground tab is used for internet browsing.

(In reply to Marco Castelluccio [:marco] from comment #41)
> Does it happen when you close a tab containing a video, when
> the video ends or while you're playing it?
It also can crash in both cases,
but more often when video is still playing and I'm closing this tab.

(In reply to Marco Castelluccio [:marco] from comment #41)
> Can you reproduce the crash in a clean profile?
Yes.

> In a 32 bit build?
No.

I suspect that this have very high correlation to bug #1254389 and to bug #1294748, as in these crashlogs reports nearly always the last crashing thread is the msmpeg2vdec.dll@0xe59cd. This could be probably fixed by bug #1287668.
Blocks: support-win64
No longer blocks: tracking_win64
Assignee: bas → nobody
Flags: needinfo?(jmuizelaar)
Not sure how useful this will be, but it can't hurt.
Comment on attachment 8797720 [details]
Bug 1291084: Unconditionally create all devices as threadsafe.

https://reviewboard.mozilla.org/r/83366/#review81882
Attachment #8797720 - Flags: review?(matt.woodrow) → review+
Hi Milan, Bas, given that this is a top crasher, I'd be happy to take a speculative fix on 50.0b5 for this. We gtb Thursday noon PST, so if the patch is ready to uplift that morning that would be great. Thanks!
Flags: needinfo?(milan)
Flags: needinfo?(bas)
(In reply to Ritu Kothari (:ritu) from comment #46)
> Hi Milan, Bas, given that this is a top crasher, I'd be happy to take a
> speculative fix on 50.0b5 for this. We gtb Thursday noon PST, so if the
> patch is ready to uplift that morning that would be great. Thanks!

It will be, but to be clear, this patch is extremely unlikely to make a difference in anything. It's a complete stab in the dark.
Flags: needinfo?(bas)
Pushed by bschouten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/54f0671f3c8b
Unconditionally create all devices as threadsafe. r=mattwoodrow
[Tracking Requested - why for this release]:
Has been spiking lately!
sorry had to back this out for failures like https://treeherder.mozilla.org/logviewer.html#?job_id=4528431&repo=autoland
Flags: needinfo?(bas)
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ce4666acd6f5
Backed out changeset 54f0671f3c8b for breaking talos and webgl tests
Pushed by bschouten@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/69e98ce5b005
Unconditionally create all devices as threadsafe. r=mattwoodrow
Tracking 50+ as the volume is high enough to be concerning.
Backed out again for 

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=69e98ce5b005f12dbf0a7a78b4dea69316f07dea
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=37107077&repo=mozilla-inbound

10:08:59     INFO - REFTEST INFO | Application command: C:\slave\test\build\application\firefox\firefox.exe -marionette -profile c:\users\cltbld\appdata\local\temp\tmprd2y8h.mozrunner
10:08:59     INFO - ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to c:\users\cltbld\appdata\local\temp\tmprd2y8h.mozrunner\runreftest_leaks.log
10:09:00     INFO - [2144] WARNING: Failed to load startupcache file correctly, removing!: file c:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/src/startupcache/StartupCache.cpp, line 219
10:09:00     INFO - [2144] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file c:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/src/xpcom/base/nsSystemInfo.cpp, line 116
10:09:00     INFO - [2144] WARNING: CheckLinkStatus called on main thread! No check performed. Assuming link is up, status is unknown.: file c:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/src/netwerk/system/win32/nsNotifyAddrListener.cpp, line 707
10:09:00     INFO - [2144] WARNING: This method is lossy. Use GetCanonicalPath !: file c:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/src/xpcom/io/nsLocalFileWin.cpp, line 3579
10:09:02     INFO - ++DOCSHELL 0F044000 == 1 [pid = 2144] [id = 1]
10:09:02     INFO - ++DOMWINDOW == 1 (0F044400) [pid = 2144] [serial = 1] [outer = 00000000]
10:09:02     INFO - ++DOMWINDOW == 2 (0F045000) [pid = 2144] [serial = 2] [outer = 0F044400]
10:09:02     INFO - [GFX1]: Failed to set device safe for multithreading.
10:09:02     INFO - Assertion failure: [GFX1]: Failed to set device safe for multithreading., at c:\builds\moz2_slave\m-in-w32-d-0000000000000000000\build\src\obj-firefox\dist\include\mozilla/gfx/Logging.h:513
10:09:13     INFO - #01: mozilla::gfx::Log<1,mozilla::gfx::CriticalLogger>::Flush() [obj-firefox/dist/include/mozilla/gfx/Logging.h:280]
10:09:13     INFO - 
10:09:13     INFO - #02: mozilla::gfx::Log<1,mozilla::gfx::CriticalLogger>::~Log<1,mozilla::gfx::CriticalLogger>() [obj-firefox/dist/include/mozilla/gfx/Logging.h:271]
10:09:13     INFO - 
10:09:13     INFO - #03: mozilla::gfx::DeviceManagerDx::CreateCompositorDeviceHelper(mozilla::gfx::FeatureState &,IDXGIAdapter1 *,bool,RefPtr<ID3D11Device> &) [gfx/thebes/DeviceManagerDx.cpp:274]
10:09:13     INFO - 
10:09:13     INFO - #04: mozilla::gfx::DeviceManagerDx::CreateCompositorDevice(mozilla::gfx::FeatureState &) [gfx/thebes/DeviceManagerDx.cpp:321]
10:09:13     INFO - 
10:09:13     INFO - #05: mozilla::gfx::DeviceManagerDx::CreateCompositorDevices() [gfx/thebes/DeviceManagerDx.cpp:126]
10:09:13     INFO - 
10:09:13     INFO - #06: gfxWindowsPlatform::InitAcceleration() [gfx/thebes/gfxWindowsPlatform.cpp:370]
10:09:13     INFO - 
10:09:13     INFO - #07: gfxPlatform::Init() [gfx/thebes/gfxPlatform.cpp:675]
10:09:13     INFO - 
10:09:13     INFO - #08: gfxPlatform::GetPlatform() [gfx/thebes/gfxPlatform.cpp:518]
10:09:13     INFO - 
10:09:13     INFO - #09: NS_InvokeByIndex
10:09:13     INFO - 
10:09:13     INFO - #10: CallMethodHelper::Invoke() [js/xpconnect/src/XPCWrappedNative.cpp:2064]
10:09:13     INFO - 
10:09:13     INFO - #11: XPCWrappedNative::CallMethod(XPCCallContext &,XPCWrappedNative::CallMode) [js/xpconnect/src/XPCWrappedNative.cpp:1350]
10:09:13     INFO - 
10:09:13     INFO - #12: XPC_WN_GetterSetter(JSContext *,unsigned int,JS::Value *) [js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1179]
10:09:13     INFO - 
10:09:13     INFO - #13: js::CallJSNative(JSContext *,bool (*)(JSContext *,unsigned int,JS::Value *),JS::CallArgs const &) [js/src/jscntxtinlines.h:239]
10:09:13     INFO - 
10:09:13     INFO - #14: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:458]
10:09:13     INFO - 
10:09:13     INFO - #15: InternalCall [js/src/vm/Interpreter.cpp:503]
10:09:13     INFO - 
10:09:13     INFO - #16: js::Call(JSContext *,JS::Handle<JS::Value>,JS::Handle<JS::Value>,js::AnyInvokeArgs const &,JS::MutableHandle<JS::Value>) [js/src/vm/Interpreter.cpp:522]
10:09:13     INFO - 
10:09:13     INFO - #17: js::CallGetter(JSContext *,JS::Handle<JS::Value>,JS::Handle<JS::Value>,JS::MutableHandle<JS::Value>) [js/src/vm/Interpreter.cpp:636]
10:09:13     INFO - 
10:09:13     INFO - #18: CallGetter [js/src/vm/NativeObject.cpp:1757]
10:09:13     INFO - 
10:09:13     INFO - #19: GetExistingProperty<1> [js/src/vm/NativeObject.cpp:1805]
10:09:13     INFO - 
10:09:13     INFO - #20: NativeGetPropertyInline<1> [js/src/vm/NativeObject.cpp:2032]
10:09:13     INFO - 
10:09:13     INFO - #21: js::NativeGetProperty(JSContext *,JS::Handle<js::NativeObject *>,JS::Handle<JS::Value>,JS::Handle<jsid>,JS::MutableHandle<JS::Value>) [js/src/vm/NativeObject.cpp:2066]
10:09:13     INFO - 
10:09:13     INFO - #22: js::GetProperty(JSContext *,JS::Handle<JSObject *>,JS::Handle<JSObject *>,JS::Handle<jsid>,JS::MutableHandle<JS::Value>) [js/src/jsobj.h:854]
10:09:13     INFO - 
10:09:13     INFO - #23: js::GetObjectElementOperation [js/src/vm/Interpreter-inl.h:458]
10:09:13     INFO - 
10:09:13     INFO - #24: js::GetElementOperation [js/src/vm/Interpreter-inl.h:563]
10:09:13     INFO - 
10:09:13     INFO - #25: Interpret [js/src/vm/Interpreter.cpp:2760]
10:09:13     INFO - 
10:09:13     INFO - #26: js::RunScript(JSContext *,js::RunState &) [js/src/vm/Interpreter.cpp:404]
10:09:13     INFO - 
10:09:13     INFO - #27: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:476]
10:09:13     INFO - 
10:09:13     INFO - #28: InternalCall [js/src/vm/Interpreter.cpp:503]
10:09:13     INFO - 
10:09:13     INFO - #29: js::Call(JSContext *,JS::Handle<JS::Value>,JS::Handle<JS::Value>,js::AnyInvokeArgs const &,JS::MutableHandle<JS::Value>) [js/src/vm/Interpreter.cpp:522]
10:09:13     INFO - 
10:09:13     INFO - #30: js::CallGetter(JSContext *,JS::Handle<JS::Value>,JS::Handle<JS::Value>,JS::MutableHandle<JS::Value>) [js/src/vm/Interpreter.cpp:636]
10:09:13     INFO - 
10:09:13     INFO - #31: CallGetter [js/src/vm/NativeObject.cpp:1757]
10:09:13     INFO - 
10:09:13     INFO - #32: GetExistingProperty<1> [js/src/vm/NativeObject.cpp:1805]
10:09:13     INFO - 
10:09:13     INFO - #33: NativeGetPropertyInline<1> [js/src/vm/NativeObject.cpp:2032]
10:09:13     INFO - 
10:09:13     INFO - #34: js::NativeGetProperty(JSContext *,JS::Handle<js::NativeObject *>,JS::Handle<JS::Value>,JS::Handle<jsid>,JS::MutableHandle<JS::Value>) [js/src/vm/NativeObject.cpp:2066]
10:09:13     INFO - 
10:09:13     INFO - #35: js::Wrapper::get(JSContext *,JS::Handle<JSObject *>,JS::Handle<JS::Value>,JS::Handle<jsid>,JS::MutableHandle<JS::Value>) [js/src/proxy/Wrapper.cpp:143]
10:09:13     INFO - 
10:09:13     INFO - #36: js::CrossCompartmentWrapper::get(JSContext *,JS::Handle<JSObject *>,JS::Handle<JS::Value>,JS::Handle<jsid>,JS::MutableHandle<JS::Value>) [js/src/proxy/CrossCompartmentWrapper.cpp:209]
10:09:13     INFO - 
10:09:13     INFO - #37: js::Proxy::get(JSContext *,JS::Handle<JSObject *>,JS::Handle<JS::Value>,JS::Handle<jsid>,JS::MutableHandle<JS::Value>) [js/src/proxy/Proxy.cpp:309]
10:09:13     INFO - 
10:09:13     INFO - #38: js::proxy_GetProperty(JSContext *,JS::Handle<JSObject *>,JS::Handle<JS::Value>,JS::Handle<jsid>,JS::MutableHandle<JS::Value>) [js/src/proxy/Proxy.cpp:582]
10:09:13     INFO - 
10:09:13     INFO - #39: js::GetProperty(JSContext *,JS::Handle<JSObject *>,JS::Handle<JS::Value>,js::PropertyName *,JS::MutableHandle<JS::Value>) [js/src/jsobj.h:846]
10:09:13     INFO - 
10:09:13     INFO - #40: js::GetProperty(JSContext *,JS::Handle<JS::Value>,JS::Handle<js::PropertyName *>,JS::MutableHandle<JS::Value>) [js/src/vm/Interpreter.cpp:4250]
10:09:13     INFO - 
10:09:13     INFO - #41: GetPropertyOperation [js/src/vm/Interpreter.cpp:191]
10:09:13     INFO - 
10:09:13     INFO - #42: Interpret [js/src/vm/Interpreter.cpp:2639]
10:09:13     INFO - 
10:09:13     INFO - #43: js::RunScript(JSContext *,js::RunState &) [js/src/vm/Interpreter.cpp:404]
10:09:13     INFO - 
10:09:13     INFO - #44: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:476]
10:09:13     INFO - 
10:09:13     INFO - #45: InternalCall [js/src/vm/Interpreter.cpp:503]
10:09:13     INFO - 
10:09:13     INFO - #46: js::Call(JSContext *,JS::Handle<JS::Value>,JS::Handle<JS::Value>,js::AnyInvokeArgs const &,JS::MutableHandle<JS::Value>) [js/src/vm/Interpreter.cpp:522]
10:09:13     INFO - 
10:09:13     INFO - #47: JS_CallFunctionValue(JSContext *,JS::Handle<JSObject *>,JS::Handle<JS::Value>,JS::HandleValueArray const &,JS::MutableHandle<JS::Value>) [js/src/jsapi.cpp:2766]
10:09:13     INFO - 
10:09:13     INFO - #48: nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS *,unsigned short,XPTMethodDescriptor const *,nsXPTCMiniVariant *) [js/xpconnect/src/XPCWrappedJSClass.cpp:1211]
10:09:13     INFO - 
10:09:13     INFO - #49: nsXPCWrappedJS::CallMethod(unsigned short,XPTMethodDescriptor const *,nsXPTCMiniVariant *) [js/xpconnect/src/XPCWrappedJS.cpp:614]
10:09:13     INFO - 
10:09:13     INFO - #50: PrepareAndDispatch [xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:85]
10:09:13     INFO - 
10:09:13     INFO - #51: SharedStub [xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:113]
10:09:13     INFO - 
10:09:13     INFO - #52: NS_CreateServicesFromCategory(char const *,nsISupports *,char const *,char16_t const *) [xpcom/components/nsCategoryManager.cpp:826]
10:09:13     INFO - 
10:09:13     INFO - #53: nsXREDirProvider::DoStartup() [toolkit/xre/nsXREDirProvider.cpp:1174]
10:09:13     INFO - 
10:09:13     INFO - #54: XREMain::XRE_mainRun() [toolkit/xre/nsAppRunner.cpp:4249]
10:09:13     INFO - 
10:09:13     INFO - #55: XREMain::XRE_main(int,char * * const,nsXREAppData const *) [toolkit/xre/nsAppRunner.cpp:4542]
10:09:13     INFO - 
10:09:13     INFO - #56: XRE_main [toolkit/xre/nsAppRunner.cpp:4633]
10:09:13     INFO - 
10:09:14     INFO - #57: do_main [browser/app/nsBrowserApp.cpp:282]
10:09:14     INFO - 
10:09:14     INFO - #58: NS_internal_main(int,char * *,char * *) [browser/app/nsBrowserApp.cpp:415]
10:09:14     INFO - 
10:09:14     INFO - #59: wmain [toolkit/xre/nsWindowsWMain.cpp:118]
10:09:14     INFO - 
10:09:14     INFO - #60: __scrt_common_main_seh [f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253]
10:09:14     INFO - 
10:09:14     INFO - #61: kernel32.dll + 0x53c45
10:09:14     INFO - 
10:09:14     INFO - #62: ntdll.dll + 0x637f5
10:09:14     INFO - 
10:09:14     INFO - #63: ntdll.dll + 0x637c8
10:09:14     INFO -
Backout by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3f81af1ef8c9
Backed out changeset 69e98ce5b005 for asserting with "[GFX1]: Failed to set device safe for multithreading" in R(R) on Windows 7 VM debug. r=backout on a CLOSED TREE
So I bet this regression was caused by https://hg.mozilla.org/releases/mozilla-beta/rev/8508083db4ac. Before this patch GetD3D11DeviceForCurrentThread would return the ImageBridge device, after this patch GetDeviceForCurrentThread returns the Compositor Device. Matt, David, what do you think?
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(dvander)
Flags: needinfo?(bas)
(In reply to Bas Schouten (:bas.schouten) from comment #56)
> So I bet this regression was caused by
> https://hg.mozilla.org/releases/mozilla-beta/rev/8508083db4ac. Before this
> patch GetD3D11DeviceForCurrentThread would return the ImageBridge device,
> after this patch GetDeviceForCurrentThread returns the Compositor Device.
> Matt, David, what do you think?

The first appearance of "std::list<T>::clear | CDeviceChild<T>::~CDeviceChild<T>" is 2016-08-17.
There was a spike on Sep 23, when we released 50.0b.
https://crash-stats.mozilla.com/signature/?date=%3E%3D2016-07-01&signature=std%3A%3Alist%3CT%3E%3A%3Aclear%20%7C%20CDeviceChild%3CT%3E%3A%3A~CDeviceChild%3CT%3E#graphs

The date of that patch is compatible with the other signature, "std::list<T>::clear". For this signature we had a spike between Jul 31 and Aug 1.
https://crash-stats.mozilla.com/signature/?date=%3E%3D2016-07-01&signature=std%3A%3Alist%3CT%3E%3A%3Aclear#graphs

This second signature abruptly disappears on Aug 17, which is when the first signature started, which makes me think the signatures somehow changed. Maybe we added "std::list<T>::clear" to the skiplist?

In 50.0b the volume is much higher than before though ("std::list<T>::clear" was ~200 in August, it was still ~200 when it became "std::list<T>::clear | CDeviceChild<T>::~CDeviceChild<T>", it's been ~600 since we released 50).
(In reply to Marco Castelluccio [:marco] from comment #57)
> This second signature abruptly disappears on Aug 17, which is when the first
> signature started, which makes me think the signatures somehow changed.
> Maybe we added "std::list<T>::clear" to the skiplist?

Yes, I hadn't noticed bug 1295362.

Btw, I thought https://hg.mozilla.org/releases/mozilla-beta/rev/8508083db4ac was an uplift. It isn't, so the spike on Sep 23 can be explained by the move of 50 from Aurora to Beta.
(In reply to Marco Castelluccio [:marco] from comment #58)
> (In reply to Marco Castelluccio [:marco] from comment #57)
> > This second signature abruptly disappears on Aug 17, which is when the first
> > signature started, which makes me think the signatures somehow changed.
> > Maybe we added "std::list<T>::clear" to the skiplist?
> 
> Yes, I hadn't noticed bug 1295362.
> 
> Btw, I thought https://hg.mozilla.org/releases/mozilla-beta/rev/8508083db4ac
> was an uplift. It isn't, so the spike on Sep 23 can be explained by the move
> of 50 from Aurora to Beta.

It should be noted the signature in this bug is also 'relatively' unrelated to the actual bug, causing the increase in driver crashes on NVidia and Intel as far as I can tell. It seems to be a side effect of that.
Approval Request Comment
[Feature/regressing bug #]: 1282364
[User impact if declined]: Increase in driver crashes.
[Describe test coverage new/current, TreeHerder]: None, beta-only code.
[Risks and why]: Some, only testable on Beta. Reverts an unintended behavior change which has been riding the trains.
[String/UUID change made/needed]: None
Assignee: nobody → bas
Attachment #8797720 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8798223 - Flags: review?(matt.woodrow)
Attachment #8798223 - Flags: approval-mozilla-beta?
I spoke with Bas on IRC. This signature is not Beta-only, so there might be some other issue at play for Aurora and Nightly.
The signatures from bug 1307543 are Beta-only though, so this patch might help fixing those.
Attachment #8798223 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8798223 [details] [diff] [review]
Return the correct device from GetDeviceForCurrentThread()

A top crash fix, Beta50+
Attachment #8798223 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Tracking in 51 & 52. The volume of crashes is concerning and we should probably address those before the release.

Note to another release manager or my future self, we should have a look if 50.0b5 still has the crash or not. If we still have the crash, we should probably track it and updated the status-firefox50 flag to affected.
the signature is still occurring on 50.0b5: https://crash-stats.mozilla.com/report/index/24d0690b-410a-4f5e-824c-c93402161008
Flags: needinfo?(bas)
(In reply to Sylvestre Ledru [:sylvestre] from comment #64)
> Tracking in 51 & 52. The volume of crashes is concerning and we should
> probably address those before the release.
> 
> Note to another release manager or my future self, we should have a look if
> 50.0b5 still has the crash or not. If we still have the crash, we should
> probably track it and updated the status-firefox50 flag to affected.

As discussed in previous comments. There is a lot of confusion around what crashes different places are talking about. This patch should considerably bring down the crash rate on -intel- hardware. Which was a subtly different, likely somewhat related, crash of much higher volume.
Flags: needinfo?(bas)
To avoid confusion and further discussion, I've morphed this bug into what
Bas fixed and filed bug 1308863 to track the regression with the
std::list<T>::clear signature.
Blocks: 1307543
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(milan)
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(dvander)
Flags: needinfo?(ajones)
Resolution: --- → FIXED
Summary: Crash in std::list<T>::clear() from D2D1::EndDraw() → Wrong device returned in GetDeviceForCurrentThread
Crash Signature: [@ std::list<T>::clear] [@ std::list<T>::clear | CDeviceChild<T>::~CDeviceChild<T>]
You need to log in before you can comment on or make changes to this bug.