3.05 KB, patch
|Details | Diff | Splinter Review|
2.29 KB, patch
|Details | Diff | Splinter Review|
1.70 KB, patch
|Details | Diff | Splinter Review|
[Tracking Requested - why for this release]: This bug was filed from the Socorro interface and is report bp-e6cd12c2-6ead-4a2e-8cb3-94da52150525. ============================================================= This signature is the #1 in Top Crash Scores in 39.0b1 right now because about half of those crashes happen within the first 60 seconds after startup and overall, this is over 5% of all crashes in this beta. Graphic Adapter split is as follows: Intel Corporation Core Processor Integrated Graphics Controller 2794 63.084 % Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller 1486 33.552 % Intel Corporation Core Processor Integrated Graphics Controller 118 2.664 % Intel Corporation 4 Series Chipset Integrated Graphics Controller 30 0.677 % Intel Corporation 4 Series Chipset Integrated Graphics Controller 1 0.023 % The crashes seem to be related to video, as some comments point to things like: "i cannot open youtube" "crashes before videos begin." "i can,t wach any vido" "when i want to see videos in youtube i'm getting this thing" "i was on rapgodfather playing a video then the pluggin container crash everytime i play a video" etc. Bas, I thought we already blocked stuff related to this in bug 600152 and http://hg.mozilla.org/mozilla-central/rev/b07c0925efe5 but we are still seeing a spike in the same signature. Can you take a look?
Sounds like video issue from the comments. Matt, any idea? I wouldn't read too much into the other similar bug, it's probably just a similar driver issue.
Anthony, given this is a mostly-startup crash that comments say involves video, can you take a look if we need to do something on the media side here?
Better stack: igd10umd32+0x280a1 d3d11!CResource<ID3D11Texture3D>::CLS::FinalConstruct d3d11!TCLSWrappers<CTexture2D>::CLSFinalConstructFn d3d11!CLayeredObjectWithCLS<CTexture2D>::FinalConstruct d3d11!CLayeredObjectWithCLS<CTexture2D>::CreateInstance d3d11!CDevice::CreateLayeredChild d3d11!CBridgeImpl<ID3D11LayeredDevice,ID3D11LayeredDevice,CLayeredObject<CDevice> >::CreateLayeredChild d3d11!CD3D11LayeredChild<ID3D11DeviceChild,NDXGI::CDevice,64>::FinalConstruct d3d11!NDXGI::CResource::FinalConstruct d3d11!NDXGI::CDevice::CreateLayeredChild d3d11!CBridgeImpl<ID3D11LayeredDevice,ID3D11LayeredDevice,CLayeredObject<NDXGI::CDevice> >::CreateLayeredChild d3d11!NOutermost::CDeviceChild::FinalConstruct d3d11!CUseCountedObject<NOutermost::CDeviceChild>::CUseCountedObject<NOutermost::CDeviceChild> d3d11!CUseCountedObject<NOutermost::CDeviceChild>::CreateInstance d3d11!NOutermost::CDevice::CreateLayeredChild d3d11!CDevice::CreateAndRecreateLayeredChild<SD3D11LayeredTexture2DCreationArgs> d3d11!CDevice::CreateTexture2D_Worker d3d11!CDevice::OpenSharedResourceInternal_Worker d3d11!CDevice::OpenSharedResource xul!mozilla::layers::DXGIYCbCrTextureHostD3D11::OpenSharedHandle xul!mozilla::layers::DXGIYCbCrTextureHostD3D11::Lock xul!mozilla::layers::ImageHost::Lock xul!mozilla::layers::AutoLockCompositableHost::AutoLockCompositableHost xul!mozilla::layers::ImageHost::Composite xul!mozilla::layers::ImageLayerComposite::RenderLayer xul!mozilla::layers::RenderLayers<mozilla::layers::ContainerLayerComposite> xul!mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite> xul!mozilla::layers::ContainerLayerComposite::RenderLayer xul!mozilla::layers::RenderLayers<mozilla::layers::ContainerLayerComposite> xul!mozilla::layers::ContainerRender<mozilla::layers::ContainerLayerComposite> xul!mozilla::layers::ContainerLayerComposite::RenderLayer xul!mozilla::layers::LayerManagerComposite::Render xul!mozilla::layers::LayerManagerComposite::EndTransaction xul!mozilla::layers::LayerManagerComposite::EndEmptyTransaction xul!mozilla::layers::CompositorParent::CompositeToTarget xul!mozilla::layers::CompositorParent::CompositeCallback xul!RunnableMethod<mozilla::layers::CompositorParent,void (__thiscall mozilla::layers::CompositorParent::*)(mozilla::TimeStamp),Tuple1<mozilla::TimeStamp> >::Run xul!MessageLoop::DoWork xul!base::MessagePumpForUI::DoRunLoop xul!base::MessagePumpWin::RunWithDispatcher xul!base::MessagePumpWin::Run xul!MessageLoop::RunHandler xul!MessageLoop::Run xul!base::Thread::ThreadMain ntdll!__RtlUserThreadStart ntdll!_RtlUserThreadStart
Rank Adapter driver version Count % 1 22.214.171.1246 6755 100.00 % Rank Adapter device id Count % 1 0x0046 4028 59.63 % 2 0x2a42 2473 36.61 % 3 0x0042 197 2.92 % 4 0x2e12 49 0.73 % 5 0x2e22 8 0.12 %
It may be triggered by video but I don't see any media code on the stack. Does comment 4 help?
This has been identified by "stability" as the top issue in 39, it really should get tracking +.
(In reply to David Major [:dmajor] from comment #7) > It may be triggered by video but I don't see any media code on the stack. > Does comment 4 help? DXGIYCbCrTextureHostD3D11 is mediacode essentially, it just lives inside layers.
Note that this code was introduced in https://hg.mozilla.org/mozilla-central/rev/d6242b24bc47 as part of a big media landing.
This is turning into a needinfo hog :) Some of the video guys are offline - while we're waiting, Sotaro, can you take a look at the stack in comment 4, and let us know if you have any ideas as to where we should start? I know it isn't a mobile platform, but it is video so you may come up with something we didn't think of.
Given that this is isolated to a single driver version (126.96.36.1996), is there something we could turn off to avoid this codepath?
Liz is running 39. Transferring the n-i to make sure she is aware of this.
FEATURE_HARDWARE_VIDEO_DECODING feature is available for separate blocklisting as of 36, so if this path is due to that particular feature, yes, we can do it. We can get a quick exploratory patch with that fix, and see how it affects the crash results. When is the next beta build?
According to the calendar thing, today is go to build for b3, and Monday for b4.
(In reply to Sylvestre Ledru [:sylvestre] from comment #13) > Liz is running 39. Transferring the n-i to make sure she is aware of this.
Topcrash, startupcrash, tracking for 39+. Milan I am about to go to build a little bit later this morning. If you can get me a patch as soon as possible then I can get it into Beta 3.
If the crash could happen driver independent ways, delete DXGIYCbCrTextureClientD3D11 before DXGIYCbCrTextureHostD3D11::OpenSharedHandle() might cause this problem. DXGIYCbCrTextureClientD3D11 allocates the buffer and send it to Host side. nical, is there such possibility?
By the way, only on b2g, life time of ImageLayer's TextureClient is ensured by ImageClient::RemoveTextureWithTracker(). https://dxr.mozilla.org/mozilla-central/source/gfx/layers/client/ImageClient.cpp#81
To continue on the blocklisting - looking at the code, I would expect that bug 1151721 has already blocklisted DXVA for these cards. So, Liz, probably not a quick patch at this point, need to sort this out.
Jet, there is some confusion over whether these crashes are on DXVA systems or not - can you find somebody with video experience to tell us from the crash reports/stacks if this is going through the DXVA path (in which case we have a blocklist problem (comment 20) or if we have a crash without video hardware acceleration, in which case we don't have a quick blocklisting solution.
This is definitely the *non* DXVA case. We can either add a new blacklist type for this (uploading YUV in the content process), or we could just hardcoded a check for this specific driver when setting up mD3D11ImageBridgeDevice.
(In reply to Sotaro Ikeda [:sotaro] from comment #18) > If the crash could happen driver independent ways, delete > DXGIYCbCrTextureClientD3D11 before > DXGIYCbCrTextureHostD3D11::OpenSharedHandle() might cause this problem. > DXGIYCbCrTextureClientD3D11 allocates the buffer and send it to Host side. > > nical, is there such possibility? The KeepUntilFullDeallocation of YCbCrKeeAliveD3D11 in the destructor is supposed to prevent this problem from happening.
Created attachment 8615962 [details] [diff] [review] some extra error checks This will probably not fix the issue but still, let's add a bit of paranoiac error checks around the code that initializes the textures on the client side, if only to discard the doubt that something bad happened there.
Following up on sotaro's idea that the OpenSharedHandle could be called after the destruction of the object itself, perhaps it could be that the device was lost before OpenSharedHandle (rather than the texture being destroyed)? Most D3D calls would just return DXGI_ERROR_DEVICE_REMOVED rather than crash but who knows...
Created attachment 8615977 [details] [diff] [review] some extra error checks Sorry, missed a return true in the previous patch
So this driver version has always been blacklisted for D2D. That means we've never used texture sharing here, likely texture sharing is just bad on this driver. Let's not do it. We might even simply want to leach this off of D2D enabled (i.e. for an initial patch simply do if D2D is for any reason not used don't use texture sharing)
(In reply to Bas Schouten (:bas.schouten) from comment #29) > So this driver version has always been blacklisted for D2D. That means we've > never used texture sharing here, likely texture sharing is just bad on this > driver. Let's not do it. We might even simply want to leach this off of D2D > enabled (i.e. for an initial patch simply do if D2D is for any reason not > used don't use texture sharing) I like this - let's get a patch for that? Blacklisted D2D means no texture sharing anywhere, for anything. Or more to the point, if we decide that we don't want texture sharing for compositor, we should decide the same everywhere. Even if this disqualifies some systems that would otherwise work, we really don't need the complexity of one more code path. We don't have one of the devices above, but we do have another GMAX4500HD, we'll see if we can reproduce the problem with this 188.8.131.526 driver.
I did an initial patch to just block this feature for this specific driver - https://hg.mozilla.org/try/rev/0715be474eeb I prefer Bas' idea though, that sounds simpler and might catch other issues in the future. I'm about to get on a plane, but will write a patch for this on Monday if nobody gets to it before then.
OK, up to you all. Sounds good! I'm aiming to get everything on mozilla-beta by 9am PDT on Monday. Otherwise this may have to wait for Beta 5.
This has a super high crash score on https://crash-analysis.mozilla.com/rkaiser/crash-report-tools/score/?version=39.0b3&limit=30. I don't think we should go another beta release with this unfixed. Any chance to get comment 31 on beta Monday morning? Otherwise we should see if Liz is willing to grant an extension...
Created attachment 8616512 [details] [diff] [review] sharing-crash
I'm aiming to go to build by around 2pm PDT which would mean we need to merge this into beta (and allow 3 or so hours for the tests to run) before 11am. If that helps! If we need to wait longer, we can do that, but it means some delay before QE can test things.
Comment on attachment 8616512 [details] [diff] [review] sharing-crash Approval Request Comment [Feature/regressing bug #]: Bug 1138967 [User impact if declined]: Crashes for users with specific driver [Describe test coverage new/current, TreeHerder]: None, but very simple change to fallback to existing code path. [Risks and why]: Low risk. [String/UUID change made/needed]: None
Comment on attachment 8616512 [details] [diff] [review] sharing-crash Approved for uplift to aurora and beta. After discussion with Ryan we want to put this into beta 4 now but with the previous build prepared as a fallback in case this doesn't pass tests on the merge to beta.
I can reproduce this crash.
I've confirmed that sharing alpha textures is broken in this driver.
There's a test program here: https://github.com/jrmuizel/d3d-tests/blob/master/alpha-texture-sharing.cc The test program crashes on 184.108.40.2066 but not on 220.127.116.112 or 18.104.22.16869
(In reply to Jeff Muizelaar [:jrmuizel] from comment #44) > There's a test program here: > https://github.com/jrmuizel/d3d-tests/blob/master/alpha-texture-sharing.cc > > The test program crashes on 22.214.171.1246 but not on 126.96.36.1992 or > 188.8.131.5269 We may be able to use this information to relax the current blocklisting eventually.
Using an R8 texture instead of an A8 texture does not crash.
The crash signature is significantly lower on 39.0b5 with this patch but it is still significant enough to be a concern, esp. as it looks like the remaining crashes are almost all on startup now.
Can you link to some of them? I don't know how to find them in Soccoro.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #50) > Can you link to some of them? I don't know how to find them in Soccoro. https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox:39.0b4&signature=igd10umd32.dll%400x280a1#tab-reports
(In reply to Robert Kaiser (:firstname.lastname@example.org) - on vacation or slow to reply until the end of June from comment #51) > (In reply to Jeff Muizelaar [:jrmuizel] from comment #50) > > Can you link to some of them? I don't know how to find them in Soccoro. > > https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox: > 39.0b4&signature=igd10umd32.dll%400x280a1#tab-reports How did you get this list?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #52) > (In reply to Robert Kaiser (:email@example.com) - on vacation or slow to > reply until the end of June from comment #51) > > (In reply to Jeff Muizelaar [:jrmuizel] from comment #50) > > > Can you link to some of them? I don't know how to find them in Soccoro. > > > > https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox:39.0b4&signature=igd10umd32.dll%400x280a1#tab-reports > > How did you get this list? I went into a top crash report for 39.0b4 on crash-stats (by selecting that from the dropdowns), and clicked on this signature (and then redacted the date fields out of the URL so when you go there you'll always get the last 7 days).
Created attachment 8621646 [details] [diff] [review] Sharing crash Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=558bbb000bea It appears that not all affected devices on all versions of windows were actually blacklisted for direct2d with this driver version, so the existing patch didn't catch all affected machines. We found a machine in the Toronto office which can reproduce this and tested with a variety of different driver version. Both the driver release before (2082) and after (2092) do not crash, so it appears it is literally just this version that has the bug. The new patch just checks for this specific version and disables using the broken functionality.
Comment on attachment 8621646 [details] [diff] [review] Sharing crash Approval Request Comment [Feature/regressing bug #]: Bug 1138967 [User impact if declined]: Crashes for users with specific driver [Describe test coverage new/current, TreeHerder]: None, but very simple change to fallback to existing code path. [Risks and why]: Low risk. [String/UUID change made/needed]: None This is even more targeted than the previous attempt, and we've confirmed locally that only this exact driver version is affected.
Comment on attachment 8621646 [details] [diff] [review] Sharing crash Approved for uplift to aurora and beta. Crash fix.
Still the #3 topcrash for 39.0b5.
I just mixed this up with another bug. This is actually #69 crasher in 39.0b5. So that is a big improvement.
I guess we missed beta 6 as well. These flags should have been reset to alert the uplift queries. 41 is looking good.
No crashes since this landed that I can see.
Looking in the crash-stats (https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=28&signature=igd10umd32.dll%400x280a1#tab-reports) I can see still a few crashes with Firefox 39 RC build 4 (20150624153222)  and with Firefox 39 RC build 5 (20150626112833)  and only one crash with Aurora 40.0a2 (20150619004003) . All crashes happened after the fix landed. igd10umd64.dll@0x3045b signature (https://crash-stats.mozilla.com/report/list?range_unit=days&range_value=28&signature=igd10umd64.dll%400x3045b#tab-reports) recorded two crashes after the fix, one on Aurora 40.0a2 (20150621004005)  and one on Nightly 41.0a1 (20150621030204) .  https://crash-stats.mozilla.com/report/index/bab1d979-92d0-4bfa-a91d-14d1b2150627 https://crash-stats.mozilla.com/report/index/e6d7ec1f-d7bf-48e7-9d66-cd9132150629  https://crash-stats.mozilla.com/report/index/94066fb6-253c-4f92-b5da-f51b22150628 https://crash-stats.mozilla.com/report/index/5fbb2eb5-a019-4747-afdb-03c272150628  https://crash-stats.mozilla.com/report/index/ed0e1d9c-4f0f-426d-8db6-b31392150624  https://crash-stats.mozilla.com/report/index/8c921103-abcc-48fe-9222-452352150622  https://crash-stats.mozilla.com/report/index/49dcad9d-1cbd-423c-8384-0dc812150630
I see 909 crashes with Firefox 41.0.2 over the last week in igd10umd32.dll@0x280a1 but I'm not clear if they're what's tracked in this bug report. Milan, does this need to be tracked?
We should track it because it came back, probably just a few days later with bug 1173983? Matt, Jeff, we still have these crashes (e.g., see https://crash-stats.mozilla.com/report/index/7d40dc02-7498-4477-93ae-921e42151103), and it seems that driver is bad, but it passes whatever test we want things to pass. Or am I reading the code wrong? Or, is this a different crash, which just happens to have the same top address (heck, could be "assert" for all I know :)
We can reproduce this crash locally.
Sorry wrong bug.
[Tracking Requested - why for this release]: nominating to track based on comment 67.
Does anyone know how dmajor got a decent stack trace from the original report? I assume windbg, which I can try out next week (don't have my windows machine with me this week). We reproduced this in the Toronto office previously (though it doesn't appear we noted which machine), so I assume we can do that again. It would be nice to check if the alpha texture sharing check added for this specific driver is actually working. If it is then we might have a more general crash with this driver, so maybe we should blacklist it entirely.
(In reply to Matt Woodrow (:mattwoodrow) from comment #71) > Does anyone know how dmajor got a decent stack trace from the original > report? I assume windbg, which I can try out next week (don't have my > windows machine with me this week). AFAIK, yes, he loaded the minidump into WinDbg, with the symbol server set to the Mozilla one so it would find those, and that usually produces better stack traces than what the minidump stackwalker can do on the server.
[Tracking Requested - why for this release]: Too late for 42 but we could add that for 43.
Tracking for 43+ since this crash signature is still showing up, with 138 crashes on 43.0b1 in the last week.
I tried reproducing some kind of crash on a 0x2a42 with 184.108.40.2066 and could not reproduce any problems on youtube with DXVA on and with it off. I'm not sure what to do next. How severe is this crash?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #75) > How severe is this crash? Here is some data I pulled out of Socorro for Firefox 42: * 615 crashes in igd10umd32.dll@0x280a1, 0 in igd10umd64.dll@0x3045b, 100% report driver 220.127.116.116 * This crash is #46 for all Fx42 users but #1 for Fx42 users on 18.104.22.1686 * This crash is 49% of the total crash volume for users with Intel 22.214.171.1246 * Users with Intel 126.96.36.1996 account for 0.66% of the total crash volume for all Intel chipset users My best assessment would be that this certainly isn't our most severe issue but it's likely severe for those hundreds of people using this Intel driver. Could this be worked around with a blocklist?
I've just reproduced this locally. I did by running video for a long time. I'll try to do it again.
Any luck here? Or should we be wontfixing this for 43?
Let's won't fix for 43.
I saw only about 10 instances of this crash on FF44.0a2 based on 28 days of crash data. If this doesn't get fixed in the next 2-3 weeks, I will be inclined to wontfix for 44.
We're going to try and find out if the number went down because we lost all the users with this configuration, or because something got better.
igd10umd32.dll@0x7f99 shows up as new in the Firefox Release explosiveness report and ranks #48 in 43.0.1 with 512 crashes (0.25%): https://crash-stats.mozilla.com/report/list?signature=igd10umd32.dll%400x7f99 Note the following driver breakdown: 1 188.8.131.522 540 39.82 % 2 184.108.40.2063 377 27.80 % 3 220.127.116.115 171 12.61 % 4 18.104.22.1682 150 11.06 % 5 22.214.171.1244 87 6.42 % 6 126.96.36.1991 28 2.06 % 7 188.8.131.520 1 0.07 % 8 8.652.0.0 1 0.07 % 9 8.672.4.0 1 0.07 % Is this another manifestation of the issue in this bug report?
Johnny, Jet: This is a crash that has been getting wontfix'd for too many releases. Copied below is the crash data (for the first signature) for 7 days. This is not a top crasher (ranked #39 atm). Would you be able to help prioritize investigation on this one? Product Version Percentage Number Of Crashes Firefox 44.0b6 9.06% 163 Firefox 44.0b4 7.11% 128 Firefox 44.0b7 2.83% 51 Firefox 44.0b1 1.06% 19 Firefox 44.0b2 0.56% 10
The "wontfix'd for too many release" doesn't quite apply - this bug is a bit of a meta bug for a few different problems, which is why the "leave open" is in place - but they are related and there are still some crashes with this signature.
Can we ask something at Intel about this crash just going by the crash address and driver version?
The conversation with Intel started, but nothing tangible happened yet.
Not much that can be done for Fx44, now a wontfix.
Milan, any further followups with Intel around this issue?
No information from Intel. I'm not holding my breath, these are old(er) drivers and systems. We are blocklisting these since about 2/9/16 (bug 1207993), although, based on comment 82 and other data, actually not blocking the driver version that's in the summary of this bug.
OK, so we're blocking many of the drivers Anthony mentions in comment 82 but not the one in the bug summary (presumably it isn't in high enough volume to warrant blocking at this point). Blocked: 184.108.40.2061 220.127.116.115 18.104.22.1682 22.214.171.1243 126.96.36.1992 188.8.131.524 Should we try blocking the driver in the summary? Or is that unlikely to be fruitful? If not, then we should stop tracking this and move on. It isn't a startup crash and while it's in the top 50 for 44.0.2 it's around #48 (https://crash-stats.mozilla.com/report/list?signature=igd10umd32.dll%400x280a1) Anthony what do you think? How do you come up with your list of most commonly seen drivers for a signature? Is there an obvious candidate to add to our blocklist?
According to supersearch this shows up exclusively on 184.108.40.2066: https://crash-stats.mozilla.com/search/?product=Firefox&signature=%3Digd10umd32.dll%400x280a1&signature=%3Digd10umd64.dll%400x3045b&_facets=signature&_facets=adapter_driver_version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-adapter_driver_version Firefox 44.0.2 has seen 1327 crashes over the last week which would place it around #46 on Release.
May as well add that one as well: Jorge, could you add this one to the downloadable list? <gfxBlacklistEntry> <os>All</os> <vendor>0x8086</vendor> <devices> <device>0x2a42</device> <device>0x2e22</device> <device>0x2e12</device> <device>0x2e32</device> <device>0x0046</device> </devices> <featureStatus>BLOCKED_DRIVER_VERSION</featureStatus> <driverVersion>220.127.116.116</driverVersion> <driverVersionComparator>EQUAL</driverVersionComparator> </gfxBlacklistEntry>
Switching assignee to Milan since I don't think there's anything actionable for me to do here.
If we see the reduction in crashes, we should just close this bug.
When will we know whether we've done what we can do here?
I think we can call this done. There is a big reduction in crashes after Feb. 10 and then another drop after Feb. 15. There are still a few crashes on beta 10 with this signature and 1 on 45.0b99. But clearly we got most of them. Milan and Sylvestre, your call if you want to close this and declare it fixed for 45.
Closing this works for me.
Untracking for 45.
17 crashes with 45, not too much indeed