Closed
Bug 1270686
Opened 8 years ago
Closed 8 years ago
Firefox 47 spike in crashes in TppTimerpExecuteCallback, on ATI graphics cards
Categories
(Core :: Audio/Video: Playback, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla49
People
(Reporter: dbaron, Assigned: cpearce)
References
Details
(Keywords: crash, topcrash, Whiteboard: [gfx-noted])
Crash Data
Attachments
(6 files, 4 obsolete files)
84.02 KB,
image/png
|
Details | |
109.92 KB,
image/png
|
Details | |
163.98 KB,
text/plain
|
Details | |
8.02 KB,
patch
|
glandium
:
review+
|
Details | Diff | Splinter Review |
2.04 KB,
patch
|
cpearce
:
review+
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
58 bytes,
text/x-review-board-request
|
ajones
:
review+
Sylvestre
:
approval-mozilla-aurora+
|
Details |
[Tracking Requested - why for this release]: This bug was filed from the Socorro interface and is report bp-9e1546a6-57a7-4743-9fe0-9e3062160429. ============================================================= Both 47.0b1 and 47.0b2 have had significantly more crashes with signature TppTimerpExecuteCallback (see also bug 964351) than other recent beta/release versions. All of these crashes happen on ATI graphics cards (every single one has vendor ID 0x1002), and they have an interesting distribution of install age (which I believe is time from Firefox install to crash) peaking around 40 seconds, with the oldest being around 100 seconds. A decent number of individual users have submitted 2 crash reports, although relatively small numbers have submitted more than that. It seems like it may be the sort of crash that is preventing these users from ever starting Firefox, although I'm not sure (I'd have expected slightly more repeat-submission in that case, although that's just intuition). The dominant OS is Windows 8.1 (at 90.5%), although there are some crashes from Windows 10 (7%) and Windows 8 (2.5%). The top of the distribution of adapter device IDs is: Rank Adapter device id Count % 1 0x9851 182 26.07 % 2 0x9834 97 13.90 % 3 0x9853 66 9.46 % 4 0x9830 54 7.74 % 5 0x9832 47 6.73 % 6 0x9838 32 4.58 % 7 0x983d 19 2.72 % 8 0x9839 15 2.15 % 9 0x9850 15 2.15 % 10 0x990b 14 2.01 % 11 0x9809 12 1.72 % 12 0x9852 12 1.72 %
Flags: needinfo?(milan)
Flags: needinfo?(bas)
Reporter | ||
Comment 1•8 years ago
|
||
FWIW, I noticed this because it's at the #1 on my query for crashes in the first 2 minutes after install: https://crash-stats.mozilla.com/search/?product=Firefox&version=47.0b1&install_age=%3C120&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature (that's for 47.0b1) (though it's #23 on the overall topcrash list).
Reporter | ||
Comment 2•8 years ago
|
||
And this is present on nightly, but not in large enough numbers to get an accurate regression range. In particular, it started here: Build ID Count 20160219030248 1 20160225030209 1 20160301030237 2 20160310030242 1 20160311030233 2 20160312030405 2 20160313030418 1 20160314030215 1 20160315030230 1 20160318030236 1 20160319030558 4 20160321030217 1 20160322030417 1 20160324030447 2 20160402030226 1 20160403030243 1 20160406030221 2 20160411030231 2 20160416030220 1 20160421030302 1 20160424030601 1
Comment 3•8 years ago
|
||
(In reply to David Baron [:dbaron] ⌚️UTC-7 (review requests must explain patch) from comment #1) > FWIW, I noticed this because it's at the #1 on my query for crashes in the > first 2 minutes after install: > > https://crash-stats.mozilla.com/search/?product=Firefox&version=47. > 0b1&install_age=%3C120&_facets=signature&_columns=date&_columns=signature&_co > lumns=product&_columns=version&_columns=build_id&_columns=platform#facet- > signature > (that's for 47.0b1) > > (though it's #23 on the overall topcrash list). I wonder if we're somehow deciding to blocklist these people only after first launch? Indeed I don't see any other explanation behind the low number of repeat submissions.
Flags: needinfo?(bas)
Reporter | ||
Comment 4•8 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #3) > I wonder if we're somehow deciding to blocklist these people only after > first launch? Indeed I don't see any other explanation behind the low number > of repeat submissions. I don't think I buy that since there are plenty of 2x submissions, and a few 3x, 4x, etc. I was thinking a more likely explanation is that people are perhaps likely to try starting up and crash more than 2 times, but they might actually bother to submit the crash report only the first or second time. This argues for the theory that this is a startup crash that prevents users from using Firefox.
It doesn't look like the crash guard is stopping the repeated crashes, I don't see any of these with GraphicsStartupTest set in the metadata. There is also plenty of these crashes 5+ minutes into the session, rather than on startup. I also wouldn't be surprised if some of the other crashes on these devices were "the same" (e.g., https://crash-stats.mozilla.com/report/index/8d424a9e-d247-4d32-a7b7-d04ee2160429, https://crash-stats.mozilla.com/report/index/9a63e97f-6cda-4ffd-8bc5-95bcd2160429). Anyone knows how to do the correlation on "strange" DLLs being loaded?
Flags: needinfo?(milan)
Reporter | ||
Comment 6•8 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #5) > It doesn't look like the crash guard is stopping the repeated crashes, I > don't see any of these with GraphicsStartupTest set in the metadata. There > is also plenty of these crashes 5+ minutes into the session, rather than on > startup. Er, right. I think I might still have been looking at my filtered query when I concluded that! It is still the largest crash within the first 120 seconds after install. > Anyone knows how to do the correlation on "strange" DLLs being loaded? By reading the correlation reports are in https://crash-analysis.mozilla.com/crash_analysis/ -- but the conclusion I came to right before filing this bug was that this crash was extremely well correlated with ATI drivers, and with a decent (although not huge) spread of versions of the drivers.
Updated•8 years ago
|
Flags: needinfo?(jmuizelaar)
Comment 7•8 years ago
|
||
Will need to check. Is there some correlation on what the users did at that time in these reports ? Is some new build 47 functionality being used?
Comment 8•8 years ago
|
||
(In reply to Paul Blinzer from comment #7) > Will need to check. Is there some correlation on what the users did at that > time in these reports ? Is some new build 47 functionality being used? There are quite a few youtube reports so it could be video related.
Flags: needinfo?(jmuizelaar)
Comment 9•8 years ago
|
||
This crash is a bit weird. It looks like we're calling a TimerCallback that doesn't exist. The best reasoning that I could come up with that would cause this is someone calling FreeLibrary on dll that still has a pending timer.
Comment 12•8 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #5) > Anyone knows how to do the correlation on "strange" DLLs being loaded? these are module correlations for this signature in yesterday's data for 47.0b2: TppTimerpExecuteCallback|EXCEPTION_ACCESS_VIOLATION_EXEC (124 crashes) 100% (124/124) vs. 8% (1857/21929) atiuxpag.dll 100% (124/124) vs. 8% (1857/21929) aticfx32.dll 100% (124/124) vs. 9% (1922/21929) atidxx32.dll 100% (124/124) vs. 9% (1968/21929) msvproc.dll 99% (123/124) vs. 17% (3638/21929) MSAudDecMFT.dll 97% (120/124) vs. 15% (3301/21929) RTWorkQ.dll 100% (124/124) vs. 21% (4576/21929) WINMMBASE.dll 97% (120/124) vs. 18% (3880/21929) kernel.appcore.dll 100% (124/124) vs. 21% (4610/21929) SHCore.dll 100% (124/124) vs. 21% (4611/21929) bcryptPrimitives.dll 100% (124/124) vs. 21% (4611/21929) combase.dll 87% (108/124) vs. 13% (2903/21929) ondemandconnroutehelper.dll 91% (113/124) vs. 25% (5527/21929) winhttp.dll 98% (122/124) vs. 34% (7468/21929) xmllite.dll 99% (123/124) vs. 44% (9698/21929) d2d1.dll 73% (91/124) vs. 22% (4931/21929) wshbth.dll 100% (124/124) vs. 52% (11380/21929) d3d11.dll 100% (124/124) vs. 56% (12267/21929) dxgi.dll 100% (124/124) vs. 67% (14646/21929) msmpeg2vdec.dll 94% (117/124) vs. 62% (13503/21929) explorerframe.dll 94% (117/124) vs. 62% (13630/21929) FWPUCLNT.DLL 100% (124/124) vs. 70% (15290/21929) DWrite.dll 100% (124/124) vs. 70% (15333/21929) mozavcodec.dll 100% (124/124) vs. 70% (15333/21929) mozavutil.dll 100% (124/124) vs. 70% (15414/21929) AudioSes.dll 100% (124/124) vs. 71% (15517/21929) cryptsp.dll 100% (124/124) vs. 71% (15543/21929) bcrypt.dll 99% (123/124) vs. 71% (15538/21929) sspicli.dll 84% (104/124) vs. 56% (12264/21929) duser.dll 84% (104/124) vs. 56% (12267/21929) dui70.dll 100% (124/124) vs. 73% (15968/21929) evr.dll 100% (124/124) vs. 73% (15972/21929) mf.dll 100% (124/124) vs. 73% (16033/21929) dxva2.dll 80% (99/124) vs. 53% (11630/21929) secur32.dll 40% (49/124) vs. 13% (2809/21929) twinapi.dll 100% (124/124) vs. 74% (16246/21929) Wpc.dll 100% (124/124) vs. 75% (16499/21929) pnrpnsp.dll 100% (124/124) vs. 75% (16520/21929) nlaapi.dll 100% (124/124) vs. 75% (16526/21929) NapiNSP.dll 31% (39/124) vs. 7% (1521/21929) Bcp47Langs.dll ...and further faceted by version numbers in the attached file
Comment 13•8 years ago
|
||
Paul does the AMD driver use CreateTimerQueueTimer()/CreateThreadpoolTimer() or FreeLibrary() in any suspicious ways?
Flags: needinfo?(paul.blinzer)
Comment 14•8 years ago
|
||
it uses FreeLibrary() but not really in any suspicious ways. We will need to look into this...
Flags: needinfo?(paul.blinzer)
Reporter | ||
Comment 15•8 years ago
|
||
This is the sort of crash where I think we need to consider not shipping Firefox 47 if we can't fix it, given that it represents about 20% of crashes within 2 minutes of install (likely users who become unable to use Firefox and have to switch to other browsers). Seems like we need to do more to make progress here, including possibly disabling code changes around the regression range.
Flags: needinfo?(milan)
Comment 16•8 years ago
|
||
(In reply to David Baron [:dbaron] ⌚️UTC-7 (review requests must explain patch) from comment #15) > Seems like we need to do more to > make progress here, including possibly disabling code changes around the > regression range. Do we have a regression range? One possible strategy for debugging this is logging the addresses of libraries that have been FreeLibrary()ed. That should hopefully tell us whether it's a library that's going away or a timer being setup for some random code address that never existed.
Comment 17•8 years ago
|
||
We have identified that a thread is getting stuck waiting for another thread to finish an operation, but we don't have a root cause yet based on the crashdump info. Is there a repro scenario observed that has a higher likelihood to lead to the issue?
Flags: needinfo?(dbaron)
Comment 18•8 years ago
|
||
This might do the job. It's sort of spammy but hopefully that's ok for Nightly
Attachment #8754063 -
Flags: review?(aklotz)
Yes, I think we want to understand what's going on before we ship 47 with this. Without knowing if any of this is useful: This is currently hitting us on AMD, but not exclusively. For example: https://crash-stats.mozilla.com/report/index/6412e15e-a779-4397-a4ac-d1a372160512 is the same crash, on Intel, on 45esr. We also had a spike in these two years ago, tracked in bug 964351 (recent comments with this problem already made in there, so this is not news to some.) I don't think we ever resolved why that was the case. This is a Flash crash in 46.0.1, on Intel, slightly different stack: https://crash-stats.mozilla.com/report/index/16c7b99b-efcc-4457-a0e6-ccb152160517
Reporter | ||
Comment 20•8 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #16) > Do we have a regression range? An approximate one; see comment 2. (In reply to Paul Blinzer from comment #17) > We have identified that a thread is getting stuck waiting for another thread > to finish an operation, but we don't have a root cause yet based on the > crashdump info. > Is there a repro scenario observed that has a higher likelihood to lead to > the issue? I don't know of any steps to reproduce the bug. However, for many users it appears to be basically a crash-on-startup. (In reply to Milan Sreckovic [:milan] from comment #19) > This is currently hitting us on AMD, but not exclusively. For example: > https://crash-stats.mozilla.com/report/index/6412e15e-a779-4397-a4ac-d1a372160512 > is the same crash, on Intel, on 45esr. ... > https://crash-stats.mozilla.com/report/index/16c7b99b-efcc-4457-a0e6-ccb152160517 I'd define those to not be "this" or "the same crash". Crash signature is not a perfect classification method; *this* bug is about the spike in crashes on AMD cards.
Flags: needinfo?(dbaron)
Updated•8 years ago
|
Whiteboard: [gfx-noted]
Comment 21•8 years ago
|
||
Comment on attachment 8754063 [details] [diff] [review] Log all dlls loaded Review of attachment 8754063 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/build/WindowsDllBlocklist.cpp @@ +733,5 @@ > printf_stderr("LdrLoadDll: Blocking load of '%s'. XPCOM components must support ASLR.\n", dllName); > return STATUS_DLL_NOT_FOUND; > } > } > + MODULEINFO info; Nit: Please move the declaration of info into the #ifdef NIGHTLY_BUILD block.
Attachment #8754063 -
Flags: review?(aklotz) → review+
Comment 22•8 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #21) > Comment on attachment 8754063 [details] [diff] [review] > Log all dlls loaded > > Review of attachment 8754063 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: mozglue/build/WindowsDllBlocklist.cpp > @@ +733,5 @@ > > printf_stderr("LdrLoadDll: Blocking load of '%s'. XPCOM components must support ASLR.\n", dllName); > > return STATUS_DLL_NOT_FOUND; > > } > > } > > + MODULEINFO info; > > Nit: Please move the declaration of info into the #ifdef NIGHTLY_BUILD block. This may not actually work as I don't think we have access to the crash reporter in mozglue...
Comment 23•8 years ago
|
||
Ah shite, I think you're right! doh...
Comment 24•8 years ago
|
||
This turned out ugly... Be gentle in the review as I just want it to work and then I'll take it out of the tree ASAP. This basically keeps a list of information and then passes it to the callback when the callback is registered.
Attachment #8754063 -
Attachment is obsolete: true
Attachment #8755053 -
Flags: review?(mh+mozilla)
Attachment #8755053 -
Flags: review?(aklotz)
Comment 25•8 years ago
|
||
Comment on attachment 8755053 [details] [diff] [review] Log all dlls loaded v2 Ted, is it going to be a problem putting a list of all the DLLs ever loaded into the AppNotes?
Attachment #8755053 -
Flags: feedback?(ted)
Hi Jeff, we are a week away from entering RC. If there is a potential fix, I hope we can get it uplifted before Thursday, when I gtb 47.0b9. Thanks!
Flags: needinfo?(jmuizelaar)
(In reply to Ritu Kothari (:ritu) from comment #26) > Hi Jeff, we are a week away from entering RC. If there is a potential fix, I > hope we can get it uplifted before Thursday, when I gtb 47.0b9. Thanks! We're still in the dark - the patch above would be nightly only, and to get us some extra information. I don't expect us to get anywhere without that information, and it will probably be Thursday before we start getting reasonable amount of data with the extra information if we land this patch (or something like it), so I don't see this making 47b9. Ritu, can you start the conversation regarding comment 15 - I don't know what it takes to decide and make something a release blocker.
Flags: needinfo?(milan) → needinfo?(rkothari)
Comment 28•8 years ago
|
||
Comment on attachment 8755053 [details] [diff] [review] Log all dlls loaded v2 Review of attachment 8755053 [details] [diff] [review]: ----------------------------------------------------------------- Why do you need this? We already have a list of loaded dlls in crash reports. What do you expect the difference would be with this additional information?
Reporter | ||
Comment 29•8 years ago
|
||
To know the list of DLLs that were loaded but have since been unloaded. I don't think that's in the crash report.
Comment 30•8 years ago
|
||
Comment on attachment 8755053 [details] [diff] [review] Log all dlls loaded v2 Review of attachment 8755053 [details] [diff] [review]: ----------------------------------------------------------------- Ah okay, so a slightly better patch summary would be "Log all DLLs ever loaded"... ::: mozglue/build/WindowsDllBlocklist.cpp @@ +17,4 @@ > #include <windows.h> > #include <winternl.h> > #include <io.h> > +#define PSAPI_VERSION 2 My reading of https://msdn.microsoft.com/en-us/library/windows/desktop/ms683201(v=vs.85).aspx is that this will break running Firefox on Windows XP. @@ +561,5 @@ > + > +extern "C" { > +NS_EXPORT void > +RegisterDllLoadCallback(LoadCallBackFn aCallback) { > + // some weird mangling thing is making it so I can't pass function pointers More about this below. @@ +767,5 @@ > + if (!ret) { > + MODULEINFO moduleInfo; > + if (GetModuleInformation(GetCurrentProcess(), *handle, &moduleInfo, sizeof(moduleInfo))) { > + DllLoadInfo info; > + strcpy(info.name, dllName); Considering the size of those buffers are some constant + 1, even if it's the same constant, you might as well have a static_assert on their size being equal. @@ +773,5 @@ > + info.SizeOfImage = moduleInfo.SizeOfImage; > + if (gLoadInfoCallback) { > + gLoadInfoCallback(info); > + } else { > + gDllLoadInfos.push_back(info); There's potential for thread-safety issues here (as well as with gLoadInfoCallback). ::: mozglue/build/WindowsDllBlocklist.h @@ +43,5 @@ > + LPVOID lpBaseOfDll; > + DWORD SizeOfImage; > +}; > +typedef void (*LoadCallBackFn)(DllLoadInfo&); > +// extern "C" because the demangler was being mean The reason this happens, I presume, is because WindowsDllBlocklist.h is not included from WindowsDllBlocklist.cpp, and you have those definitions twice, and for some reason, they don't match exactly as far as mangling goes. @@ +44,5 @@ > + DWORD SizeOfImage; > +}; > +typedef void (*LoadCallBackFn)(DllLoadInfo&); > +// extern "C" because the demangler was being mean > +extern "C" { extern "C" NS_IMPORT void ... That said, MFBT_API would be better here (I know NS_IMPORT is what is being used, but that's because the file used to be in toolkit/, with a function definition in xpcom/). Now, overall, this is what I'd suggest: - Create a separate patch that: - Includes WindowsDllBlocklist.h from WindowsDllBlocklist.cpp. - Replace NS_EXPORT in WindowsDllBlocklist.cpp with MFBT_API. - Replace NS_IMPORT in WindowsDllBlocklist.h with MFBT_API. - Replace the nscore.h include in WindowsDllBlocklist.h with mozilla/Attributes.h - Remove the extern "C" in this patch, and remove the duplication of the function type and struct declarations. The reason for a separate patch is that it would stay in, whether or not the crash reporter annotation thing stays.
Attachment #8755053 -
Flags: review?(mh+mozilla) → review-
Comment 31•8 years ago
|
||
This address the mangling/headers mess. Re: thread safety we register the callback early enough that it should be before there are any parallel dll loading, and once the callback is called we should be thread safe (AppNotes are)
Attachment #8755053 -
Attachment is obsolete: true
Attachment #8755053 -
Flags: review?(aklotz)
Attachment #8755053 -
Flags: feedback?(ted)
Attachment #8755886 -
Flags: review?(mh+mozilla)
Attachment #8755886 -
Flags: review?(aklotz)
Attachment #8755886 -
Flags: feedback?(ted)
Comment 32•8 years ago
|
||
Comment on attachment 8755886 [details] [diff] [review] Log all dlls ever loaded v3 Review of attachment 8755886 [details] [diff] [review]: ----------------------------------------------------------------- OK but needs some cleanup before landing. ::: mozglue/build/WindowsDllBlocklist.cpp @@ +267,4 @@ > > namespace { > > +typedef NTSTATUS (NTAPI *LdrLoadDll_func) (PWCHAR filePath, PULONG flags, PUNICODE_STRING moduleFileName, HMODULE *handle); I'd prefer that this signature not change for the sake of consistency with NT Native API types. Just cast it to an HMODULE in the GetModuleInformation call. @@ +550,5 @@ > + > +#ifdef NIGHTLY_BUILD > +static std::vector<DllLoadInfo> gDllLoadInfos; > +static LoadCallBackFn gLoadInfoCallback; > + Nit: please remove blank lines @@ +771,3 @@ > } // namespace > > MFBT_API void This should be wrapped in #ifdef NIGHTLY_BUILD ::: mozglue/build/WindowsDllBlocklist.h @@ +34,5 @@ > MOZ_DECL_USE_GUARD_OBJECT_NOTIFIER > }; > > +#ifdef NIGHTLY_BUILD > +#define DLLNAME_MAX 128 Nit: would you mind declaring an enum class for DLLNAME_MAX instead of using #define?
Attachment #8755886 -
Flags: review?(aklotz) → review+
Comment 33•8 years ago
|
||
Comment on attachment 8755886 [details] [diff] [review] Log all dlls ever loaded v3 Review of attachment 8755886 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/build/WindowsDllBlocklist.cpp @@ +17,4 @@ > #include <windows.h> > #include <winternl.h> > #include <io.h> > +#define PSAPI_VERSION 2 See previous review.
Attachment #8755886 -
Flags: review?(mh+mozilla) → review-
Comment 34•8 years ago
|
||
This dynamically checks for K32GetModuleInformation so that it works on XP.
Attachment #8755886 -
Attachment is obsolete: true
Attachment #8755886 -
Flags: feedback?(ted)
Attachment #8756077 -
Flags: review?(mh+mozilla)
(In reply to Milan Sreckovic [:milan] from comment #27) > (In reply to Ritu Kothari (:ritu) from comment #26) > > Hi Jeff, we are a week away from entering RC. If there is a potential fix, I > > hope we can get it uplifted before Thursday, when I gtb 47.0b9. Thanks! > > We're still in the dark - the patch above would be nightly only, and to get > us some extra information. I don't expect us to get anywhere without that > information, and it will probably be Thursday before we start getting > reasonable amount of data with the extra information if we land this patch > (or something like it), so I don't see this making 47b9. > > Ritu, can you start the conversation regarding comment 15 - I don't know > what it takes to decide and make something a release blocker. Hi Milan, I have already set this bug as tracked blocking for Fx47. For a critical issue like this, I would be willing to take a fix or consider backing out some changes (like DBaron suggested) in 47.0b9 (Thursday) or RC1 (Monday). Based on your comment, we might have a speculative fix ready by Thursday. Is that right?
Flags: needinfo?(rkothari) → needinfo?(milan)
Comment 36•8 years ago
|
||
(In reply to Ritu Kothari (:ritu) from comment #35) > > Hi Milan, I have already set this bug as tracked blocking for Fx47. For a > critical issue like this, I would be willing to take a fix or consider > backing out some changes (like DBaron suggested) in 47.0b9 (Thursday) or RC1 > (Monday). Based on your comment, we might have a speculative fix ready by > Thursday. Is that right? Thursday is very optimistic for a fix.
Comment 37•8 years ago
|
||
Comment on attachment 8756077 [details] [diff] [review] Log all dlls ever loaded v4 Review of attachment 8756077 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/build/WindowsDllBlocklist.cpp @@ +17,4 @@ > #include <windows.h> > #include <winternl.h> > #include <io.h> > +#define PSAPI_VERSION 2 From my reading of that MSDN article, you should "just" set this to 1 and add psapi to OS_LIBS in mozglue/build/moz.build, and shouldn't have to care about the K32 prefix or getting the function from the symbol.
Attachment #8756077 -
Flags: review?(mh+mozilla)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #36) > (In reply to Ritu Kothari (:ritu) from comment #35) > > > > Hi Milan, I have already set this bug as tracked blocking for Fx47. For a > > critical issue like this, I would be willing to take a fix or consider > > backing out some changes (like DBaron suggested) in 47.0b9 (Thursday) or RC1 > > (Monday). Based on your comment, we might have a speculative fix ready by > > Thursday. Is that right? > > Thursday is very optimistic for a fix. Right - Thursday may be when we start getting the data that would hopefully help us with a fix. It is very unlikely that we'd actually have a fix by then.
Flags: needinfo?(milan)
Comment 39•8 years ago
|
||
Use a different approach for XP support.
Attachment #8756077 -
Attachment is obsolete: true
Attachment #8756111 -
Flags: review?(mh+mozilla)
Comment 40•8 years ago
|
||
Comment on attachment 8756111 [details] [diff] [review] Log all dlls ever loaded v5 Review of attachment 8756111 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/build/WindowsDllBlocklist.cpp @@ +545,5 @@ > } > return nullptr; > } > > +#define DLLNAME_MAX 128 You're redefining DLLNAME_MAX here. @@ +770,5 @@ > > +#ifdef NIGHTLY_BUILD > +MFBT_API void > +RegisterDllLoadCallback(LoadCallBackFn aCallback) { > + // some weird mangling thing is making it so I can't pass function pointers You can remove this comment. @@ +800,5 @@ > if (GetModuleHandleA("user32.dll")) { > sUser32BeforeBlocklist = true; > } > > + Don't add an extra line here. ::: mozglue/build/WindowsDllBlocklist.h @@ +34,5 @@ > MOZ_DECL_USE_GUARD_OBJECT_NOTIFIER > }; > > +#ifdef NIGHTLY_BUILD > +#define DLLNAME_MAX 128 Move this out of the ifdef.
Attachment #8756111 -
Flags: review?(mh+mozilla) → review+
Comment 43•8 years ago
|
||
sorry had to back out for bustage like https://treeherder.mozilla.org/logviewer.html#?job_id=28637696&repo=mozilla-inbound
Flags: needinfo?(jmuizelaar)
Comment 45•8 years ago
|
||
Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/56b15c2f02c3
Comment 46•8 years ago
|
||
Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/e15b4b4184eb
Comment 47•8 years ago
|
||
sorry had to back this out again for crashes like https://treeherder.mozilla.org/logviewer.html#?job_id=28654519&repo=mozilla-inbound
Comment 48•8 years ago
|
||
Backed out because a later test run reported many crashes caused by this push: https://hg.mozilla.org/integration/mozilla-inbound/rev/3a3dcaf96852 Test run with failures (on Windows 7 opt): https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=5c66fb81be682bf7602a8a7b8655186cbeaed1d2 Failure log (example): https://treeherder.mozilla.org/logviewer.html#?job_id=28700505&repo=mozilla-inbound 10:54:17 INFO - Thread 23 (crashed) 10:54:17 INFO - 0 ucrtbase.dll!_invalid_parameter + 0xdc 10:54:17 INFO - eip = 0x69ad3405 esp = 0x0ba5f7cc ebp = 0x0ba5f7f4 ebx = 0x00000000 10:54:17 INFO - esi = 0x024570e0 edi = 0x00000000 eax = 0x0ba5f19c ecx = 0x00000008 10:54:17 INFO - edx = 0x00000000 efl = 0x00000212 10:54:17 INFO - Found by: given as instruction pointer in context 10:54:17 INFO - 1 ucrtbase.dll!_invalid_parameter_noinfo_noreturn + 0x8 10:54:17 INFO - eip = 0x69ad3448 esp = 0x0ba5f7fc ebp = 0x0ba5f808 10:54:17 INFO - Found by: call frame info 10:54:17 INFO - 2 mozglue.dll!std::vector<DllLoadInfo,std::allocator<DllLoadInfo> >::_Reallocate(unsigned int) [vector:5c66fb81be68 : 1638 + 0x11] 10:54:17 INFO - eip = 0x69bd2242 esp = 0x0ba5f810 ebp = 0x0ba5f808 10:54:17 INFO - Found by: call frame info
Reporter | ||
Comment 49•8 years ago
|
||
Does gDllLoadInfos need to be protected by a lock?
Hi Milan, Jeff, could you please nominate the patch for uplift to Beta? I will gtb 47.0b9 tomorrow ~10:00am PST and I hope we can get a diagnostic patch landed to m-b before that. Thanks for all the hard work on this one!
Flags: needinfo?(milan)
Comment 52•8 years ago
|
||
Backed out because there still intermittent crashes, e.g. in M-e10s(bc5) on Windows 7 opt: https://hg.mozilla.org/integration/mozilla-inbound/rev/ec3ce347ebb2 Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=28758201&repo=mozilla-inbound
Comment 53•8 years ago
|
||
just by circumstantial evidence this TppTimerpExecuteCallback signature looks related to enabling D3D11 DXVA in bug 1248496. from 47.0b7 to 47.0b8 the volume of this signature shrunk by halve - in between some versions of atidxx32.dll have been blacklisted from using D3D11 DXVA (bug 1273691) and the TppTimerpExecuteCallback signature has ceased in beta 8 for exactly those versions: module correlation in beta 7: 100% (158/158) vs. 9% (1890/20570) atidxx32.dll 1% (2/158) vs. 0% (32/20570) 8.17.10.483 0% (0/158) vs. 0% (60/20570) 8.17.10.489 1% (1/158) vs. 0% (19/20570) 8.17.10.494 0% (0/158) vs. 0% (1/20570) 8.17.10.498 1% (2/158) vs. 0% (14/20570) 8.17.10.511 0% (0/158) vs. 0% (1/20570) 8.17.10.514 16% (26/158) vs. 0% (76/20570) 8.17.10.519 1% (1/158) vs. 0% (19/20570) 8.17.10.520 22% (34/158) vs. 1% (195/20570) 8.17.10.525 32% (50/158) vs. 0% (80/20570) 8.17.10.531 14% (22/158) vs. 0% (83/20570) 8.17.10.539 1% (2/158) vs. 0% (61/20570) 8.17.10.545 0% (0/158) vs. 0% (10/20570) 8.17.10.560 2% (3/158) vs. 1% (112/20570) 8.17.10.569 3% (4/158) vs. 0% (79/20570) 8.17.10.581 1% (1/158) vs. 0% (6/20570) 8.17.10.595 0% (0/158) vs. 0% (2/20570) 8.17.10.605 6% (10/158) vs. 2% (315/20570) 8.17.10.625 ...compared to correlation data in beta 8: 100% (56/56) vs. 9% (916/10640) atidxx32.dll 5% (3/56) vs. 0% (20/10640) 8.17.10.483 0% (0/56) vs. 0% (26/10640) 8.17.10.489 2% (1/56) vs. 0% (9/10640) 8.17.10.494 0% (0/56) vs. 0% (3/10640) 8.17.10.511 (blocked from using d3d11 dxva in 47.0b8) 0% (0/56) vs. 0% (1/10640) 8.17.10.514 0% (0/56) vs. 0% (13/10640) 8.17.10.519 (blocked from using d3d11 dxva in 47.0b8) 0% (0/56) vs. 0% (9/10640) 8.17.10.520 0% (0/56) vs. 0% (35/10640) 8.17.10.525 (blocked from using d3d11 dxva in 47.0b8) 71% (40/56) vs. 1% (82/10640) 8.17.10.531 0% (0/56) vs. 0% (14/10640) 8.17.10.539 (blocked from using d3d11 dxva in 47.0b8) 4% (2/56) vs. 0% (15/10640) 8.17.10.545 0% (0/56) vs. 0% (8/10640) 8.17.10.560 4% (2/56) vs. 0% (44/10640) 8.17.10.569 14% (8/56) vs. 0% (43/10640) 8.17.10.581 0% (0/56) vs. 0% (1/10640) 8.17.10.595 0% (0/56) vs. 2% (164/10640) 8.17.10.625 (blocked from using d3d11 dxva in 47.0b8)
Blocks: 1248496
Comment 54•8 years ago
|
||
Yeah, I think that theory is pretty compelling. This suggests that we should disable D3D11 DXVA on AMD in 47.
Flags: needinfo?(jmuizelaar) → needinfo?(matt.woodrow)
Comment 55•8 years ago
|
||
Paul, under what circumstances is atidxx32.dll loaded/unloaded?
Flags: needinfo?(paul.blinzer)
Updated•8 years ago
|
Component: Graphics → Audio/Video: Playback
Comment 56•8 years ago
|
||
It looks atidxx32.dll is still loaded in the crashes. Paul, is there another dll that is loaded/unloaded when using D3D11 for DXVA?
Comment 57•8 years ago
|
||
For your information, there also these "browser_perf-recordings-clear-02.js | Test timed out" permanent failures between the last landing and its backout: https://treeherder.mozilla.org/logviewer.html#?job_id=28748516&repo=mozilla-inbound A retrigger with 20 jobs on the backout didn't let a single job fail, so either this a new intermittent which came, permafailed, and went away, or it's related to this patch.
So - is there work beyond what's in beta 8 with bug 1273691? Or we want to go hard at it and disable all AMD D3D11 DXVA, 47 and higher, using the downloadable list?
Flags: needinfo?(milan)
Comment 59•8 years ago
|
||
by chance atidxx32.dll 8.17.10.531, which seems to be at the root of 70% of the remaining crashes in beta 8, will go into the blacklist in 47.0b9 as well (bug 1275739). not sure how to progress beyond that...
Comment 60•8 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #58) > So - is there work beyond what's in beta 8 with bug 1273691? Or we want to > go hard at it and disable all AMD D3D11 DXVA, 47 and higher, using the > downloadable list? Unless we have information to suggest otherwise, I think we should delay enabling D3D11 DXVA on all AMD hardware until at least FF48. We can probably just do this with a code change.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #60) > (In reply to Milan Sreckovic [:milan] from comment #58) > > So - is there work beyond what's in beta 8 with bug 1273691? Or we want to > > go hard at it and disable all AMD D3D11 DXVA, 47 and higher, using the > > downloadable list? > > Unless we have information to suggest otherwise, I think we should delay > enabling D3D11 DXVA on all AMD hardware until at least FF48. We can probably > just do this with a code change. Hi Milan, Jeff, I just took a patch from Philipp in bug 1275739. Is there a different patch that is needed or is that sufficient? I will gtb 47.0b9 in an hour so appreciate a prompt reply. Thanks!
Flags: needinfo?(milan)
Flags: needinfo?(jmuizelaar)
Reporter | ||
Comment 62•8 years ago
|
||
(In reply to [:philipp] from comment #53) > just by circumstantial evidence this TppTimerpExecuteCallback signature > looks related to enabling D3D11 DXVA in bug 1248496. Note also that the first nightly that https://hg.mozilla.org/mozilla-central/rev/320e4b7ceee4 appeared in was 2016-02-18, which is consistent with the regression range in comment 2 (the first crash was on 2016-02-19).
(In reply to Ritu Kothari (:ritu) from comment #61) > > > > Unless we have information to suggest otherwise, I think we should delay > > enabling D3D11 DXVA on all AMD hardware until at least FF48. We can probably > > just do this with a code change. > > Hi Milan, Jeff, I just took a patch from Philipp in bug 1275739. Is there a > different patch that is needed or is that sufficient? I will gtb 47.0b9 in > an hour so appreciate a prompt reply. Thanks! With an hour deadline, we should probably stick with Philipp's patch. I do like Jeff's suggestion, but we'd want media to comment on it, and that may get us into tomorrow.
Flags: needinfo?(milan) → needinfo?(ajones)
(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #62) > (In reply to [:philipp] from comment #53) > > just by circumstantial evidence this TppTimerpExecuteCallback signature > > looks related to enabling D3D11 DXVA in bug 1248496. > > Note also that the first nightly that > https://hg.mozilla.org/mozilla-central/rev/320e4b7ceee4 appeared in was > 2016-02-18, which is consistent with the regression range in comment 2 (the > first crash was on 2016-02-19). If this is true, we should clearly be backing out "Bug 1248496 - Enable D3D11 DXVA" and disabling D3D11 DXVA for Fx47. We can keep it enabled in Fx48 to investigate, rootcause and fix similar issues. Hi Anthony, Chris, Matt: What would it take for us to backout the patch from bug 1248496? Also, what new risks would this backout be introducing?
Flags: needinfo?(cpearce)
Comment 65•8 years ago
|
||
Flags: needinfo?(matt.woodrow)
Attachment #8757006 -
Flags: review?(ajones)
Attachment #8757006 -
Flags: approval-mozilla-beta?
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(cpearce)
Assignee | ||
Updated•8 years ago
|
Attachment #8757006 -
Flags: review?(ajones) → review+
Comment on attachment 8757006 [details] [diff] [review] Disable D3D11 DXVA (patch for beta) We've noticed a huge # of crashes and it makes sense to turn this pref off. Beta47+
Attachment #8757006 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Updated•8 years ago
|
Flags: needinfo?(ajones)
Comment 68•8 years ago
|
||
(In reply to Ritu Kothari (:ritu) from comment #67) > Comment on attachment 8757006 [details] [diff] [review] > Disable D3D11 DXVA (patch for beta) > > We've noticed a huge # of crashes and it makes sense to turn this pref off. > Beta47+ seems this landed in https://hg.mozilla.org/releases/mozilla-beta/rev/2ee4473c729a
Updated•8 years ago
|
Priority: -- → P1
Assignee | ||
Comment 69•8 years ago
|
||
We've already blacklisted most of the driver versions that appear in the crash reports so that D3D11 DXVA won't be used. Let's blacklist the remainder, so that we don't see the same thing in 48.
Assignee | ||
Comment 70•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/56476/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/56476/
Attachment #8758089 -
Flags: review?(ajones)
Comment on attachment 8758089 [details] MozReview Request: Bug 1270686 - Blacklist more ATI drivers for D3D11 DXVA2. r?kentuckyfriedtakahe https://reviewboard.mozilla.org/r/56476/#review53040
Attachment #8758089 -
Flags: review?(ajones) → review+
Assignee | ||
Comment 73•8 years ago
|
||
Comment on attachment 8758089 [details] MozReview Request: Bug 1270686 - Blacklist more ATI drivers for D3D11 DXVA2. r?kentuckyfriedtakahe Approval Request Comment [Feature/regressing bug #]: D3D11 DXVA [User impact if declined]: Crashes for some users with ATI graphics cards with obsolete graphics drivers when they try to play video. [Describe test coverage new/current, TreeHerder]: We don't have coverage of busted drivers AFAIK. [Risks and why]: Low; we're reverting these users to the previously working path (D3D9 DXVA) [String/UUID change made/needed]: None.
Attachment #8758089 -
Flags: approval-mozilla-aurora?
Comment 74•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/524bf899365d
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox49:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Updated•8 years ago
|
Assignee: nobody → cpearce
Updated•8 years ago
|
Attachment #8758089 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 75•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/4854209add13
Comment 76•8 years ago
|
||
likely root cause is found and working on a fix
Flags: needinfo?(paul.blinzer)
Comment 77•8 years ago
|
||
The Crimson hotfix driver package 16.6.1 (file version 16.15.2211, available e.g. via http://support.amd.com/en-us/download/desktop?os=Windows+10+-+64, or equivalent for other OS releases) has a fix for the issue included. Please confirm on that driver version and remove the blacklist entry from that version onwards
Flags: needinfo?(dbaron)
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(dbaron) → needinfo?(cpearce)
Comment 78•8 years ago
|
||
(In reply to Paul Blinzer from comment #77) > The Crimson hotfix driver package 16.6.1 (file version 16.15.2211, available > e.g. via http://support.amd.com/en-us/download/desktop?os=Windows+10+-+64, > or equivalent for other OS releases) has a fix for the issue included. > Please confirm on that driver version and remove the blacklist entry from > that version onwards Is this bug present in all AMD drivers before 16.15.2211? i.e. is there anything we can do to avoid triggering the bug other than blocking D3D11-DXVA on all AMD drivers before 16.15.2211?
Flags: needinfo?(jmuizelaar) → needinfo?(paul.blinzer)
Comment 79•8 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #78) > (In reply to Paul Blinzer from comment #77) > > The Crimson hotfix driver package 16.6.1 (file version 16.15.2211, available > > e.g. via http://support.amd.com/en-us/download/desktop?os=Windows+10+-+64, > > or equivalent for other OS releases) has a fix for the issue included. > > Please confirm on that driver version and remove the blacklist entry from > > that version onwards > > Is this bug present in all AMD drivers before 16.15.2211? i.e. is there > anything we can do to avoid triggering the bug other than blocking > D3D11-DXVA on all AMD drivers before 16.15.2211? Not all AMD drivers, though due to different changes in that code over time it can't be exactly determined when it was introduced. This issue is apparently triggered by allocating/reusing memory resources that are pending release but due to the GPU being very busy, still being held for a relatively long time in the driver, until the OS may trigger a preemption. Reducing the load on the GPU during allocation/reallocation of memory resources should avoid the trigger.
Flags: needinfo?(paul.blinzer)
Flags: needinfo?(cpearce)
Comment hidden (offtopic) |
You need to log in
before you can comment on or make changes to this bug.
Description
•