Closed
Bug 1180561
Opened 9 years ago
Closed 8 years ago
startup crash in [@ msmpeg2vdec.dll@0x25a3fb ]
Categories
(Core :: Graphics, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox39 | --- | unaffected |
firefox40 | --- | unaffected |
firefox41 | --- | wontfix |
firefox42 | --- | wontfix |
firefox43 | --- | wontfix |
firefox44 | --- | wontfix |
firefox45 | --- | wontfix |
firefox46 | --- | wontfix |
firefox47 | --- | wontfix |
firefox48 | --- | wontfix |
firefox49 | --- | wontfix |
firefox50 | --- | wontfix |
firefox51 | --- | wontfix |
firefox52 | --- | wontfix |
firefox-esr52 | --- | wontfix |
firefox53 | --- | wontfix |
firefox54 | --- | wontfix |
firefox55 | --- | fixed |
People
(Reporter: u480271, Assigned: away)
References
(Blocks 2 open bugs)
Details
(4 keywords, Whiteboard: [gfx-noted], tpi:+)
Crash Data
Attachments
(2 files, 3 obsolete files)
1.88 KB,
patch
|
bugzilla
:
review+
|
Details | Diff | Splinter Review |
8.23 KB,
patch
|
bugzilla
:
review+
|
Details | Diff | Splinter Review |
In the process of trying to repro Bug 1180217, I ran into this crash when attempting to access about:support
54 reports so far last month, all on Windows 7 and all with Nightly (starting with 41.0a1). Not nominating to track since the volume is extremely low.
Dan, if you have the time it would be useful to see if this reproduces for you further back than Firefox 41 and if not, try to find a regression window using https://mozilla.github.io/mozregression/install.html
status-firefox39:
--- → unaffected
status-firefox40:
--- → unaffected
status-firefox41:
--- → affected
status-firefox42:
--- → affected
Keywords: regression,
regressionwindow-wanted
Comment 2•9 years ago
|
||
I can pretty much infinite crash loop this. See crash report: https://crash-stats.mozilla.com/report/index/30e25ff7-0aca-4801-bed1-1f26d2150708
STR:
1) Install Gecko Profiler addon
2) Disable e10s
3) Load Nightly and try to go to about:support
4) Hit infinite crash loop.
The problem goes away with the gecko profiler addon. Any idea on what the profiler addon is doing benwa during about:support?
Flags: needinfo?(bgirard)
Comment 3•9 years ago
|
||
I don't know. There's rare 'boggyman' problems about user code that doesn't handle interrupted system calls properly but that's usually a Linux problem.
I'm guessing it's an interaction with msmpeg2vdecand the 'signaling' of the profiler. A regression window is worth doing but it might just lead us to a change that made about:support start loading msmpeg2vdec.
Flags: needinfo?(bgirard)
Thanks Mason, these steps reproduce easily for me. I'll work on a regression window for Benoit.
Keywords: reproducible
mozilla-central:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=caf25344f73e&tochange=1ad65cbeb2f4
mozilla-inbound:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e3cb99d0589f&tochange=7362533abe5a
Blame:
Bug 1151611 - Expose DXVA status in about:support (Matt Woodrow)
Comment 6•9 years ago
|
||
As suggested in comment 3, this is just the changeset where about:support started loading the msmpeg2vdec dll.
Flags: needinfo?(matt.woodrow)
Comment 7•9 years ago
|
||
But I can also crash forever by loading a youtube video as well instead of about:support. If you replace the STR with a youtube video, you get this crash.
Comment 8•9 years ago
|
||
Can you try to reproduce with the 'stackwalk' option ticked off?
(In reply to Benoit Girard (:BenWa) from comment #8)
> Can you try to reproduce with the 'stackwalk' option ticked off?
The crashes stop for me if Stack Walk is turned off.
Comment 10•9 years ago
|
||
Thanks Anthony.
It must be a conflicting interaction between Stackwalk64 & msmpeg2vdec.dll.
Reporter | ||
Comment 11•9 years ago
|
||
I'll try repro'ing success with StackWalk turned off.
Comment 12•9 years ago
|
||
I did an additional check to see if the crashes occur with Gecko Profiler removed or disabled, the crash does not happen in either case. I also checked to see how many of the reports for add-on correlations:
~40% have no add-ons at all
~60% have some mixture of add-ons, ~50% of which include Gecko Profiler
Based on this I'm not sure Gecko Profiler is a cause, it's just an easy way to trigger this crash.
Comment 13•9 years ago
|
||
I wonder if these other cases are calling StackWalk64 via another mean.
It would be worth checking if the devtools profiler ever triggers this. In general it shouldn't use the 'stackwalk' option but perhaps it's triggered somehow.
Comment 14•9 years ago
|
||
I've been playing around with Devtools for an hour but have not reproduced a crash with Gecko Profiler disabled.
Reporter | ||
Comment 15•9 years ago
|
||
I didn't turn stack walk off, but I just stopped the profiler (via the Gecko Profiler Add-on) and the crash disappeared.
Comment 16•9 years ago
|
||
It seems like a bad idea to let this one slide! The crash is pretty high up there.
Flags: needinfo?(dglastonbury)
Flags: needinfo?(bgirard)
Whiteboard: [gfx-noted]
Comment 17•9 years ago
|
||
This is a nightly only crash and it doesn't seem to be riding the train. This would only affect builds that have profiling enabled.
Likely the best way to fix it would be to use frame pointers on 32-bit. I don't think they are there for 64-bit but I'm not sure.
Flags: needinfo?(bgirard)
Reporter | ||
Comment 18•9 years ago
|
||
I don't know what the solution is. I defer to Benoit.
Flags: needinfo?(dglastonbury)
Comment 19•9 years ago
|
||
Also seeing this on 64bit, e.g. bp-cff87de7-6b62-4ab3-a790-c8be42150818
Comment 21•9 years ago
|
||
https://crash-stats.mozilla.com/report/index/330de4cb-d084-4804-97aa-361262151012
FF44.0a1
Crashing Thread
Frame Module Signature Source
0 xul.dll mozilla::`anonymous namespace'::RunWatchdog(void*) toolkit/components/terminator/nsTerminator.cpp
1 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c
2 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c
3 msvcr120.dll _callthreadstartex f:\dd\vctools\crt\crtw32\startup\threadex.c:376
4 msvcr120.dll _threadstartex f:\dd\vctools\crt\crtw32\startup\threadex.c:354
Ø 5 kernel32.dll kernel32.dll@0x159dc
Ø 6 ntdll.dll ntdll.dll@0x2a630
Ø 7 kernel32.dll kernel32.dll@0x9ba5f
Ø 8 kernel32.dll kernel32.dll@0x9ba5f
status-firefox43:
--- → ?
status-firefox44:
--- → affected
Hardware: x86 → All
Summary: crash in msmpeg2vdec.dll@0x25a3fb → crash in [@ msmpeg2vdec.dll@0x25a3fb ]
Comment 22•9 years ago
|
||
This is now showing up, perhaps unsurprisingly, as a startup crash in Firefox 45.0a1.
status-firefox45:
--- → affected
Summary: crash in [@ msmpeg2vdec.dll@0x25a3fb ] → startup crash in [@ msmpeg2vdec.dll@0x25a3fb ]
Updated•9 years ago
|
Keywords: topcrash-win
Edwin, can you see if there is something you can do?
Flags: needinfo?(edwin)
Assignee: nobody → edwin
Status: NEW → ASSIGNED
Flags: needinfo?(edwin)
The crash here appears to be intentional.
When msmpeg2vdec.dll is first loaded, it installs a symbol resolution callback using RtlInstallFunctionTableCallback. However, all this callback does is unset our exception handler and then crash.
When we walk the stack (when running the profiler, for instance) we can end up trying to walk through this dll. When RtlLookupFunctionEntry is called it invokes the callback in an attempt to map the address to a function entry, which silently crashes.
This patch simply hooks the call to RtlInstallFunctionTableCallback and drops it. There are no genuine uses of this function for any of our library dependencies, AFAICT.
Attachment #8721664 -
Flags: review?(benjamin)
Comment 25•9 years ago
|
||
msmpeg2vdec.dll unsets our exception handler?! Does that mean our crash reporter just doesn't work with any process that has this DLL loaded? Please file that if so.
I'm going to bounce this to aklotz since he's the resident expert about this kind of patching madness.
Updated•9 years ago
|
Attachment #8721664 -
Flags: review?(benjamin) → review?(aklotz)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #25)
> msmpeg2vdec.dll unsets our exception handler?! Does that mean our crash
> reporter just doesn't work with any process that has this DLL loaded? Please
> file that if so.
No, it only unsets the exception handler (...and then throws an exception...) when this callback is invoked, which is when the DLL is on the stack and we try to walk it.
Happily, the crash reporter caught this just fine since it hooks SetUnhandledExceptionFilter at [1]. Made it a pain in the neck to debug, though. :)
[1] https://dxr.mozilla.org/mozilla-central/rev/e1cf617a1f2813b6cd66f460313a61c223406c9b/toolkit/crashreporter/nsExceptionHandler.cpp#1382
Comment 27•9 years ago
|
||
I'd like to know more about what is going on here before evaluating this. Edwin, can you please post more information about what this callback is actually doing?
I'd at least like to see a call stack of the callback clearing our exception handler, as well as maybe a few lines of the assembly listing showing it raising the exception.
Thanks!
Flags: needinfo?(edwin)
Installing the callback on init:
> 0:000> k
> Child-SP RetAddr Call Site
> 00000000`0022cea8 00000000`76e2ba9e ntdll!RtlInstallFunctionTableCallback
> 00000000`0022ceb0 000007fe`eaea8fb6 kernel32!RtlInstallFunctionTableCallbackStub+0x1e
> 00000000`0022cef0 000007fe`eace0d01 msmpeg2vdec!DllCanUnloadNow+0x6ab6e
> 00000000`0022cf80 00000000`77037c78 msmpeg2vdec+0x90d01
> 00000000`0022cfb0 00000000`77037a49 ntdll!LdrpRunInitializeRoutines+0x1fe
> 00000000`0022d180 00000000`77026cf3 ntdll!LdrpLoadDll+0x594
> 00000000`0022d380 000007fe`f647a9b6 ntdll!LdrLoadDll+0xed
> 00000000`0022d3f0 000007fe`fcf692a7 mozglue!double_conversion::BignumDtoa+0x18a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mfbt\double-conversion\bignum-dtoa.cc @ 141]
> 00000000`0022d560 000007fe`feb34e5e KERNELBASE!LoadLibraryExW+0x2b3
> 00000000`0022d5e0 000007fe`feb34dd7 ole32!LoadLibraryWithLogging+0x2e [d:\w7rtm\com\ole32\common\loadfree.cxx @ 157]
> 00000000`0022d630 000007fe`feb34bdc ole32!CClassCache::CDllPathEntry::LoadDll+0x5b [d:\w7rtm\com\ole32\com\objact\dllcache.cxx @ 1927]
> 00000000`0022d670 000007fe`feb31cc1 ole32!CClassCache::CDllPathEntry::Create_rl+0x4c [d:\w7rtm\com\ole32\com\objact\dllcache.cxx @ 1783]
> 00000000`0022d6d0 000007fe`feb31b93 ole32!CClassCache::CClassEntry::CreateDllClassEntry_rl+0xe1 [d:\w7rtm\com\ole32\com\objact\dllcache.cxx @ 886]
> 00000000`0022d990 000007fe`feb32445 ole32!CClassCache::GetClassObjectActivator+0x5ab [d:\w7rtm\com\ole32\com\objact\dllcache.cxx @ 4795]
> 00000000`0022dae0 000007fe`feb32308 ole32!CClassCache::GetClassObject+0x45 [d:\w7rtm\com\ole32\com\objact\dllcache.cxx @ 4580]
> 00000000`0022db50 000007fe`feb316ba ole32!CServerContextActivator::CreateInstance+0x178 [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 974]
> 00000000`0022dc70 000007fe`feb3194c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x5e [d:\w7rtm\com\ole32\actprops\actprops.cxx @ 1917]
> 00000000`0022dce0 000007fe`feb319be ole32!CApartmentActivator::CreateInstance+0x88 [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 2251]
> 00000000`0022dd80 000007fe`feb56e63 ole32!CProcessActivator::CCICallback+0x4e [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 1716]
> 00000000`0022ddc0 000007fe`feb56df7 ole32!CProcessActivator::AttemptActivation+0x33 [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 1630]
> 00000000`0022de00 000007fe`feb32587 ole32!CProcessActivator::ActivateByContext+0x7f [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 1488]
> 00000000`0022de90 000007fe`feb316ba ole32!CProcessActivator::CreateInstance+0x77 [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 1389]
> 00000000`0022ded0 000007fe`feb317c7 ole32!ActivationPropertiesIn::DelegateCreateInstance+0x5e [d:\w7rtm\com\ole32\actprops\actprops.cxx @ 1917]
> 00000000`0022df40 000007fe`feb316ba ole32!CClientContextActivator::CreateInstance+0xeb [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 685]
> 00000000`0022e1f0 000007fe`feb320b0 ole32!ActivationPropertiesIn::DelegateCreateInstance+0x5e [d:\w7rtm\com\ole32\actprops\actprops.cxx @ 1917]
> 00000000`0022e260 000007fe`feb475eb ole32!ICoCreateInstanceEx+0x5f5 [d:\w7rtm\com\ole32\com\objact\objact.cxx @ 1334]
> 00000000`0022edb0 000007fe`e424c4e6 ole32!CoCreateInstance+0x17b [d:\w7rtm\com\ole32\com\objact\actapi.cxx @ 108]
> 00000000`0022eea0 000007fe`e424c633 xul!mozilla::MFTDecoder::Create+0x3e [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\media\platforms\wmf\mftdecoder.cpp @ 42]
> 00000000`0022eee0 000007fe`e424cab7 xul!mozilla::CanCreateMFTDecoder+0x6b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\media\platforms\wmf\wmfdecodermodule.cpp @ 148]
> 00000000`0022ef10 000007fe`e424d271 xul!mozilla::CanCreateWMFDecoder<CLSID_CMSH264DecoderMFT>+0x93 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\media\platforms\wmf\wmfdecodermodule.cpp @ 161]
> 00000000`0022ef40 000007fe`e424d5c6 xul!mozilla::WMFDecoderModule::SupportsMimeType+0x59 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\media\platforms\wmf\wmfdecodermodule.cpp @ 188]
> 00000000`0022ef70 000007fe`e424d40e xul!mozilla::HasGMPFor+0x14e [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\media\platforms\agnostic\gmp\gmpdecodermodule.cpp @ 112]
> 00000000`0022f010 000007fe`e433eb5d xul!mozilla::GMPDecoderModule::UpdateUsableCodecs+0x112 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\media\platforms\agnostic\gmp\gmpdecodermodule.cpp @ 157]
> 00000000`0022f0b0 000007fe`e40e00e4 xul!nsRunnableFunction<<lambda_2fa968b940fba08a9b1a2f7f0244fc0e> >::Run+0x9 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\obj-firefox\dist\include\nsthreadutils.h @ 262]
> 00000000`0022f0e0 000007fe`e40deaa6 xul!nsThread::ProcessNextEvent+0x380 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\xpcom\threads\nsthread.cpp @ 1024]
> 00000000`0022f260 000007fe`e3eba257 xul!mozilla::ipc::MessagePump::Run+0x7a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\ipc\glue\messagepump.cpp @ 95]
> 00000000`0022f2c0 000007fe`e3eb9952 xul!MessageLoop::RunHandler+0x17 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\ipc\chromium\src\base\message_loop.cc @ 228]
> 00000000`0022f2f0 000007fe`e40df808 xul!MessageLoop::Run+0x3e [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\ipc\chromium\src\base\message_loop.cc @ 202]
> 00000000`0022f340 000007fe`e40df3df xul!nsBaseAppShell::Run+0x3c [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\widget\nsbaseappshell.cpp @ 158]
> 00000000`0022f370 000007fe`e3e2ffeb xul!nsAppShell::Run+0x27 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\widget\windows\nsappshell.cpp @ 259]
> 00000000`0022f3a0 000007fe`e425b304 xul!nsAppStartup::Run+0x27 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\toolkit\components\startup\nsappstartup.cpp @ 282]
> 00000000`0022f3d0 000007fe`e425b4f6 xul!XREMain::XRE_mainRun+0x5d0 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\toolkit\xre\nsapprunner.cpp @ 4284]
> 00000000`0022f690 000007fe`e421d3bb xul!XREMain::XRE_main+0x14e [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\toolkit\xre\nsapprunner.cpp @ 4380]
> 00000000`0022f6f0 00000001`3fc917f0 xul!XRE_main+0x6b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\toolkit\xre\nsapprunner.cpp @ 4483]
> 00000000`0022f8d0 00000001`3fc913e1 firefox!do_main+0x160 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\browser\app\nsbrowserapp.cpp @ 220]
> 00000000`0022fb00 00000001`3fc91240 firefox!NS_internal_main+0x111 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\browser\app\nsbrowserapp.cpp @ 362]
> 00000000`0022fbd0 00000001`3fc92758 firefox!wmain+0x190 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\toolkit\xre\nswindowswmain.cpp @ 138]
> 00000000`0022fc30 00000000`76df59ed firefox!__tmainCRTStartup+0x144 [f:\dd\vctools\crt\crtw32\startup\crt0.c @ 255]
> 00000000`0022fc70 00000000`7702b371 kernel32!BaseThreadInitThunk+0xd
> 00000000`0022fca0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
With callback = arg 4 = r9 = 0x000007feeaeaa3d4:
> 0:000> .frame /r
> 00 00000000`0022cea8 00000000`76e2ba9e ntdll!RtlInstallFunctionTableCallback
> rax=0000000000000000 rbx=000007feeac50000 rcx=000007ff3d000003
> rdx=000007ff3d000000 rsi=000007ff3d000000 rdi=0000000000000000
> rip=00000000770023f0 rsp=000000000022cea8 rbp=0000000000000000
> r8=000000000000a000 r9=000007feeaeaa3d4 r10=0000000000000000
> r11=0000000000000246 r12=0000000000000001 r13=000000000000a000
> r14=0000000000000001 r15=0000000000000000
> iopl=0 nv up ei pl nz na pe nc
> cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
> ntdll!RtlInstallFunctionTableCallback:
> 00000000`770023f0 4c8bdc mov r11,rsp
Which is in msmpeg2vdec.dll:
> 0:000> lm
> start end module name
> [...]
> 00000000`76de0000 00000000`76eff000 kernel32 (pdb symbols) c:\symbols\kernel32.pdb\25851C45C74242DB981B353136F10DC82\kernel32.pdb
> [...]
> 000007fe`eac50000 000007fe`eaef9000 msmpeg2vdec (export symbols) C:\Windows\System32\msmpeg2vdec.dll
> 000007fe`eb310000 000007fe`eb453000 msmpeg2adec (export symbols) C:\Windows\System32\msmpeg2adec.dll
> [...]
Disassembled (note: DllCanUnloadNow is a red herring -- this DLL has very few exported symbols):
> 000007fe`eaeaa3d4 4881ecb8050000 sub rsp,5B8h
> 000007fe`eaeaa3db 488b0566bf0200 mov rax,qword ptr [msmpeg2vdec!DllCanUnloadNow+0x97f00 (000007fe`eaed6348)]
> 000007fe`eaeaa3e2 4833c4 xor rax,rsp
> 000007fe`eaeaa3e5 48898424a0050000 mov qword ptr [rsp+5A0h],rax
> 000007fe`eaeaa3ed 488d8c24d0000000 lea rcx,[rsp+0D0h]
> 000007fe`eaeaa3f5 ff15ed6e0400 call qword ptr [msmpeg2vdec!DllCanUnloadNow+0xb2ea0 (000007fe`eaef12e8)]
> = 0x0000000076e2bb50
> > 00000000`76e2bb50 48ff25e9040500 jmp qword ptr [kernel32!_imp_RtlCaptureContext (00000000`76e7c040)]
> 000007fe`eaeaa3fb 41b894000000 mov r8d,94h
> 000007fe`eaeaa401 488d4c2434 lea rcx,[rsp+34h]
> 000007fe`eaeaa406 33d2 xor edx,edx
> 000007fe`eaeaa408 e81bb8ddff call msmpeg2vdec+0x35c28 (000007fe`eac85c28)
> = 000007fe`eac85c28 ff25e2b42600 jmp qword ptr [msmpeg2vdec!DllCanUnloadNow+0xb2cc8 (000007fe`eaef1110)]
> = 0x000007fefd521000
> > msvcrt!memset
> 000007fe`eaeaa40d 33c9 xor ecx,ecx
> 000007fe`eaeaa40f 488d442430 lea rax,[rsp+30h]
> 000007fe`eaeaa414 4c8b9c24b8050000 mov r11,qword ptr [rsp+5B8h]
> 000007fe`eaeaa41c 4889442420 mov qword ptr [rsp+20h],rax
> 000007fe`eaeaa421 488d8424d0000000 lea rax,[rsp+0D0h]
> 000007fe`eaeaa429 4c895c2440 mov qword ptr [rsp+40h],r11
> 000007fe`eaeaa42e c744243015000040 mov dword ptr [rsp+30h],40000015h
> 000007fe`eaeaa436 4889442428 mov qword ptr [rsp+28h],rax
> 000007fe`eaeaa43b ff15bf6e0400 call qword ptr [msmpeg2vdec!DllCanUnloadNow+0xb2eb8 (000007fe`eaef1300)]
> = 0x0000000076df9040
> > kernel32!SetUnhandledExceptionFilter
> 000007fe`eaeaa441 488d4c2420 lea rcx,[rsp+20h]
> 000007fe`eaeaa446 ff15dc6f0400 call qword ptr [msmpeg2vdec!DllCanUnloadNow+0xb2fe0 (000007fe`eaef1428)]
> = 0x0000000076e7b890
> kernel32!UnhandledExceptionFilter
> 000007fe`eaeaa44c ff15b66e0400 call qword ptr [msmpeg2vdec!DllCanUnloadNow+0xb2ec0 (000007fe`eaef1308)]
> = 0x0000000076df51b0
> > kernel32!GetCurrentProcessStub:
> > 00000000`76df51b0 eb06 jmp kernel32!GetCurrentProcess (00000000`76df51b8)
> 000007fe`eaeaa452 488bc8 mov rcx,rax
> 000007fe`eaeaa455 ba03000000 mov edx,3
> 000007fe`eaeaa45a ff15b06e0400 call qword ptr [msmpeg2vdec!DllCanUnloadNow+0xb2ec8 (000007fe`eaef1310)]
> = 0x0000000076e2c100
> > kernel32!TerminateProcessStub:
> > 00000000`76e2c100 e9734cfeff jmp kernel32!TerminateProcess (00000000`76e10d78)
> 000007fe`eaeaa460 cc int 3
Aaaaand boom (except less of a boom and more of a whimper because the debugger fails to catch it):
> 0:000> bu 000007feeaeaa3d4
>
> Breakpoint 3 hit
> msmpeg2vdec!DllCanUnloadNow+0x6bf8c:
> 000007fe`eaeaa3d4 4881ecb8050000 sub rsp,5B8h
> 0:062> k
> Child-SP RetAddr Call Site
> 00000000`263e7128 00000000`7707a2e9 msmpeg2vdec!DllCanUnloadNow+0x6bf8c
> 00000000`263e7130 00000000`770607dc ntdll!RtlpLookupDynamicFunctionEntry+0xb9
> 00000000`263e7160 000007fe`f64b2e33 ntdll!RtlLookupFunctionEntry+0xa3
> 00000000`263e7190 000007fe`f64b2918 mozglue!WalkStackMain64+0x1ab [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 413]
> 00000000`263e76e0 000007fe`e26548ab mozglue!MozStackWalk+0x184 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 591]
> 00000000`263eb800 000007fe`e265165e xul!GeckoSampler::doNativeBacktrace+0x9b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 980]
> 00000000`263ef6f0 000007fe`e264d6c6 xul!GeckoSampler::InplaceTick+0x8a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 1198]
> 00000000`263ef790 000007fe`e264d504 xul!SamplerThread::SampleContext+0x14a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cc @ 211]
> 00000000`263efd50 000007fe`e264d73e xul!SamplerThread::Run+0xd4 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cc @ 147]
> 00000000`263efda0 000007fe`f0834f7f xul!ThreadEntry+0xa [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cc @ 256]
> 00000000`263efdd0 000007fe`f0835126 MSVCR120!_callthreadstartex+0x17 [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376]
> 00000000`263efe00 00000000`76df59ed MSVCR120!_threadstartex+0x102 [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354]
> 00000000`263efe30 00000000`7702b371 kernel32!BaseThreadInitThunk+0xd
> 00000000`263efe60 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
Flags: needinfo?(edwin)
Comment 29•9 years ago
|
||
Comment on attachment 8721664 [details] [diff] [review]
A patch that shouldn't really have to exist
Review of attachment 8721664 [details] [diff] [review]:
-----------------------------------------------------------------
We definitely need to do something here, but I'm not super keen on taking this as-is. In particular, I am hesitant to drop all calls to this function -- I've been burned too many times by injected 3rd-party code that becomes messed up stuff like this.
I'd rather that we examine the address of callback, and if it falls within msmpeg2vdec's .text section, we block it. Otherwise we let it through.
Do you want to implement the "check if msmpeg2vdec is present and get its .text section bounds" code? If not, I would be happy to do so -- I'm very familiar with the PE format.
Attachment #8721664 -
Flags: review?(aklotz) → review-
Comment 30•9 years ago
|
||
This is still common on Nightly. E.g. for Nightly 20160405030214 there are currently 4 crashes, making it the #5 cause of crashes.
Comment 32•9 years ago
|
||
Any progress on this, Edwin?
Comment 33•9 years ago
|
||
FYI, this is currently the #11 topcrash in Nightly with 34 crashes reported last week (1.07% of overall crash volume). It still seems to be mostly isolated to Nightly versions.
status-firefox46:
--- → affected
status-firefox47:
--- → affected
status-firefox48:
--- → affected
status-firefox49:
--- → affected
(In reply to Aaron Klotz [:aklotz] from comment #29)
> Do you want to implement the "check if msmpeg2vdec is present and get its
> .text section bounds" code? If not, I would be happy to do so -- I'm very
> familiar with the PE format.
Hey Aaron, if you have time I might take you up on this. I'm pretty out of my depth when it comes to Windows internals.
Flags: needinfo?(edwin) → needinfo?(aklotz)
Comment 35•9 years ago
|
||
Super busy but I'll take it.
Assignee: edwin → aklotz
Flags: needinfo?(aklotz)
Updated•9 years ago
|
Updated•8 years ago
|
Whiteboard: [gfx-noted] → [gfx-noted], tpi:?
Comment 36•8 years ago
|
||
Aaron, any news about this bug? This is a startup crash. Thanks
status-firefox50:
--- → affected
Flags: needinfo?(aklotz)
Comment 37•8 years ago
|
||
59 crashes over seven days, this doesn't look very serious. As such it doesn't take precedence over the current e10s work aklotz is on.
Updated•8 years ago
|
Comment 38•8 years ago
|
||
At present volume, #29 @ 0.67% w/29 reports in Firefox 50.0a1, this no longer qualifies as a topcrash.
Keywords: topcrash-win
Comment 39•8 years ago
|
||
Still happening in 51, but it's not super high volume. Removing stale needinfo; I'm sure Aaron will get to this in due course as per the priority of things on his plate.
status-firefox51:
--- → affected
Flags: needinfo?(aklotz)
Updated•8 years ago
|
Assignee: aklotz → nobody
Status: ASSIGNED → NEW
Priority: -- → P2
Whiteboard: [gfx-noted], tpi:? → [gfx-noted], tpi:+
Updated•8 years ago
|
Updated•8 years ago
|
Assignee | ||
Comment 40•8 years ago
|
||
Comment on attachment 8721664 [details] [diff] [review]
A patch that shouldn't really have to exist
I don't think it's safe to drop msmpeg2vdec's callback. That callback is their way of saying "if there's an exception in these functions, don't bother unwinding, let's just crash." We very nearly took the same approach for our JIT in bug 844196 comment 86. It's possible that some subset of these reports are actually msmpeg2vdec attempting to crash safely, and preventing that could be a security problem.
The problem is that msmpeg2vdec assumes that the only time they'd be unwound is by the SEH handler in an exception, but our profiler also tries to unwind them in benign situations. I don't know of a good way to deal with that while preserving the safe behavior in an actual exception. Maybe abort stack walking if we reach this version of this dll? It's yucky, but I don't have a better idea.
Comment 41•8 years ago
|
||
Could we look into re-prioritize? The volume we see is basically just people who want to use the platform profiler; we will not see high volume but this blocks productivity of anybody who wants to profile Nightly.
Nightly is now rolling out as 64-bit by default. This might lead to more people crashing with profiler installed.
As a personal anecdote, I just did extensive profiling on Thinkpad T450 with Windows 7 Pro and had to switch to 32bit Nightly because the browser kept crashing on 64bit.
Excerpt from my day:
https://crash-stats.mozilla.com/report/index/f5cdb259-fb57-4ae8-9e35-9fe4d2161214
https://crash-stats.mozilla.com/report/index/fd02965a-27af-4ee4-baf4-abebb2161212
https://crash-stats.mozilla.com/report/index/ca2bf79d-901a-4676-86d7-903052161212
…
Flags: needinfo?(milan)
(In reply to Harald Kirschner :digitarald from comment #41)
> Could we look into re-prioritize?
Not sure it's a question of priority at this point, more of a figuring out what to do. Graphics probably not the best component for it either.
There are plans to change the about:support and DXVA behaviour, so perhaps that will take this bug out of the startup crash into another area.
Comment 43•8 years ago
|
||
(In reply to Harald Kirschner :digitarald from comment #41)
> Could we look into re-prioritize? The volume we see is basically just people
> who want to use the platform profiler; we will not see high volume but this
> blocks productivity of anybody who wants to profile Nightly.
Harald, does Hasal enable the profiler during its test runs? Does this crash block Hasal testing of 64-bit Firefox?
Flags: needinfo?(hkirschner)
Comment 44•8 years ago
|
||
Best to ask the Hasal team if this issue popped up.
Flags: needinfo?(hkirschner) → needinfo?(bchien)
Comment 45•8 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #43)
> (In reply to Harald Kirschner :digitarald from comment #41)
> > Could we look into re-prioritize? The volume we see is basically just people
> > who want to use the platform profiler; we will not see high volume but this
> > blocks productivity of anybody who wants to profile Nightly.
>
> Harald, does Hasal enable the profiler during its test runs? Does this crash
> block Hasal testing of 64-bit Firefox?
Hasal didn't run regular daily 64-bit Firefox yet. As Hasal's plan, Hasal will switch to 64-bit as soon as possible in 2017 Q1.
As daily testing, I don't get message on 32-bit with platform profiler enabled.
Flags: needinfo?(bchien)
Comment 46•8 years ago
|
||
Got this today as double crashlog report:
https://crash-stats.mozilla.com/report/index/7e6288e8-44b3-4119-9f50-d42562170109
https://crash-stats.mozilla.com/report/index/ff9b57b4-9c8c-4e38-96e4-ec1a92170109
status-firefox52:
--- → affected
status-firefox53:
--- → affected
Comment 47•8 years ago
|
||
Next crashlog reports:
https://crash-stats.mozilla.com/report/index/6dd770b9-788f-4110-9a67-ebecc2170109
https://crash-stats.mozilla.com/report/index/87445f30-91b4-49fd-b869-ea5af2170109
https://crash-stats.mozilla.com/report/index/7a16d205-cb17-4947-ab6f-dc8a22170109
https://crash-stats.mozilla.com/report/index/fbe152b9-b348-4702-8ccf-d62062170109
https://crash-stats.mozilla.com/report/index/f8cb3f5a-21e2-4830-bf86-30ff72170109
They always happen with enabled Gecko Profiler, making it unusable to get Firefox profiles with this extension.
Updated•8 years ago
|
Updated•8 years ago
|
Comment 48•8 years ago
|
||
Adding crash signature from bug 1191077 since it has been duped.
Crash Signature: [@ msmpeg2vdec.dll@0x25a3fb] → [@ msmpeg2vdec.dll@0x25a3fb]
[@ RtlVirtualUnwind | RtlVirtualUnwindStub | WalkStackMain64 ]
Assignee | ||
Comment 49•8 years ago
|
||
This patch implements my suggestion from comment 40.
cpearce, does it fix the crash you were seeing?
Flags: needinfo?(cpearce)
Comment 50•8 years ago
|
||
Hard to say. A local opt x64 build doesn't crash for me. I pushed your patch to Try as a PGO build, and in the resulting build the content process hangs on Facebook. :(
Flags: needinfo?(cpearce)
Assignee | ||
Comment 51•8 years ago
|
||
Can you grab a "~*k" or a minidump of the hang? (I guess you'll need a push with symbols)
Flags: needinfo?(cpearce)
Comment 52•8 years ago
|
||
I can repro this using Windows 7 64 bit and today's Nightly (Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0)
I followed the STR in comment 2 and it didn't crash.
Keywords: nightly-community
Comment 53•8 years ago
|
||
So I have this "mini" (actually it's around 2GB) dump here which I got by loading a huge tabs count session in FF x64 before lazy tab fix could land (bug 1358283)
Analysis https://pastebin.com/Mk89LaUh
Anything that I may do additionally to help you gentlemen?
Assignee | ||
Comment 54•8 years ago
|
||
Thanks. Your stack agrees with the last bit of comment 28. I think the problem is pretty well understood at this point; the only question is how we're going to fix it.
I'm still proposing that we go with something like comment 49. I am skeptical that Chris's hang was directly caused by that patch -- there are so many other issues in the Win64 profiler right now, many of which I'm trying to fix this week. I'll take a needinfo on myself to rebase this patch and get Chris some new bits when the profiler is in better shape.
Flags: needinfo?(dmajor)
Assignee | ||
Comment 55•8 years ago
|
||
Rebased on top of the recent changes to StackWalk.cpp.
I'm feeling comfortable enough with this patch to move forward with it. I am reasonably sure that the hang that cpearce hit locally was one of the other known Win64 profiler hangs and unrelated to this change.
Nick, I'm not sure how closely you've been following this bug; comment 24 and comment 40 provide a good summary.
Assignee: nobody → dmajor
Attachment #8721664 -
Attachment is obsolete: true
Attachment #8854890 -
Attachment is obsolete: true
Flags: needinfo?(milan)
Flags: needinfo?(dmajor)
Flags: needinfo?(cpearce)
Attachment #8863507 -
Flags: review?(n.nethercote)
Comment 56•8 years ago
|
||
Comment on attachment 8863507 [details] [diff] [review]
Abort if we reach a frame inside msmpeg2vdec
Review of attachment 8863507 [details] [diff] [review]:
-----------------------------------------------------------------
This is not familiar stuff for me, but it seems plausible.
::: mozglue/misc/StackWalk.cpp
@@ +370,3 @@
>
> + // In the loop below we'll need to special-case stack frames in this DLL.
> + HMODULE msmpeg2vdec = GetModuleHandleW(L"msmpeg2vdec.dll");
How expensive is this? I think this is the stack-walking function that's used by the profiler on Win64, so it's unfortunate that we have to make it slower.
Attachment #8863507 -
Flags: review?(n.nethercote) → review+
Assignee | ||
Comment 57•8 years ago
|
||
> > + // In the loop below we'll need to special-case stack frames in this DLL.
> > + HMODULE msmpeg2vdec = GetModuleHandleW(L"msmpeg2vdec.dll");
>
> How expensive is this? I think this is the stack-walking function that's
> used by the profiler on Win64, so it's unfortunate that we have to make it
> slower.
With msmpeg2vdec loaded (so that the code in the inner loop is active), it makes WalkStackMain64 about 20% slower. That is indeed unfortunate, but at least it's not orders of magnitude worse like that time I tried to VirtualQuery inside the inner loop.
I talked to cpearce last week about possibly caching some of the msmpeg addresses but he didn't seem too thrilled. I'm willing to try landing this now and deal with optimization only if it becomes a problem.
Comment 58•8 years ago
|
||
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0e572861c2dc
Abort stack walking if we reach a frame inside msmpeg2vdec. r=njn
Comment 59•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment 60•8 years ago
|
||
Doesn't seem worth backporting to ESR52. I'm not as sure about Beta. Can this ride the 55 train, David?
status-firefox-esr52:
--- → wontfix
Flags: needinfo?(dmajor)
Assignee | ||
Comment 61•8 years ago
|
||
In the past six months (and probably forever) there have been no reports on anything but the nightly channel. Makes sense since the only people likely to hit this are developers doing profiling.
https://crash-stats.mozilla.com/search/?signature=~msmpeg2vdec&cpu_arch=amd64&product=Firefox&version=55.0a1&platform=Windows&date=%3E%3D2016-11-03T19%3A15%3A02.000Z&date=%3C2017-05-03T19%3A15%3A02.000Z&_sort=-date&_facets=signature&_facets=cpu_arch&_facets=release_channel&_facets=build_id&_facets=address&_facets=reason&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-release_channel
Flags: needinfo?(dmajor)
Assignee | ||
Comment 62•8 years ago
|
||
Oh, I guess my query was buggy because somehow it filled in version 55.0a1 by default.
Taking out the version filter, but adding in this particular reason-code, it's still very much nightly: https://crash-stats.mozilla.com/search/?signature=~msmpeg2vdec&cpu_arch=amd64&reason=%3D0x40000015%20%2F%200x00000000&product=Firefox&platform=Windows&date=%3E%3D2016-11-03T19%3A15%3A02.000Z&date=%3C2017-05-03T19%3A15%3A02.000Z&_sort=-date&_facets=signature&_facets=cpu_arch&_facets=release_channel&_facets=build_id&_facets=address&_facets=reason&_facets=version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
Assignee | ||
Comment 63•8 years ago
|
||
Backed out for contributing to bug 1361901 comment 5:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bbdbbcf5cedad5769fd2b284403e754b7d43e642
Assignee | ||
Comment 64•8 years ago
|
||
Backout now merged to central: https://hg.mozilla.org/mozilla-central/rev/bbdbbcf5cedad5769fd2b284403e754b7d43e642
Assignee | ||
Comment 65•8 years ago
|
||
I thought I had posted about this already but apparently it was lost:
After looking at this more closely under a live debugger, the situation is actually messier than I thought. The no-go zone for stack walking is actually not the msmpeg2vdec library itself, but rather a region of RWX (presumably JIT) space that they allocate on load. This makes a lot of sense: I bet they don't bother generating unwind information for their JIT code, so the only thing they can do is abort.
So I think what we want is a hybrid of Edwin's patch and mine: Intercept RtlInstallFunctionTableCallback, but instead of dropping the callback, check whether it's within msmpeg, and if so then we'll record the address of the JIT region and avoid stack-walking it.
If it's any consolation, this crash is only seen on Win7SP1 machines. In newer Windows, msmpeg2vdec.dll no longer uses this callback technique.
Assignee | ||
Comment 66•8 years ago
|
||
Attachment #8868296 -
Flags: review?(aklotz)
Assignee | ||
Comment 67•8 years ago
|
||
Attachment #8863507 -
Attachment is obsolete: true
Attachment #8868297 -
Flags: review?(aklotz)
Updated•8 years ago
|
Attachment #8868296 -
Flags: review?(aklotz) → review+
Updated•8 years ago
|
Attachment #8868297 -
Flags: review?(aklotz) → review+
Comment 68•8 years ago
|
||
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/19e4de97f4c6
Cleanup - Remove the DLL blocklist's anonymous namespace. r=aklotz
https://hg.mozilla.org/integration/mozilla-inbound/rev/83ae7a021aaf
Intercept msmpeg2vdec's JIT unwind callback on Win7 x64. r=aklotz
Comment 69•8 years ago
|
||
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3436882396c9
Bustage followup - 32-bit builds don't have RtlIFTC. CLOSED TREE
Comment 70•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/19e4de97f4c6
https://hg.mozilla.org/mozilla-central/rev/83ae7a021aaf
https://hg.mozilla.org/mozilla-central/rev/3436882396c9
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
Comment 71•8 years ago
|
||
David, is your fix safe enough to uplift to Beta 54? This crash affects 64-bit Firefox and more users will be getting 64-bit in Firefox 54 and 55. In 54, we plan to run a Funnelcake experiment that will serve the 64-bit installer to more users and compare their user engagement metrics against 32-bit users. In 55, we plan to make 64-bit the installer default for all new installs.
Flags: needinfo?(dmajor)
Assignee | ||
Comment 72•8 years ago
|
||
This crash (or at least this variant of it) is highly localized to developers doing profiling on the nightly channel. Given the trouble that always seems to come out of hook patches, I don't think the risk/reward is right for an uplift.
Flags: needinfo?(dmajor)
Comment 73•8 years ago
|
||
(In reply to David Major [:dmajor] from comment #72)
> This crash (or at least this variant of it) is highly localized to
> developers doing profiling on the nightly channel. Given the trouble that
> always seems to come out of hook patches, I don't think the risk/reward is
> right for an uplift.
Sounds good. We can just let your fix ride the 55 train as you recommend.
You need to log in
before you can comment on or make changes to this bug.
Description
•