Closed Bug 1180561 Opened 4 years ago Closed 2 years ago

startup crash in [@ msmpeg2vdec.dll@0x25a3fb ]

Categories

(Core :: Graphics, defect, P2, critical)

41 Branch
All
Windows 7
defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox39 --- unaffected
firefox40 --- unaffected
firefox41 --- wontfix
firefox42 --- wontfix
firefox43 --- wontfix
firefox44 --- wontfix
firefox45 --- wontfix
firefox46 --- wontfix
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 --- wontfix
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- fixed

People

(Reporter: kamidphish, Assigned: dmajor)

References

(Blocks 2 open bugs)

Details

(4 keywords, Whiteboard: [gfx-noted], tpi:+)

Crash Data

Attachments

(2 files, 3 obsolete files)

In the process of trying to repro Bug 1180217, I ran into this crash when attempting to access about:support
54 reports so far last month, all on Windows 7 and all with Nightly (starting with 41.0a1). Not nominating to track since the volume is extremely low. 

Dan, if you have the time it would be useful to see if this reproduces for you further back than Firefox 41 and if not, try to find a regression window using https://mozilla.github.io/mozregression/install.html
I can pretty much infinite crash loop this. See crash report: https://crash-stats.mozilla.com/report/index/30e25ff7-0aca-4801-bed1-1f26d2150708

STR:
1) Install Gecko Profiler addon
2) Disable e10s
3) Load Nightly and try to go to about:support
4) Hit infinite crash loop.

The problem goes away with the gecko profiler addon. Any idea on what the profiler addon is doing benwa during about:support?
Flags: needinfo?(bgirard)
I don't know. There's rare 'boggyman' problems about user code that doesn't handle interrupted system calls properly but that's usually a Linux problem.

I'm guessing it's an interaction with msmpeg2vdecand the 'signaling' of the profiler. A regression window is worth doing but it might just lead us to a change that made about:support start loading msmpeg2vdec.
Flags: needinfo?(bgirard)
Thanks Mason, these steps reproduce easily for me. I'll work on a regression window for Benoit.
Keywords: reproducible
As suggested in comment 3, this is just the changeset where about:support started loading the msmpeg2vdec dll.
Flags: needinfo?(matt.woodrow)
But I can also crash forever by loading a youtube video as well instead of about:support. If you replace the STR with a youtube video, you get this crash.
Can you try to reproduce with the 'stackwalk' option ticked off?
(In reply to Benoit Girard (:BenWa) from comment #8)
> Can you try to reproduce with the 'stackwalk' option ticked off?

The crashes stop for me if Stack Walk is turned off.
Thanks Anthony.

It must be a conflicting interaction between Stackwalk64 & msmpeg2vdec.dll.
I'll try repro'ing success with StackWalk turned off.
I did an additional check to see if the crashes occur with Gecko Profiler removed or disabled, the crash does not happen in either case. I also checked to see how many of the reports for add-on correlations:

~40% have no add-ons at all
~60% have some mixture of add-ons, ~50% of which include Gecko Profiler

Based on this I'm not sure Gecko Profiler is a cause, it's just an easy way to trigger this crash.
I wonder if these other cases are calling StackWalk64 via another mean.

It would be worth checking if the devtools profiler ever triggers this. In general it shouldn't use the 'stackwalk' option but perhaps it's triggered somehow.
I've been playing around with Devtools for an hour but have not reproduced a crash with Gecko Profiler disabled.
I didn't turn stack walk off, but I just stopped the profiler (via the Gecko Profiler Add-on) and the crash disappeared.
It seems like a bad idea to let this one slide! The crash is pretty high up there.
Flags: needinfo?(dglastonbury)
Flags: needinfo?(bgirard)
Whiteboard: [gfx-noted]
This is a nightly only crash and it doesn't seem to be riding the train. This would only affect builds that have profiling enabled.

Likely the best way to fix it would be to use frame pointers on 32-bit. I don't think they are there for 64-bit but I'm not sure.
Flags: needinfo?(bgirard)
I don't know what the solution is. I defer to Benoit.
Flags: needinfo?(dglastonbury)
Also seeing this on 64bit, e.g. bp-cff87de7-6b62-4ab3-a790-c8be42150818
Duplicate of this bug: 1191077
https://crash-stats.mozilla.com/report/index/330de4cb-d084-4804-97aa-361262151012

FF44.0a1

Crashing Thread
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::`anonymous namespace'::RunWatchdog(void*) 	toolkit/components/terminator/nsTerminator.cpp
1 	nss3.dll 	PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c
2 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c
3 	msvcr120.dll 	_callthreadstartex 	f:\dd\vctools\crt\crtw32\startup\threadex.c:376
4 	msvcr120.dll 	_threadstartex 	f:\dd\vctools\crt\crtw32\startup\threadex.c:354
Ø 5 	kernel32.dll 	kernel32.dll@0x159dc 	
Ø 6 	ntdll.dll 	ntdll.dll@0x2a630 	
Ø 7 	kernel32.dll 	kernel32.dll@0x9ba5f 	
Ø 8 	kernel32.dll 	kernel32.dll@0x9ba5f
Hardware: x86 → All
Summary: crash in msmpeg2vdec.dll@0x25a3fb → crash in [@ msmpeg2vdec.dll@0x25a3fb ]
This is now showing up, perhaps unsurprisingly, as a startup crash in Firefox 45.0a1.
Summary: crash in [@ msmpeg2vdec.dll@0x25a3fb ] → startup crash in [@ msmpeg2vdec.dll@0x25a3fb ]
See Also: → 1224595
Keywords: topcrash-win
Edwin, can you see if there is something you can do?
Flags: needinfo?(edwin)
Assignee: nobody → edwin
Status: NEW → ASSIGNED
Flags: needinfo?(edwin)
The crash here appears to be intentional.

When msmpeg2vdec.dll is first loaded, it installs a symbol resolution callback using RtlInstallFunctionTableCallback. However, all this callback does is unset our exception handler and then crash.

When we walk the stack (when running the profiler, for instance) we can end up trying to walk through this dll. When RtlLookupFunctionEntry is called it invokes the callback in an attempt to map the address to a function entry, which silently crashes.

This patch simply hooks the call to RtlInstallFunctionTableCallback and drops it. There are no genuine uses of this function for any of our library dependencies, AFAICT.
Attachment #8721664 - Flags: review?(benjamin)
msmpeg2vdec.dll unsets our exception handler?! Does that mean our crash reporter just doesn't work with any process that has this DLL loaded? Please file that if so.

I'm going to bounce this to aklotz since he's the resident expert about this kind of patching madness.
Attachment #8721664 - Flags: review?(benjamin) → review?(aklotz)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #25)
> msmpeg2vdec.dll unsets our exception handler?! Does that mean our crash
> reporter just doesn't work with any process that has this DLL loaded? Please
> file that if so.

No, it only unsets the exception handler (...and then throws an exception...) when this callback is invoked, which is when the DLL is on the stack and we try to walk it.

Happily, the crash reporter caught this just fine since it hooks SetUnhandledExceptionFilter at [1]. Made it a pain in the neck to debug, though. :)

[1] https://dxr.mozilla.org/mozilla-central/rev/e1cf617a1f2813b6cd66f460313a61c223406c9b/toolkit/crashreporter/nsExceptionHandler.cpp#1382
I'd like to know more about what is going on here before evaluating this. Edwin, can you please post more information about what this callback is actually doing?

I'd at least like to see a call stack of the callback clearing our exception handler, as well as maybe a few lines of the assembly listing showing it raising the exception.

Thanks!
Flags: needinfo?(edwin)
Installing the callback on init:

> 0:000> k
> Child-SP          RetAddr           Call Site
> 00000000`0022cea8 00000000`76e2ba9e ntdll!RtlInstallFunctionTableCallback
> 00000000`0022ceb0 000007fe`eaea8fb6 kernel32!RtlInstallFunctionTableCallbackStub+0x1e
> 00000000`0022cef0 000007fe`eace0d01 msmpeg2vdec!DllCanUnloadNow+0x6ab6e
> 00000000`0022cf80 00000000`77037c78 msmpeg2vdec+0x90d01
> 00000000`0022cfb0 00000000`77037a49 ntdll!LdrpRunInitializeRoutines+0x1fe
> 00000000`0022d180 00000000`77026cf3 ntdll!LdrpLoadDll+0x594
> 00000000`0022d380 000007fe`f647a9b6 ntdll!LdrLoadDll+0xed
> 00000000`0022d3f0 000007fe`fcf692a7 mozglue!double_conversion::BignumDtoa+0x18a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mfbt\double-conversion\bignum-dtoa.cc @ 141]
> 00000000`0022d560 000007fe`feb34e5e KERNELBASE!LoadLibraryExW+0x2b3
> 00000000`0022d5e0 000007fe`feb34dd7 ole32!LoadLibraryWithLogging+0x2e [d:\w7rtm\com\ole32\common\loadfree.cxx @ 157]
> 00000000`0022d630 000007fe`feb34bdc ole32!CClassCache::CDllPathEntry::LoadDll+0x5b [d:\w7rtm\com\ole32\com\objact\dllcache.cxx @ 1927]
> 00000000`0022d670 000007fe`feb31cc1 ole32!CClassCache::CDllPathEntry::Create_rl+0x4c [d:\w7rtm\com\ole32\com\objact\dllcache.cxx @ 1783]
> 00000000`0022d6d0 000007fe`feb31b93 ole32!CClassCache::CClassEntry::CreateDllClassEntry_rl+0xe1 [d:\w7rtm\com\ole32\com\objact\dllcache.cxx @ 886]
> 00000000`0022d990 000007fe`feb32445 ole32!CClassCache::GetClassObjectActivator+0x5ab [d:\w7rtm\com\ole32\com\objact\dllcache.cxx @ 4795]
> 00000000`0022dae0 000007fe`feb32308 ole32!CClassCache::GetClassObject+0x45 [d:\w7rtm\com\ole32\com\objact\dllcache.cxx @ 4580]
> 00000000`0022db50 000007fe`feb316ba ole32!CServerContextActivator::CreateInstance+0x178 [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 974]
> 00000000`0022dc70 000007fe`feb3194c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x5e [d:\w7rtm\com\ole32\actprops\actprops.cxx @ 1917]
> 00000000`0022dce0 000007fe`feb319be ole32!CApartmentActivator::CreateInstance+0x88 [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 2251]
> 00000000`0022dd80 000007fe`feb56e63 ole32!CProcessActivator::CCICallback+0x4e [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 1716]
> 00000000`0022ddc0 000007fe`feb56df7 ole32!CProcessActivator::AttemptActivation+0x33 [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 1630]
> 00000000`0022de00 000007fe`feb32587 ole32!CProcessActivator::ActivateByContext+0x7f [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 1488]
> 00000000`0022de90 000007fe`feb316ba ole32!CProcessActivator::CreateInstance+0x77 [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 1389]
> 00000000`0022ded0 000007fe`feb317c7 ole32!ActivationPropertiesIn::DelegateCreateInstance+0x5e [d:\w7rtm\com\ole32\actprops\actprops.cxx @ 1917]
> 00000000`0022df40 000007fe`feb316ba ole32!CClientContextActivator::CreateInstance+0xeb [d:\w7rtm\com\ole32\com\objact\actvator.cxx @ 685]
> 00000000`0022e1f0 000007fe`feb320b0 ole32!ActivationPropertiesIn::DelegateCreateInstance+0x5e [d:\w7rtm\com\ole32\actprops\actprops.cxx @ 1917]
> 00000000`0022e260 000007fe`feb475eb ole32!ICoCreateInstanceEx+0x5f5 [d:\w7rtm\com\ole32\com\objact\objact.cxx @ 1334]
> 00000000`0022edb0 000007fe`e424c4e6 ole32!CoCreateInstance+0x17b [d:\w7rtm\com\ole32\com\objact\actapi.cxx @ 108]
> 00000000`0022eea0 000007fe`e424c633 xul!mozilla::MFTDecoder::Create+0x3e [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\media\platforms\wmf\mftdecoder.cpp @ 42]
> 00000000`0022eee0 000007fe`e424cab7 xul!mozilla::CanCreateMFTDecoder+0x6b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\media\platforms\wmf\wmfdecodermodule.cpp @ 148]
> 00000000`0022ef10 000007fe`e424d271 xul!mozilla::CanCreateWMFDecoder<CLSID_CMSH264DecoderMFT>+0x93 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\media\platforms\wmf\wmfdecodermodule.cpp @ 161]
> 00000000`0022ef40 000007fe`e424d5c6 xul!mozilla::WMFDecoderModule::SupportsMimeType+0x59 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\media\platforms\wmf\wmfdecodermodule.cpp @ 188]
> 00000000`0022ef70 000007fe`e424d40e xul!mozilla::HasGMPFor+0x14e [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\media\platforms\agnostic\gmp\gmpdecodermodule.cpp @ 112]
> 00000000`0022f010 000007fe`e433eb5d xul!mozilla::GMPDecoderModule::UpdateUsableCodecs+0x112 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\dom\media\platforms\agnostic\gmp\gmpdecodermodule.cpp @ 157]
> 00000000`0022f0b0 000007fe`e40e00e4 xul!nsRunnableFunction<<lambda_2fa968b940fba08a9b1a2f7f0244fc0e> >::Run+0x9 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\obj-firefox\dist\include\nsthreadutils.h @ 262]
> 00000000`0022f0e0 000007fe`e40deaa6 xul!nsThread::ProcessNextEvent+0x380 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\xpcom\threads\nsthread.cpp @ 1024]
> 00000000`0022f260 000007fe`e3eba257 xul!mozilla::ipc::MessagePump::Run+0x7a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\ipc\glue\messagepump.cpp @ 95]
> 00000000`0022f2c0 000007fe`e3eb9952 xul!MessageLoop::RunHandler+0x17 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\ipc\chromium\src\base\message_loop.cc @ 228]
> 00000000`0022f2f0 000007fe`e40df808 xul!MessageLoop::Run+0x3e [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\ipc\chromium\src\base\message_loop.cc @ 202]
> 00000000`0022f340 000007fe`e40df3df xul!nsBaseAppShell::Run+0x3c [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\widget\nsbaseappshell.cpp @ 158]
> 00000000`0022f370 000007fe`e3e2ffeb xul!nsAppShell::Run+0x27 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\widget\windows\nsappshell.cpp @ 259]
> 00000000`0022f3a0 000007fe`e425b304 xul!nsAppStartup::Run+0x27 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\toolkit\components\startup\nsappstartup.cpp @ 282]
> 00000000`0022f3d0 000007fe`e425b4f6 xul!XREMain::XRE_mainRun+0x5d0 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\toolkit\xre\nsapprunner.cpp @ 4284]
> 00000000`0022f690 000007fe`e421d3bb xul!XREMain::XRE_main+0x14e [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\toolkit\xre\nsapprunner.cpp @ 4380]
> 00000000`0022f6f0 00000001`3fc917f0 xul!XRE_main+0x6b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\toolkit\xre\nsapprunner.cpp @ 4483]
> 00000000`0022f8d0 00000001`3fc913e1 firefox!do_main+0x160 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\browser\app\nsbrowserapp.cpp @ 220]
> 00000000`0022fb00 00000001`3fc91240 firefox!NS_internal_main+0x111 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\browser\app\nsbrowserapp.cpp @ 362]
> 00000000`0022fbd0 00000001`3fc92758 firefox!wmain+0x190 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\toolkit\xre\nswindowswmain.cpp @ 138]
> 00000000`0022fc30 00000000`76df59ed firefox!__tmainCRTStartup+0x144 [f:\dd\vctools\crt\crtw32\startup\crt0.c @ 255]
> 00000000`0022fc70 00000000`7702b371 kernel32!BaseThreadInitThunk+0xd
> 00000000`0022fca0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d


With callback = arg 4 = r9 = 0x000007feeaeaa3d4:

> 0:000> .frame /r
> 00 00000000`0022cea8 00000000`76e2ba9e ntdll!RtlInstallFunctionTableCallback
> rax=0000000000000000 rbx=000007feeac50000 rcx=000007ff3d000003
> rdx=000007ff3d000000 rsi=000007ff3d000000 rdi=0000000000000000
> rip=00000000770023f0 rsp=000000000022cea8 rbp=0000000000000000
>  r8=000000000000a000  r9=000007feeaeaa3d4 r10=0000000000000000
> r11=0000000000000246 r12=0000000000000001 r13=000000000000a000
> r14=0000000000000001 r15=0000000000000000
> iopl=0         nv up ei pl nz na pe nc
> cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
> ntdll!RtlInstallFunctionTableCallback:
> 00000000`770023f0 4c8bdc          mov     r11,rsp


Which is in msmpeg2vdec.dll:

> 0:000> lm
> start             end                 module name
> [...]
> 00000000`76de0000 00000000`76eff000   kernel32   (pdb symbols)          c:\symbols\kernel32.pdb\25851C45C74242DB981B353136F10DC82\kernel32.pdb
> [...]
> 000007fe`eac50000 000007fe`eaef9000   msmpeg2vdec   (export symbols)       C:\Windows\System32\msmpeg2vdec.dll
> 000007fe`eb310000 000007fe`eb453000   msmpeg2adec   (export symbols)       C:\Windows\System32\msmpeg2adec.dll
> [...]


Disassembled (note: DllCanUnloadNow is a red herring -- this DLL has very few exported symbols):

> 000007fe`eaeaa3d4 4881ecb8050000  sub     rsp,5B8h
> 000007fe`eaeaa3db 488b0566bf0200  mov     rax,qword ptr [msmpeg2vdec!DllCanUnloadNow+0x97f00 (000007fe`eaed6348)]
> 000007fe`eaeaa3e2 4833c4          xor     rax,rsp
> 000007fe`eaeaa3e5 48898424a0050000 mov     qword ptr [rsp+5A0h],rax
> 000007fe`eaeaa3ed 488d8c24d0000000 lea     rcx,[rsp+0D0h]
> 000007fe`eaeaa3f5 ff15ed6e0400    call    qword ptr [msmpeg2vdec!DllCanUnloadNow+0xb2ea0 (000007fe`eaef12e8)]
>  = 0x0000000076e2bb50
>  > 00000000`76e2bb50 48ff25e9040500  jmp     qword ptr [kernel32!_imp_RtlCaptureContext (00000000`76e7c040)]
> 000007fe`eaeaa3fb 41b894000000    mov     r8d,94h
> 000007fe`eaeaa401 488d4c2434      lea     rcx,[rsp+34h]
> 000007fe`eaeaa406 33d2            xor     edx,edx
> 000007fe`eaeaa408 e81bb8ddff      call    msmpeg2vdec+0x35c28 (000007fe`eac85c28)
>  = 000007fe`eac85c28 ff25e2b42600    jmp     qword ptr [msmpeg2vdec!DllCanUnloadNow+0xb2cc8 (000007fe`eaef1110)]
>  = 0x000007fefd521000
>  > msvcrt!memset
> 000007fe`eaeaa40d 33c9            xor     ecx,ecx
> 000007fe`eaeaa40f 488d442430      lea     rax,[rsp+30h]
> 000007fe`eaeaa414 4c8b9c24b8050000 mov     r11,qword ptr [rsp+5B8h]
> 000007fe`eaeaa41c 4889442420      mov     qword ptr [rsp+20h],rax
> 000007fe`eaeaa421 488d8424d0000000 lea     rax,[rsp+0D0h]
> 000007fe`eaeaa429 4c895c2440      mov     qword ptr [rsp+40h],r11
> 000007fe`eaeaa42e c744243015000040 mov     dword ptr [rsp+30h],40000015h
> 000007fe`eaeaa436 4889442428      mov     qword ptr [rsp+28h],rax
> 000007fe`eaeaa43b ff15bf6e0400    call    qword ptr [msmpeg2vdec!DllCanUnloadNow+0xb2eb8 (000007fe`eaef1300)]
>  = 0x0000000076df9040
>  > kernel32!SetUnhandledExceptionFilter
> 000007fe`eaeaa441 488d4c2420      lea     rcx,[rsp+20h]
> 000007fe`eaeaa446 ff15dc6f0400    call    qword ptr [msmpeg2vdec!DllCanUnloadNow+0xb2fe0 (000007fe`eaef1428)]
>  = 0x0000000076e7b890
>  kernel32!UnhandledExceptionFilter
> 000007fe`eaeaa44c ff15b66e0400    call    qword ptr [msmpeg2vdec!DllCanUnloadNow+0xb2ec0 (000007fe`eaef1308)]
>  = 0x0000000076df51b0
>  > kernel32!GetCurrentProcessStub:
>  > 00000000`76df51b0 eb06            jmp     kernel32!GetCurrentProcess (00000000`76df51b8)
> 000007fe`eaeaa452 488bc8          mov     rcx,rax
> 000007fe`eaeaa455 ba03000000      mov     edx,3
> 000007fe`eaeaa45a ff15b06e0400    call    qword ptr [msmpeg2vdec!DllCanUnloadNow+0xb2ec8 (000007fe`eaef1310)]
>  = 0x0000000076e2c100
>  > kernel32!TerminateProcessStub:
>  > 00000000`76e2c100 e9734cfeff      jmp     kernel32!TerminateProcess (00000000`76e10d78)
> 000007fe`eaeaa460 cc              int     3


Aaaaand boom (except less of a boom and more of a whimper because the debugger fails to catch it):

> 0:000> bu 000007feeaeaa3d4
> 
> Breakpoint 3 hit
> msmpeg2vdec!DllCanUnloadNow+0x6bf8c:
> 000007fe`eaeaa3d4 4881ecb8050000  sub     rsp,5B8h
> 0:062> k
> Child-SP          RetAddr           Call Site
> 00000000`263e7128 00000000`7707a2e9 msmpeg2vdec!DllCanUnloadNow+0x6bf8c
> 00000000`263e7130 00000000`770607dc ntdll!RtlpLookupDynamicFunctionEntry+0xb9
> 00000000`263e7160 000007fe`f64b2e33 ntdll!RtlLookupFunctionEntry+0xa3
> 00000000`263e7190 000007fe`f64b2918 mozglue!WalkStackMain64+0x1ab [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 413]
> 00000000`263e76e0 000007fe`e26548ab mozglue!MozStackWalk+0x184 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\mozglue\misc\stackwalk.cpp @ 591]
> 00000000`263eb800 000007fe`e265165e xul!GeckoSampler::doNativeBacktrace+0x9b [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 980]
> 00000000`263ef6f0 000007fe`e264d6c6 xul!GeckoSampler::InplaceTick+0x8a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\geckosampler.cpp @ 1198]
> 00000000`263ef790 000007fe`e264d504 xul!SamplerThread::SampleContext+0x14a [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cc @ 211]
> 00000000`263efd50 000007fe`e264d73e xul!SamplerThread::Run+0xd4 [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cc @ 147]
> 00000000`263efda0 000007fe`f0834f7f xul!ThreadEntry+0xa [c:\builds\moz2_slave\m-cen-w64-ntly-000000000000000\build\src\tools\profiler\core\platform-win32.cc @ 256]
> 00000000`263efdd0 000007fe`f0835126 MSVCR120!_callthreadstartex+0x17 [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376]
> 00000000`263efe00 00000000`76df59ed MSVCR120!_threadstartex+0x102 [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354]
> 00000000`263efe30 00000000`7702b371 kernel32!BaseThreadInitThunk+0xd
> 00000000`263efe60 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
Flags: needinfo?(edwin)
Comment on attachment 8721664 [details] [diff] [review]
A patch that shouldn't really have to exist

Review of attachment 8721664 [details] [diff] [review]:
-----------------------------------------------------------------

We definitely need to do something here, but I'm not super keen on taking this as-is. In particular, I am hesitant to drop all calls to this function -- I've been burned too many times by injected 3rd-party code that becomes messed up stuff like this.

I'd rather that we examine the address of callback, and if it falls within msmpeg2vdec's .text section, we block it. Otherwise we let it through.

Do you want to implement the "check if msmpeg2vdec is present and get its .text section bounds" code? If not, I would be happy to do so -- I'm very familiar with the PE format.
Attachment #8721664 - Flags: review?(aklotz) → review-
This is still common on Nightly. E.g. for Nightly 20160405030214 there are currently 4 crashes, making it the #5 cause of crashes.
ni'ing edwin for aklotz's question in comment 29.
Flags: needinfo?(edwin)
Any progress on this, Edwin?
FYI, this is currently the #11 topcrash in Nightly with 34 crashes reported last week (1.07% of overall crash volume). It still seems to be mostly isolated to Nightly versions.
(In reply to Aaron Klotz [:aklotz] from comment #29)
> Do you want to implement the "check if msmpeg2vdec is present and get its
> .text section bounds" code? If not, I would be happy to do so -- I'm very
> familiar with the PE format.

Hey Aaron, if you have time I might take you up on this. I'm pretty out of my depth when it comes to Windows internals.
Flags: needinfo?(edwin) → needinfo?(aklotz)
Super busy but I'll take it.
Assignee: edwin → aklotz
Flags: needinfo?(aklotz)
Whiteboard: [gfx-noted] → [gfx-noted], tpi:?
Aaron, any news about this bug? This is a startup crash. Thanks
Flags: needinfo?(aklotz)
59 crashes over seven days, this doesn't look very serious. As such it doesn't take precedence over the current e10s work aklotz is on.
At present volume, #29 @ 0.67% w/29 reports in Firefox 50.0a1, this no longer qualifies as a topcrash.
Keywords: topcrash-win
Still happening in 51, but it's not super high volume. Removing stale needinfo; I'm sure Aaron will get to this in due course as per the priority of things on his plate.
Flags: needinfo?(aklotz)
Assignee: aklotz → nobody
Status: ASSIGNED → NEW
Priority: -- → P2
Whiteboard: [gfx-noted], tpi:? → [gfx-noted], tpi:+
Comment on attachment 8721664 [details] [diff] [review]
A patch that shouldn't really have to exist

I don't think it's safe to drop msmpeg2vdec's callback. That callback is their way of saying "if there's an exception in these functions, don't bother unwinding, let's just crash." We very nearly took the same approach for our JIT in bug 844196 comment 86. It's possible that some subset of these reports are actually msmpeg2vdec attempting to crash safely, and preventing that could be a security problem.

The problem is that msmpeg2vdec assumes that the only time they'd be unwound is by the SEH handler in an exception, but our profiler also tries to unwind them in benign situations. I don't know of a good way to deal with that while preserving the safe behavior in an actual exception. Maybe abort stack walking if we reach this version of this dll? It's yucky, but I don't have a better idea.
Could we look into re-prioritize? The volume we see is basically just people who want to use the platform profiler; we will not see high volume but this blocks productivity of anybody who wants to profile Nightly.

Nightly is now rolling out as 64-bit by default. This might lead to more people crashing with profiler installed.

As a personal anecdote, I just did extensive profiling on Thinkpad T450 with Windows 7 Pro and had to switch to 32bit Nightly because the browser kept crashing on 64bit.

Excerpt from my day:

https://crash-stats.mozilla.com/report/index/f5cdb259-fb57-4ae8-9e35-9fe4d2161214
https://crash-stats.mozilla.com/report/index/fd02965a-27af-4ee4-baf4-abebb2161212
https://crash-stats.mozilla.com/report/index/ca2bf79d-901a-4676-86d7-903052161212
Flags: needinfo?(milan)
(In reply to Harald Kirschner :digitarald from comment #41)
> Could we look into re-prioritize? 

Not sure it's a question of priority at this point, more of a figuring out what to do. Graphics probably not the best component for it either.
There are plans to change the about:support and DXVA behaviour, so perhaps that will take this bug out of the startup crash into another area.
(In reply to Harald Kirschner :digitarald from comment #41)
> Could we look into re-prioritize? The volume we see is basically just people
> who want to use the platform profiler; we will not see high volume but this
> blocks productivity of anybody who wants to profile Nightly.

Harald, does Hasal enable the profiler during its test runs? Does this crash block Hasal testing of 64-bit Firefox?
Flags: needinfo?(hkirschner)
Best to ask the Hasal team if this issue popped up.
Flags: needinfo?(hkirschner) → needinfo?(bchien)
(In reply to Chris Peterson [:cpeterson] from comment #43)
> (In reply to Harald Kirschner :digitarald from comment #41)
> > Could we look into re-prioritize? The volume we see is basically just people
> > who want to use the platform profiler; we will not see high volume but this
> > blocks productivity of anybody who wants to profile Nightly.
> 
> Harald, does Hasal enable the profiler during its test runs? Does this crash
> block Hasal testing of 64-bit Firefox?

Hasal didn't run regular daily 64-bit Firefox yet. As Hasal's plan, Hasal will switch to 64-bit as soon as possible in 2017 Q1. 

As daily testing, I don't get message on 32-bit with platform profiler enabled.
Flags: needinfo?(bchien)
Adding crash signature from bug 1191077 since it has been duped.
Crash Signature: [@ msmpeg2vdec.dll@0x25a3fb] → [@ msmpeg2vdec.dll@0x25a3fb] [@ RtlVirtualUnwind | RtlVirtualUnwindStub | WalkStackMain64 ]
This patch implements my suggestion from comment 40.

cpearce, does it fix the crash you were seeing?
Flags: needinfo?(cpearce)
Hard to say. A local opt x64 build doesn't crash for me. I pushed your patch to Try as a PGO build, and in the resulting build the content process hangs on Facebook.  :(
Flags: needinfo?(cpearce)
Can you grab a "~*k" or a minidump of the hang? (I guess you'll need a push with symbols)
Flags: needinfo?(cpearce)
I can repro this using Windows 7 64 bit and today's Nightly (Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0)

I followed the STR in comment 2 and it didn't crash.
So I have this "mini" (actually it's around 2GB) dump here which I got by loading a huge tabs count session in FF x64 before lazy tab fix could land (bug 1358283) 
Analysis https://pastebin.com/Mk89LaUh

Anything that I may do additionally to help you gentlemen?
Thanks. Your stack agrees with the last bit of comment 28. I think the problem is pretty well understood at this point; the only question is how we're going to fix it.

I'm still proposing that we go with something like comment 49. I am skeptical that Chris's hang was directly caused by that patch -- there are so many other issues in the Win64 profiler right now, many of which I'm trying to fix this week. I'll take a needinfo on myself to rebase this patch and get Chris some new bits when the profiler is in better shape.
Flags: needinfo?(dmajor)
Rebased on top of the recent changes to StackWalk.cpp.

I'm feeling comfortable enough with this patch to move forward with it. I am reasonably sure that the hang that cpearce hit locally was one of the other known Win64 profiler hangs and unrelated to this change.

Nick, I'm not sure how closely you've been following this bug; comment 24 and comment 40 provide a good summary.
Assignee: nobody → dmajor
Attachment #8721664 - Attachment is obsolete: true
Attachment #8854890 - Attachment is obsolete: true
Flags: needinfo?(milan)
Flags: needinfo?(dmajor)
Flags: needinfo?(cpearce)
Attachment #8863507 - Flags: review?(n.nethercote)
Comment on attachment 8863507 [details] [diff] [review]
Abort if we reach a frame inside msmpeg2vdec

Review of attachment 8863507 [details] [diff] [review]:
-----------------------------------------------------------------

This is not familiar stuff for me, but it seems plausible.

::: mozglue/misc/StackWalk.cpp
@@ +370,3 @@
>  
> +  // In the loop below we'll need to special-case stack frames in this DLL.
> +  HMODULE msmpeg2vdec = GetModuleHandleW(L"msmpeg2vdec.dll");

How expensive is this? I think this is the stack-walking function that's used by the profiler on Win64, so it's unfortunate that we have to make it slower.
Attachment #8863507 - Flags: review?(n.nethercote) → review+
> > +  // In the loop below we'll need to special-case stack frames in this DLL.
> > +  HMODULE msmpeg2vdec = GetModuleHandleW(L"msmpeg2vdec.dll");
> 
> How expensive is this? I think this is the stack-walking function that's
> used by the profiler on Win64, so it's unfortunate that we have to make it
> slower.

With msmpeg2vdec loaded (so that the code in the inner loop is active), it makes WalkStackMain64 about 20% slower. That is indeed unfortunate, but at least it's not orders of magnitude worse like that time I tried to VirtualQuery inside the inner loop.

I talked to cpearce last week about possibly caching some of the msmpeg addresses but he didn't seem too thrilled. I'm willing to try landing this now and deal with optimization only if it becomes a problem.
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0e572861c2dc
Abort stack walking if we reach a frame inside msmpeg2vdec. r=njn
https://hg.mozilla.org/mozilla-central/rev/0e572861c2dc
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Doesn't seem worth backporting to ESR52. I'm not as sure about Beta. Can this ride the 55 train, David?
Backed out for contributing to bug 1361901 comment 5:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bbdbbcf5cedad5769fd2b284403e754b7d43e642
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I thought I had posted about this already but apparently it was lost:

After looking at this more closely under a live debugger, the situation is actually messier than I thought. The no-go zone for stack walking is actually not the msmpeg2vdec library itself, but rather a region of RWX (presumably JIT) space that they allocate on load. This makes a lot of sense: I bet they don't bother generating unwind information for their JIT code, so the only thing they can do is abort.

So I think what we want is a hybrid of Edwin's patch and mine: Intercept RtlInstallFunctionTableCallback, but instead of dropping the callback, check whether it's within msmpeg, and if so then we'll record the address of the JIT region and avoid stack-walking it.

If it's any consolation, this crash is only seen on Win7SP1 machines. In newer Windows, msmpeg2vdec.dll no longer uses this callback technique.
Attachment #8863507 - Attachment is obsolete: true
Attachment #8868297 - Flags: review?(aklotz)
Blocks: 1366030
Attachment #8868296 - Flags: review?(aklotz) → review+
Attachment #8868297 - Flags: review?(aklotz) → review+
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/19e4de97f4c6
Cleanup - Remove the DLL blocklist's anonymous namespace. r=aklotz
https://hg.mozilla.org/integration/mozilla-inbound/rev/83ae7a021aaf
Intercept msmpeg2vdec's JIT unwind callback on Win7 x64. r=aklotz
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3436882396c9
Bustage followup - 32-bit builds don't have RtlIFTC. CLOSED TREE
David, is your fix safe enough to uplift to Beta 54? This crash affects 64-bit Firefox and more users will be getting 64-bit in Firefox 54 and 55. In 54, we plan to run a Funnelcake experiment that will serve the 64-bit installer to more users and compare their user engagement metrics against 32-bit users. In 55, we plan to make 64-bit the installer default for all new installs.
Flags: needinfo?(dmajor)
This crash (or at least this variant of it) is highly localized to developers doing profiling on the nightly channel. Given the trouble that always seems to come out of hook patches, I don't think the risk/reward is right for an uplift.
Flags: needinfo?(dmajor)
(In reply to David Major [:dmajor] from comment #72)
> This crash (or at least this variant of it) is highly localized to
> developers doing profiling on the nightly channel. Given the trouble that
> always seems to come out of hook patches, I don't think the risk/reward is
> right for an uplift.

Sounds good. We can just let your fix ride the 55 train as you recommend.
See Also: → 1490637
You need to log in before you can comment on or make changes to this bug.