Closed Bug 1247741 Opened 10 years ago Closed 10 years ago

Crash in every tab on loading - Razer Synapse

Categories

(Core :: General, defect)

x86_64
Windows 10
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox46 blocking fixed
firefox47 + fixed

People

(Reporter: gcp, Assigned: gcp)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file, 2 obsolete files)

Regression in today's Nightly. Every tab crashes all the time. All stacks are like: https://crash-stats.mozilla.com/report/index/19891fe9-b7e4-4bfe-8fbc-70a232160211 https://crash-stats.mozilla.com/report/index/bfe8dae5-b0ff-4d9b-8000-de6522160211 https://crash-stats.mozilla.com/report/index/932f1738-6bc7-41a1-9798-431232160211 This sounds related to bug 1240848 and bug 1241921, although this machine doesn't have an Nvidia card. The stack points to Razer Synapse (headset/mouse drivers) being the culprit, and indeed stopping its services makes Firefox work again.
Welcome to the API hooking hell...
So I guess what's happening here is that Razer is trying to patch LdrLoadDll, but does so in a broken manner that ends up corrupting our own hook "patched_LdrLoadDll". When our hook then runs it crashes due to the corruption.
This reproduces on Windows 8.1 and Windows 10 (at least). It's a regression with Razers' latest driver update. Every Firefox with e10s is broken up to the moment we introduced DLL blacklisting in content.
Crash Signature: [@ RtlInitUnicodeStringEx | SearchPathW ]
I got this in the debugger now. patched_LdrLoadDll is being called with a corrupted filePath pointer: + filePath 0x00002009 <Error reading characters of string.> wchar_t * The flags (0), handle (0) and moduleFileName look ok: moduleFileName = 0x0018c200 {Length=64 MaximumLength=66 Buffer=0x0018c2b8 L"C:\\WINDOWS\\System32\\MMDevApi.dll" } The corrupted filePath eventually causes a crash when we try to call SearchPathW with the invalid pointer. Going up the stack, I see a detour through 0KrakenDevProps,dll that ends up going through a load of combase.dll calls as well (it's also visible in the crash stat above). We eventually get back to our own stacks, which seem to indicate we're actually trying to load nsNativeModuleLoader::LoadModule(mozilla::FileLocation & aFile) Line 125 nsComponentManagerImpl::ManifestBinaryComponent(nsComponentManagerImpl::ManifestProcessingContext & aCx, int aLineNo, char * const * aArgv) Line 698 nsAString_internal = {mData=0x030911e8 L"c:\\mozilla\\mozilla-central\\objdir-desktop\\dist\\bin\\browser\\components\\browsercomps.dll" ...} Okay, now there's an interesting remark at the top of getFullPath: https://dxr.mozilla.org/mozillacentral/rev/ea39d4a6232c278dd8d805608a07cf9f4cc4c76b/mozglue/build/WindowsDllBlocklist.cpp#485 I see us being called with 0x2009 (=8201 dec). So maybe that 4096 magic value can be replaced by 16384 or something? But it might be better to find out what those values actually mean.
Digging through Wine sources, they seem to check whether the moduleFileName contains a path, and whether it is a relative path. If it is, then filePath is used, but otherwise it isn't. I think we might be able to use similar logic to protect this path? Maybe simply try without using the filePath string, and if that fails, add it?
I've traced a Chrome session with API Monitor. As first arguments to LdrLoadDLL I see: 1, 9, 2049 and 8201. DLLs loaded with 8201: # Type Name Pre-Call Value Post-Call Value 4 PVOID* BaseAddress 0x07e2df04 = NULL 0x07e2df04 = 0x5b650000 "C:\Windows\SYSTEM32\devenum.dll" # Type Name Pre-Call Value Post-Call Value 4 PVOID* BaseAddress 0x07e2dc64 = NULL 0x07e2dc64 = 0x5b1d0000 "C:\Windows\SYSTEM32\ksproxy.ax" # Type Name Pre-Call Value Post-Call Value 4 PVOID* BaseAddress 0x07e2db14 = NULL 0x07e2db14 = 0x5b630000 "C:\Windows\SYSTEM32\vidcap.ax" # Type Name Pre-Call Value Post-Call Value 4 PVOID* BaseAddress 0x07e2db14 = NULL 0x07e2db14 = 0x5b610000 "C:\Windows\SYSTEM32\kswdmcap.ax" This looks like it's all audio/video stuff.
Also found bug 1198186. Looking at ntdll in the disassembler now...
(In reply to Gian-Carlo Pascutto [:gcp] from comment #4) > I see us being called with 0x2009 (=8201 dec). So maybe that 4096 magic > value can be replaced by 16384 or something? But it might be better to find > out what those values actually mean. On Win32, any valid pointers cannot point a lower address than 65536 because otherwise it is indistinguishable from atoms that can be any 16-bit values. For example, the GetProp function will take a string or an atom in the second parameter. If the integral value of the parameter is lower than 65536, it will be treated as an atom [1]. Therefore, we can safely bump the boundary-value to 65536. [1] https://msdn.microsoft.com/en-us/library/windows/desktop/ms633564%28v=vs.85%29.aspx
See comment #8 for the rationale.
Assignee: nobody → VYV03354
Status: NEW → ASSIGNED
Attachment #8719728 - Flags: review?(aklotz)
Alright, what I found: ntldll.dll LdrLoadDLL has 2 specific checks for the first variable: - x & 0x401 == 0x401 if so bail with INVALID PARAMETER - In call to LdrpInitializeDllPath, if x & 1 then the value & 0xFFFFFFFE looks like it's used as a flag or it might be a(n index) to a list of search paths and moduleFileName replaces the path. http://source.winehq.org/git/wine.git/blob/refs/heads/master:/dlls/ntdll/loader.c#l2098 https://www.virtualbox.org/svn/vbox/trunk/src/VBox/HostDrivers/Support/win/SUPR3HardenedMain-win.cpp Note the latter also uses: !((uintptr_t)pwszSearchPath & 1) && (uintptr_t)pwszSearchPath >= 0x2000U ? pwszSearchPath : L"<flags>") Both the Wine and ReactOS re-implementations of LdrLoadDLL actually avoid using the first parameter unless they have determined that the second one is a relative path. So what we're doing is not very solid in that regard, but adding all those paths checks to our code is going to be messy and hard to get right. Given what the disassembly of NTDLL shows, and what API tracing shows in comment 6, I think we could simply replace the check by: PWCHAR sanitizedFilePath = (intptr_t(filePath) & 1) ? nullptr : filePath;
(In reply to Masatoshi Kimura [:emk] from comment #8) > On Win32, any valid pointers cannot point a lower address than 65536 because > otherwise it is indistinguishable from atoms that can be any 16-bit values. This seems to match what lpMinimumApplicationAddress returns here. Nevertheless I think you want to add the & 1 check to be safe. (I don't think a wide string is expected to be allocated on an odd address, so that should be quite safe?)
Comment on attachment 8719728 [details] [diff] [review] Bump the valid pointer boundary value to 65536 Review of attachment 8719728 [details] [diff] [review]: ----------------------------------------------------------------- I agree with GCP on this one; it sounds like we should emulate the "real" LdrLoadDll as much as we can here. Furthermore, atoms are Win32 constructs that have no meaning at the NT level, so I don't think that should be our criteria here.
Attachment #8719728 - Flags: review?(aklotz)
Another thing we should probably do is wrap that pointer dereference inside a SEH try block. I think we should still have the explicit checks, but then SEH can catch the rest (if anything).
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #12) > Furthermore, atoms are Win32 constructs > that have no meaning at the NT level, so I don't think that should be our > criteria here. Windows NT maps PAGE_NOACCESS page to the first page to catch null pointer access. And the address space allocation granularity is 64K. So the lowest valid address is 0x10000 anyway. It was the reason the default base address of PE EXE was 0x10000 before Win95. But OK, I'll change the test logic to check the odd address instead of checking <0x10000. (In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #13) > Another thing we should probably do is wrap that pointer dereference inside > a SEH try block. I think we should still have the explicit checks, but then > SEH can catch the rest (if anything). I'm not inclined to swallow exceptions especially inside sensitive locations such as ntdll function hooks. It might lead to more cryptic crash stack: https://blogs.msdn.microsoft.com/oldnewthing/20060927-07/?p=29563
(In reply to Masatoshi Kimura [:emk] from comment #14) > Windows NT maps PAGE_NOACCESS page to the first page to catch null pointer > access. And the address space allocation granularity is 64K. So the lowest > valid address is 0x10000 anyway. It was the reason the default base address > of PE EXE was 0x10000 before Win95. As said, I agree with this reasoning. lpMinimumApplicationAddress is 64k on any Windows I know of (though one can argue whether first arg to LdrLoadDll is guaranteed to be userspace address - I think it is in reality but it's undocumented territory because this function is called from LoadLibraryExW - in any case you provided additional reasons why <64k cannot be a valid pointer). But we need the extra & 1 check because otherwise we risk crashing if a future windows version passes 65537. I am fairly confident I read the code correctly as uneven having a special meaning and it not being a valid pointer - both due to the alignment issue and due to finding the same check in the VirtualBox source. I think in this code it might be better to have both checks? If we would trigger falsely the worst that happens is that some (unknown) DLL is blocked from being loaded inside the Firefox process. If we *miss* a case then it's 100% guaranteed a crash.
1) Check for <64k because those can't be valid pointers. 2) Uneven values are not treated as pointers by LdrLoadDLL so don't try to read them. 3) Not using try/except because of side effects.
Attachment #8720187 - Flags: review?(aklotz)
Assignee: VYV03354 → gpascutto
Comment on attachment 8720187 [details] [diff] [review] Additional checks for pointer validity in LdrLoadDLL detour Review of attachment 8720187 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/build/WindowsDllBlocklist.cpp @@ +486,5 @@ > // path name. For example, its numerical value can be 1. Passing a non-valid > // pointer to SearchPathW will cause a crash, so we need to check to see if we > // are handed a valid pointer, and otherwise just pass nullptr to SearchPathW. > + PWCHAR sanitizedFilePath = nullptr; > + if ((intptr_t(filePath) >= 65536) && ((intptr_t(filePath) & 1) == 0)) { Shouldn't it be uintptr_t? Otherwise this check will fail if a 32-bit OS is booted with /3GB switch or Firefox is running on WOW64 and filePath points a >2GB address. Yes, this is an existing bug, but let's fix it while we are here. I'm surprised this (wrong) check didn't cause serious problems.
(In reply to Masatoshi Kimura [:emk] from comment #17) > Shouldn't it be uintptr_t? Otherwise this check will fail if a 32-bit OS is > booted with /3GB switch or Firefox is running on WOW64 and filePath points a > >2GB address. > > Yes, this is an existing bug, but let's fix it while we are here. I'm > surprised this (wrong) check didn't cause serious problems. Great catch. I think it didn't happen because most of these checks are done near startup when we're unlikely to be using stuff near the end of the virtual address space.
Updated to uintptr_t.
Attachment #8719728 - Attachment is obsolete: true
Attachment #8720187 - Attachment is obsolete: true
Attachment #8720187 - Flags: review?(aklotz)
Attachment #8720705 - Flags: review?(aklotz)
Keywords: crash
Noting this is a topcrash in aurora. Tracking for 46 and 47.
Comment on attachment 8720705 [details] [diff] [review] extra-ptr-valid-check Review of attachment 8720705 [details] [diff] [review]: ----------------------------------------------------------------- OK. We'll see how this goes for now. Yes, in general I agree that IsBadXxxPtr or similar things are bad; in this case I was making an exception (pun intended) because we are implementing an API hook and are not 100% certain of the parameters meaning *for our own purposes*. If we caught an exception caused by our own pointer dereference it would not matter, since we're still passing the same parameter on to the real LdrLoadDll. Should LdrLoadDll throw on the parameter, the exception would still behave as expected. In other words, preserve the exception behavior of the real API, and not change it due to our own interposition.
Attachment #8720705 - Flags: review?(aklotz) → review+
Yeah, but we can do nothing about that if the invalid pointer blows a guard page. The behavior is changed, only less obvious.
https://hg.mozilla.org/integration/mozilla-inbound/rev/759262867c63204d7481f3a763f9efbb5cc18138 Bug 1247741 - Additional checks for pointer validity in LdrLoadDLL detour. r=aklotz
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Comment on attachment 8720705 [details] [diff] [review] extra-ptr-valid-check Approval Request Comment [Feature/regressing bug #]: Windows 8.1 or Windows 10 + e10s + random hardware/drivers [User impact if declined]: Some users will have an unusable Firefox if e10s gets enabled [Describe test coverage new/current, TreeHerder]: None, needs specific drivers to trigger [Risks and why]: Pretty low, tightening up some checks. [String/UUID change made/needed]: NA
Attachment #8720705 - Flags: approval-mozilla-aurora?
Comment on attachment 8720705 [details] [diff] [review] extra-ptr-valid-check Impressive fix for e10s top crash! please uplift to aurora
Attachment #8720705 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Going ahead to call this a release blocker to make sure we get the fix to stick. It would be great to also verify this if anyone has the hardware/drivers.
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: