<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 2

•

8 years ago

The signature has been there for a long time, but it jumps up for 50.0.2 and 50.1.b5. The crashes have suffix 0xfff6 in the address with reason EXCEPTION_ACCESS_VIOLATION_EXEC. The stack from bp-e388f1ec-db23-44e1-a360-a4f8a2161213: ntdll!KiFastSystemCallRet ntdll!ZwWaitForSingleObject+0xc KERNELBASE!WaitForSingleObjectEx+0x98 kernel32!WaitForSingleObjectExImplementation+0x75 kernel32!WaitForSingleObject+0x12 xul!google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread(struct _EXCEPTION_POINTERS * exinfo = 0x0652f6a4, struct MDRawAssertionInfo * assertion = 0x00000000)+0x59 xul!google_breakpad::ExceptionHandler::HandleException(struct _EXCEPTION_POINTERS * exinfo = 0x0652f6a4)+0x73 kernel32!UnhandledExceptionFilter+0x127 ntdll!__RtlUserThreadStart+0x62 ntdll!_EH4_CallFilterFunc+0x12 ntdll!_except_handler4+0x8e ntdll!ExecuteHandler2+0x26 ntdll!ExecuteHandler+0x24 ntdll!RtlDispatchException+0x127 ntdll!KiUserExceptionDispatcher+0xf 0x75c4fff6 0x5b001c kernel32!BaseThreadInitThunk+0xe ntdll!__RtlUserThreadStart+0x70 ntdll!_RtlUserThreadStart+0x1b The changes exist on both 50.0.2 [1] and 50.1.b5 [2] are: Andrew McCreight — Bug 1321066 - Explicitly guard against reentrance in nsSMILTimeContainer. r=dholbert a=dveditz Lee Salzman — Bug 1315848 - Skia clamped gradient fix r=mchang a=ritu Frederik Braun — Bug 1312272 - Test that marquee event handlers are subject to CSP. r=smaug, a=gchang Any ideas? [1] https://hg.mozilla.org/releases/mozilla-release/pushloghtml?fromchange=d936940d5476&tochange=cc272f7d48d3 [2] https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=1b954c82dd04&tochange=9afe68360fa8

Flags: needinfo?(lsalzman)

Flags: needinfo?(janus926)

Flags: needinfo?(continuation)

Comment 3

•

8 years ago

The crashes from bug 1321066 are release asserts, and look very distinct (inside SMIL code), so I don't think that could be the issue.

Flags: needinfo?(continuation)

Comment 4

•

8 years ago

Though we're crashing inside breakpad, so I guess it could be anything. Ted, do you have any idea what might be going wrong here?

Flags: needinfo?(ted)

Updated

•

8 years ago

Component: General → Breakpad Integration

Product: Core → Toolkit

Lee Salzman [:lsalzman]

Comment 5

•

8 years ago

I do not believe fix for bug 131584 is related to this.

Flags: needinfo?(lsalzman)

Comment 6

•

8 years ago

It doesn't seem a crash inside breakpad, but executing code in improper address from the thread function. The stack may be corrupted.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 7

•

8 years ago

Ah, sorry.

Component: Breakpad Integration → General

Flags: needinfo?(ted)

Product: Toolkit → Core

Updated

•

8 years ago

Comment 9

•

8 years ago

There is bug 1308418, bug 1308863, bug 1310600, bug 1292273 that all seem of interest here.

(not currently active) Ted Mielczarek

Comment 10

•

8 years ago

(In reply to Ting-Yu Chou [:ting] from comment #2) > The signature has been there for a long time, but it jumps up for 50.0.2 and > 50.1.b5. The crashes have suffix 0xfff6 in the address with reason > EXCEPTION_ACCESS_VIOLATION_EXEC. The stack from > bp-e388f1ec-db23-44e1-a360-a4f8a2161213: FYI--when you get a stack from a minidump you need to make sure you are looking at the stack starting at the *exception record*. In WinDbg this requires running the command `.ecxr`. I don't remember if there's any special action required in Visual C++. Otherwise you see the obvious answer that yes, we write minidumps from Breakpad code. :)

(not currently active) Ted Mielczarek

Comment 11

•

8 years ago

The crashes from comment 0 and comment 2 both have the same stack on the main thread: 0 ntdll.dll KiFastSystemCallRet 1 user32.dll NtUserMessageCall 2 user32.dll ImeWndCreateHandler 3 user32.dll SendMessageW 4 xul.dll MessageWindow::SendRequest() toolkit/xre/nsNativeAppSupportWin.cpp:591 5 xul.dll nsNativeAppSupportWin::Start(bool*) toolkit/xre/nsNativeAppSupportWin.cpp:682 6 xul.dll XREMain::XRE_mainStartup(bool*) toolkit/xre/nsAppRunner.cpp:3745 7 xul.dll XREMain::XRE_main(int, char** const, nsXREAppData const*) toolkit/xre/nsAppRunner.cpp:4409 8 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp:4515 Frame 4 is https://hg.mozilla.org/releases/mozilla-release/annotate/cc272f7d48d3/toolkit/xre/nsNativeAppSupportWin.cpp#l591 The Firefox process is trying to remote to an existing Firefox process.

Comment 12

•

8 years ago

For some reasons this happens only on Windows 7.

OS: Windows → Windows 7

Kan-Ru Chen [:kanru] (UTC+9)

Comment 13

•

8 years ago

Based on the search results [1], the number of crashes in the past 6 months is 9685, only 59 (<1%) of them happened before 2016-12-02. And there were 1112 crashes at 2016-12-03 [2], the versions include 1x to 5x, maybe this is something related to the platform? I'll get a Windows 7 to reproduce. [1] https://crash-stats.mozilla.com/signature/?product=Firefox&address=%24fff6&signature=BaseThreadInitThunk&date=%3E%3D2016-06-20T08%3A14%3A00.000Z&date=%3C2016-12-20T08%3A14%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#graphs [2] https://crash-stats.mozilla.com/signature/?product=Firefox&address=%24fff6&signature=BaseThreadInitThunk&date=%3E%3D2016-12-03T00%3A00%3A00.000Z&date=%3C2016-12-04T00%3A00%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#aggregations

Updated

•

8 years ago

Crash Signature: [@ BaseThreadInitThunk] → [@ BaseThreadInitThunk] [@ igd11dxva32.dll | BaseThreadInitThunk]

Patrick McManus [:mcmanus]

Comment 14

•

8 years ago

Note: every crash I've looked at (51.0b9) has thread 0 sitting in mozilla::net::nsSocketTransportService::GetThreadSafely() or some related STS code. Checking what network code changed around then (or maybe xpcom thread code?) would probably help point out the regression source. -> Networking, NI mcmanus

Component: General → Networking

Flags: needinfo?(mcmanus)

Comment 15

•

8 years ago

nothing comes to mind.. but honestly that's far enough back that if it wasn't a major change git is a better resource than my memory.

Flags: needinfo?(mcmanus)

Dragana Damjanovic [:dragana]

Comment 16

•

8 years ago

I think it is not related, but in bug 1323084 we have some shutdown hangs in Thread shutdown code. The starting date is different, I would say not related.

Comment 17

•

8 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #11) > The Firefox process is trying to remote to an existing Firefox process. It doesn't seem related, bp-4a29ceec-2bbe-41b3-9423-9e31b2161210 is a similar crash but the main thread has different stack, and it crashed right after installation. From the stack content of the crashing thread, it seems the thread function pointer for _RtlUserThreadStart is invalid.

Michal Novotny [:michal]

Updated

•

8 years ago

Component: Networking → General

Reporter

Comment 18

•

8 years ago

this startup crash is still somewhat on the rise in the last couple of days: http://bit.ly/2hHKbhu maybe the following crash comment provides some insights of what's going on: "how do i fix this because ever since i downloaded this thing it installed virus (nova rambler) and then it messed up all my browsers so i removed parts of it but it still messes some of my browsers except opera" https://crash-stats.mozilla.com/report/index/898bff61-cb26-4aab-a195-a5dab2170101

Comment 19

•

8 years ago

I noticed two comments: bp-688468cf-9d28-47fb-956e-dfe442170108: I installer some download manager. Mozilla start crashing after installing the software. I uninstalled the software but it keeps crashing my browser bp-1adfb644-b551-4921-b55f-5443e2170103: I DON'T EVEN KNOW WHAT HAPPENED. I JUST DOWNLOADED IDM AND THIS HAPPENED So I installed IDM (Internet Download Manager v627build2) on Windows7, and then it crashes every time I open a page. The crash goes away after uninstalling it and it doesn't crash in safe mode, so should be related to the addon. However the crash stack/reason don't match to what were reported, see bp-aaacb7a5-230e-45dc-977f-44edd2170109.

Comment 20

•

8 years ago

If you restrict the search to crashes that have the addons field defined (https://crash-stats.mozilla.com/search/?signature=%3DBaseThreadInitThunk&addons=%21__null__&product=Firefox&version=51.0b&_sort=-date&_facets=signature&_facets=addons&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-addons), there are a few malware addons, e.g.: 12 {de71f09a-3342-48c5-95c1-4b0f17567554} 74 13.86 % 13 ar1er-ewrgfdgomusix@jetpack 56 10.49 % https://dxr.mozilla.org/addons/source/addons/671633 https://dxr.mozilla.org/addons/source/addons/707711 Andy, could you look at the other addons in that list and see if they are malware? (In reply to Ting-Yu Chou [:ting] from comment #19) > So I installed IDM (Internet Download Manager v627build2) on Windows7, and > then it crashes every time I open a page. The crash goes away after > uninstalling it and it doesn't crash in safe mode, so should be related to > the addon. > > However the crash stack/reason don't match to what were reported, see > bp-aaacb7a5-230e-45dc-977f-44edd2170109. Perhaps open a new bug for this crash and CC charles@tonec.com (working at Tonec, which builds Internet Download Manager).

Flags: needinfo?(amckay)

Jorge Villalobos [:jorgev] (he/him)

Updated

•

8 years ago

Depends on: 1329654

Andy McKay

Comment 21

•

8 years ago

Re-directing to the review team who are the people who implement policy around malware.

Flags: needinfo?(amckay) → needinfo?(jorge)

Comment 22

•

8 years ago

Covered in bug 1329654.

Flags: needinfo?(jorge)

Comment 23

•

8 years ago

Is it possible that the thread was created from a module which have been unloaded before the thread runs? Ted, will there be any concerns if we include (MiniDumpWithUnloadedModules | MiniDumpWithProcessThreadData) in the minidump?

Flags: needinfo?(ted)

Comment 24

•

8 years ago

The crash has been spiking even more since 13 Jan. Some users are mentioning IDM.

Comment 25

•

8 years ago

https://crash-stats.mozilla.com/search/?signature=%3DBaseThreadInitThunk&user_comments=~download&user_comments=~idm&user_comments=~virus&user_comments=~malware&product=Firefox&date=%3E%3D2016-10-16T14%3A50%3A27.000Z&date=%3C2017-01-16T14%3A50%3A27.000Z&_sort=-date&_facets=signature&_facets=user_comments&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-user_comments A lot of users are mentioning IDM, or viruses. Some users are saying that the crash started after a virus scan found and removed something. Is it possible some antiviruses are removing files from our installation directory?

Deepak

Comment 26

•

8 years ago

Marco/Jorge, I left my comment on bug - https://bugzilla.mozilla.org/show_bug.cgi?id=1329654 to get more information for addon ID {de71f09a-3342-48c5-95c1-4b0f17567554} I am developer of the addon and trying to get information on what issues are you guys noticing so that we can get it fixed. The extensions uses approved apis and we present complete disclosure to users while extension is installed. In our testing we find no issues with the extension on all versions on Windows 7 and above.

Flags: needinfo?(mcastelluccio)

Flags: needinfo?(jorge)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 27

•

8 years ago

Replied in bug 1329654.

Flags: needinfo?(mcastelluccio)

Flags: needinfo?(jorge)

Comment 28

•

8 years ago

There are also shutdown crashes that present pretty much exactly like the startup ones (e.g., https://crash-stats.mozilla.com/report/index/d29c784e-3824-45d0-b1c1-1ed6a2170117) perhaps those, if related, could be used for metadata/correlations?

Milan Sreckovic [:milan] (needinfo for best results)

Comment 29

•

8 years ago

When this blew up in 2011 (bug 601587) it came and went without us figuring out what's going on, except perhaps for bug 601587 comment 21, a correlation with "windows search" - http://systemexplorer.net/file-database/file/msshsq-dll/5844765

Comment 30

•

8 years ago

(In reply to Milan Sreckovic [:milan] from comment #28) > There are also shutdown crashes that present pretty much exactly like the > startup ones (e.g., > https://crash-stats.mozilla.com/report/index/d29c784e-3824-45d0-b1c1- > 1ed6a2170117) perhaps those, if related, could be used for > metadata/correlations? Thread 2 may be interesting; it's about the only thread not sitting in an NtWaitFor.... or equivalent: nsLayoutStylesheetCache::HTMLSheet() / FinalExceptionHandlerPad2 / wcstombs / Win32FileOutputProvider::Seek / WriteFile / NtWriteFile / NtgetcontextThread (There's also one thread with no stack at all, but this one is interesting) I don't really believe the HTMLSheet() frame, but the rest seems consistent

Milan Sreckovic [:milan] (needinfo for best results)

Comment 31

•

8 years ago

I'd love it if we figured this out but I don't think it should block 51 release at this point - since it's already a bad issue in 50. It seems a little odd that the signature doesn't appear in 52/53.

status-firefox50: affected → wontfix

tracking-firefox50: ? → -

tracking-firefox52: --- → ?

Comment 32

•

8 years ago

A number of these are safe mode crashes (https://crash-stats.mozilla.com/report/index/34892fad-9bbb-4859-9a9e-cb5f82170119)

(not currently active) Ted Mielczarek

Comment 33

•

8 years ago

(In reply to Ting-Yu Chou [:ting] from comment #23) > Ted, will there be any concerns if we include (MiniDumpWithUnloadedModules | > MiniDumpWithProcessThreadData) in the minidump? I don't think that should cause any problems. You'll have to get that information out using a debugger, but that should be OK.

Flags: needinfo?(ted)

Comment 34

•

8 years ago

(In reply to Ting-Yu Chou [:ting] from comment #23) > Is it possible that the thread was created from a module which have been > unloaded before the thread runs? Or it could be something like CreateRemoteThread.

Updated

•

8 years ago

Comment 35

•

8 years ago

This seems to be the #1 browser crash in 52.0b

status-firefox52: ? → affected

tracking-firefox52: ? → +

Comment 36

•

8 years ago

It's the top crash for 51.0.1 still. I'm blocking updates (excluding in Play Store) for a few more of the devices most severely affected by the startup crash: Dell Venue 8 yellowtail ZTE V975– redhookbay acer B1-730HD vespa AcerA1-830– ducati AsusTransformer Pad (TF103CG)– K018 With these blocked, I am going to go ahead and enable updates for 51.0 Fennec.

Comment 37

•

8 years ago

Oops, I just commented in the wrong bug ! Should be in bug 1318667.

Comment 38

•

8 years ago

I've contacted one of the users who experienced the crash. He said he downloaded "Robosizer" from http://www.robosizer.com/download.html, which redirected to http://www.snapfiles.com/get/robosizer.html, which redirected to http://www.snapfiles.com/php/sfdwnld.php?id=112684&a=7144899&loc=2. However, he was able to simply remove it and, after that, Firefox started working again.

Comment 39

•

8 years ago

He also said: "I wonder if Robosizer works in Windows 10 machines. I have another desktop pc running Win7, and Robosizer works as it should, without causing problems.".

Reporter

Comment 40

•

8 years ago

Attached file RoboSizer_Setup.zip — Details

i can corroborate the observations of the user - installing the robosizer tool (see .zip) on win10 caused crashing or silent failures when trying to launch firefox or thunderbird, whereas on windows 7 it caused no apparent issues.

Reporter

Updated

•

8 years ago

Comment 41

•

8 years ago

Another user replied, he downloaded a file from http://sharkdownloads.com/compressdecompress/winrar-5-31-final-crack-full/. He had to remove the Firefox installation directory and install Firefox again to get it working again.

Comment 42

•

8 years ago

Attached file RunBoosterSetup64_3231.7z — Details

This is the only file I could download from that page (from one of the comments). I haven't tested it yet.

Reporter

Comment 43

•

8 years ago

i wasn't able to reproduce crashes by anything that i could download from that page in comment #41 :-(

Updated

•

8 years ago

status-firefox51: affected → wontfix

status-firefox54: --- → ?

Comment 44

•

8 years ago

Robosizer has NtCreateThreadEx() called with invalid function pointer with stack: : kd> !wow64exts.k Walking Native Stack... # Child-SP RetAddr Call Site 00 ffff8701`eceb32e8 fffff801`f90c2fab nt!PspCreateThread 01 ffff8701`eceb32f0 fffff801`f8d5e393 nt!NtCreateThreadEx+0x1ef 02 ffff8701`eceb3a90 00007fff`76e17784 nt!KiSystemServiceCopyEnd+0x13 03 00000000`03a9e848 00000000`50061790 ntdll!NtCreateThreadEx+0x14 04 00000000`03a9e850 00000000`500610de wow64!Wow64NtCreateThreadEx+0xc8 05 00000000`03a9e8d0 00000000`50056ea5 wow64!whNtCreateThreadEx+0x41e 06 00000000`03a9eab0 00000000`50131cf7 wow64!Wow64SystemServiceEx+0x155 07 00000000`03a9f370 00000000`5006bfa1 wow64cpu!ServiceNoTurbo+0xb 08 00000000`03a9f420 00000000`5005cbb0 wow64!RunCpuSimulation+0xf311 09 00000000`03a9f4a0 00007fff`76dea10d wow64!Wow64LdrpInitialize+0x120 0a 00000000`03a9f750 00007fff`76de9fae ntdll!_LdrpInitialize+0x109 0b 00000000`03a9f7d0 00000000`00000000 ntdll!LdrInitializeThunk+0xe Walking Guest (WoW) Stack... # ChildEBP RetAddr 00 03b9f134 10008c3d ntdll_776c0000!NtCreateThreadEx+0xc WARNING: Stack unwind information not available. Following frames may be wrong. 01 03b9f82c 76f49fb3 EasyHook32!RhIsX64Process+0x23d 02 (Inline) -------- combase!CMessageCall::CleanupMonitorLock+0x5 [d:\rs1\onecore\com\combase\dcomrem\call.cxx @ 578] 03 03b9fc60 00000000 combase!CMessageCall::~CMessageCall+0x43 [d:\rs1\onecore\com\combase\dcomrem\call.cxx @ 522] I am not sure how does Robosizer use EasyHook, but after I replace "C:\Program Files (x86)\RoboSizer\EasyHook32.dll" with the latest version (2.7.6035.0), Firefox *does not* crash.

Comment 45

•

8 years ago

(In reply to Ting-Yu Chou [:ting] from comment #44) > Robosizer has NtCreateThreadEx() called with invalid function pointer with Actually it's not an invalid function pointer for NtCreateThreadEx(), I just sorted things out for what happen to Robosizer, and will update the details to bug 1335080 because I don't know how many of this crash come from Robosizer.

Comment 46

•

8 years ago

I just realized the crashes are happened with cpu arch x86, sadly we don't have one in Taipei office.

Reporter

Comment 47

•

8 years ago

cpu_arch x86 in socorro only means that it is a 32bit build of firefox that is crashing afaik...

Comment 48

•

8 years ago

Thanks for the information, then I'd like to confirm the meaning of the "os_arch" in correlation tab: (96.27% in signature vs 35.68% overall) os_arch = x86 [99.42% vs 54.79% if startup_crash = 1] Is the x86 here means the OS is a 32bit one? I know it's a silly question, but I want to make sure I understand it correctly.

Comment 49

•

8 years ago

(In reply to Ting-Yu Chou [:ting] from comment #48) > Thanks for the information, then I'd like to confirm the meaning of the > "os_arch" in correlation tab: > > (96.27% in signature vs 35.68% overall) os_arch = x86 [99.42% vs 54.79% if > startup_crash = 1] > > Is the x86 here means the OS is a 32bit one? I know it's a silly question, > but I want to make sure I understand it correctly. Not a silly question, the wording is confusing and could be improved! os_arch is the architecture of the OS (detected from the amount of virtual memory available), cpu_arch is the arch for which Firefox was built. So yes, here the OS is almost always 32 bit.

Comment 50

•

8 years ago

Great, I'll reinstall the Win7 to 32bit and give it another try to reproduce the crash.

Comment 51

•

8 years ago

This is now top #3 at release channel (51.0.1).

Keywords: topcrash

Updated

•

8 years ago

Comment 52

•

8 years ago

(In reply to Ting-Yu Chou [:ting] from comment #50) > Great, I'll reinstall the Win7 to 32bit and give it another try to reproduce > the crash. Still unable to reproduce, tried also IDB but didn't crash. :(

Comment 53

•

8 years ago

(In reply to Ting-Yu Chou [:ting] from comment #52) > Still unable to reproduce, tried also IDB but didn't crash. :( Correction: IDM

Comment 54

•

8 years ago

An interesting comment from bp-002a95d8-952a-47ba-868d-ec1912170217: It seems that installing RemoveWAT causes Firefox to become unusable. From console output, I'm suspicious that it modified some files. I'll see if I can reproduce.

Comment 55

•

8 years ago

I installed RemoveWAT 2.2.7 from https://www.sendspace.com/file/42onal, which installed a bunch of malwares. Firefox now keeps opening ad pages every few seconds but it doesn't crash every time I launch it, though when it crashes the signature is matched and the address is ended with "fff6" which takes 50% of the crashes in the past week. I am writing to the user to get the RemoveWAT he/she installed.

Comment 56

•

8 years ago

The user doesn't keep the RemoveWAT he installed. However, from the binary I have, the injected code for the thread is doing: 00020000 55 push ebp 00020001 8bec mov ebp,esp 00020003 51 push ecx 00020004 53 push ebx 00020005 56 push esi 00020006 57 push edi 00020007 8b7d08 mov edi,dword ptr [ebp+8] 0002000a 8d4720 lea eax,[edi+20h] 0002000d 50 push eax 0002000e 57 push edi 0002000f ff97c8010000 call dword ptr [edi+1C8h] // [1] 00020015 50 push eax 00020016 ff97cc010000 call dword ptr [edi+1CCh] // [2] CRASH! The first call [1] goes to LoadLibraryA() for kernel32.dll, and the second call [2] goes to apphelp!StubGetProcAddress() which is invalid for Firefox process as apphelp.dll is not loaded. Robosizer (bug 1335080) is similiar, just it calls AcLayers!NS_Armadillo::APIHook_GetProcAddress(). This is the reason for the crashes with address suffix "fff6". But I don't know why the number went high after the specific date, maybe a new malware(?) was born that day...

Comment 57

•

8 years ago

What if we load apphelp.dll earlier by ourselves?

Comment 58

•

8 years ago

What do you think if we add something like: https://hg.mozilla.org/mozilla-central/annotate/e1135c6fdc9bcd80d38f7285b269e030716dcb72/toolkit/mozapps/update/updater/loaddlls.cpp#l11 to preload apphelp.dll in firefox.exe? Or do you have some other ideas to address comment 56?

Flags: needinfo?(benjamin)

Comment 59

•

8 years ago

I don't think I'm the right person to be asking. I haven't done any debugging of this crash. But updater.exe isn't what's crashing here, so I confident that you don't want to be modifying the updater DLL loading mechanism. What is apphelp? Why would we want it loaded at all?

Flags: needinfo?(benjamin)

Comment 60

•

8 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #59) > debugging of this crash. But updater.exe isn't what's crashing here, so I > confident that you don't want to be modifying the updater DLL loading > mechanism. What I am thinking is a similar DLL loading mechanism for firefox.exe. > What is apphelp? Why would we want it loaded at all? Honestly I don't know what is apphelp.dll, there's not much information about it, the answer from google is: it's a part of Application Compatibility Client Library from Microsoft. This issue is about dll injection, which a 3rd party software injects a piece of code into firefox process, and the code calls the function pointers of LoadLibrary() and GetProcAddress() that it retrieved earlier in its process. However, somehow apphelp.dll or AcLayers.dll was loaded in its process, so the function pointer of GetProcAddress() is actually apphelp!StubGetProcAddress() or AcLayers!NS_Armadillo::APIHook_GetProcAddress() which is invalid for firefox process, because they haven't been loaded when the code runs. Do you know anyone who is familiar with this that I can talk to?

Comment 61

•

8 years ago

If people are dynamically adding broken code to our process, I don't think we should try to fix that by preloading DLLs that we don't need ourselves. Each DLL load costs startup I/O. If we can't resolve this with DLL blocklisting, I recommend WONTFIX.

Comment 62

•

8 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #61) > If people are dynamically adding broken code to our process, I don't think > we should try to fix that by preloading DLLs that we don't need ourselves. > Each DLL load costs startup I/O. > > If we can't resolve this with DLL blocklisting, I recommend WONTFIX. As of today this is still top #3 crash (note it is a startup one) in release channel, shouldn't we try to prevent us from being crashed by this kind of code? User probably wouldn't know which software breaks firefox, and after trying reinstall, they may turn to the other browsers and blame us. We can measure the cost of preloading DLLs and then decide whether it's worth or not. Or we can prohibit the other processes to set any piece of memory in firefox process executable (I don't know if it's doable). Either way, I think we should do something. :njn, what do you think? :dmajor, :aklotz, do you have any other ideas how we could deal with this issue (see comment 60 for the brief description)?

Flags: needinfo?(n.nethercote)

Flags: needinfo?(dmajor)

Flags: needinfo?(aklotz)

Comment 63

•

8 years ago

(In reply to Ting-Yu Chou [:ting] from comment #56) > 0002000f ff97c8010000 call dword ptr [edi+1C8h] // [1] > 00020015 50 push eax > 00020016 ff97cc010000 call dword ptr [edi+1CCh] // [2] CRASH! > > The first call [1] goes to LoadLibraryA() for kernel32.dll, and the second > call [2] goes to apphelp!StubGetProcAddress() which is invalid for Firefox > process as apphelp.dll is not loaded. If apphelp is not loaded, how do you know that this call is going to apphelp!<something> ?

Comment 64

•

8 years ago

I think the fix alluded to by bug 1335080 comment 2 is: https://easyhook.codeplex.com/SourceControl/changeset/74620 So the process that's doing the injection is taking its own function pointer for GetProcAddress and expecting the target's (Firefox's) address space to have the same function at the same address? And things are going wrong because one half of this pair is using apphelp and the other isn't? I really don't think we should be loading up apphelp just to let some code use a dodgy injection technique on us. I'll let aklotz chime in if he has any techniques to block this app, otherwise we should look for a non-code solution (contact AVs, support pages, etc).

Flags: needinfo?(dmajor)

Comment 65

•

8 years ago

(In reply to David Major [:dmajor] from comment #63) > If apphelp is not loaded, how do you know that this call is going to > apphelp!<something> ? I switched (.process) to the suspicious process, and used "ln <function pointer>".

Comment 66

•

8 years ago

> Honestly I don't know what is apphelp.dll, there's not much information > about it, the answer from google is: it's a part of Application > Compatibility Client Library from Microsoft. Apphelp (and AcLayers) are Windows code that provides backwards-compatibility fixes by shimming APIs, e.g. making GetWindowsVersion lie to keep some old program happy. I don't remember the details but I've seen this stuff used for questionable purposes in the past, since it's a way to introduce code into a process well before main().

Nicholas Nethercote [inactive]

Comment 67

•

8 years ago

While I don't want to enable evil code injection, startup crashes also mean dead-to-users installs. Perhaps if we detect a startup crash in a previous run we can start in maximum-chance-of-starting mode (safe mode), and if *that* crashes, try enabling apphelp (and once it comes up, giving the user a warning that there may be malware affecting their Firefox installation and provide a link to more info). We can discuss if we want to allow browsing or just tell the user and exit.

Comment 68

•

8 years ago

I'm in favour of any and all reasonable techniques that block third party code injection. I don't have much to add beyond that, sorry.

Flags: needinfo?(n.nethercote)

Wayne Mery (:wsmwk)

Updated

•

8 years ago

Whiteboard: [tbird crash]

Comment 69

•

8 years ago

One of the action items for the InjectEject project is to intercept BaseThreadInitThunk and examine the thread start address for known malicious entry points (such as LoadLibrary). A simple mitigation for this could be to add checks inside that hook to ensure that BaseThreadInitThunk's thread start address points to valid executable memory. If we wanted to be even more aggressive we could ensure that said executable memory was backed by an image. But this hook doesn't exist yet! :-)

Flags: needinfo?(aklotz)

Comment 70

•

8 years ago

(In reply to Aaron Klotz [:aklotz] from comment #69) > One of the action items for the InjectEject project is to intercept > BaseThreadInitThunk and examine the thread start address for known malicious > entry points (such as LoadLibrary). > > A simple mitigation for this could be to add checks inside that hook to > ensure that BaseThreadInitThunk's thread start address points to valid > executable memory. If we wanted to be even more aggressive we could ensure > that said executable memory was backed by an image. > > But this hook doesn't exist yet! :-) Is there a bug tracking the addition of this hook?

Comment 71

•

8 years ago

Attached patch Hook BaseThreadInitThunk and verify thread start address (obsolete) — Details — Splinter Review

Ah heck, I can do this right now. This patch adds some code to the DLL blocklist (for the lack of a better location) that hooks kernel32!BaseThreadInitThunk and verifies that the thread start address belongs to VM that is committed, execute-read, and backed by an image. If that test fails, the thread exits.

Attachment #8850169 - Flags: review?(dmajor)

Comment 72

•

8 years ago

Comment on attachment 8850169 [details] [diff] [review] Hook BaseThreadInitThunk and verify thread start address Ah crap, this needs 64-bit interceptor changes.

Attachment #8850169 - Flags: review?(dmajor)

Comment 73

•

8 years ago

[Tracking Requested - why for this release]: Top startup crash in 52, even worse than in 51.

status-firefox53: ? → affected

status-firefox54: ? → ---

tracking-firefox53: --- → ?

Comment 74

•

8 years ago

Attached patch Add new x64 opcodes to interceptor (obsolete) — Details — Splinter Review

This adds support for: test r/m32, r32 jne rel8 and expands x64 support by also accepting REX.WRB and REX.WB

Assignee: nobody → aklotz

Attachment #8850209 - Flags: review?(m_kato)

Comment 75

•

8 years ago

Comment on attachment 8850169 [details] [diff] [review] Hook BaseThreadInitThunk and verify thread start address OK, now this patch works.

Attachment #8850169 - Flags: review?(dmajor)

Comment 76

•

8 years ago

Comment on attachment 8850169 [details] [diff] [review] Hook BaseThreadInitThunk and verify thread start address Review of attachment 8850169 [details] [diff] [review]: ----------------------------------------------------------------- I am concerned about this code not providing any feedback if things go wrong, for example: - Someone already hooked BaseThreadInitThunk in a way that our disassembler no longer understands - Silently blocking some "good" threads that fail our requirements - Some future version of Windows adds a new protection flag that makes us block the world We may want to add some logging about what we blocked or why we failed. +bsmedberg in case he has other concerns in this regard. ::: mozglue/build/WindowsDllBlocklist.cpp @@ +277,5 @@ > } > > namespace { > > +typedef void (__fastcall* BaseThreadInitThunk_func)(BOOL aIsInitialThread, void* aStartAddress, void* aThreadParam); Please double-check that this signature looks the same on all platforms (including XP if this is going to be uplifted). @@ +700,5 @@ > > return stub_LdrLoadDll(filePath, flags, moduleFileName, handle); > } > > +static bool This could use a comment explaining why the lack of these conditions means we should block the thread. @@ +701,5 @@ > return stub_LdrLoadDll(filePath, flags, moduleFileName, handle); > } > > +static bool > +ShouldBlockThread(void* aStartAddress, void* aThreadParam) Nit: aThreadParam unused @@ +707,5 @@ > + bool shouldBlock = false; > + MEMORY_BASIC_INFORMATION startAddressInfo; > + if (VirtualQuery(aStartAddress, &startAddressInfo, sizeof(startAddressInfo))) { > + shouldBlock |= startAddressInfo.State != MEM_COMMIT; > + shouldBlock |= startAddressInfo.Protect != PAGE_EXECUTE_READ; Just want to confirm: are these really the only acceptable values? Is there any valid scenario in which code might be demand-committed or RWX, for example?

Attachment #8850169 - Flags: feedback?(benjamin)

Comment 77

•

8 years ago

It would be so amazing to be able to prevent this! Tracking, in a hopeful way, for 53 onwards.

status-firefox54: --- → affected

status-firefox55: --- → affected

tracking-firefox53: ? → +

tracking-firefox54: --- → +

tracking-firefox55: --- → +

Comment 78

•

8 years ago

(In reply to David Major [:dmajor] from comment #76) > Comment on attachment 8850169 [details] [diff] [review] > Hook BaseThreadInitThunk and verify thread start address > > Review of attachment 8850169 [details] [diff] [review]: > ----------------------------------------------------------------- > > I am concerned about this code not providing any feedback if things go > wrong, for example: > - Someone already hooked BaseThreadInitThunk in a way that our disassembler > no longer understands In that case the original function remains untouched, which makes us no worse off than we are now. > - Some future version of Windows adds a new protection flag that makes us > block the world I think that's a risk we take any time we do something like this. It's not much different from LdrLoadDll in that regard. > > We may want to add some logging about what we blocked or why we failed. This is fair. I suppose we could somehow record the actual protection attributes of any blocked start addresses. > +bsmedberg in case he has other concerns in this regard. > > ::: mozglue/build/WindowsDllBlocklist.cpp > @@ +277,5 @@ > > } > > > > namespace { > > > > +typedef void (__fastcall* BaseThreadInitThunk_func)(BOOL aIsInitialThread, void* aStartAddress, void* aThreadParam); > > Please double-check that this signature looks the same on all platforms > (including XP if this is going to be uplifted). It is identical from Windows 7 through to Windows 10 Anniversary Update on both 32-bit and 64-bit implementations. > > @@ +700,5 @@ > > > > return stub_LdrLoadDll(filePath, flags, moduleFileName, handle); > > } > > > > +static bool > > This could use a comment explaining why the lack of these conditions means > we should block the thread. Sure. > > @@ +701,5 @@ > > return stub_LdrLoadDll(filePath, flags, moduleFileName, handle); > > } > > > > +static bool > > +ShouldBlockThread(void* aStartAddress, void* aThreadParam) > > Nit: aThreadParam unused Ah, yeah, I plan to use that in the future but it isn't needed yet. I'll remove for now. > > @@ +707,5 @@ > > + bool shouldBlock = false; > > + MEMORY_BASIC_INFORMATION startAddressInfo; > > + if (VirtualQuery(aStartAddress, &startAddressInfo, sizeof(startAddressInfo))) { > > + shouldBlock |= startAddressInfo.State != MEM_COMMIT; > > + shouldBlock |= startAddressInfo.Protect != PAGE_EXECUTE_READ; > > Just want to confirm: are these really the only acceptable values? > Is there any valid scenario in which code might be demand-committed or RWX, > for example? At least part of this is a policy decision. I don't really think that we should be allowing RWX, though perhaps we should punt on that for the purposes of this bug, just check for X, and worry about the rest as part of InjectEject efforts. If we added recording of thread blocks, we could follow up on this. I would tend to think that any legitimate binaries would not be using RWX, but I suppose that isn't an assumption we can make. As for the demand-commit thing, here is my thought process: We know we're dealing with the start address of the thread. Let us hypothetically consider the possibility that its page is intentionally inaccessible. BaseThreadInitThunk invokes the thread function, an AV exception is raised (as is the case with whatever is causing this crash right now). Where can user code catch that AV exception in order to perform the delayed commit? The only thing I can think of that could make this happen is if somebody were to override our unhandled exception filter, which our breakpad integration code actively tries to prevent.

Comment 79

•

8 years ago

> As for the demand-commit thing, here is my thought process: We know we're > dealing with the start address of the thread. Let us hypothetically consider > the possibility that its page is intentionally inaccessible. > BaseThreadInitThunk invokes the thread function, an AV exception is raised > (as is the case with whatever is causing this crash right now). Where can > user code catch that AV exception in order to perform the delayed commit? In a VectoredExceptionHandler. The Win64 ASan runtime does this for shadow-memory pages. I admit though that taking this approach for actual executable code does seem a bit farfetched.

Comment 80

•

8 years ago

> > - Someone already hooked BaseThreadInitThunk in a way that our disassembler > > no longer understands > > In that case the original function remains untouched, which makes us no > worse off than we are now. Let's say we ship this and it doesn't fix the current crash. True, we're no worse off, but we still have a high-impact crash on our hands. It would be useful to get some diagnostics to inform a subsequent attempt at a fix.

Comment 81

•

8 years ago

Not all code injection cause crash like this, shouldn't we leave user the choice (they may be aware of it) whether to run it or not? If our goal is to get more users on firefox...

Makoto Kato [:m_kato]

Updated

•

8 years ago

Attachment #8850209 - Flags: review?(m_kato) → review+

Comment 82

•

8 years ago

Should this land on m-c? Do you want to land it behind a pref and run some kind of experiment? If it needs a policy decision, who should make that?

Flags: needinfo?(aklotz)

Comment 83

•

8 years ago

Comment on attachment 8850169 [details] [diff] [review] Hook BaseThreadInitThunk and verify thread start address I feel like the MEM_IMAGE check could be pretty risky, and I think it's worth putting some numbers on this before rolling it out. In particular, here are the things we don't know: * how many people could be affected by this today * whether we'd accidentally be breaking something real * whether this would turn crashes into hangs Elaborating on this last bit: a common thread launch mechanism is to launch a thread and then wait on an event for the thread (immediately or eventually). If the thread never runs, that wait is going to end up hanging. And in general I'd say a hang is worse than a crash. Also I'm a little worried that this code will break crashfirefox.exe. In terms of risk, can we measure this before we start blocking things?

Attachment #8850169 - Flags: feedback?(benjamin) → feedback-

Comment 84

•

8 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #83) > Comment on attachment 8850169 [details] [diff] [review] > Hook BaseThreadInitThunk and verify thread start address > Elaborating on this last bit: a common thread launch mechanism is to launch > a thread and then wait on an event for the thread (immediately or > eventually). If the thread never runs, that wait is going to end up hanging. > And in general I'd say a hang is worse than a crash. > From the perspective of the kernel, the thread *does* run, it's just that the user's thread proc does not. When we "block" the thread from running, we are already running as that new thread, so we just call ExitThread on ourselves, thus "completing" the thread's execution as far as any kernel handles are concerned. > Also I'm a little worried that this code will break crashfirefox.exe. We can exempt null pointers. > > In terms of risk, can we measure this before we start blocking things? Sure. But I'm going to turn this over to ccorcoran go from here.

Assignee: aklotz → ccorcoran

Flags: needinfo?(aklotz)

Comment 85

•

8 years ago

FWIW, if we can land this without MEM_IMAGE now (very low risk, should fix the crash at hand?) and then deal with the MEM_IMAGE measurement and landing later, that would probably help keep this moving.

Comment hidden (mozreview-request)

Assignee

Comment 89

•

8 years ago

Ok my first checkin; seems the review requests went overkill. Let me know if I need to clean anything up. Basically the last commit contains my changes: - NULL thread procs are allowed (and will thus crash). This allows crashfirefox.exe to work. Send ((LPTHREAD_START_ROUTINE)1) to silently block the thread. - Removed the check for MEM_IMAGE. agree this would be safer - I'm afraid of calling ExitThread() from BaseThreadInitThunk; if we want NOP behavior, I think it's slightly safer to use an empty thread proc. Just in case the original BaseThreadInitThunk does magic that's needed. > In terms of risk, can we measure this before we start blocking things? Shall I create another bug to add measurement? Seems like a good idea to annotate crash reports with this. Ting: > Not all code injection cause crash like this, shouldn't we leave user the choice (they may be aware of it) whether to run it or not? If our goal is to get more users on firefox... IMO this is a strange option to give to the user. By the time we give the user the choice, we know for certain that clicking "Continue" would crash.

Flags: needinfo?(benjamin)

Comment 90

•

8 years ago

mozreview-review

Comment on attachment 8852415 [details] Bug 1322554: don't block threads based on MEM_IMAGE; safer exiting thread; don't block null threadproc to allow crashfirefox.exe; https://reviewboard.mozilla.org/r/124660/#review127190 Thanks! This is looking good, but I don't think it's necessary to land separately. Splitting up patches is normally encouraged, but in this specific case, where the first patch is not quite right and the second one goes back and corrects it, I think it's better to just land the right thing the first time. (If MozReview is being uncooperative because you took this over from Aaron, it might be easier to cancel the first review and start a fresh one.) Good idea about NopThreadProc; I too was a bit worried about the ExitThread. A few minor nit-picks below: ::: mozglue/build/WindowsDllBlocklist.cpp:711 (Diff revision 1) > } > > static bool > -ShouldBlockThread(void* aStartAddress, void* aThreadParam) > +ShouldBlockThread(void* aStartAddress) > { > + // allows crashfirefox.exe to continue to work. also if your threadproc is null, this crash is intentional. When your comments are in complete-sentence style (as opposed to terse same-line comments), please capitalize. Same for NopThreadProc below. ::: mozglue/build/WindowsDllBlocklist.cpp:712 (Diff revision 1) > > static bool > -ShouldBlockThread(void* aStartAddress, void* aThreadParam) > +ShouldBlockThread(void* aStartAddress) > { > + // allows crashfirefox.exe to continue to work. also if your threadproc is null, this crash is intentional. > + if(aStartAddress == 0) Should have a space after 'if'. ::: mozglue/build/WindowsDllBlocklist.cpp:720 (Diff revision 1) > bool shouldBlock = false; > MEMORY_BASIC_INFORMATION startAddressInfo; > if (VirtualQuery(aStartAddress, &startAddressInfo, sizeof(startAddressInfo))) { > shouldBlock |= startAddressInfo.State != MEM_COMMIT; > shouldBlock |= startAddressInfo.Protect != PAGE_EXECUTE_READ; > - shouldBlock |= !(startAddressInfo.Type & MEM_IMAGE); > + //shouldBlock |= !(startAddressInfo.Type & MEM_IMAGE); in the future we may want to be stricter by adding this check. for now playing it safe. In general we try not to have commented-out code in the tree since it tends to just go stale. (Same for #if 0 and similar.) We'll be adding MEM_IMAGE in a followup bug (hmm, someone should file one!) so there's no concern about forgetting about this. ::: mozglue/build/WindowsDllBlocklist.cpp:728 (Diff revision 1) > return shouldBlock; > } > > +// allows blocked threads to still run normally through BaseThreadInitThunk, in case there's any magic there that we shouldn't skip. > +DWORD WINAPI > +NopThreadProc(void* aThreadParam) On some of our platforms the compiler complains about unused variables. This code is Windows-only so it's probably fine, but as a general habit I would write "(void* /* aThreadParam */)" or just "(void*)". Also I'd mark the function static since it's only for a one-off use in this file.

Attachment #8852415 - Flags: review?(dmajor)

Assignee

Updated

•

8 years ago

Attachment #8852413 - Attachment is obsolete: true

Attachment #8852413 - Flags: review?(m_kato)

Assignee

Updated

•

8 years ago

Attachment #8852414 - Attachment is obsolete: true

Attachment #8852414 - Flags: review?(dmajor)

Assignee

Updated

•

8 years ago

Attachment #8852415 - Attachment is obsolete: true

Comment hidden (mozreview-request)

Comment 92

•

8 years ago

mozreview-review

Comment on attachment 8852516 [details] Bug 1322554: Interpose kernel32!BaseThreadInitThunk to add verification of thread start addresses; https://reviewboard.mozilla.org/r/124708/#review127234 Oh, I forgot about the DllInterceptor patch. That one could have stayed separate -- my fault, sorry. I don't want to cause another round of churn so don't worry about splitting them again. ::: mozglue/build/WindowsDllBlocklist.cpp:716 (Diff revision 1) > + // Allows crashfirefox.exe to continue to work. Also if your threadproc is null, this crash is intentional. > + if (aStartAddress == 0) > + return false; > + > + bool shouldBlock = false; > + MEMORY_BASIC_INFORMATION startAddressInfo; One more nit -- just to be super extra safe let's initialize this to {0}. Normally it wouldn't matter, but given the environment we're in, with other code hooking stuff and whatnot, let's be cautious.

Attachment #8852516 - Flags: review?(dmajor) → review+

Comment 93

•

8 years ago

We should not present anything to users. From discussion with dmajor on IRC, we're going to proceed here with removing the MEM_IMAGE check, and if we want to add that later we'll need some form of field telemetry to see what might break.

Flags: needinfo?(benjamin)

Updated

•

8 years ago

Attachment #8850169 - Flags: review?(dmajor)

Comment hidden (mozreview-request)

Pushed: https://hg.mozilla.org/integration/autoland/rev/23de710cae6fdbc85bca9c4e5f649738e7dd965f

Assignee

Updated

•

8 years ago

Keywords: checkin-needed

Iris Hsiao [:ihsiao]

Comment 95

•

8 years ago

Keywords: checkin-needed

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 96

•

8 years ago

Backed out for breaking at least Windows 8 x64 builds (ignore the "likely all Windows builds"): https://hg.mozilla.org/integration/autoland/rev/943594e7f5d8460ea0270ee03c297dfa49e7fe91 Push showing the failure (the push for your patch is still running, maybe slower machine): https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=495a39e11056e7137b3b82e83e677d4c60994624&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=87560660&repo=autoland 08:52:03 INFO - Executing c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/obj-firefox/dist\bin\xpcshell.exe -g c:\builds\moz2_slave\autoland-w64-d-000000000000000\build\src\obj-firefox\dist\bin/ -a c:\builds\moz2_slave\autoland-w64-d-000000000000000\build\src\obj-firefox\dist\bin/ -f c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/toolkit/mozapps/installer\precompile_cache.js -e precompile_startupcache("resource://gre/"); 08:52:03 INFO - Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Unrecognized opcode sequence), at c:\builds\moz2_slave\autoland-w64-d-000000000000000\build\src\obj-firefox\dist\include\nsWindowsDllInterceptor.h:1118 08:52:03 INFO - Traceback (most recent call last): 08:52:03 INFO - File "c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/toolkit/mozapps/installer/packager.py", line 397, in <module> 08:52:03 INFO - main() 08:52:03 INFO - File "c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/toolkit/mozapps/installer/packager.py", line 391, in main 08:52:03 INFO - args.source, gre_path, base) 08:52:03 INFO - File "c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/toolkit/mozapps/installer/packager.py", line 165, in precompile_cache 08:52:03 INFO - errors.fatal('Error while running startup cache precompilation') 08:52:03 INFO - File "c:\builds\moz2_slave\autoland-w64-d-000000000000000\build\src\python\mozbuild\mozpack\errors.py", line 103, in fatal 08:52:03 INFO - self._handle(self.FATAL, msg) 08:52:03 INFO - File "c:\builds\moz2_slave\autoland-w64-d-000000000000000\build\src\python\mozbuild\mozpack\errors.py", line 98, in _handle 08:52:03 INFO - raise ErrorMessage(msg) 08:52:03 INFO - mozpack.errors.ErrorMessage: Error: Error while running startup cache precompilation 08:52:03 INFO - c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/toolkit/mozapps/installer/packager.mk:41: recipe for target 'stage-package' failed 08:52:03 INFO - mozmake.EXE[7]: *** [stage-package] Error 1 08:52:03 INFO - mozmake.EXE[7]: Leaving directory 'c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/obj-firefox/browser/installer' 08:52:03 INFO - c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/toolkit/mozapps/installer/packager.mk:96: recipe for target 'make-package' failed 08:52:03 INFO - mozmake.EXE[6]: *** [make-package] Error 2 08:52:03 INFO - c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/config/rules.mk:519: recipe for target 'default' failed 08:52:03 INFO - mozmake.EXE[5]: *** [default] Error 2 08:52:03 INFO - c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/browser/build.mk:9: recipe for target 'package' failed 08:52:03 INFO - mozmake.EXE[4]: *** [package] Error 2 08:52:03 INFO - c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/build/moz-automation.mk:100: recipe for target 'automation/package' failed 08:52:03 INFO - mozmake.EXE[3]: *** [automation/package] Error 2

Flags: needinfo?(ccorcoran)

Updated

•

8 years ago

status-firefox52: affected → wontfix

status-firefox-esr52: --- → affected

Comment hidden (mozreview-request)

Assignee

Comment 99

•

8 years ago

(In reply to Carl Corcoran [:ccorcoran] from comment #97) > Created attachment 8854781 [details] > Bug 1322554: added handling for CALL reg opcode; > > Review commit: https://reviewboard.mozilla.org/r/126762/diff/#index_header > See other reviews: https://reviewboard.mozilla.org/r/126762/ This fixes the cppunittest errors on win2008. Kernel32.dll!BaseThreadInitThunk contained: > FFD2 call rdx which we didn't handle. Verified fixed on win2008 & treeherder. Still getting the hang of reviewboard; it re-requested review for the previous patch (attachment 8852516 [details]). Nothing changed there.

Flags: needinfo?(ccorcoran)

Comment 100

•

8 years ago

mozreview-review

Comment on attachment 8854781 [details] Bug 1322554: Interpose kernel32!BaseThreadInitThunk to add verification of thread start addresses; https://reviewboard.mozilla.org/r/126762/#review129568 ::: xpcom/build/nsWindowsDllInterceptor.h:1117 (Diff revision 1) > int64_t* ptrToJmpDest = reinterpret_cast<int64_t*>(origBytes + nOrigBytes + 5 + offset); > intptr_t jmpDest = static_cast<intptr_t>(*ptrToJmpDest); > JumpPatch jump(nTrampBytes, jmpDest, JumpType::Jmp); > nTrampBytes = jump.GenerateJump(tramp); > nOrigBytes += 5; > + } else if ((origBytes[nOrigBytes] & (kMaskMod|kMaskReg)) == 0xd0) {// (rm = xx010xxx) | (mod = 11xxxxxx) Please remove this comment and replace 0xd0 with BuildModRmByte(kModReg, 2, 0)

Attachment #8854781 - Flags: review?(aklotz) → review+

Pulsebot

Comment 101

•

8 years ago

Pushed by rjesup@wgate.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/a50a4f9e972b Interpose kernel32!BaseThreadInitThunk to add verification of thread start addresses; r=dmajor

Comment 102

•

8 years ago

Since time for 53 is short, as I indicated to ddurst earlier I landed the patch, with aklotz's nit from his review, merged into one patch.

Comment 103

•

8 years ago

Please nominate for aurora/beta (and probably ESR52) ASAP, thanks.

Flags: needinfo?(ccorcoran)

https://hg.mozilla.org/mozilla-central/rev/a50a4f9e972b

Assignee

Comment 104

•

8 years ago

Attached patch a50a4f9e972b.txt — Details — Splinter Review

Approval Request Comment [Feature/Bug causing the regression]: 3rd-party apps [User impact if declined]: browser crash. this is top crash & startup crash [Is this code covered by automated tests?]: partially. if the browser doesn't crash, we know it's passing. [Has the fix been verified in Nightly?]: just landed [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: low risk [Why is the change risky/not risky?]: this fix involves hooking a new system function (kernel32!BaseThreadInitThunk); some Windows binaries could potentially break our patching code. This is unlikely imo as the same hooking code is already being used for many other calls without issue. [String changes made/needed]: none

Attachment #8850169 - Attachment is obsolete: true

Attachment #8850209 - Attachment is obsolete: true

Attachment #8852516 - Attachment is obsolete: true

Attachment #8854781 - Attachment is obsolete: true

Flags: needinfo?(ccorcoran)

Attachment #8855203 - Flags: approval-mozilla-esr52?

Attachment #8855203 - Flags: approval-mozilla-beta?

Attachment #8855203 - Flags: approval-mozilla-aurora?

Carsten Book [:Tomcat]

Comment 105

•

8 years ago

bugherder

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox55: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Comment 106

•

8 years ago

Comment on attachment 8855203 [details] [diff] [review] a50a4f9e972b.txt Fix for malware/antivirus dll related startup crash. Let's try to land this for beta 10.

Attachment #8855203 - Flags: approval-mozilla-beta?

Attachment #8855203 - Flags: approval-mozilla-beta+

Attachment #8855203 - Flags: approval-mozilla-aurora?

Attachment #8855203 - Flags: approval-mozilla-aurora+

Comment hidden (obsolete)

Comment 108

•

8 years ago

This'll need a rebased patch (or some other bugs to be uplifted) for Beta.

Flags: needinfo?(ccorcoran)

Comment 109

•

8 years ago

Aaron, might you be able to help with the rebasing so we can try to land this today for beta 10?

Flags: needinfo?(aklotz)

Phil Ringnalda (:philor)

Comment 110

•

8 years ago

I'm sure glad I don't have to explain them, only report on them. Backed out in https://hg.mozilla.org/mozilla-central/rev/1518ee478393 - since this landed, we've only had a couple of Win8 debug xpcshell runs which did not time out in some random updater test, like https://treeherder.mozilla.org/logviewer.html#?job_id=89061558&repo=mozilla-inbound Why does it fail? Why does it fail on trunk, but not on aurora? Does that mean the patch is fine on aurora, or that it's broken but the updater tests don't happen to show that it is broken there? Those are some of the things I'm glad I don't have to explain.

Status: RESOLVED → REOPENED

status-firefox55: fixed → ---

Resolution: FIXED → ---

Target Milestone: mozilla55 → ---

Assignee

Comment 111

•

8 years ago

I am looking into this now. My best guess is that we introduced a race condition somehow. If a caller is waiting for a signal from a thread that we block, then it will freeze. This trades a crash for a freeze, which in this thread we've determined is acceptable. That's not a complete theory though; it doesn't explain random regression timeouts. And/or maybe there are cases where we're blocking threads too liberally. Still not a complete theory. Going to test again with logging to narrow it down.

Flags: needinfo?(ccorcoran)

Comment 112

•

8 years ago

Phil, it doesn't appear that this log has dumps/stacks of the process that timed out, even though it says mozcrash Using tests\bin\minidumpwriter.exe to write a dump to c:\users\cltbld~1.t-w\appdata\local\temp\xpc-other-69txcc\1ad92b13-faee-47e8-9475-636bc4390bbe.dmp for [5092] And there aren't any minidumps in the uploaded artifacts (or at least I don't see any). Stacks should point to the hang pretty reliably. Is there a way to get the minidump output for these minidumps?

Flags: needinfo?(philringnalda)

Phil Ringnalda (:philor)

Comment 113

•

8 years ago

I doubt it, other than getting a loaner - after the test timeout there's a mozprocess 1000 second timeout, then an 1800 second timeout that's buildbot giving up, then the buildbot master loses contact with the slave. The two ways off the top of my head that we've done that before are by starving the python process running buildbot of either memory or CPU so badly that it blows up, or by actually crashing Windows.

Flags: needinfo?(philringnalda)

Comment 114

•

8 years ago

Clearing ni? because backout

Flags: needinfo?(aklotz)

Comment 115

•

8 years ago

Marking this affected for 55 since the patch was backed out from m-c (but not from aurora)

status-firefox55: --- → affected

Comment 116

•

8 years ago

Let's push a Try for beta -- I really, really want to see this in 53 (and in 52ESR after that)

Flags: needinfo?(ccorcoran)

Reporter

Comment 117

•

8 years ago

though the one reproducible case to trigger this signature (bug 1335080) went away for me with this patch, it curiously seems to have had the opposite effect on the crash rates with aurora, as the volume of the signature increased in builds there after the patch has landed: https://crash-stats.mozilla.com/signature/?release_channel=aurora&signature=BaseThreadInitThunk&date=%3E%3D2016-06-01T00%3A00%3A00.000Z#graphs under those circumstances and because we shipped with this problem for a while now, i don't think we should attempt to uplift it to 53 that close to the release date now, but first wait and see how things turn out on 54.0b.

Comment 118

•

8 years ago

This seems really unlikely for 53 at this point. We will do an RC2 later this week but I don't have a good feeling here based on the aurora crashes and the test failures when trying to uplift.

Julien Cristau [:jcristau]

Updated

•

8 years ago

Attachment #8855203 - Flags: approval-mozilla-beta+ → approval-mozilla-beta-

Marcia Knous [:marcia]

Comment 119

•

8 years ago

Should this be backed out of Aurora based on Comment 117? The crash rate on Aurora is almost double what it was last week.

Comment 120

•

8 years ago

Ryan can you back this out of aurora as well? Based on comment 117 it seems like we shouldn't let this go into beta yet.

Flags: needinfo?(ryanvm)

Comment 121

•

8 years ago

(In reply to David Major [:dmajor] from comment #80) > Let's say we ship this and it doesn't fix the current crash. True, we're no > worse off, but we still have a high-impact crash on our hands. It would be > useful to get some diagnostics to inform a subsequent attempt at a fix. Next time I am going to push harder on this. :-(

Reporter

Comment 122

•

8 years ago

on aurora the spike had a strong correlation to the module WRusr.dll (Webroot SecureAnywhere): (99.32% in signature vs 02.26% overall) Module "WRusr.dll" = true one crash comment that might also shed a bit more light on this was: "Everybody in the team of developers I'm in now cannot start Firefox Developer edition because it immediately crashes. Some of my colleagues didn't have a page set as their home and theirs only crashed when going to a URL but I cannot open it at all as my home page was set as a web URL." https://crash-stats.mozilla.com/report/index/3d23d0d1-45f2-4a2b-adc4-251b22170411#tab-details

https://hg.mozilla.org/releases/mozilla-aurora/rev/e82a6f21f043

Comment 123

•

8 years ago

backout uplift

status-firefox54: fixed → affected

tracking-firefox-esr52: --- → ?

Flags: needinfo?(ryanvm)

Assignee

Comment 124

•

8 years ago

FYI I'm looking into the two results that have been noted here as the result of the patch: 1. Test timeouts on win8 x64, which I have finally been able to reproduce in a debugger last night. Today I'm digging into this further. At first glance seems like a race condition buried in the rust runtime that's triggered during Firefox startup. Example stack trace here: https://gist.github.com/thenfour/13617392b36ee2b3e9ae22914c611698 2. Investigate the WRusr.dll crashes and again see how/if this patch is increasing crashes. We may well find that the impact of hooking the thread start routine is too much impact, revealing other issues that are otherwise dormant. To me, revealing dormant bugs as a good reason to continue efforts here, if only to help resolve other intermittent issues. More to come...

Flags: needinfo?(ccorcoran)

Assignee

Comment 125

•

8 years ago

I am still waiting on getting access to the raw minidumps to investigate the WRusr.dll crash. Something occurred to me as a possible explanation though: What are the implications of patching BaseThreadInitThunk *while it may be currently executing*? I didn't look yet into our detours implementation to see what kind of guards we have against that scenario, but it's one way this patch may cause new crashes in BaseThreadInitThunk. Regarding test timeouts, it seems to be a race condition within the rust runtime that is normally hidden from view. Hooking BaseThreadInitThunk nudges timing windows just slightly, very occasionally causing a deadlock. Here's a description of what happens in the code to cause the contention: > Main thread (XRE_Init) > { > creates thread1 (InitDwrite) > rust runtime begins lazy-initializing tls, > ACQUIRES rust tls mutex > trying to release that mutex, calls into GetProcAddress > WAITS for loader lock > } > > Thread1 (InitDwrite) > { > completes its work, exits > ACQUIRES loader lock > on_tls_callback https://github.com/rust-lang/rust/blob/master/src/libstd/sys/windows/thread_local.rs#L242 > WAITS for rust tls mutex > } The bug here is that rust runtime is acquiring a mutex while in a loader lock, creating a race condition. I didn't think of a handy way to forcibly induce the deadlock (e.g. using Sleep()), as both locks are fully contained within the rust runtime. I'm honestly surprised we don't see it deadlock more often; the timing window is not that large here and under normal conditions it seems to barely miss. A practical solution is to find a workaround in XRE_Main; I would suggest we force rust to complete initializing tls before we create the InitDwrite thread. I'm now looking into the best way to accomplish that; it's certainly a one-liner that just touches a tls value or so. FYI I'm on Easter holiday, returning on Tuesday.

Comment 126

•

8 years ago

I'm pretty sure it's a bug to acquire any other lock while holding the loader lock. https://blogs.msdn.microsoft.com/oldnewthing/20040128-00/?p=40853 (also https://blogs.msdn.microsoft.com/oldnewthing/20140808-00/?p=293) Although I also bet there's a lot of software out there which might affect this in one way or another. Are we installing this patch early enough that there aren't other threads running?

Comment 127

•

8 years ago

(In reply to Carl Corcoran [:ccorcoran] from comment #125) > What are the implications of patching BaseThreadInitThunk *while it may be > currently executing*? I didn't look yet into our detours implementation to > see what kind of guards we have against that scenario, but it's one way this > patch may cause new crashes in BaseThreadInitThunk. If we are able to take the NopSpacePatcher route, then this can be done safely. (In fact, this is the reason why functions begin with this awkward-looking "mov edi, edi" rather than "nop; nop" -- so that patching can't cause someone to be mid-instruction.) If we have to take the AddDetour route, we're super unsafe and all bets are off. But given the consistency of the repro reported by that engineering team, I'm leaning against a timing problem. In the past we've had problems in wanting to hook the same API as an AV etc. It could be that wrusr tries to hook BaseThreadInitThunk but our patched version confuses its disassembler and it falls over.

Comment 128

•

8 years ago

Seems highly unlikely that we'd consider this as a dot-release candidate, so marking 53 as wontfix.

status-firefox53: affected → wontfix

Assignee

Comment 129

•

8 years ago

Regarding the temporary spike in crashes from hooking BaseThreadInitThunk, I'm guessing WRusr.dll is at some point attempting to remove our hook, and in doing so causing memory corruption around BaseThreadInitThunk. To confirm this 100%, I would need to install Webroot SecureAnywhere and try to repro. Then find a workaround or block it. What's happening to cause the crash? Looking at a few minidumps, it's restricted to 32-bit build, across Windows platforms (i've seen win10 and 8.1 out of about 6 dumps). This disassembly reveals pretty much everything: > 0:036> u kernel32!BaseThreadInitThunk-8 l10 > kernel32!_BaseDllInitialize+0x29b: > 766e6298 cc int 3 > 766e6299 cc int 3 > 766e629a cc int 3 > 766e629b e90e49ab98 jmp mozglue!`anonymous namespace'::patched_BaseThreadInitThunk (0f19abae) > kernel32!BaseThreadInitThunk: > 766e62a0 8bff mov edi,edi > 766e62a2 55 push ebp > 766e62a3 8bec mov ebp,esp > 766e62a5 51 push ecx > 766e62a6 a1c4008b6b mov eax,dword ptr ds:[6B8B00C4h] > 766e62ab 33c5 xor eax,ebp Points of interest: 1. 766e629b our long jmp is intact, and calling into our hook. We have not removed the hook. 2. 766e62a0 mov edi,edi - this is a problem. This should be our trampoline to the long jmp. Suggests that something has clobbered our hook. 3. 766e62a6 this is actually where the crash happens. #3 is the most interesting, and I don't really have an explanation yet. Two bytes have been overwritten: > 0:036> !chkimg -d kernel32 > 766e629b-766e629f 5 bytes - kernel32!_BaseDllInitialize+29e > [ cc cc cc cc cc:e9 0e 49 ab 98 ] > 766e62a9-766e62aa 2 bytes - kernel32!BaseThreadInitThunk+9 (+0x0e) > [ 78 76:8b 6b ] > 7 errors : kernel32 (766e629b-766e62aa) The original bits *should* disassemble like: > 766e62a6 a1c4007876 mov eax,dword ptr [KERNEL32!__security_cookie (767800C4h)] Those 2 overwritten bytes are almost always 8b 6b, though in one dump I saw 90 6b. Could be broken fragments of assembly from something? Maybe but nothing obvious. Finally, notable thread activity: 1. Main thread is in GeckoMediaPluginService::Init, waiting while launching a thread; presumably the crashing thread. 2. Crashing thread has finished and encountered the corrupt address. *boom* 3. there are about 3 worker threads in WRusr.dll, waiting (presumably for ipc). 4. all other threads seem fairly benign. Note that almost all threads return into our hook. So here's what seems to be happening, given the pieces to that puzzle: 1. we hook BaseThreadInitThunk 2. thread X starts up, runs under our hook 3. WRusr clobbers our hook, and in the process corrupts 2 other bytes in BaseThreadInitThunk 4. thread X ends, but BaseThreadInitThunk explodes because of the corruption. Next steps? How much diligence do we want to throw at this? If we want to fix this WRusr.dll issue, we should repro in a VM to see what's causing the corruption. Or is blocking it an option?

Flags: needinfo?(benjamin)

Comment 130

•

8 years ago

Blocking is absolutely an option if it works. Per new blocklisting policy, we don't allow external DLLs in our process.

Flags: needinfo?(benjamin)

Assignee

Updated

•

8 years ago

Depends on: 1358151

Reporter

Comment 131

•

8 years ago

the crash volume of this signature is declining to the levels we've had before december - maybe the malware received an update "fixing" the startup crashes. https://crash-stats.mozilla.com/signature/?product=Firefox&signature=BaseThreadInitThunk&date=%3E%3D2016-10-27T11%3A28%3A07.000Z#graphs

Assignee

Updated

•

8 years ago

Depends on: 1361410

Comment 132

•

8 years ago

I'm one of the developers for Webroot (WRusr.dll), I'd like permissions to download the raw dump files from Mozilla. I'm also having some trouble reproducing this myself. According to what I've gathered in the thread above. This is limited to 32 bit Firefox and has happened on I tries the GA version of Firefox and the Nightly version. I tried both 64 and 32 bit Firefox on both 64 and 32 bit Windows 7, 8.1, and 10. I see some consensus in the thread that this tends to happen on 32 bit Windows 7. I don't have a 32 bit Windows 7 immediately ready; I'm installing one now to test this.

Comment 133

•

8 years ago

Johnny, you can find a build to reproduce the problem in bug 1361410 comment 0. Carl, can you reproduce the crash yourself? If you can, could you send Johnny a raw dump? We can't share users' raw dumps, because we would need their consent.

Flags: needinfo?(ccorcoran)

Comment 134

•

8 years ago

Marco, thanks for the build and steps to repro. I'll get on this now.

Comment 135

•

8 years ago

I just reproduce it myself. Thanks for pointing me in the right direction Marco. I'll report back once I complete my analysis.

Assignee

Comment 136

•

8 years ago

Johnny, thanks for your help. I see you managed to repro it. If you still need minidumps, I attached them on bug 1361410.

Flags: needinfo?(ccorcoran)

Comment 137

•

8 years ago

I've found the root of the problem on our end. I'm 90% sure I have a solution in place to resolve this. I'm doing some further validation on my end. Once that is done I'll update again. I might be able to send someone a one-off build to do some validation against on your end. If that is something someone would be interested in let me know.

Assignee

Comment 138

•

8 years ago

Johnny I'm happy to give it a shot. But we're just as interested in ensuring users with the current WRusr.dll don't crash. Is there a workaround we can use in the meantime so even with the current WRusr.dll, it won't crash? For example a way to tell WRusr not to touch this particular detour?

Flags: needinfo?(johnnyshaw02)

Comment 139

•

8 years ago

I'm almost positive that isn't an option, but I'll verify. Exposing that ease of control to security software would be a major hole. What is the reason these users can't update their antivirus when this fix becomes available?

Flags: needinfo?(johnnyshaw02)

Comment 140

•

8 years ago

The user could set Firefox to "Allow" int he Application Protection. This will disable ALL protection for the browser though. If you click the gear next to "Identity Protection", then go to "Application Protection", and then move the Firefox binary to "Allow". This will tell WRusr not to inject into the process.

Assignee

Comment 141

•

8 years ago

Yea, and as you said, everyone should be caught up to the latest WRusr.dll. For the rare stragglers, and for the Nightly users in the meantime, we may decide to simply skip this detour depending on the version of WRusr.dll. Would you be able to let us know which version of WRusr.dll we could use as a condition?

Flags: needinfo?(johnnyshaw02)

Comment 142

•

8 years ago

Yeah, I can let you know what version to expect it in when I know.

Flags: needinfo?(johnnyshaw02)

Comment hidden (mozreview-request)

Comment 146

•

8 years ago

mozreview-review

Comment on attachment 8867827 [details] Bug 1322554: don't hook BaseThreadInitThunk if WRusr.dll is present on Windows x86; https://reviewboard.mozilla.org/r/139352/#review142728 ::: mozglue/build/WindowsDllBlocklist.cpp:765 (Diff revision 1) > > +static bool > +IsSafeToHookBaseThreadInitThunk() > +{ > + // Bug 1361410: WRusr.dll will overwrite our hook and cause a crash. Workaround: If we detect WRusr.dll, don't hook. > +#ifndef HAVE_64BIT_BUILD Looking at crash-stats, pretty much all the BaseThreadInitThunk crashes are 32-bit builds. The very very small number of 64-bit ones are probably unrelated noise. What do you think about putting this entire patch series behind #ifdef _M_IX86? Hooks always carry risk of bad interactions with other software, so I like to avoid hooking on machines where we don't need to. I haven't gone back and looked at the previous patch but this might also eliminate the need for some of the disassembler changes. ::: mozglue/build/WindowsDllBlocklist.cpp:767 (Diff revision 1) > +IsSafeToHookBaseThreadInitThunk() > +{ > + // Bug 1361410: WRusr.dll will overwrite our hook and cause a crash. Workaround: If we detect WRusr.dll, don't hook. > +#ifndef HAVE_64BIT_BUILD > + HMODULE hWRusr = (HMODULE)GetModuleHandleW(L"WRusr.dll"); > + if (hWRusr != NULL) { Hm, why the cast? If you like, you could just avoid it with `if (GetModuleHandle...)`. Also, we probably don't need to worry about following the pattern of printf_stderr's -- I don't think they add too much value since we very rarely step through this file after a patch is landed.

Assignee

Comment 147

•

8 years ago

mozreview-review-reply

Comment on attachment 8867827 [details] Bug 1322554: don't hook BaseThreadInitThunk if WRusr.dll is present on Windows x86; https://reviewboard.mozilla.org/r/139352/#review142728 > Hm, why the cast? If you like, you could just avoid it with `if (GetModuleHandle...)`. Also, we probably don't need to worry about following the pattern of printf_stderr's -- I don't think they add too much value since we very rarely step through this file after a patch is landed. Originally I was writing this with a version check, where I use the HMODULE to get at the resources. Without the version checking it's true this is unnecessary, but also benign to leave it.

Comment hidden (mozreview-request)

Assignee

Comment 149

•

8 years ago

mozreview-review-reply

Comment on attachment 8867827 [details] Bug 1322554: don't hook BaseThreadInitThunk if WRusr.dll is present on Windows x86; https://reviewboard.mozilla.org/r/139352/#review142728 > Looking at crash-stats, pretty much all the BaseThreadInitThunk crashes are 32-bit builds. The very very small number of 64-bit ones are probably unrelated noise. > > What do you think about putting this entire patch series behind #ifdef _M_IX86? Hooks always carry risk of bad interactions with other software, so I like to avoid hooking on machines where we don't need to. I haven't gone back and looked at the previous patch but this might also eliminate the need for some of the disassembler changes. Seems reasonable; no reason to spread this patch where it doesn't benefit. I would still keep the 64-bit opcode support though; this is just adding support for opcodes we should already be supporting but just haven't seen yet. More the better. Or should we file separate bugs for them?

Julien Cristau [:jcristau]

Comment 150

•

8 years ago

For what its worth. Rather the just blindly re-writing code WRusr tries to be smart about replacing code when clearing hooks. Its possible the method of hooking mozglue is using isn't conventional. Obviously WRusr could be smarter about removing the hooks. But, the better approach from our side would be to allow mosglue to do it's job. My point is, if you have a secondary method for hooking. You might be positive results using it. To close the loop on my side I did send over a build to Carl that has a patch on our end. Let me know if you don't get it Carl.

Comment 151

•

8 years ago

Comment on attachment 8855203 [details] [diff] [review] a50a4f9e972b.txt esr52-, this seems too high risk for a limited impact there (and we should get a fix to stick in trunk first anyway :) ).

Attachment #8855203 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52-

Comment 152

•

8 years ago

mozreview-review

Comment on attachment 8867827 [details] Bug 1322554: don't hook BaseThreadInitThunk if WRusr.dll is present on Windows x86; https://reviewboard.mozilla.org/r/139352/#review144122 Sorry, I should clarify -- could you please put the entire patch stack behind a 32-bit ifdef? (Including AddHook, NopThreadProc, ShouldHook, TestDllInterceptor, etc.) And I think I'd prefer _M_IX86 because the HAVE_ stuff is more of configure's turf, and the _M_ approach is the most common pattern in our Windows-only files. Regarding the 64-bit assembler changes, my concern is mostly because this bug is of high uplift-interest on relman's radar, and I wouldn't want to rush unnecessary code to beta. Perhaps we could split that code off into its own bug that rides the trains like normal?

Attachment #8867827 - Flags: review?(dmajor) → review+

Comment hidden (mozreview-request)

Assignee

Updated

•

8 years ago

Attachment #8867827 - Attachment is obsolete: true

Assignee

Updated

•

8 years ago

Attachment #8852516 - Attachment is obsolete: true

Comment 154

•

8 years ago

mozreview-review

Comment on attachment 8854781 [details] Bug 1322554: Interpose kernel32!BaseThreadInitThunk to add verification of thread start addresses; https://reviewboard.mozilla.org/r/126762/#review145738 ::: mozglue/build/WindowsDllBlocklist.cpp:870 (Diff revision 3) > reinterpret_cast<intptr_t>(patched_RtlInstallFunctionTableCallback), > (void**)&stub_RtlInstallFunctionTableCallback); > } > #endif > + > +#ifdef _M_IX86// Minimize impact; crashes in BaseThreadInitThunk are vastly more frequent on x86 Nit: Add a space before // ::: toolkit/xre/test/win/TestDllInterceptor.cpp:523 (Diff revision 3) > TestHook(TestSendMessageTimeoutW, "user32.dll", "SendMessageTimeoutW") && > #endif > TestHook(TestTlsAlloc, "kernel32.dll", "TlsAlloc") && > TestHook(TestTlsFree, "kernel32.dll", "TlsFree") && > +#ifdef _M_IX86 > + TestDetour("kernel32.dll", "BaseThreadInitThunk") && This call to TestDetour is going to be a departure from our usual pattern of calling TestHook wherever the product code calls AddHook. I understand why it's this way -- we can't really call stub_BaseThreadInitThunk from the test code -- but it's worth explaining in a comment, if for no other reason than to deter future coders from breaking the pattern without good reason. :)

Attachment #8854781 - Flags: review?(dmajor) → review+

Comment 155

•

8 years ago

Actually, speaking of the TestHook/TestDetour distinction, have you tried using AddDetour rather than AddHook to see if it interacts any better with wrusr.dll? Johnny Shaw did mention in comment 150 that we might have better luck with alternative hook techniques.

Flags: needinfo?(ccorcoran)

Assignee

Comment 156

•

8 years ago

David, I just tried AddDetour and indeed Webroot no longer crashes. Our hook is still overwritten, which is a potential problem, but the crash we've noted is no longer happening. Which means we should be able to proceed now. Johnny, thank you for all your help and diligence. New patch incoming...

Flags: needinfo?(ccorcoran)

Comment hidden (mozreview-request)

Comment 158

•

8 years ago

Happy to help. Thank you guys for working with us. I'm certainly willing to keep our communication open to collaborate in the future.

Pulsebot

Comment 159

•

8 years ago

Pushed by dmajor@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7d2e74c69253 Interpose kernel32!BaseThreadInitThunk to add verification of thread start addresses; r=aklotz,dmajor

https://hg.mozilla.org/mozilla-central/rev/7d2e74c69253

Comment 160

•

8 years ago

bugherder

Status: REOPENED → RESOLVED

Closed: 8 years ago → 8 years ago

status-firefox55: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Nathan Froyd [:froydnj]

Comment 161

•

8 years ago

Please request uplift on this when you get a chance; not sure if it's worth seeing if crash reports go down or not, seeing as you've verified it fixes things locally.

Flags: needinfo?(ccorcoran)

Comment 162

•

8 years ago

OTOH it would be good to observe this on nightly for a few days to watch for regressions. In code like this that's always a risk.

Reporter

Comment 163

•

8 years ago

unfortunately this patch doesn't prevent the startup issue with robosizer on win10 raised in bug 1335080 when i tested it now again (whereas the prior patch in attachment 8855203 [details] [diff] [review] fixed it for me).

Assignee

Comment 164

•

8 years ago

:philipp we seem to get very few crashes with this signature on x64, so to reduce impact we decided to start by restricting this patch to x86. https://crash-stats.mozilla.com/signature/?product=Firefox&signature=BaseThreadInitThunk&date=%3E%3D2017-01-01T08%3A14%3A00.000Z&date=%3C2017-05-18T08%3A14%3A00.000Z#graphs View crashes per day by cpu arch, and we get 1-4 crashes per day on x64 in BaseThreadInitThunk, compared to x86's in the hundreds. If this patch is successful in reducing x86's crashes, I think we should indeed try it on x64. Interesting about the stats for this signature, in the past month it's already gone significantly down. I'm curious what caused this change around mid-March. All this also reignites the desire to get telemetry about this patch.

Flags: needinfo?(ccorcoran)

Calixte Denizet (:calixte)

Reporter

Comment 165

•

7 years ago

ah thanks for the pointer. in fact, bug 1335080 is fixed when testing with a 32bit build of nightly. when robosizer is running, a 64bit nightly fails to launch silently though without generating any report, so it may be that the real extend of the issue with 64bit installations is masked in crash report data...

Updated

•

7 years ago

Depends on: 1368319

Assignee

Comment 166

•

7 years ago

:philipp I took another look at RoboSizer on win64. It consistently crashes for me too, though it appears unrelated to BaseThreadInitThunk. It's injecting EasyHook64.dll into Firefox, which is causing corruption at a semi-random place during startup. Sometimes it happens before crash reporting is hooked up. The solution here is likely to do some Dll blocking for EasyHook with Robosizer. I have created bug 1369188 to track it specifically.

Comment 167

•

7 years ago

(In reply to Carl Corcoran [:ccorcoran] from comment #166) > :philipp I took another look at RoboSizer on win64. It consistently crashes > for me too, though it appears unrelated to BaseThreadInitThunk. It's > injecting EasyHook64.dll into Firefox, which is causing corruption at a > semi-random place during startup. Sometimes it happens before crash > reporting is hooked up. > > The solution here is likely to do some Dll blocking for EasyHook with > Robosizer. I have created bug 1369188 to track it specifically. The original patch did fix the crash for Philipp though, so that's strange.

Assignee

Comment 168

•

7 years ago

Robosizer injects its DLLs via CreateRemoteThread, which this patch blocks because the threadstart memory has PAGE_EXECUTE_READWRITE, which we block with this patch. So it's true that this patch fixes this crash, though the crash is unrelated to BaseThreadInitThunk. I lost sight of this effect of the patch. Our BaseThreadInitThunk detour has the side-effect of auto-blocking lots of DLLs like EasyHook, which means we should consider that we're not just fixing BaseThreadInitThunk crash signatures, but any crash related to remotely-injected code. The motivation for restricting to x86 was the very few BaseThreadInitThunk crash signatures we receive for x64. So I think we should re-consider x64 support.

Updated

•

7 years ago

status-firefox54: affected → wontfix

Updated

•

7 years ago

Blocks: 1374611

Comment 169

•

7 years ago

Is this something we're still considering for eventual backport to ESR52 or should we wontfix? Seems like a tough call given the risk vs. reward :\

Comment 170

•

7 years ago

The BaseThreadInitThunk crashes are not as common as they used to be (#32 on ESR52, they used to be in the top three).

Assignee

Comment 171

•

7 years ago

The BaseThreadInitThunk crash signature is just 1 signature this patch should fix. In theory, this should prevent any crashes stemming from WriteProcessMemory/CreateRemoteThread DLL injection.

Comment 172

•

7 years ago

(In reply to Carl Corcoran [:ccorcoran] from comment #171) > The BaseThreadInitThunk crash signature is just 1 signature this patch > should fix. In theory, this should prevent any crashes stemming from > WriteProcessMemory/CreateRemoteThread DLL injection. Yes, it is definitely a great prevention mechanism. I'm just saying we currently don't have a crash as concerning as "BaseThreadInitThunk" was a while ago, so we should take this into account when deciding whether to uplift to ESR52.

Comment 173

•

7 years ago

Doesn't seem like there's a clear consensus to move forward with nominating this for ESR52, so setting the status to wontfix to stop tracking it.

status-firefox-esr52: affected → wontfix

tracking-firefox-esr52: ? → ---

Updated

•

7 years ago

Blocks: 1372827

Updated

•

7 years ago

No longer blocks: 1374611

Updated

•

7 years ago

Blocks: 1372823

Comment 174

•

6 years ago

(In reply to David Major [:dmajor] from comment #66) > > Honestly I don't know what is apphelp.dll, there's not much information > > about it, the answer from google is: it's a part of Application > > Compatibility Client Library from Microsoft. > > Apphelp (and AcLayers) are Windows code that provides > backwards-compatibility fixes by shimming APIs, e.g. making > GetWindowsVersion lie to keep some old program happy. > > I don't remember the details but I've seen this stuff used for questionable > purposes in the past, since it's a way to introduce code into a process well > before main(). I don't know why I came back to this bug, and I was thinking: "Why would they use apphelp or aclayers to call GetProcAddress, when GetProcAddress is in kernel32.dll and they could have safely called it?". Maybe the issue here wasn't that they were using apphelp and aclayers to do dodgy things, but Windows was automatically "applying compatibility fixes" redirecting GetProcAddress (and maybe other functions?) to those DLLs. If that's the case, could similar crashes happen again when some third-party software which injects into Firefox (with any mechanism) somehow gets "compatibility fixes" by Windows and out of the blue starts using functions from DLLs that we don't load (as we don't get "compatibility fixes")? Is there any way we could prevent this problem?