Closed Bug 932100 Opened 11 years ago Closed 11 years ago

Set up DLL blocklist before LoadAppInitDlls

Categories

(Toolkit :: General, defect)

All
Windows 7
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla28
Tracking Status
firefox25 --- wontfix
firefox26 + wontfix
firefox27 + verified
firefox28 + verified

People

(Reporter: away, Assigned: away)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 4 obsolete files)

The DLL blocklist is ineffective against AppInit DLLs because they get loaded before the DLL blocklist is initialized. Bug 925459 is an example.

kernel32!LoadAppInitDlls is called during user32's DLL_PROCESS_ATTACH. In order to intercept AppInit DLLs, we would need to hook LdrLoadDll before user32 gets loaded.

Currently, firefox.exe has a load-time link to user32.dll, which means we do LoadAppInitDlls before any Mozilla code is executed. It turns out that this link to user32 is only used by one call to MessageBoxW from an error handler in nsBrowserApp.cpp. Given the rarity of that codepath, we could easily turn it into a runtime dynamic link.

That would at least let us run firefox!wmain before user32, but the blocklist is implemented in xul.dll, and by the time we load that, InitXPCOMGlue has already loaded a ton of stuff, including user32.

Some potential options:
1) Have a separate blocklist in firefox.exe for AppInit DLLs
2) Move the blocklist from xul.dll to firefox.exe
3) Disentangle things and try to get xul!XRE_SetupDllBlocklist available before InitXPCOMGlue preloads the world
Assuming we work around the MessageBoxW, then: user32 comes in via winmm, which comes in via:

nss3 (timeGetTime)
mozjs (timeBeginPeriod and timeEndPeriod)
gkmedias (a bunch of waveOut stuff)
Another option:
4) Move the blocklist into its own DLL and put that on the top of the preload list
5) Delayload everything with ties to user32 (this may not be practical)
We shouldn't be static linking nss3 or mozjs or gkmedias: those are all dependencies of xul.dll but aren't needed in firefox.exe itself.
Blocks: 925459
(In reply to Benjamin Smedberg  [:bsmedberg] (Away 24-25 October) from comment #4)
> We shouldn't be static linking nss3 or mozjs or gkmedias: those are all
> dependencies of xul.dll but aren't needed in firefox.exe itself.

They're not static links, but XPCOMGlueLoad reads these filenames from dependentlibs.list and preloads+LoadLibrary's them.
Another option is to move the blocklist to its own dll (as a regular dependency of firefox.exe; not using xul's preload list)
Glandium suggested yet another option: move the blocklist to mozglue. This is the approach I am currently working on.
Attached patch Part 1: MessageBoxW (obsolete) — Splinter Review
This is the easy part. Still working on the rest.
Assignee: nobody → dmajor
Status: NEW → ASSIGNED
Attachment #824977 - Flags: review?(benjamin)
Attached patch Part 2: WIP (obsolete) — Splinter Review
WIP Checkpoint

This is just copying nsWindowsDllBlocklist.cpp to mozglue and getting it to compile and link. Nobody calls it yet. The xul copy is still active.

Hacks:
- Disabled crash reporting code, will have to add it back in a different form, maybe some callbacks
- Disabled the XPCOM main thread checks, will have to add back
- Duplicated printf_stderr
Attachment #824977 - Flags: review?(benjamin) → review+
Functional but needs polish

- It feels weird to have all platforms include WindowsDllBlocklist.h and do the platform check inside. Is there a better way to gate this stuff?

- Is there some static lib that I can share printf_stderror from so that I don't have to duplicate?

- The code that logs whether the hook succeeded doesn't actually work today (it runs before the crash reporter is initialized) so I didn't bother preserving it

- My assert about user32 in Dllblocklist_Initialize doesn't hold true for xpcshell. It links xul in the regular way, so user32 gets pulled in at startup. Should I stick some extra condition on the assertion?
Attachment #824981 - Attachment is obsolete: true
Attachment #825753 - Flags: review?(mh+mozilla)
Attachment #825753 - Flags: review?(benjamin)
The assert bit me on Try so I coded around it. This is hax that I would never do in non-debug code. I'm not sure whether mozglue has access to the more "official" ways of getting the app name.

-  MOZ_ASSERT(!GetModuleHandleA("user32.dll"));
+  // This is only enforced in Firefox since other hosts may have different needs.
+  MOZ_ASSERT_IF(GetModuleHandleA("firefox.exe"), !GetModuleHandleA("user32.dll"));

First attempt: https://tbpl.mozilla.org/?tree=Try&rev=12ed199d8449
Fix Windows dbg: https://tbpl.mozilla.org/?tree=Try&rev=bae9aabbe217
Comment on attachment 825753 [details] [diff] [review]
Part 2: Move blocklist to mozglue

The rest of this looks fine to me.
Attachment #825753 - Flags: review?(benjamin) → review+
Hrm, about xpcshell: we probably could dynamically load XUL now, since we moved the guts of xpcshell into xul in bug 920695.

I don't like the hardcoded firefox.exe: maybe we should make this a flag param to DllBlocklist_Initialize for the time being? Or make the assert a separate function which is called from firefox.exe but not xpcshell.exe.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #13)
> Hrm, about xpcshell: we probably could dynamically load XUL now, since we
> moved the guts of xpcshell into xul in bug 920695.
> 
> I don't like the hardcoded firefox.exe: maybe we should make this a flag
> param to DllBlocklist_Initialize for the time being? Or make the assert a
> separate function which is called from firefox.exe but not xpcshell.exe.

Yeah, I'll move the assert out to the caller. That seems like the cleanest option. 

While I'm at it, if I move the assertion to be *after* the initialization, then we can make an even stronger claim, that the blocklist is completely set up before user32 enters the picture.
Comment on attachment 825753 [details] [diff] [review]
Part 2: Move blocklist to mozglue

Review of attachment 825753 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/build/Makefile.in
@@ +54,5 @@
> +	$(MOZ_ZLIB_LIBS) \
> +	version.lib \
> +	$(NULL)
> +
> +STL_FLAGS=

I /think/ you don't need STL_FLAGS= anymore, with your other changes.

::: mozglue/build/WindowsDllBlocklist.cpp
@@ +147,5 @@
>  #undef DEBUG_very_verbose
>  
> +static DWORD sThreadLoadingXPCOMModule;
> +
> +// REVIEW: I duplicated this from XPCOM glue. Is there a way to share it?

You may want to move printf_stderr entirely in mozglue, but that probably should be its own bug, and might have its own linking glitches.

@@ +592,5 @@
> +  // initialized before user32.dll is loaded into the process. If this assert
> +  // ever fires, then the fix for bug 932100 has been defeated and the
> +  // blocklist will miss AppInit DLLs. You should use a delayload or reorder
> +  // the code to prevent user32.dll from loading during early startup.
> +  MOZ_ASSERT(!GetModuleHandleA("user32.dll"));

Considering the subtle differences there can be between opt and debug builds, we may want to make this fail on opt builds too. Although it might create other subtle problems on systems with nasty third party injecting Dlls, if they win the race and import user32.dll before we reach here. So maybe not fail, but record it. In telemetry or crashreporter annotation.

@@ -581,5 @@
>  #endif
> -
> -#ifdef MOZ_CRASHREPORTER
> -  if (!ok) {
> -    CrashReporter::AppendAppNotesToCrashReport(NS_LITERAL_CSTRING("DllBlockList Failed\n"));

Note you're removing this, which could be important to know. This might something to track in a vector or something and dump in DllBlocklist_WriteBlockedDlls, although that'd need to be checked for formatting and whatnot.

::: mozglue/build/moz.build
@@ +32,5 @@
>  
> +if CONFIG['OS_TARGET'] == 'WINNT':
> +    CPP_SOURCES += [
> +        'WindowsDllBlocklist.cpp',
> +    ]

Your tree is old. CPP_SOURCES is gone, it's SOURCES, now.

::: xpcom/components/nsNativeComponentLoader.cpp
@@ +140,5 @@
>  
>      // We haven't loaded this module before
>  
> +#ifdef HAS_DLL_BLOCKLIST
> +    DllBlocklist_SetInXPCOMLoadOnMainThread(true);

I think this would be nicer if you used RAII.
Attachment #825753 - Flags: review?(mh+mozilla)
Blocks: 935335
- Repurposed WriteBlockedDlls to be a more general WriteNotes, this lets us log init status and user32 presence
- Moved user32 debug assert up to the caller (browser), but the opt logging remains in blocklist
- It seems we do still need STL_FLAGS=
- Opened bug 935335 for printf_stderr
- Added RAII helper for the xpcom load
Attachment #825753 - Attachment is obsolete: true
Attachment #828384 - Flags: review?(mh+mozilla)
Attachment #828384 - Flags: review?(benjamin)
Here's how I've tried the code locally. Most of these are pretty invasive and require recompiling and/or a debugger.

In debug:
- Undid part 1 patch and verified that it triggers the new assert
- Reapplied part 1 patch and verified no assert
- Stepped through RAII debug macros

In opt:
- Undid part 1 patch and verified the logging picks up user32
- Reapplied part 1 patch and verified no user32 logging
- Set AppInit registry keys to use a (fake) bitguard.dll, verified blocked
- Induced a crash and stepped through WriteNotes (forced all variables set so I could see all the writes)
- Stepped through use of sThreadLoadingXPCOMModule
- Stepped through re-entrancy check: forced GetFileVersionInfoSizeExW to use the XP-style flags that cause re-entrancy, verified we bail out
(In reply to David Major [:dmajor] from comment #16)
> - It seems we do still need STL_FLAGS=

Do you have a log of a build without to take a look?
(In reply to Mike Hommey [:glandium] from comment #18)
> (In reply to David Major [:dmajor] from comment #16)
> > - It seems we do still need STL_FLAGS=
> 
> Do you have a log of a build without to take a look?

0:00.79 WindowsDllBlocklist.obj : error LNK2019: unresolved external symbol __imp__moz_xmalloc referenced in function "void * __cdecl operator new(unsigned int)" (??2@YAPAXI@Z)
0:00.79 WindowsDllBlocklist.obj : error LNK2019: unresolved external symbol __imp__moz_free referenced in function "void __cdecl operator delete(void *)" (??3@YAXPAX@Z)
0:00.79 WindowsDllBlocklist.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) void __cdecl std::moz_Xlength_error(char const *)" (__imp_?moz_Xlength_error@std@@YAXPBD@Z) referenced in function "protected: class std::_Tree_iterator [...]
0:00.79 WindowsDllBlocklist.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) void __cdecl std::moz_Xout_of_range(char const *)" (__imp_?moz_Xout_of_range@std@@YAXPBD@Z) referenced 
in function "public: class std::_Tree_iterator [...]

I tried moving #define MOZ_NO_MOZALLOC to before #include <map>, but that produces:
0:00.96 d:\src\vc10\obj06chk\dist\stl_wrappers\xutility(36) : fatal error C1189: #error :  "STL code can only be used with infallible ::operator new()"
Attachment #828384 - Flags: review?(benjamin) → review+
Comment on attachment 828384 [details] [diff] [review]
Part 2: Move blocklist to mozglue

Review of attachment 828384 [details] [diff] [review]:
-----------------------------------------------------------------

Note the changes in the crashreporter annotation may require changes on the socorro side.

::: mozglue/build/WindowsDllBlocklist.cpp
@@ +162,5 @@
> +static bool sBlocklistInitFailed;
> +static bool sUser32BeforeBlocklist;
> +
> +// Duplicated from xpcom glue. Ideally this should be shared.
> +void

make it static
Attachment #828384 - Flags: review?(mh+mozilla) → review+
Patch for checkin
Attachment #824977 - Attachment is obsolete: true
Attachment #830560 - Flags: review+
Patch for checkin
Attachment #828384 - Attachment is obsolete: true
Attachment #830562 - Flags: review+
Keywords: checkin-needed
This still requires a hardcoded blocklist and doesn't work with a list dynamically pulled from the web, right?
That is correct. We currently have no plans for a dynamic DLL blocklist.
Whiteboard: [leave open]
Oops.
Whiteboard: [leave open]
https://hg.mozilla.org/mozilla-central/rev/49671f1df4da
https://hg.mozilla.org/mozilla-central/rev/33c8facd0468
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Depends on: 939043
Based on discussions in Crashkill meetings, this is not going to be safe enough to take on the Beta branch so we'll keep on tracking it for potential 27 Aurora, but marking wontfix for 26.
Comment on attachment 830560 [details] [diff] [review]
Part 1: MessageBoxW

[Approval Request Comment]
Bug caused by (feature/regressing bug #): External app
User impact if declined: Top crasher on Windows
Testing completed (on m-c, etc.): We only started blocking all versions in last night's nightly, so we don't yet have data on whether this successfully blocks bitguard.dll in the wild. Comment 17 describes local testing performed on a developer machine. 
Risk to taking this patch (and alternatives if risky): If there were mistakes in the DLL transplant, then the blocklist might stop working, but so far this doesn't seem to be the case. No alternatives, due to the design of Windows AppInit.
String or IDL/UUID changes made by this patch: None

** Also needs bug 925459 **
Attachment #830560 - Flags: approval-mozilla-aurora?
Comment on attachment 830562 [details] [diff] [review]
Part 2: Move blocklist to mozglue

[Approval Request Comment]
Bug caused by (feature/regressing bug #): External app
User impact if declined: Top crasher on Windows
Testing completed (on m-c, etc.): We only started blocking all versions in last night's nightly, so we don't yet have data on whether this successfully blocks bitguard.dll in the wild. Comment 17 describes local testing performed on a developer machine. 
Risk to taking this patch (and alternatives if risky): If there were mistakes in the DLL transplant, then the blocklist might stop working, but so far this doesn't seem to be the case. No alternatives, due to the design of Windows AppInit.
String or IDL/UUID changes made by this patch: None

** Also needs bug 925459 **
Attachment #830562 - Flags: approval-mozilla-aurora?
I am no longer seeing any of the signatures in bug 925459 on Nightly
Attachment #830560 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #830562 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to David Major [:dmajor] from comment #30)
> Comment on attachment 830562 [details] [diff] [review]
> Part 2: Move blocklist to mozglue
> 
> [Approval Request Comment]
> Bug caused by (feature/regressing bug #): External app
> User impact if declined: Top crasher on Windows
> Testing completed (on m-c, etc.): We only started blocking all versions in
> last night's nightly, so we don't yet have data on whether this successfully
> blocks bitguard.dll in the wild. Comment 17 describes local testing
> performed on a developer machine. 
> Risk to taking this patch (and alternatives if risky): If there were
> mistakes in the DLL transplant, then the blocklist might stop working, but
> so far this doesn't seem to be the case. No alternatives, due to the design
> of Windows AppInit.
> String or IDL/UUID changes made by this patch: None
> 
> ** Also needs bug 925459 **

:dmajor, DO we want to have some more blocklisting testcases covered by QA here ?
Keywords: verifyme
(In reply to bhavana bajaj [:bajaj] from comment #32)
> :dmajor, DO we want to have some more blocklisting testcases covered by QA
> here ?
The recent crash-stats results are sufficient, I think. We've stopped seeing bitguard crashes, so I'm satisfied that the change is effective. Also, I've seen other DLLs continue to show up in the BlockedDllList annotations, so that tells me that we haven't completely messed up the blocklist.
This is Part 2 for Aurora since it didn't merge cleanly. Trivial merge though so carrying forward the r+. Part 1 can be applied to Aurora without changes.
Attachment #8336422 - Flags: review+
Keywords: checkin-needed
Whiteboard: Need Aurora checkin for Part 1 and Part 2 (Aurora)
The assert is firing for me.  Anything I can do to figure out why/what's loading user32.dll?
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #36)
> The assert is firing for me.  Anything I can do to figure out why/what's
> loading user32.dll?

It is likely the appcompat flag from bug 939043 comment 7 and 8.
Depends on: 945216
Only 3 crashes are seen in 27.0beta builds in the last 3 weeks with crash signature that contain bitguard.dll
https://crash-stats.mozilla.com/query/?product=Firefox&version=Firefox%3A27.0b&platform=win&range_value=3&range_unit=weeks&date=12%2F17%2F2013+09%3A00%3A00&query_search=signature&query_type=contains&query=bitguard&reason=&release_channels=&build_id=&process_type=any&hang_type=any

Verified that the crash signatures from bug 925459 did not appear in 27 Beta. Based on comment 33 I am marking this as verified fixed.
Status: RESOLVED → VERIFIED
Keywords: verifyme
Depends on: 951827
We disabled this mechanism for 27 in bug 951827 - should we mark status for this as disabled for 27 here as well?
Depends on: 958344
You need to log in before you can comment on or make changes to this bug.