Closed Bug 1023941 Opened 10 years ago Closed 10 years ago

Support building firefox.exe and mozglue.dll with a static CRT and everything else with a dynamic CRT

Categories

(Firefox Build System :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla34

People

(Reporter: ehsan.akhgari, Assigned: away)

References

Details

Attachments

(5 files, 2 obsolete files)

In order to be able to support Windows XP SP2 with Visual C++ 2013, we need to link statically to the CRT in firefox.exe and mozglue.dll but dynamically to everything else.

Is this something that the build system already supports?  If not, can you please give us a rough estimate on how much work that's going to be and how it would work?

Thanks!
Flags: needinfo?(mh+mozilla)
Blocks: VC12
Add USE_STATIC_LIBS to any moz.build which needs to statically link the CRT.
(In reply to comment #1)
> Add USE_STATIC_LIBS to any moz.build which needs to statically link the CRT.

That should affect the directories where the code that goes into the module live though, right?  The big question in my mind is that if we have that exact directory/module boundary for us to be able to build things separately with different compiler flags...
I'm pretty sure USE_STATIC_LIBS on firefox.exe would break jemalloc integration.

(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #0)
> In order to be able to support Windows XP SP2 with Visual C++ 2013, we need
> to link statically to the CRT in firefox.exe and mozglue.dll but dynamically
> to everything else.

I don't see why that would make a difference. Is the static crt compatible with XP when the dynamic one isn't?
Flags: needinfo?(mh+mozilla)
https://software.intel.com/en-us/articles/linking-applications-using-visual-studio-2012-to-run-on-windows-xp

If it is a Console application, type in the Additional Options field:

/SUBSYSTEM:CONSOLE,"5.01"

if it is a Windows app, type in:

/SUBSYSTEM:WINDOWS,"5.01"

IOW, we should just need to add a version to WIN32_EXE_LDFLAGS in config/config.mk. I'm not sure how well that will play with what we do for jemalloc, though.
Isn't it bug 1001332? How bug 1001332 was useless to run the build on WinXP?
(In reply to Mike Hommey [:glandium] from comment #3)
> I'm pretty sure USE_STATIC_LIBS on firefox.exe would break jemalloc
> integration.

Hmm, do you mind expanding on that please?  And how can we fix it?

> (In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment
> #0)
> > In order to be able to support Windows XP SP2 with Visual C++ 2013, we need
> > to link statically to the CRT in firefox.exe and mozglue.dll but dynamically
> > to everything else.
> 
> I don't see why that would make a difference. Is the static crt compatible
> with XP when the dynamic one isn't?

The issue is that the MSVC runtime DLLs link against GetLogicalProcessorInformationEx which was first introduced in Windows XP SP3.  This means that any application that links against those DLLs will crash at startup due to the unresolved symbol on XP SP2.

Now, it turns out that this code gets called when loading the concurrency runtime component, which is not used in our code.  Therefore, the linker doesn't pull in that symbol when you link against the static version of the library.  So the idea is to link statically to the CRT in firefox.exe and mozglue.dll in order to give us a chance to install a runtime hook which will enable us to load the MSVC runtime DLLs from xul.dll, mozjs.dll, etc.
Flags: needinfo?(mh+mozilla)
(In reply to Masatoshi Kimura [:emk] from comment #6)
> Isn't it bug 1001332? How bug 1001332 was useless to run the build on WinXP?

Bug 1001332 is another thing that we need in order to run the binaries MSVC 2013 produces on older Windows releases.  These two bugs don't have anything with each other.
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #7)
> (In reply to Mike Hommey [:glandium] from comment #3)
> > I'm pretty sure USE_STATIC_LIBS on firefox.exe would break jemalloc
> > integration.
> 
> Hmm, do you mind expanding on that please?  And how can we fix it?

It relies on hacking the small static part of the dynamic crt. Building against the static crt would require an entirely different hack.

> > (In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment
> > #0)
> > > In order to be able to support Windows XP SP2 with Visual C++ 2013, we need
> > > to link statically to the CRT in firefox.exe and mozglue.dll but dynamically
> > > to everything else.
> > 
> > I don't see why that would make a difference. Is the static crt compatible
> > with XP when the dynamic one isn't?
> 
> The issue is that the MSVC runtime DLLs link against
> GetLogicalProcessorInformationEx which was first introduced in Windows XP
> SP3.  This means that any application that links against those DLLs will
> crash at startup due to the unresolved symbol on XP SP2.

According to http://msdn.microsoft.com/en-us/library/windows/desktop/dd405488%28v=vs.85%29.aspx the function you mention was added to windows 7. Not even vista. Are you sure it's the problematic symbol?

Also, did you actually try running a msvc2013 build with the patch from bug 1001332 on windows xp sp2?
Flags: needinfo?(mh+mozilla)
So, the function that msvcr120.dll is looking for is actually GetLogicalProcessorInformation (not Ex), and it was indeed added to XP SP3.

I don't see how statically linking firefox.exe and mozglue.dll to the CRT would solve this, though (also, linking it in two places seems wrong)
Mike, it solves it because firefox and mozglue don't end up depending on GetLogicaProcessorInformation. And then we dynamically patch the dynamic CRT import table to load correctly for the rest of our binaries.
Err, yes, it's GetLogicalProcessorInformation.  GetLogicalProcessorInformationEx is the function that the CRT calls dynamically just a few lines below, I just got the names wrong.

And yes, we have verified that we crash on XP SP2 without this workaround and do not crash with it.  This bug is not just wild guesses, it's the result of a ton of investigation (thanks to dmajor.)  He has a PoC for this.  And like I said in comment 8, bug 1001332 is also needed but is not enough and is not relevant to this bug, please let's keep these two issues separate.

You can see bug 998943 for some of our previous efforts and their implications.
(In reply to Mike Hommey [:glandium] from comment #9)
> (In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment
> #7)
> > (In reply to Mike Hommey [:glandium] from comment #3)
> > > I'm pretty sure USE_STATIC_LIBS on firefox.exe would break jemalloc
> > > integration.
> > 
> > Hmm, do you mind expanding on that please?  And how can we fix it?
> 
> It relies on hacking the small static part of the dynamic crt. Building
> against the static crt would require an entirely different hack.

Can you please give us more information on what this hack is, and how we can extend it to the static CRT?  We really need to make this bug actionable ASAP.  Thanks!
Flags: needinfo?(mh+mozilla)
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #12)
> You can see bug 998943 for some of our previous efforts and their
> implications.

"You are not authorized to access bug #998943." :(
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #13)
> Can you please give us more information on what this hack is, and how we can
> extend it to the static CRT?  We really need to make this bug actionable
> ASAP.  Thanks!

See big comment in mozglue/build/Makefile.in
Flags: needinfo?(mh+mozilla)
Discussed on IRC with glandium. We can save ourselves a lot of trouble by marking mozglue as delay-load. Then we would only need to static-link firefox.exe. Oddly enough, the exe doesn't use jemalloc, so there shouldn't be any problems there.

I've tried this with my franken-build PoC, and it seems to work.
That sounds good!  But we'd need to somehow ensure that we do this work before delay-loading mozglue.dll otherwise we'd get killed before getting a chance to install our hook.  Do you have any idea how we can enforce this going forward?  (Note that AFAIK we don't really test our builds on Windows XP SP2 regularly so if we end up breaking this, we may not know until it's too late.)
We could probably just MOZ_ASSERT(!GetModuleHandle("mozglue.dll")) on all platforms.
(In reply to David Major [:dmajor] from comment #18)
> We could probably just MOZ_ASSERT(!GetModuleHandle("mozglue.dll")) on all
> platforms.

Great idea!
Well, except GetModuleHandle only works on windows, so it can't be on all platforms. You probably meant on all versions of windows.
(In reply to Mike Hommey [:glandium] from comment #20)
> Well, except GetModuleHandle only works on windows, so it can't be on all
> platforms. You probably meant on all versions of windows.

Well, this only matters for Windows, so, yes!
Because searching in irc logs is not as convenient as reading bug comments: (from a week ago)

<glandium> dmajor: you can try your way around that by linking another xpcomglue
<glandium> dmajor: iirc we do have a static static xpcomglue
<glandium> as in static lib built for static linkage with the crt, as opposed to static lib built for dynamic linkage with the crt
<glandium> dmajor: xpcomglue_staticruntime
<dmajor> glandium: how do I tell the build to use that?
glandium> dmajor: the simplest albeit hackish is probably a $(subst) on LDFLAGS
glandium> or one of the variables that end up in LDFLAGS
<dmajor> glandium: wait, what exactly does "STATIC" mean in "USE_STATIC_LIBS"? is it referring to the CRT or to the glue?
<glandium> dmajor: crt
<glandium> dmajor: that said, USE_STATIC_LIBS also empties MOZ_GLUE_LDFLAGS
<dmajor> glandium: lines 30-32 are suspicious. why is it telling me no imports from mozglue.dll and then complaining about not finding the import?
<glandium> dmajor: cf. my last message
<glandium> so USE_STATIC_LIBS implies not linking mozglue
<glandium> fnu
<glandium> quite a puzzle
<glandium> the other problem is that the import lib for mozglue contains parts of the dynamic crt
<glandium> so, ultimately, seems to me we need to build another mozglue import lib that doesn't have that
<dmajor> oof. I don't think I want to own this :) if it was going to be a one line change, I'd have taken the bug...
<glandium> dmajor: yeah, i think that should be owned by a build system peer
<glandium> and most likely, that would be me
<dmajor> glandium: my condolences :\
Personally, I don't feel this is worth the effort. Just warn users that XP SP2 support is going to be dropped.
Turns out this is not as difficult as previously thought. We just need to use these LIBS:

-	$(XPCOM_STANDALONE_GLUE_LDOPTS) \
+	$(DEPTH)/mozglue/build/$(LIB_PREFIX)mozglue.$(IMPORT_LIB_SUFFIX) \
+	$(XPCOM_STANDALONE_STATICRUNTIME_GLUE_LDOPTS) \
I went through my install directory looking for other executables that depend on the CRT. In addition to firefox.exe, we'll also need static links and/or delay loads for: crashreporter.exe, plugin-container.exe, plugin-hang-ui.exe.

Plus we'll need to do import patching in any exe that does a dynamic load of our core DLLs. I assume that a grep for XPCOMGlueStartup catches all of them?

webapprt/win/webapprt.cpp
246 	rv = XPCOMGlueStartup(xpcomDllPath);
xulrunner/app/nsXULRunnerApp.cpp
207 	rv = XPCOMGlueStartup(exePath);
xulrunner/stub/nsXULStub.cpp
386 	rv = XPCOMGlueStartup(greDir);
Assignee: nobody → dmajor
Attachment #8467588 - Flags: review?(mh+mozilla)
Had to update for the recent LIBS changes. More binaries to come if this one is good.
Attachment #8467589 - Flags: review?(mh+mozilla)
Comment on attachment 8467588 [details] [diff] [review]
Part 1: Delayload mozglue.dll from firefox.exe

Review of attachment 8467588 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/app/moz.build
@@ +31,5 @@
>  ]
>  
> +DELAYLOAD_DLLS += [
> +    'mozglue.dll',
> +]

What about the assert that it's loaded? Also, can you put all the things from this bug together in the file?
Attachment #8467588 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8467589 [details] [diff] [review]
Part 2: Static-link the CRT into firefox.exe

Review of attachment 8467589 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/app/moz.build
@@ +54,5 @@
>  # Set it to 256k.  See bug 127069.
>  if CONFIG['OS_ARCH'] == 'WINNT' and not CONFIG['GNU_CC']:
>      LDFLAGS += ['/HEAP:0x40000']
>  
> +# On Windows we USE_STATIC_LIBS.

See comment to other part ; that would make this comment unnecessary.
Attachment #8467589 - Flags: review?(mh+mozilla) → review+
BTW, how about plugin-container and webapprt?
I will do the other binaries in another patch in this bug. Wanted to avoid wasting time if the first approach was wrong. I plan on putting the asserts right next to the import hook (an upcoming piece) so that they can be shared by all affected binaries.
I learned something disturbing while testing xulrunner and B2G Desktop. The CRT makes some malloc calls during its initialization, before main. If an executable contains user code that calls malloc (e.g. xulrunner.exe) then the linker substitutes mozglue's malloc, and the internal CRT calls are also subject to that substitution. That means mozglue gets loaded before main, and we don't have an opportunity to set up hooks. If an executable doesn't explicitly touch malloc (e.g. firefox.exe) then the linker makes no substitutions, and the internal CRT calls go to NT HeapAlloc.

What this means for xulrunner.exe and b2g.exe is either they get rid of those mallocs or they don't get SP2 support. I think we can save this for followup bugs.

What this means for firefox.exe is that the hook trick only works because firefox just-so-happens not to call malloc. That's pretty fragile. I will add assertions that this doesn't change going forward, but that may be pretty restrictive if we ever want to add more code to the exe.
I mentioned this on IRC before, but for comparison, /arch:ia32 is a simple compiler switch and saves the user from an expensive hardware upgrade, while installing a service pack is pretty simple in comparison. XP SP3 requiring WGA is a myth BTW.
And FYI, I was against adding hacks to support XP SP2 even before comment 32 was posted.
I've combined the previous two patches into one, as requested. Carrying forward r=glandium.
Attachment #8467588 - Attachment is obsolete: true
Attachment #8467589 - Attachment is obsolete: true
Attachment #8475707 - Flags: review+
Tim, can you foresee any problems with building a separate, static-CRT version of the sandbox lib?
Attachment #8475709 - Flags: review?(mh+mozilla)
Attachment #8475709 - Flags: feedback?(tabraldes)
Comment on attachment 8475709 [details] [diff] [review]
Part 2: Static link for plugin-container.exe

Review of attachment 8475709 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/app/moz.build
@@ +63,5 @@
>  if CONFIG['OS_ARCH'] == 'WINNT' and not CONFIG['GNU_CC']:
>      LDFLAGS += ['/HEAP:0x40000']
>  
> +#Warning: C4273 in xutility: 'std::moz_Xinvalid_argument' : inconsistent dll linkage
> +#FAIL_ON_WARNINGS = True

This is unfortunate. Any idea how I can get around it, or should I give up and disable for WINNT?
Attachment #8475710 - Flags: review?(mh+mozilla)
Attachment #8475711 - Flags: review?(ted)
Attachment #8475712 - Flags: review?(benjamin)
With the changes above, I can run every .exe in the browser package on my SP2 VM.

Here is the bloat from the static CRTs:
crashreporter.exe:    104 KB -> 273 KB
firefox.exe:          274 KB -> 373 KB
plugin-container.exe: 128 KB -> 258 KB
plugin-hang-ui.exe:    21 KB -> 162 KB
webapprt-stub.exe:    120 KB -> 123 KB

Total growth: 542 KB
Attachment #8475712 - Flags: review?(benjamin) → review+
(In reply to David Major [:dmajor] from comment #41)
> With the changes above, I can run every .exe in the browser package on my
> SP2 VM.
> 
> Here is the bloat from the static CRTs:
> crashreporter.exe:    104 KB -> 273 KB
> firefox.exe:          274 KB -> 373 KB
> plugin-container.exe: 128 KB -> 258 KB
> plugin-hang-ui.exe:    21 KB -> 162 KB
> webapprt-stub.exe:    120 KB -> 123 KB
> 
> Total growth: 542 KB

That's pretty minor, I think our biggest concern here is compressed size since it effects the cost of distribution. These changes probably compress down pretty well, when you get a chance can you zip them up and compare to the same without the change? (Or just build an installer and compare the resulting exes?)
Comment on attachment 8475711 [details] [diff] [review]
Part 4: Static link for crashreporter.exe

Review of attachment 8475711 [details] [diff] [review]:
-----------------------------------------------------------------

It's depressing to me how many copies of Breakpad libs we need to build, but you gotta do what you gotta do...
Attachment #8475711 - Flags: review?(ted) → review+
Attachment #8475709 - Flags: review?(mh+mozilla) → review+
Attachment #8475710 - Flags: review?(mh+mozilla) → review+
(In reply to Jim Mathies [:jimm] from comment #42)
> That's pretty minor, I think our biggest concern here is compressed size
> since it effects the cost of distribution. These changes probably compress
> down pretty well, when you get a chance can you zip them up and compare to
> the same without the change? (Or just build an installer and compare the
> resulting exes?)

A bit under 300k:

08/26/2014  01:47 PM        45,053,020 firefox-34.0a1.en-US.win32.zip
08/26/2014  02:08 PM        45,345,950 firefox-34.0a1.en-US.win32.zip
(In reply to David Major [:dmajor] from comment #36)
> Created attachment 8475709 [details] [diff] [review]
> Part 2: Static link for plugin-container.exe
> 
> Tim, can you foresee any problems with building a separate, static-CRT
> version of the sandbox lib?

Sorry for the delay on this. I don't foresee any problems with what we're doing here, but I'd really like us to test at least the OpenH264 plugin before landing.

Steps to test:
  1) Build Nightly with the patches in this bug
  2) Verify that the OpenH264 plugin has been installed already (check in Addons Manager -> Plugins). If not, wait a few minutes and it should install itself.
  3) Go to http://mozilla.github.io/webrtc-landing/pc_test.html
  4) Check "Require H.264 video" and press "Start"
  5) Allow sharing of devices when prompted
You should see a bunch of debug output that I don't understand, followed by "HIP HIP HOORAY"
Also, plugin-container.exe should be running
  6) Leave open for >30s just in case (there is a 30s timeout that kills plugin-container.exe if firefox.exe has been unable to communicate with it)
Attachment #8475709 - Flags: feedback?(tabraldes) → feedback+
Thanks for the tip! In hindsight it seems obvious now, but it took me a while to work out that you need a machine with both a camera and microphone for the test to run properly. The test case seems to be fine with this stack of patches.
As a compromise, I decided to file bug 1059840 to infobar XP SP2 users.
Comment on attachment 8475712 [details] [diff] [review]
Part 5: Import patching and assertions

>+        RVAPtr<IMAGE_IMPORT_BY_NAME> import(module, thunk->u1.AddressOfData);
>+        if (!strcmp(import->Name, "GetLogicalProcessorInformation")) {
How do you get this to compile? My compiler says you can't pass an unsigned char[1] to strcmp.
CHAR is signed.
(In reply to David Major [:dmajor] from comment #51)
> CHAR is signed.

That's not true on all platforms.
I meant in this context. That file only compiles on Windows. I didn't see any issues on my machine or on the builders. If this has broken anyone's build, please paste the error and compiler version.
(In reply to David Major from comment #51)
> CHAR is signed.

Sure, but BYTE isn't. And my Windows SDK 7.1 WinNT.h has:

typedef struct _IMAGE_IMPORT_BY_NAME {
    WORD    Hint;
    BYTE    Name[1];
} IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;
Blocks: 1060848
Aha. Sounds like the SDKs disagree on the type. Sorry for the bustage.
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab03252f744e
Blocks: 1060897
(In reply to David Major from comment #55)
> Aha. Sounds like the SDKs disagree on the type. Sorry for the bustage.
> https://hg.mozilla.org/integration/mozilla-inbound/rev/ab03252f744e

Thanks for following up on this!
Depends on: 1080388
Blocks: 1060890
Depends on: 1136775
No longer depends on: 1136775
Depends on: 1137609
Depends on: 1175039
Blocks: 1236931
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: