The default bug view has changed. See this FAQ.

Need to figure out VC8 CRT issues with the XULRunner stub (and with embedding)

RESOLVED FIXED in mozilla1.9alpha1

Status

Toolkit Graveyard
XULRunner
RESOLVED FIXED
11 years ago
a year ago

People

(Reporter: bsmedberg, Assigned: ted)

Tracking

(Blocks: 1 bug)

unspecified
mozilla1.9alpha1
x86
Windows XP
Dependency tree / graph

Details

Attachments

(3 attachments, 5 obsolete attachments)

(Reporter)

Description

11 years ago
Followup to bug 334528: even after that lands, the XULRunner stub can't load gecko correctly. The situation is something like this:

c:\Program Files\MyApplication\xulrunner-stub.exe, when launched, tries to dynamically load
c:\Program Files\Mozilla\XULRunner 1.9a1\xul.dll (and xpcom.dll, and other things). But because the CRT is at c:\Program Files\Mozilla\XULRunner 1.9a1 and not at c:\Program Files\MyApplication, the loader doesn't find it and the app doesn't start.

Ted, I need your help with this one. Is there any way to embed a manifest in xul.dll (and nspr4.dll and the other DLLs in question) which tells the loader to look in the same directory as the DLL for the CRT assembly, instead of looking in the application directory? I'd like to avoid the Activation Context goop that we did for the installer in bug 328579
(Assignee)

Comment 1

11 years ago
AIUI, if you embed a manifest at resource ID 2, the system will look for the CRT assembly in the same directory as the DLL.  As long as you don't intend to use these DLLs in any other fashion, that's ok.

The problem we had with the installer bug was that we used the same DLLs in two different fashions: once for the installer, where they were extracted to a temp dir with the CRT manifest/assemblies in the same directory, and then again in appdir/components/, where the CRT was in appdir/.

Reference: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sbscs/setup/searching_for_assembly_files.asp (specifically the last paragraph)
(Assignee)

Comment 2

11 years ago
I forgot to mention, the patch from bug 328579 gives you the ability to say:
EMBED_MANIFEST_AT = 2

In your Makefile.in before you include rules.mk.
(Reporter)

Comment 3

11 years ago
No component libraries are involved, only library DLLS that would live next to the CRT.
(Assignee)

Comment 4

11 years ago
Then you can do this safely by embedding the manifest at ID 2, and XULRunner app authors including binary components will have to make sure not to embed a manifest at all in their components.
(Reporter)

Comment 5

11 years ago
What about NSPR]NSS? Do they currently embed a manifest?
(Assignee)

Comment 6

11 years ago
Nope.  Currently, EXEs get a manifest at resource ID 1, and DLLs don't get a manifest at all unless they ask for it with that Makefile define.  The only places that happens right now are in XPInstall and MSAA (from bug 338807 ).

http://lxr.mozilla.org/seamonkey/search?string=EMBED_MANIFEST_AT
(Assignee)

Comment 7

11 years ago
Oh, one more note:  I believe that once one DLL gets loaded with an activation context, then that becomes the current activation context, so any other DLLs loaded without one will use the existing one.  If you can guarantee that xul.dll or xpcom.dll or whatever will get loaded first, then I think you can get away with just embedding the manifest in that one DLL.

I could be off base, but it's easy enough to test.  Just use mt.exe from the commandline and embed a manifest in your dll, and see if it works.
(Assignee)

Comment 8

11 years ago
Additionally, if I'm wrong on that last point, and you do need to embed a manifest in every single DLL to make this work, and getting that into NSPR/NSS is a problem, I think copying the activation context magic from xpinstall wouldn't be too bad.  You would basically need to stick this block of code: (modified a bit)
http://lxr.mozilla.org/seamonkey/source/toolkit/mozapps/installer/windows/wizard/setup/xpi.c#132
Right before here:
http://lxr.mozilla.org/seamonkey/source/xulrunner/stub/nsXULStub.cpp#269
(Reporter)

Comment 9

11 years ago
Yeah... I'd prefer a solution that didn't require that, because that require every embedder to do it, as well as every future version of the stub. I'm going to see what we can do to get the manifests embedded in NSPR, because everything else depends on the NSPR libs being loaded first.
(In reply to comment #9)
>  I'm going
> to see what we can do to get the manifests embedded in NSPR, because everything
> else depends on the NSPR libs being loaded first.
> 

Benja;in, did you progress on that ?
(Reporter)

Comment 11

11 years ago
No, it's P3 which means that it's well below my radar right now.
Seems that NSS does now embed a manifest. See bug 353475.
Embedding a manifest in nspr4.dll simply reduced the number of:

  This application has failed to start because MSVCR80D.dll was not found.
  Re-installing the application may fix this problem.

popups from six to five (non-libxul build). All six popups are eliminated when manifests are embedded in each of:

  nsprpub/lib/ds/plds4.dll
  nsprpub/lib/libc/src/plc4.dll
  nsprpub/pr/src/nspr4.dll
  toolkit/library/xul.dll
  xpcom/build/xpcom_core.dll
  xpcom/stub/xpcom.dll

but then DLLs from the xulrunner/components directory fail to load. I get lots of:

  nsNativeModuleLoader::LoadModule("...\xulrunner\components\caps.dll") - load
  FAILED, rv: 80004003, error:
          error 126

for DLLs such as caps.dll, necko.dll, chrome.dll and tkitcmps.dll before the process dies. Embedding a manifest at resource ID 2 in caps.dll seems to get it to load even though the CRT isn't beside that DLL. I'm not sure how to proceed other than to embed a manifest in all DLLs the app can load. Thoughts?
I believe that http://msdn.microsoft.com/library/en-us/sbscs/setup/using_side-by-side_assemblies_as_a_resource.asp confirms my observations that this isn't as simple as embedding a manifest once in nspr4.dll (only embedding at resource ID 1 sets the process default, but embedding at resource ID 1 isn't valid for DLLs).
Doing as Ted suggests in comment 8 doesn't seem to be enough. It appears to get all the DLLs in the xulrunner directory loading correctly, but all the DLLs in xulrunner/components fail.
Programmatically making the xulrunner directory the current directory as well as doing the activation context stuff seems to fix things. DLLs from appdir/components seem to load fine too. I'm going to play with this some more and then I'll attach a patch.
(Reporter)

Comment 17

11 years ago
You must not change the current directory. It will break all kinds of commandline handling behaviors.
Created attachment 244810 [details] [diff] [review]
hack

In that case I won't bother trying to make the setting of the directory cross platform. Here's what I have, nevertheless. I embedded the manifest in xpcom.dll instead of nspr4.dll simply because I couldn't figure out how to use the build system of the latter to do the embedding.
(Assignee)

Comment 19

11 years ago
Comment on attachment 244810 [details] [diff] [review]
hack


>+  // change the current directory to the GRE directory
>+  SetCurrentDirectory(greDir);
>+  // now put XPCOM_DLL back on the end of greDir
>+  if (lastSlash) {
>+    *lastSlash = PATH_SEPARATOR_CHAR;
>+  }
>+  ACTCTXA actctx;
>+  HANDLE hActCtx = INVALID_HANDLE_VALUE;
>+  ULONG_PTR ulpActivationCookie;
>+
>+  /* Windows XP + Visual C++ 8 requires proper
>+   * Side by Side configuration, which means
>+   * we have to have a manifest to create an
>+   * Activation Context so we can locate the
>+   * VC8 runtime.
>+   */
>+  memset(&actctx, 0, sizeof(actctx));
>+  actctx.cbSize = sizeof(actctx);
>+  actctx.lpSource = (LPCSTR)greDir;
>+  actctx.lpResourceName = MAKEINTRESOURCE(17);

I think instead of using SetCurrentDirectory, you may be able to do like:

actctx.lpAssemblyDirectory = greDir;
actctx.dwFlags = ACTCTX_FLAG_RESOURCE_NAME_VALID | ACTCTX_FLAG_ASSEMBLY_DIRECTORY_VALID;

Can you give that a try?
(Assignee)

Comment 20

11 years ago
Of course that code isn't completely correct, you'll probably need to strcpy the greDir where you do the SetCurrentDirectory there.
Still the failures trying to load anything from xulrunner/components I'm afraid.
(Assignee)

Comment 22

11 years ago
Just for completeness, can you post your updated patch you tried?
Created attachment 244824 [details] [diff] [review]
hack with lpAssemblyDirectory instead of SetCurrentDirectory (components fail to load)

Sure, best to check I'm not doing something silly. ;-) Stepping through the code in the debugger everything looks fine (paths look good). However, at this point in the code:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/nsprpub/pr/src/linking/prlink.c&rev=3.51.2.37&mark=895,908#892

although the absolute path to the DLL in wname looks fine, h gets set to NULL with error 126 (ERROR_MOD_NOT_FOUND). See

http://msdn.microsoft.com/library/en-us/debug/base/system_error_codes__0-499_.asp

Adding the SetCurrentDirectory change "fixes" things so that the DLLs in xulrunner/components load again. Since the ActivateActCtx call fixes the loading of DLLs from the xulrunner directory, I'm not sure what the problem with the DLLs in the components directory is. I don't believe they're failing because the CRT isn't found though. I think it must be something else.
Ah. If I replace SetCurrentDirectory with SetDllDirectory, things work too. This is even though I pass SetDllDirectory the xulrunner directory, and not the xulrunner/components directory.
(Assignee)

Comment 25

11 years ago
You are running on WinXP, right?
Note that to build with SetDllDirectory you need to add the following line before the windows.h include:

#define _WIN32_WINNT 0x0502
WinXP SP2, right.
(Reporter)

Comment 28

11 years ago
Since SetDllDirectory is not available in win2k, we'd need to dynamically load it (but that's ok, since we don't have this problem except in winxp+). We could even do it from the app itself, so that we don't have to change the stub.
Great. It would be really nice to get a handle on _why_ the above attempts work/fail first though. :-)
(Assignee)

Comment 30

11 years ago
What I assume is that the CRT DLLs are not in the search path unless a) the dll you're trying to load is in that same dir, or b) you've called SetDllDirectory.

However, the more I read about the SxS stuff, the less I understand.

It looks like my suggestion from comment #19 has no effect, so you can just remove that and use your SetDllDirectory fix along with the original activation context magic, and things should work.

I'm not convinced that finding the CRT DLLs is the problem. Maybe it is, but then I'd have expected similar errors loading the DLLs under the components directory as we get when loading the DLLs under the xulrunner directory without the activation context stuff. I'll investigate further tomorrow.

I certainly sympathize with your feelings on the SxS stuff. I can't get anything more than a fuzzy grip on how this stuff works from my reading of the docs.

Regarding your suggestion in comment 19, I think whether it works or not, it's the right thing to do so I'd rather leave it in.

Benjamin, assuming doing the activation and dll path stuff is all good, where do you want the various pieces put? I was thinking putting them in XPCOMGlueStartup should get things working and save embedders from having to know about this. That files doesn't reach out to system.dll right now, but hopefully making it do so isn't a problem.
Created attachment 245081 [details] [diff] [review]
current diff (activation context + SetDllDirectory)

A couple of people have mailed me privately to ask for my current diff. It's not hard to derive it from the comments above, but I might as well attach it I guess.
Attachment #244810 - Attachment is obsolete: true
Attachment #244824 - Attachment is obsolete: true
Gah! So because the stub is inexplicably broken even for people with the CRT in WinSxS, when I got it to work on my machine (which has the CRT in WinSxS) I forgot to test on a machine without Visual Studio installed. My current efforts only manage to get the CRT in WinSxS recognized. Even though it's all about setting search paths/activation contexts to the xulrunner directory! The CRT bundled with XULRunner still isn't found, so this doesn't fix things for the general populous. Now I'm even more confused than before.

Anyway, this hack allows our guys to continue with their work on their VS installed machines for now, and I have too many other pressing things to look into this further. If someone has ideas or insight I'll happily run tests for you, or if someone wants to pick this up and run with it, feel free.

Comment 34

10 years ago
Is there anything new to add on this? Will this need to be solved in order to build Firefox running on XULRunner?
(Assignee)

Updated

10 years ago
Assignee: nobody → ted.mielczarek
(Assignee)

Updated

10 years ago
Blocks: 371359, 373047

Comment 35

10 years ago
I'm doing accessibility work for a xulrunner based app and wanted to take 1.9a4 for a spin to see what's been fixed/broken.  Unfortunately, I immediately received the R6034 error.  Is there an update to the target milestone (still listed as 1.9a1)?  And doesn't this qualify for 'Blocker' or at least 'Critical' severity since the application can't run at all?
(Assignee)

Comment 36

10 years ago
Ok, I've done some investigation.  XULRunner on windows doesn't ship any binary components in components/.  Therefore, if we just embed a manifest in all of our DLLs, we should be in good shape.  I tested this with the JavaXPCOM tests and the XR stub.  I will have to patch NSPR to ensure that it gets manifests in everything.  XR app authors and extension authors will have to statically link to the CRT in any components they author, but I think this is a workable situation.

(Assignee)

Comment 37

10 years ago
Created attachment 262548 [details] [diff] [review]
embed manifests in everything
Attachment #245081 - Attachment is obsolete: true
Attachment #262548 - Flags: review?(benjamin)
(Assignee)

Comment 38

10 years ago
Created attachment 262549 [details] [diff] [review]
embed manifests in nspr libs

And for NSPR as well.
Attachment #262549 - Flags: review?
(Assignee)

Updated

10 years ago
Attachment #262549 - Flags: review? → review?(wtc)
(Reporter)

Updated

10 years ago
Attachment #262548 - Flags: review?(benjamin) → review+

Comment 39

10 years ago
Comment on attachment 262549 [details] [diff] [review]
embed manifests in nspr libs

This patch looks okay.  I'd prefer a patch that's similar to the NSS patch, which uses a variable named MT whose value is mt.exe.
(Assignee)

Comment 40

10 years ago
Created attachment 267580 [details] [diff] [review]
embed manifests in nspr libs, use $(MT)

Is this better?
Attachment #262549 - Attachment is obsolete: true
Attachment #267580 - Flags: review?(wtc)
Attachment #262549 - Flags: review?(wtc)

Comment 41

10 years ago
Comment on attachment 267580 [details] [diff] [review]
embed manifests in nspr libs, use $(MT)

r=wtc.
Attachment #267580 - Flags: review?(wtc) → review+

Comment 42

10 years ago
Comment on attachment 267580 [details] [diff] [review]
embed manifests in nspr libs, use $(MT)

Ted, just curious: what do the ;1 and ;2 at the end of the
-OUTPUTRESOURCE option mean?  I noticed that
mozilla/xpinstall/stub/Makefile.in were using ;17 before.
(Assignee)

Comment 43

10 years ago
That's the resource ID where the manifest is embedded.  Windows expects to read manifests at ID 1 for executables, and 2 for DLLs (usually, although you can do other things).  Anything above 16 is left for user-defined activities.  We were using 17 because we had some code in xpistub to manually create an activation context and load the manifest into it.  Since we're no longer using the xpinstaller, we don't need that anymore.  If anyone still needed it, they could make it load from resource ID 2 anyway.

Thanks for the review!
(Assignee)

Comment 44

10 years ago
Comment on attachment 262548 [details] [diff] [review]
embed manifests in everything

Checked in on trunk.

Comment 45

10 years ago
Created attachment 267800 [details] [diff] [review]
embed manifests in nspr libs, use $(MT) (as checked in)

I made some minor changes.  I've checked in this patch on the
NSPR trunk (NSPR 4.7) and will check this in on the
NSPRPUB_PRE_4_2_CLIENT_BRANCH when the Mozilla trunk reopens.
Attachment #267580 - Attachment is obsolete: true

Comment 46

10 years ago
Created attachment 267803 [details] [diff] [review]
embed manifests in nspr tests

We missed the NSPR test programs, which are built with a
rule in mozilla/nsprpub/pr/tests/Makefile.in.  This patch
takes care of that makefile rule.

Comment 47

10 years ago
Comment on attachment 267803 [details] [diff] [review]
embed manifests in nspr tests

Note that I use -OUTPUTRESOURCE:$@\;1 to build the NSPR
test programs.
Attachment #267803 - Flags: review?(ted.mielczarek)
(Assignee)

Updated

10 years ago
Attachment #267803 - Flags: review?(ted.mielczarek) → review+

Comment 48

10 years ago
I heavily suspect this has caused SeaMonkey Ts to increase from ~760ms to ~880ms on Windows (see http://build-graphs.mozilla.org/graph/query.cgi?testname=startup&units=ms&tbox=sea-win32-tbox&autoscale=1&days=7&avg=1 )

We currently have no test data for Firefox, but one reason for that may again be this checkin (that is the one in comment #44) as suggested in bug 383875

Comment 49

10 years ago
Sorry for yet another comment, but that SeaMonkey perf data is backed by the other Windows tinderbox in SeaMonkey-Ports which went from ~2200ms to ~2600 on Ts, while the Mac box there and the Linux both on the SeaMonkey tinderbox page all stayed unchanged. Also other perf data than Ts stayed unchanged.

So overall, we have a 15-20% Windows-only Ts regression for SeaMonkey.
I backed out attachment 262548 [details] [diff] [review] to see if it would fix the perf test machine bustage that's keeping the tree closed (bl-bldxp01). It cycled once, and still crashed. I've triggered a clobber to see if that will fix it, but won't be able to stay around until we get the results (the cycle time is at least a couple of hours).
And while the backout didn't make bl-bldxp01 happy, it did drop Ts on sea-win32-tbox from 891 the run before to 750 the run after, so more than "heavily suspect."

Comment 52

10 years ago
The backout seems to fixed Thunderbird trunk also.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a6pre) Gecko/20070610 Thunderbird/3.0a1pre ID:2007061003 [cairo]
Starts and runs fine here, winxp sp2 (previously crashed on startup)
(Assignee)

Comment 53

10 years ago
Ok, so there are two separate issues here, both of which suck.  There's some sort of startup crash, and there's a Ts hit.  I'm not entirely surprised by a Ts hit, since the DLL loader probably has to parse the manifest out of every DLL now.  It'd be interesting to compare the effect on Firefox since it's a static build.  The crash is strange, I was able to reproduce it in a VM, and I submitted a crash report here:
https://crash-reports.mozilla.com/reports/report/index/27a59de6-1758-11dc-a54e-001a4bd43ed6

It's crashing in nsFrame, but lower in the stack it looks like we're doing XPCOM shutdown.  I don't understand that.

Comment 54

10 years ago
With regards to NSPR, the patches from this bug are the only difference between HEAD and NSPRPUB_PRE_4_2_CLIENT_BRANCH (does not yet have the patch).

Note that we would like to obsolete the NSPRPUB_PRE_4_2_CLIENT_BRANCH, see bug 387992 and switch to snapshots of HEAD.

Is it ok for Mozilla trunk to pick up the changes from this bug, as checked in to NSPR HEAD?
(Reporter)

Comment 55

10 years ago
That should be fine. If it's not, the tinderbox will let us know ;-)
I think this is wanted-1.9, right? Remove it if not.
Whiteboard: [wanted-1.9]
Actually, let's see if drivers feel that this is blocking-1.9.
Flags: blocking1.9?
Whiteboard: [wanted-1.9]
(Reporter)

Comment 58

10 years ago
Well, I'd like for ted to try and reland it.
(Assignee)

Comment 59

10 years ago
Comment on attachment 262548 [details] [diff] [review]
embed manifests in everything

Ok, I've rebuilt with this patch applied, and tested the resulting build on Vista and XPSP2 with no problems.  I'd like to check this back in.
Attachment #262548 - Flags: approval1.9?
(Reporter)

Updated

10 years ago
Attachment #262548 - Flags: approval1.9? → approval1.9+
(Assignee)

Comment 60

10 years ago
Checked in.  Hopefully it sticks this time!
(Assignee)

Updated

10 years ago
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
Flags: blocking1.9?

Comment 61

10 years ago
SeaMonkey Ts regressed again, like in comment #48 and comment #49 - I filed that as bug 350616 so that it's tracked.
I guess SeaMonkey will need to switch to libxul to fix that again, which is a target anyways.
(Assignee)

Updated

10 years ago
Depends on: 394190

Comment 62

10 years ago
Umm, sorry, wrong bug number. The Ts regression is bug 394470, actually.
Product: Toolkit → Toolkit Graveyard
You need to log in before you can comment on or make changes to this bug.