Closed Bug 350616 Opened 18 years ago Closed 17 years ago

Need to figure out VC8 CRT issues with the XULRunner stub (and with embedding)

Categories

(Toolkit Graveyard :: XULRunner, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.9alpha1

People

(Reporter: benjamin, Assigned: ted)

References

Details

Attachments

(3 files, 5 obsolete files)

Followup to bug 334528: even after that lands, the XULRunner stub can't load gecko correctly. The situation is something like this:

c:\Program Files\MyApplication\xulrunner-stub.exe, when launched, tries to dynamically load
c:\Program Files\Mozilla\XULRunner 1.9a1\xul.dll (and xpcom.dll, and other things). But because the CRT is at c:\Program Files\Mozilla\XULRunner 1.9a1 and not at c:\Program Files\MyApplication, the loader doesn't find it and the app doesn't start.

Ted, I need your help with this one. Is there any way to embed a manifest in xul.dll (and nspr4.dll and the other DLLs in question) which tells the loader to look in the same directory as the DLL for the CRT assembly, instead of looking in the application directory? I'd like to avoid the Activation Context goop that we did for the installer in bug 328579
AIUI, if you embed a manifest at resource ID 2, the system will look for the CRT assembly in the same directory as the DLL.  As long as you don't intend to use these DLLs in any other fashion, that's ok.

The problem we had with the installer bug was that we used the same DLLs in two different fashions: once for the installer, where they were extracted to a temp dir with the CRT manifest/assemblies in the same directory, and then again in appdir/components/, where the CRT was in appdir/.

Reference: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sbscs/setup/searching_for_assembly_files.asp (specifically the last paragraph)
I forgot to mention, the patch from bug 328579 gives you the ability to say:
EMBED_MANIFEST_AT = 2

In your Makefile.in before you include rules.mk.
No component libraries are involved, only library DLLS that would live next to the CRT.
Then you can do this safely by embedding the manifest at ID 2, and XULRunner app authors including binary components will have to make sure not to embed a manifest at all in their components.
What about NSPR]NSS? Do they currently embed a manifest?
Nope.  Currently, EXEs get a manifest at resource ID 1, and DLLs don't get a manifest at all unless they ask for it with that Makefile define.  The only places that happens right now are in XPInstall and MSAA (from bug 338807 ).

http://lxr.mozilla.org/seamonkey/search?string=EMBED_MANIFEST_AT
Oh, one more note:  I believe that once one DLL gets loaded with an activation context, then that becomes the current activation context, so any other DLLs loaded without one will use the existing one.  If you can guarantee that xul.dll or xpcom.dll or whatever will get loaded first, then I think you can get away with just embedding the manifest in that one DLL.

I could be off base, but it's easy enough to test.  Just use mt.exe from the commandline and embed a manifest in your dll, and see if it works.
Additionally, if I'm wrong on that last point, and you do need to embed a manifest in every single DLL to make this work, and getting that into NSPR/NSS is a problem, I think copying the activation context magic from xpinstall wouldn't be too bad.  You would basically need to stick this block of code: (modified a bit)
http://lxr.mozilla.org/seamonkey/source/toolkit/mozapps/installer/windows/wizard/setup/xpi.c#132
Right before here:
http://lxr.mozilla.org/seamonkey/source/xulrunner/stub/nsXULStub.cpp#269
Yeah... I'd prefer a solution that didn't require that, because that require every embedder to do it, as well as every future version of the stub. I'm going to see what we can do to get the manifests embedded in NSPR, because everything else depends on the NSPR libs being loaded first.
(In reply to comment #9)
>  I'm going
> to see what we can do to get the manifests embedded in NSPR, because everything
> else depends on the NSPR libs being loaded first.
> 

Benja;in, did you progress on that ?
No, it's P3 which means that it's well below my radar right now.
Seems that NSS does now embed a manifest. See bug 353475.
Embedding a manifest in nspr4.dll simply reduced the number of:

  This application has failed to start because MSVCR80D.dll was not found.
  Re-installing the application may fix this problem.

popups from six to five (non-libxul build). All six popups are eliminated when manifests are embedded in each of:

  nsprpub/lib/ds/plds4.dll
  nsprpub/lib/libc/src/plc4.dll
  nsprpub/pr/src/nspr4.dll
  toolkit/library/xul.dll
  xpcom/build/xpcom_core.dll
  xpcom/stub/xpcom.dll

but then DLLs from the xulrunner/components directory fail to load. I get lots of:

  nsNativeModuleLoader::LoadModule("...\xulrunner\components\caps.dll") - load
  FAILED, rv: 80004003, error:
          error 126

for DLLs such as caps.dll, necko.dll, chrome.dll and tkitcmps.dll before the process dies. Embedding a manifest at resource ID 2 in caps.dll seems to get it to load even though the CRT isn't beside that DLL. I'm not sure how to proceed other than to embed a manifest in all DLLs the app can load. Thoughts?
I believe that http://msdn.microsoft.com/library/en-us/sbscs/setup/using_side-by-side_assemblies_as_a_resource.asp confirms my observations that this isn't as simple as embedding a manifest once in nspr4.dll (only embedding at resource ID 1 sets the process default, but embedding at resource ID 1 isn't valid for DLLs).
Doing as Ted suggests in comment 8 doesn't seem to be enough. It appears to get all the DLLs in the xulrunner directory loading correctly, but all the DLLs in xulrunner/components fail.
Programmatically making the xulrunner directory the current directory as well as doing the activation context stuff seems to fix things. DLLs from appdir/components seem to load fine too. I'm going to play with this some more and then I'll attach a patch.
You must not change the current directory. It will break all kinds of commandline handling behaviors.
Attached patch hack (obsolete) — Splinter Review
In that case I won't bother trying to make the setting of the directory cross platform. Here's what I have, nevertheless. I embedded the manifest in xpcom.dll instead of nspr4.dll simply because I couldn't figure out how to use the build system of the latter to do the embedding.
Comment on attachment 244810 [details] [diff] [review]
hack


>+  // change the current directory to the GRE directory
>+  SetCurrentDirectory(greDir);
>+  // now put XPCOM_DLL back on the end of greDir
>+  if (lastSlash) {
>+    *lastSlash = PATH_SEPARATOR_CHAR;
>+  }
>+  ACTCTXA actctx;
>+  HANDLE hActCtx = INVALID_HANDLE_VALUE;
>+  ULONG_PTR ulpActivationCookie;
>+
>+  /* Windows XP + Visual C++ 8 requires proper
>+   * Side by Side configuration, which means
>+   * we have to have a manifest to create an
>+   * Activation Context so we can locate the
>+   * VC8 runtime.
>+   */
>+  memset(&actctx, 0, sizeof(actctx));
>+  actctx.cbSize = sizeof(actctx);
>+  actctx.lpSource = (LPCSTR)greDir;
>+  actctx.lpResourceName = MAKEINTRESOURCE(17);

I think instead of using SetCurrentDirectory, you may be able to do like:

actctx.lpAssemblyDirectory = greDir;
actctx.dwFlags = ACTCTX_FLAG_RESOURCE_NAME_VALID | ACTCTX_FLAG_ASSEMBLY_DIRECTORY_VALID;

Can you give that a try?
Of course that code isn't completely correct, you'll probably need to strcpy the greDir where you do the SetCurrentDirectory there.
Still the failures trying to load anything from xulrunner/components I'm afraid.
Just for completeness, can you post your updated patch you tried?
Sure, best to check I'm not doing something silly. ;-) Stepping through the code in the debugger everything looks fine (paths look good). However, at this point in the code:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/nsprpub/pr/src/linking/prlink.c&rev=3.51.2.37&mark=895,908#892

although the absolute path to the DLL in wname looks fine, h gets set to NULL with error 126 (ERROR_MOD_NOT_FOUND). See

http://msdn.microsoft.com/library/en-us/debug/base/system_error_codes__0-499_.asp

Adding the SetCurrentDirectory change "fixes" things so that the DLLs in xulrunner/components load again. Since the ActivateActCtx call fixes the loading of DLLs from the xulrunner directory, I'm not sure what the problem with the DLLs in the components directory is. I don't believe they're failing because the CRT isn't found though. I think it must be something else.
Ah. If I replace SetCurrentDirectory with SetDllDirectory, things work too. This is even though I pass SetDllDirectory the xulrunner directory, and not the xulrunner/components directory.
You are running on WinXP, right?
Note that to build with SetDllDirectory you need to add the following line before the windows.h include:

#define _WIN32_WINNT 0x0502
WinXP SP2, right.
Since SetDllDirectory is not available in win2k, we'd need to dynamically load it (but that's ok, since we don't have this problem except in winxp+). We could even do it from the app itself, so that we don't have to change the stub.
Great. It would be really nice to get a handle on _why_ the above attempts work/fail first though. :-)
What I assume is that the CRT DLLs are not in the search path unless a) the dll you're trying to load is in that same dir, or b) you've called SetDllDirectory.

However, the more I read about the SxS stuff, the less I understand.

It looks like my suggestion from comment #19 has no effect, so you can just remove that and use your SetDllDirectory fix along with the original activation context magic, and things should work.

I'm not convinced that finding the CRT DLLs is the problem. Maybe it is, but then I'd have expected similar errors loading the DLLs under the components directory as we get when loading the DLLs under the xulrunner directory without the activation context stuff. I'll investigate further tomorrow.

I certainly sympathize with your feelings on the SxS stuff. I can't get anything more than a fuzzy grip on how this stuff works from my reading of the docs.

Regarding your suggestion in comment 19, I think whether it works or not, it's the right thing to do so I'd rather leave it in.

Benjamin, assuming doing the activation and dll path stuff is all good, where do you want the various pieces put? I was thinking putting them in XPCOMGlueStartup should get things working and save embedders from having to know about this. That files doesn't reach out to system.dll right now, but hopefully making it do so isn't a problem.
A couple of people have mailed me privately to ask for my current diff. It's not hard to derive it from the comments above, but I might as well attach it I guess.
Attachment #244810 - Attachment is obsolete: true
Attachment #244824 - Attachment is obsolete: true
Gah! So because the stub is inexplicably broken even for people with the CRT in WinSxS, when I got it to work on my machine (which has the CRT in WinSxS) I forgot to test on a machine without Visual Studio installed. My current efforts only manage to get the CRT in WinSxS recognized. Even though it's all about setting search paths/activation contexts to the xulrunner directory! The CRT bundled with XULRunner still isn't found, so this doesn't fix things for the general populous. Now I'm even more confused than before.

Anyway, this hack allows our guys to continue with their work on their VS installed machines for now, and I have too many other pressing things to look into this further. If someone has ideas or insight I'll happily run tests for you, or if someone wants to pick this up and run with it, feel free.
Is there anything new to add on this? Will this need to be solved in order to build Firefox running on XULRunner?
Assignee: nobody → ted.mielczarek
Blocks: 371359, 373047
I'm doing accessibility work for a xulrunner based app and wanted to take 1.9a4 for a spin to see what's been fixed/broken.  Unfortunately, I immediately received the R6034 error.  Is there an update to the target milestone (still listed as 1.9a1)?  And doesn't this qualify for 'Blocker' or at least 'Critical' severity since the application can't run at all?
Ok, I've done some investigation.  XULRunner on windows doesn't ship any binary components in components/.  Therefore, if we just embed a manifest in all of our DLLs, we should be in good shape.  I tested this with the JavaXPCOM tests and the XR stub.  I will have to patch NSPR to ensure that it gets manifests in everything.  XR app authors and extension authors will have to statically link to the CRT in any components they author, but I think this is a workable situation.

Attachment #245081 - Attachment is obsolete: true
Attachment #262548 - Flags: review?(benjamin)
Attached patch embed manifests in nspr libs (obsolete) — Splinter Review
And for NSPR as well.
Attachment #262549 - Flags: review?
Attachment #262549 - Flags: review? → review?(wtc)
Attachment #262548 - Flags: review?(benjamin) → review+
Comment on attachment 262549 [details] [diff] [review]
embed manifests in nspr libs

This patch looks okay.  I'd prefer a patch that's similar to the NSS patch, which uses a variable named MT whose value is mt.exe.
Is this better?
Attachment #262549 - Attachment is obsolete: true
Attachment #267580 - Flags: review?(wtc)
Attachment #262549 - Flags: review?(wtc)
Comment on attachment 267580 [details] [diff] [review]
embed manifests in nspr libs, use $(MT)

r=wtc.
Attachment #267580 - Flags: review?(wtc) → review+
Comment on attachment 267580 [details] [diff] [review]
embed manifests in nspr libs, use $(MT)

Ted, just curious: what do the ;1 and ;2 at the end of the
-OUTPUTRESOURCE option mean?  I noticed that
mozilla/xpinstall/stub/Makefile.in were using ;17 before.
That's the resource ID where the manifest is embedded.  Windows expects to read manifests at ID 1 for executables, and 2 for DLLs (usually, although you can do other things).  Anything above 16 is left for user-defined activities.  We were using 17 because we had some code in xpistub to manually create an activation context and load the manifest into it.  Since we're no longer using the xpinstaller, we don't need that anymore.  If anyone still needed it, they could make it load from resource ID 2 anyway.

Thanks for the review!
Comment on attachment 262548 [details] [diff] [review]
embed manifests in everything

Checked in on trunk.
I made some minor changes.  I've checked in this patch on the
NSPR trunk (NSPR 4.7) and will check this in on the
NSPRPUB_PRE_4_2_CLIENT_BRANCH when the Mozilla trunk reopens.
Attachment #267580 - Attachment is obsolete: true
We missed the NSPR test programs, which are built with a
rule in mozilla/nsprpub/pr/tests/Makefile.in.  This patch
takes care of that makefile rule.
Comment on attachment 267803 [details] [diff] [review]
embed manifests in nspr tests

Note that I use -OUTPUTRESOURCE:$@\;1 to build the NSPR
test programs.
Attachment #267803 - Flags: review?(ted.mielczarek)
Attachment #267803 - Flags: review?(ted.mielczarek) → review+
I heavily suspect this has caused SeaMonkey Ts to increase from ~760ms to ~880ms on Windows (see http://build-graphs.mozilla.org/graph/query.cgi?testname=startup&units=ms&tbox=sea-win32-tbox&autoscale=1&days=7&avg=1 )

We currently have no test data for Firefox, but one reason for that may again be this checkin (that is the one in comment #44) as suggested in bug 383875
Sorry for yet another comment, but that SeaMonkey perf data is backed by the other Windows tinderbox in SeaMonkey-Ports which went from ~2200ms to ~2600 on Ts, while the Mac box there and the Linux both on the SeaMonkey tinderbox page all stayed unchanged. Also other perf data than Ts stayed unchanged.

So overall, we have a 15-20% Windows-only Ts regression for SeaMonkey.
I backed out attachment 262548 [details] [diff] [review] to see if it would fix the perf test machine bustage that's keeping the tree closed (bl-bldxp01). It cycled once, and still crashed. I've triggered a clobber to see if that will fix it, but won't be able to stay around until we get the results (the cycle time is at least a couple of hours).
And while the backout didn't make bl-bldxp01 happy, it did drop Ts on sea-win32-tbox from 891 the run before to 750 the run after, so more than "heavily suspect."
The backout seems to fixed Thunderbird trunk also.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a6pre) Gecko/20070610 Thunderbird/3.0a1pre ID:2007061003 [cairo]
Starts and runs fine here, winxp sp2 (previously crashed on startup)
Ok, so there are two separate issues here, both of which suck.  There's some sort of startup crash, and there's a Ts hit.  I'm not entirely surprised by a Ts hit, since the DLL loader probably has to parse the manifest out of every DLL now.  It'd be interesting to compare the effect on Firefox since it's a static build.  The crash is strange, I was able to reproduce it in a VM, and I submitted a crash report here:
https://crash-reports.mozilla.com/reports/report/index/27a59de6-1758-11dc-a54e-001a4bd43ed6

It's crashing in nsFrame, but lower in the stack it looks like we're doing XPCOM shutdown.  I don't understand that.
With regards to NSPR, the patches from this bug are the only difference between HEAD and NSPRPUB_PRE_4_2_CLIENT_BRANCH (does not yet have the patch).

Note that we would like to obsolete the NSPRPUB_PRE_4_2_CLIENT_BRANCH, see bug 387992 and switch to snapshots of HEAD.

Is it ok for Mozilla trunk to pick up the changes from this bug, as checked in to NSPR HEAD?
That should be fine. If it's not, the tinderbox will let us know ;-)
I think this is wanted-1.9, right? Remove it if not.
Whiteboard: [wanted-1.9]
Actually, let's see if drivers feel that this is blocking-1.9.
Flags: blocking1.9?
Whiteboard: [wanted-1.9]
Well, I'd like for ted to try and reland it.
Comment on attachment 262548 [details] [diff] [review]
embed manifests in everything

Ok, I've rebuilt with this patch applied, and tested the resulting build on Vista and XPSP2 with no problems.  I'd like to check this back in.
Attachment #262548 - Flags: approval1.9?
Attachment #262548 - Flags: approval1.9? → approval1.9+
Checked in.  Hopefully it sticks this time!
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
SeaMonkey Ts regressed again, like in comment #48 and comment #49 - I filed that as bug 350616 so that it's tracked.
I guess SeaMonkey will need to switch to libxul to fix that again, which is a target anyways.
Depends on: 394190
Umm, sorry, wrong bug number. The Ts regression is bug 394470, actually.
Product: Toolkit → Toolkit Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: