NIghtly doesn't start and displays an error message

RESOLVED FIXED in Firefox 64

Status

()

defect
P1
normal
RESOLVED FIXED
10 months ago
7 months ago

People

(Reporter: expyron, Assigned: emk)

Tracking

({regression})

Trunk
mozilla64
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox62 unaffected, firefox63 unaffected, firefox64 fixed)

Details

Attachments

(2 attachments, 3 obsolete attachments)

Posted image Error message
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36

Steps to reproduce:

- Install a recent Nightly (either from auto-update or fresh install)
- Try to start Nightly


Actual results:

I get the error message (see attached screenshot):
"The procedure entry point ?DllBlocklist_initialize@@YAXI@Z could not be located in the dynamic link library C:\Program Files\Firefox Nightly\firefox.exe"

Nightly doesn't start


Expected results:

Nightly starts correctly without error message

Using mozregression, this is the relevant patch after which it stops working (tested from two different computers, same issue):
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=23841605d322bd069d28d5dfdd73e96247d1f130&tochange=6e4da634fab4c1e70046fb6a6111cc2575667d62
Are you using a third-party security software?
Blocks: 1481546
Status: UNCONFIRMED → NEW
Component: Untriaged → General
Ever confirmed: true
I get this on two computers with Windows 10.
One is using Windows Defender, the other is using McAfee.

I saw the other bug reports with nightly not starting, but this has different symptoms, as it shows an error dialog.
I have the exact same error message on both computers, but could not find anything on a google search, nor in existing bug reports.
So I managed to find an partial explanation for this issue.

An old file C:\Windows\System32\mozglue.dll is present on the filesystem of the computer.
The file signature is timestamped ‎"Sunday, ‎22 ‎February, ‎2015 16:26:01".

As soon as I delete or rename this file, the latest nightlies work properly again.

I do not know when or how the file was put there. It is most likely from a previous firefox installation.
The modification date of the file is 2015-03-13, so it's older than the computer, and older than any firefox version ever installed on the computer.

For some reason, this happened on two (very different) machines, so there must be some reason why this file was still present.

The nightlies before 2018-09-26 are not affected by the presence of C:\Windows\System32\mozglue.dll
So it's defnitely a fallout from bug 1481546. After bug 1481546, System32 dlls always take precedence over local one. If System32 happens to have the same name of dll as Firefox one, it will prevent Firefox from launching.
Firefox (at least Nightly) doesn't seem to put its DLLs in System32 anymore. I don't know if that was a recent change.

If a previous Firefox installation was not properly uninstalled, or did not clean up properly its DLLs from System32, then the new versions can not run anymore.

The error message does not make it easy to debug too.
Maybe we can fix this by making mozglue.dll delayload and load it explicitly with the full path before calling the first mozglue function.
mozglue.dll might not be the only DLL affected.
Couldn't the installer clean up old firefox DLLs from System32 ?

I do not know how prevalent this is, as I could not find any other report of this after a week of nightly.
It still seems weird that I got the same issue twice.

Also, there is no telemetry on this. Those errors do not trigger a crash report (firefox isn't even starting).
According to inspection with dependency walker, mozglue.dll is the only non-system DLL that is not delayloaded.

Firefox installer have never copied mozglue.dll to system32. Some third-party apps might have installed and depend on mozglue.dll in system32.
I confirm the file is *not* coming from a previous Firefox installation (which is good!).

In my case, it was included with an SSO tool from Evidian (https://www.evidian.com/products/enterprise-sso/).
They are straight up bundling an old mozglue.dll from firefox into their installer (is this even allowed ?), and putting it into system32.

As this is enterprise-only, users are not likely to use nightly, which could explain the lack of issues reported since last week.
(In reply to expyron from comment #3)
> So I managed to find an partial explanation for this issue.
> 
> An old file C:\Windows\System32\mozglue.dll is present on the filesystem of
> the computer.
> The file signature is timestamped ‎"Sunday, ‎22 ‎February, ‎2015 16:26:01".
> 
> As soon as I delete or rename this file, the latest nightlies work properly
> again.


Sounds like a local machine issue from years ago. I think we should close this out as invalid.
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → INVALID
(In reply to expyron from comment #5)
> Firefox (at least Nightly) doesn't seem to put its DLLs in System32 anymore.
> I don't know if that was a recent change.

Our install has never done this.
(In reply to Jim Mathies [:jimm] from comment #10)
> Sounds like a local machine issue from years ago. I think we should close
> this out as invalid.

So we can't launch Firefox anymore if some other apps (in this case SSO tool from Evidian) happen to be installed? (See comment #9.)

We should ensure that we load mozglue.dll from our app directory rather than System32. Delayloading did not work because mozglue has some data exports, but I have another idea. I'll reopen the bug if it works.
(In reply to Masatoshi Kimura [:emk] from comment #12)
> So we can't launch Firefox anymore if some other apps (in this case SSO tool
> from Evidian) happen to be installed? (See comment #9.)

I can confirm that. This is not an old install. The newest versions of the SSO product available from Evidian also have this incompatibility.
(In reply to Masatoshi Kimura [:emk] from comment #12)
> I'll reopen the bug if it works.

It worked as expected. Try also looks good:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3fa5d50f29fbd415123f2928add2148452c6b292
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
expyron, could you double-check if this build fixed your issue?
https://queue.taskcluster.net/v1/task/YfDode4UTIOfzpOr_oRCjw/runs/0/artifacts/public/build/target.zip
Flags: needinfo?(expyron)
SxS assemblies do not obey the usual DLL search order. It will make it possible
to load mozglue.dll from appdir even if the PreferSystem32Image mitigation is
enabled and System32 has a random mozglue.dll.
Attachment #9015969 - Flags: review?(aklotz)
Assignee: nobody → VYV03354
Status: REOPENED → ASSIGNED
(In reply to Masatoshi Kimura [:emk] from comment #15)
> expyron, could you double-check if this build fixed your issue?
> https://queue.taskcluster.net/v1/task/YfDode4UTIOfzpOr_oRCjw/runs/0/
> artifacts/public/build/target.zip

Yes, this build fixes the issue.
Thank you for the very quick fix.
Flags: needinfo?(expyron)
Priority: -- → P2
Comment on attachment 9015969 [details] [diff] [review]
Make mozglue.dll a private SxS assembly

Review of attachment 9015969 [details] [diff] [review]:
-----------------------------------------------------------------

I love this! This is by far the best way to resolve this issue!

Having said that, I think you need to add dependentAssembly entries for at least one other .exe file, maybe more. I know that that minidump-analyzer.exe depends on mozglue, so you should add one there. Please check for any other executables that depend on mozglue.dll and add appropriate manifests as necessary.
Attachment #9015969 - Flags: review?(aklotz) → review-
I modified linker options instead of adding .rc and .manifest files for all Gecko programs.
Attachment #9016248 - Flags: review?(aklotz)
Attachment #9015969 - Attachment is obsolete: true
Comment on attachment 9016248 [details] [diff] [review]
Make mozglue.dll a private SxS assembly

Review of attachment 9016248 [details] [diff] [review]:
-----------------------------------------------------------------

I'm fine with this, but I think we should also obtain an r+ from a build config peer.
Attachment #9016248 - Flags: review?(core-build-config-reviews)
Attachment #9016248 - Flags: review?(aklotz)
Attachment #9016248 - Flags: review+
Comment on attachment 9016248 [details] [diff] [review]
Make mozglue.dll a private SxS assembly

Review of attachment 9016248 [details] [diff] [review]:
-----------------------------------------------------------------

I have full confidence in Aaron's judgement about the Windows-specific bits.

::: build/gecko_templates.mozbuild
@@ +42,5 @@
>              USE_LIBS += ['mozglue']
>              DEFINES['MOZ_HAS_MOZGLUE'] = True
>              if CONFIG['MOZ_GLUE_IN_PROGRAM'] and CONFIG['CC_TYPE'] in ('clang', 'gcc'):
>                  LDFLAGS += ['-rdynamic']
> +            if CONFIG['CC_TYPE'] in ('clang-cl', 'msvc'):

I think this needs to be dependent on `not MOZ_GLUE_IN_PROGRAM`?  Or at least assert what the expected value is?

::: mozglue/build/Makefile.in
@@ +15,5 @@
> +# Rebuild mozglue.dll if the manifest changes - it's included by mozglue.rc.
> +# (this dependency should really be just for mozglue.dll, not other targets)
> +# Note the manifest file exists in the tree, so we use the explicit filename
> +# here.
> +EXTRA_DEPS += mozglue.dll.manifest

Can you file a followup bug on scanning .rc (resource?) files for included files, and making them EXTRA_DEPS automatically?  I see that we have other .rc files that apparently include things, and we probably don't get the deps right for them either.
Attachment #9016248 - Flags: review?(core-build-config-reviews) → review+
(In reply to Nathan Froyd [:froydnj] from comment #21)
> I think this needs to be dependent on `not MOZ_GLUE_IN_PROGRAM`?  Or at
> least assert what the expected value is?

Sure. Added `not CONFIG['MOZ_GLUE_IN_PROGRAM']`.

> Can you file a followup bug on scanning .rc (resource?) files for included
> files, and making them EXTRA_DEPS automatically?  I see that we have other
> .rc files that apparently include things, and we probably don't get the deps
> right for them either.

Filed bug 1498414.
Attachment #9016248 - Attachment is obsolete: true
Mmm, this change caused failures on xpcshell tests and cppunit tests. Moreover, msvc builds are not comfortable with the duplicated manifest entry (lld-link seems to merge the duplicated resources):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ccd58231c0abf6c39ba2c93272e83ab22843b081
test_punycodeURIs.js fails because it could not launch WriteArgument.exe.
CppUnitTests fail for a similar reason.
If I include <dependency> about mozglue in .manifest files, MS link complains about the duplicate resource. If I do not include <dependency>, lld-link drops the <dependency> that is generated from the /manifestdependency switch, so the resulting binary will not contain <dependency> :(
Some tests assume that they can find mozglue outside the app directory. Making mozglue an SxS assembly will break the assumption. Since it is possible to load mozglue as a normal (non-SxS assemnly) dll, I added the <dependency> only to the shipped binaries.
Also I stopped using the linker flag because it depends strongly on the toolchains.
Attachment #9017014 - Flags: review?(nfroyd)
Attachment #9017014 - Flags: review?(aklotz)
Attachment #9016482 - Attachment is obsolete: true
Comment on attachment 9017014 [details] [diff] [review]
Make mozglue.dll a private SxS assembly

Review of attachment 9017014 [details] [diff] [review]:
-----------------------------------------------------------------

r=me on the build parts.
Attachment #9017014 - Flags: review?(nfroyd) → review+
Comment on attachment 9017014 [details] [diff] [review]
Make mozglue.dll a private SxS assembly

Review of attachment 9017014 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! r=me

::: mozglue/build/mozglue.dll.manifest
@@ +4,5 @@
> +    version="1.0.0.0"
> +    name="mozglue"
> +    type="win32"
> +/>
> +<file name="mozglue.dll"/>

I wonder whether, in the future, we should use hashes for SxS binding, but that's definitely a topic that introduces a lot of complexity that is best left for another day!
Attachment #9017014 - Flags: review?(aklotz) → review+
I'm moving this to mozglue since that seems like a slightly better component for this bug to reside in.
Component: General → mozglue
Priority: P2 → P1
Product: Firefox → Core
Keywords: checkin-needed
Pushed by ebalazs@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3eb04f5363eb
Make mozglue.dll a private SxS assembly. r=aklotz,froydnj
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/3eb04f5363eb
Status: ASSIGNED → RESOLVED
Closed: 9 months ago9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Depends on: 1515982
You need to log in before you can comment on or make changes to this bug.