Closed Bug 425167 Opened 16 years ago Closed 6 years ago

Installer corrupted: invalid opcode

Categories

(Firefox :: Installer, defect, P1)

2.0 Branch
All
Windows
defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: GregKagiraWatson, Assigned: molly)

References

Details

Attachments

(2 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13

The error message said "Installer corrupted: invalid opcode"
FireFox has been running slow and locking up recently, so I was/am optimistic that this update (install of new version) will correct my problems, but I am uncertain that it even installed correctly, given the error message.  What should I do to make sure I have an uncorrupted version, try the download and reinstall it again... and see if I get the same error message?  Please advise. Thank you guys for all your great work!!!  I love FireFox and have been using it for years. This error message happened the last time I tried to update. Greg 404-831-0951 (cell phone)

Reproducible: Always

Steps to Reproduce:
1.download new version
2. run install
3. error message appears
4. install completes as if everything is OK, but the error message appears next time I try update I get the same error message
Actual Results:  
During normal use I got this: "Fail to get XMLDocument from http:..dl.ask.com/toolbar/moz/tbintalled.jsp?v=2.0 yatta...yatta...yatta...  [Exception..."Component returned failure code: 0x80004005 (NS_ERROR_FAIURE) [nxIXMLHttpRequest.suend]"  nsresult: 0x80004005 (NS_ERROR_FAIURE)"  location: "JS frame : : chronme://snipit/content/snipit.js :: ReportToWhatzup :: line 1571" data: no]

Expected Results:  
Expect to hear from you.

That's it folks.  Thank you.
Greg
Can you reproduce with Firefox v2.0.0.14 ?
Version: unspecified → 2.0 Branch
xref: Bug 442702 New: install of TB20014 fails with "invalid opcode" error message.  I note in this bug, that I get "Same "invalid opcode" error attempting to install FF 2.0.0.14 on same machine."

Greg appears to be gone.
Component: General → Installer
QA Contact: general → installer
Summary: Installer corrupted: invalide opcode → Installer corrupted: invalid opcode
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → INCOMPLETE
I am not gone, but I don't know what you expect me to do?
I think I eventually reinstalled FireFox (after several tries -- had to go back to a couple of MS Windows "system restore" points), which seems to be working OK now.  What did you guys do? Do you know what caused this and what I can do to avoid in future?
Greg
Status: RESOLVED → UNCONFIRMED
Resolution: INCOMPLETE → ---
We didn't change anything. Chances are this was either a corruption in the download or an extremely rare setting on your system. Are you still able to reproduce?
No response to comment #4. Resolving -> WFM
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → WORKSFORME
Resolution: WORKSFORME → INCOMPLETE
Attached file last-update.log
Here we are 9 years later, and I can repro :) 

I'm on Nightly and the symptoms are that after an update installs, just as Firefox itself starts this dialog will appear. It appears to be from "helper.exe", which stays alive while it is being shown. It's happened for the last 4 or 5 updates and note that the updates *do* get installed.

It's also worth mentioning that a google search for "installer corrupted invalid opcode firefox" gets lots of hits, including this bug and a sumo article which talks about this specific issue, so it is probably happening at a rate which we should take seriously, especially given the resolution is likely beyond many of our users.  OTOH, as the updates appear to work, I guess we could argue it's simply an annoyance.

From reading those articles I suspect that if I cleaned my %USERPROFILE%\AppData\Local\Mozilla\updates folder it would go away, but I'm not doing that yet in the hope that data is useful. It has the following structure:

.
├───2D9E75C658F35BC6
├───EECF4B263795E958
└───F0D1AB6A2E32A696
    │   updates.xml
    │
    └───updates
        │   backup-update.log
        │   last-update.log
        │
        └───0

(ie, a number of empty dirs and only 3 files in total) - so maybe nuking it will not work? I'm attaching last-update.log.

Rob, are you still interested in digging in here, or do you know who might be?
Flags: needinfo?(robert.strong.bugs)
+mhowell
Flags: needinfo?(robert.strong.bugs) → needinfo?(mhowell)
Thanks for the info. There should not be anything in "%USERPROFILE%\AppData\Local\Mozilla\updates" that's relevant to this error in any case, and I don't see anything in your directory tree that looks out of the ordinary. The contents of your last-update.log don't provide any hints either.

Would you mind attaching the file "C:\Firefox\Nightly\uninstall\helper.exe" after you've seen this error? I'm curious whether the file is corrupt or broken, or if the file data itself is fine but something is causing it to not load properly.

To provide some background here, helper.exe is responsible for running the code at [0] during each update after the files (including helper.exe) have been patched by updater.exe. Most of the time this code doesn't end up doing a whole lot; it updates the version information in the Windows registry uninstall key to indicate the new version number, and on most updates that's all it does. Most of the rest of the code is cleanup or migrations for updating from much older versions. That's why the updated version of Nightly runs fine despite this error.

[0] https://searchfox.org/mozilla-central/source/browser/installer/windows/nsis/shared.nsh#5
Flags: needinfo?(mhowell) → needinfo?(markh)
Attached file helper.exe.zip
Here's helper.exe, captured while it is showing that dialog.

FWIW, process explorer tells me it was executed with a command-line of "argv0ignored /PostUpdate", it's parent is the (now closed) updater.exe, and it has one thread with a stack of:

wow64cpu.dll!TurboDispatchJumpAddressEnd+0x544
wow64cpu.dll!TurboDispatchJumpAddressEnd+0x503
wow64cpu.dll!BTCpuSimulate+0x9
wow64.dll!Wow64LdrpInitialize+0x236
wow64.dll!Wow64LdrpInitialize+0x120
ntdll.dll!LdrInitShimEngineDynamic+0x308f
ntdll.dll!memset+0x1ecbf
ntdll.dll!LdrInitializeThunk+0x5b
ntdll.dll!LdrInitializeThunk+0xe

(which seems a little odd, but that's what it says :)
Flags: needinfo?(markh) → needinfo?(mhowell)
Okay, thanks for the file and the info. That's a perfectly good copy of helper.exe, so at least we know that the updater is writing it correctly.

That leaves two possibilities: either something is messing with how the file is loaded/read (but presumably not messing with other files the same way or your computer would probably not be usable), or the error is being triggered by a bug in our code that only manifests under pretty rare conditions (otherwise this issue would be more frequently reported). I suspect the latter of course, but it's going to be very difficult to debug. We don't have logging in this code, and there's no way to get a less generic error message, so the only diagnostic I can really think of is to build a copy of helper.exe that's littered with progress messages and see how far it gets. That's kind of more work than I can ask you to do though, I think.

I think the stack you're seeing is a Process Explorer bug; I seem to get the same cut-off stack for any 32-bit process (which helper.exe always is). Process Hacker [https://processhacker.sourceforge.io/] would probably show you the rest of the stack, but I don't expect it to be very informative; it's an NSIS binary so it's just interpreting bytecode (the opcode that the error message refers to is a bytecode instruction), so there probably won't be a useful native code stack. I also don't think I've ever noticed the "argv0ignored" thing, but that's... probably okay.
Flags: needinfo?(mhowell)
(In reply to Matt Howell [:mhowell] from comment #10)
> have logging in this code, and there's no way to get a less generic error
> message, so the only diagnostic I can really think of is to build a copy of
> helper.exe that's littered with progress messages and see how far it gets.
> That's kind of more work than I can ask you to do though, I think.

I can certainly build firefox from source if that helps. To experiment, I added:

>   MessageBox MB_OK "POSTUPDATE"

to https://searchfox.org/mozilla-central/rev/71ef4447db179639be9eff4471f32a95423962d7/browser/installer/windows/nsis/shared.nsh#6

then built a debug version of Firefox and copied update.exe from the obj-dir to the installed directory, but saw no messagebox. VS2017 also doesn't think that executable has debug info, so I'm sure there's something about how nsis is used that I don't understand.

If you can give me some clues here I'd be happy to experiment.

> I think the stack you're seeing is a Process Explorer bug; I seem to get the
> same cut-off stack for any 32-bit process (which helper.exe always is).
> Process Hacker [https://processhacker.sourceforge.io/] would probably show
> you the rest of the stack, but I don't expect it to be very informative;
> it's an NSIS binary so it's just interpreting bytecode (the opcode that the
> error message refers to is a bytecode instruction), so there probably won't
> be a useful native code stack. I also don't think I've ever noticed the
> "argv0ignored" thing, but that's... probably okay.

Yeah - Visual Studio gives a fairly useless stack (both using the original helper.exe and one I built using a debug mozconfig), so I suspect you are correct.

What are the next steps here?
Flags: needinfo?(mhowell)
I should also note that executing "helper.exe /PostUpdate" from the cmdline gives the same message.
Yeah, you're doing pretty much what I had in mind, adding in MessageBox calls and invoking "helper.exe /PostUpdate" from the command line. I can give you a few more pointers. The actual entry point is:
https://searchfox.org/mozilla-central/rev/71ef4447db179639be9eff4471f32a95423962d7/browser/installer/windows/nsis/uninstaller.nsi#616

which is calling:
https://searchfox.org/mozilla-central/rev/71ef4447db179639be9eff4471f32a95423962d7/toolkit/mozapps/installer/windows/nsis/common.nsh#5257

which invokes the PostUpdate code you already found, after a bit of initialization and checking the command line. I'd work my way down that whole path sprinkling MessageBox's around. If it's not even getting into the PostUpdate macro that's certainly surprising; as you can see there's not too much substantial code prior to that.

There seems to be a bug in the build system where helper.exe doesn't get recompiled on incremental builds, so I think you have to have a fresh build every time you change it. Fortunately it does get compiled during artifact builds, so I'd certainly recommend using those.

About the debug info thing, the NSIS compiler works by taking a prebuilt executable stub (the bytecode interpreter) and just appending the bytecode that it's compiled from our scripts. So debug vs. release builds don't have any effect on it. :(

By the way, thanks for doing this! It's always really awesome to have someone who's reproducing a rare bug be willing to put in the effort to help diagnose it.
Flags: needinfo?(mhowell)
Okay, thanks to some pretty heroic debugging from :markh (thanks again, Mark) I think we've actually finally tracked this bug down.

The root cause is the use of the ApplicationID::Set plugin function, specifically inside our function UpdateShortcutAppModelIDs at [1]. At [2], ApplicationID::Set pops its two required parameters off the NSIS stack, but then it also proceeds to pop a third optional parameter. We never pass the third parameter, because the documentation doesn't mention it and we don't need it, so every call we make to ApplicationID::Set pops one too many items off the stack.

Luckily, we can handle that happening a few times just fine, because a) there are several items already on the stack at that point and we don't end up doing much with those, so we can tolerate losing them, and b) the NSIS Pop instruction does not fatally fail if the stack is empty, it just sets the error flag (which is not generally checked after a Pop instruction) and leaves the destination alone. But, the Exch instruction, which is used in UpdateShortcutAppModelIDs a couple of times, cannot continue if it does not find the number of items it needs on the stack, so it throws the invalid opcode error and aborts the script. Meaning we only crash if we lose a number of items (that is, if we make a number of ApplicationID::Set calls) which is greater than the number of items that were previously on the stack when UpdateShortcutAppModelIDs was invoked (minus 2, because our use of Exch there needs two items).

All of which means this only breaks things if you have an unusually high number of shortcuts to one installation of Firefox. And in fact that's what led us here; Mark noticed nine accidentally created copies of his taskbar shortcut.

I'm going to go ahead and put a patch for this together; this bug has certainly sat here long enough. I think the way to go is just to add in the third parameter everywhere we call ApplicationID::Set, so I'm planning on doing that.

[1] https://searchfox.org/mozilla-central/rev/3fa761ade83ed0b8ab463acb057c2cf0b104689e/toolkit/mozapps/installer/windows/nsis/common.nsh#6990
[2] https://searchfox.org/mozilla-central/rev/3fa761ade83ed0b8ab463acb057c2cf0b104689e/other-licenses/nsis/Contrib/ApplicationID/Set.cpp#64
Assignee: nobody → mhowell
Severity: critical → minor
Status: RESOLVED → REOPENED
Ever confirmed: true
OS: Windows XP → Windows
Priority: -- → P1
Hardware: x86 → All
Resolution: INCOMPLETE → ---
In theory, the third parameter (dualMode) to the NSIS ApplicationID plugin's
Set function is optional, but the plugin assumes that either it's there or
there's nothing else on the NSIS stack, so it pops three items from the stack
unconditionally. This means that supplying only two parameters results in
one item silently being dropped from the NSIS stack. Fortunately this one
function is the only place where we were doing that, so it only became a
problem if there were an awful lot of shortcuts (around 7) to the same
installation.
Cool! I have no issues with the patch, but it seems like this should probably be a different bug: The initial report here was before the introduction of that optional parameter in bug 740694.
Attachment #9003179 - Attachment is obsolete: true
Okay, I'll move this to a new bug.
Status: REOPENED → RESOLVED
Closed: 16 years ago6 years ago
Resolution: --- → INCOMPLETE
See Also: → 1485484
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: