Crash with JS callback and File obcall from second initInstall/performInstallject

VERIFIED FIXED in mozilla0.9.1

Status

Core Graveyard
Installer: XPInstall Engine
P1
critical
VERIFIED FIXED
17 years ago
2 years ago

People

(Reporter: Jimmy Lee, Assigned: dbragg)

Tracking

({crash})

Trunk
mozilla0.9.1
x86
Windows NT
crash

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [xpibug])

Attachments

(7 attachments)

(Reporter)

Description

17 years ago
Build: 2001-02-02-10-Mtrunk(WIN)

1. Open http://jimbob/jars/a_fileop_exe_2_win_block_t_cb.html or
   http://jimbob/jars/a_fileop_exe_3_win_block_t_cb.html (Both tests use
   blocking as true)
2. Click OK button from confirmation dialog

RESULT:
Crash

EXPECTED RESULT:
Notepad.exe is launched.  Exiting notepad results in a confirm dialog appearing 
and text displays from the browser window.

Incident ID: 25702613

Call Stack:    (Signature = 0x10101010 df117123) 
0x10101010 
js3250.dll + 0x1b6ec (0x60b1b6ec) 
js3250.dll + 0x17a26 (0x60b17a26) 
js3250.dll + 0x44ad (0x60b044ad) 
js3250.dll + 0x446d (0x60b0446d) 
js3250.dll + 0x43ef (0x60b043ef) 
xpinstal.dll + 0xbd1f (0x60a3bd1f)                                       

Install script for xpi called by a_fileop_exe_2_win_block_t_cb.html:
var xpiSrc = "notepad.exe";

initInstall("Acceptance: addFile for a_fileop_exe_2_win_block_t", "acceptance", 
"1.1.1.1", 0);
g = getFolder("Program");
err1 = addFile("fileop_exe_2_win_block_t", "3.3.3.3", xpiSrc, g, xpiSrc, true);
logComment("addFile " + xpiSrc + " returns = " + err1);
performOrCancel();

logComment("");  //line space
initInstall("Acceptance: a_fileop_exe_2_win_block_t", "acceptance", "1.1.1.1", 
0);
fileSource = getFolder(g, xpiSrc);	// append file to path
err2 = File.execute(fileSource, true);
logComment("Path to notepad.exe = " + fileSource);
logComment("File.execute returns = " + err2);
performOrCancel();

err3 = confirm("This confirm dialog should appear after executable is 
dismissed.");
logComment("confirm returns = " + err3);

function performOrCancel() 
{ 
   if (0 == getLastError())
      performInstall();
   else
      cancelInstall();
}
(Reporter)

Comment 1

17 years ago
Further investigation reveals that the problem is not related to blocking.  It 
appears that the problem is related to File.execute().  The tests 
http://jimbob/jars/a_fileop_exe_2_win_block_f_cb.html 
and http://jimbob/jars/a_fileop_exe_3_win_block_f_cb.html have blocking set to 
false.

Removing "blocking" from Summary.  Either way, the crash still occurs. :-(
Summary: Crash occurs with callback and File.execute blocking → Crash occurs with callback and File.execute
Jimmy, does Install.execute() also crash? Could be something in the new 
nsIProcess stuff. Is this bug XP?

nominating nsbeta1 for investigation
Keywords: nsbeta1
(Reporter)

Comment 3

17 years ago
This is not a problem with Install.execute().  The crash only appears for 
File.execute(), and I have no idea why.  There appears to be no problem on 
Linux, too.
(Assignee)

Comment 4

17 years ago
Also, doesn't this crash only exist with the callback?  Running File.Execute by
itself doesn't crash anything as I understand it.
(Reporter)

Comment 5

17 years ago
Correct.  The crash only occurs with the callback.  Further investigation 
reveals that the crash occurs with other methods of the File object (ie 
File.copy, File.isDirectory, etc.)
(Assignee)

Comment 6

17 years ago
Well. It looks like this problem is unrelated to the Execute (blocking) changes.
 I suspected that there might be a problem with the (small) modification I had
to make to one of the fileOp constructors but File.Copy and File.IsDirectory
don't use the same constructor that File.Execute uses so this I don't think this
is the problem.

Running the test case in the first comment of this bug ended up crashing in the
inline funtion in the struct Link in xpcom/ds/nsVoidBTree.h.  The line:

PRBool operator!=(const Link& aOther) const {
return !aOther.operator==(*this); }

The problem seems to be that *this empty.  Crasho grosso.
(Reporter)

Comment 7

17 years ago
But wait, there's more!

Looking back, I am able to reproduce the problem with 6.0 RTM and 6.01.  Well, I 
actually reproduced the crash with File.copy, so that sort of shows us that it's 
definitely not specific to File.execute.  Now what?
(Reporter)

Comment 8

17 years ago
Changing Summary to be more accurate.

A more easier test to use may be http://jimbob/jars/f_fileop_copy_cb.html since 
we don't have to deal with some of the differences with File.execute().
Summary: Crash occurs with callback and File.execute → Crash occurs with callback and File object
We probably need to fix this one, seems like the File object is doing something
nasty that could come up in other cases.

I Don't understand the use of the word "callback". To my knowledge the only
thing we use that term for in XPInstall is the InstallTrigger.install() 2nd
argument, and I don't see that involved here (though there are other crashes
with that callback function).
Whiteboard: [xpibug][smartupdate]
Keywords: nsbeta1 → nsbeta1+
Whiteboard: [xpibug][smartupdate] → [xpibug]
(Assignee)

Comment 10

17 years ago
After some investigation I don't think this is related to File Ops but is
instead caused by a double call to performInstall.  In the test case presented
an init/perform pair is called to add a file to the system for later copying
using File.copy.  Then another init/perform pair is used to test File.copy.
When I modified the test to have only one init/perform pair the "callback"
occured correctly.

QA: Could you modify your test so that there is only one init/perform pair.
You'll have to copy a file that already exists of course but it's really just
for this verification.

Just FYI, when the crash does occur it's in the javascript interpreter code, not
in any xpinstall code.
(Reporter)

Comment 11

17 years ago
One would think...

I cannot reproduce the crash with the test described below.  After talking with 
Don, I prepared a test where no File objects were called.

A new test can be found at http://jimbob/jars/f_double_initinstall_cb.html
This page calls f_double_initinstall.xpi whose install script is as follows:

var xpiSrc = "xpinstall.txt";

initInstall("Functional: f_double_initinstall Part 1", "f_double_initinstall_1", 
"1.1.1.1", 0);
f = getFolder("Program", "f_double_initinstall_1");
err1 = addFile("", xpiSrc, f, "f_double_initinstall_1.txt");
logComment("addFile returns = " + err1);
performOrCancel();

initInstall("Functional: f_double_initinstall Part 2", "f_double_initinstall_2", 
"1.1.1.2", 0);
g = getFolder("Program", "f_double_initinstall_2");
err2 = addFile("", xpiSrc, g, "f_double_initinstall_2.txt");
logComment("addFile returns = " + err2);
performOrCancel();

function performOrCancel()
{ 
   if (0 == getLastError())
      performInstall();			// do the install here.
   else
      cancelInstall();
}
(Reporter)

Comment 12

17 years ago
Calling one pair of initInstall() and performInstall() does not crash if a 
fileop is used.

Another new test at http://jimbob/jars/f_fileop_dircreate_cb.html supports this 
claim.

Calling two pairs of initInstall() and performInstall() does not crash if a 
fileop is used in the first part and addFile (no fileop) is used in the second 
pairing.

This test is http://jimbob/jars/f_fileop_install_cb.html.
I'm confused by "below". I assume you meant "below" you when entering the 
comment, i.e. now "above" your comment, and not the test case you then entered 
that is, in fact, below your comment.

Or did you gratuitously type in a non-crashing example? :-)

So is it safe to change the summary to "crash with more than one init-perform 
block in same install"? We fixed one like that once, drat.
Very strange, requires two init/perform blocks AND a File object method?
(Assignee)

Comment 15

17 years ago
No wait!  It's even weirder than that.  The File op has to be in the second
init/perform block to cause the crash.
(Reporter)

Comment 16

17 years ago
Yes, Don clarifies the situation.  It appears that the file operation has to
called from the second pair of initInstall/performInstall to reproduce the crash.

Somewhat reluctantly :-), I did provide a couple of non-crashing examples.  This
seemed necessary in order to prove that this crash is specific to what Don and I
concluded.

Adding keyword crash.

Updated Summary to reflect the new details of the crash.  Hopefully, it is more
clear.
Keywords: crash
Summary: Crash occurs with callback and File object → Crash with JS callback and File obcall from second initInstall/performInstallject

Updated

17 years ago
Target Milestone: --- → mozilla0.9.1

Updated

17 years ago
Priority: -- → P1
(Assignee)

Comment 17

17 years ago
Adding Brendan and Phil to the CC list because this is a REALLY WEIRD crash down
deep in jsinterp.c

I've traced the crash to a line in jsinterp.c in the js_Interpret function.
It's the

case JSOP_GETPROP: case

It seems to be that in the call to OBJ_GET_PROPERTY, the
obj->map->ops->getProperty value is bogus.  I'll create an attachment that has
the stack trace (although that's not much help) and my debugger information that
shows the value of obj->map->ops->getProperty

(Assignee)

Comment 18

17 years ago
Created attachment 29058 [details]
stack trace and debugger information on js_interp.c crash
(Assignee)

Comment 19

17 years ago
Created attachment 29060 [details]
Oops. This is the stack trace.
There's nothing weird about this -- you're probably passing a dead (garbage
collected, maybe even recycled) object reference at 0x00ef2408 to the JS API. 
Don't do that.  Where did that second argument from JS_EvaluateScript come from?
Perhaps you need to use a GC root (JS_AddNamedRoot, JS_RemoveRoot)?

Cc'ing jband, because I dimly recall he (and I) have seen this sort of thing
from xpinstall before.

/be
(Assignee)

Comment 21

17 years ago
Created attachment 29183 [details] [diff] [review]
Fix (thanks to Rob Ginda!)
I'd make that property JSPROP_PERMANENT too.  It's still scary to have a global
C variable weakly referring to a JS GC thing that's not rooted, even if that
thing is held by a property.  What if the object in which the property is
defined (however readonly and permanent that property may be) is itself cut off
from the live object graph, and collected?

/be
(Assignee)

Comment 23

17 years ago
When you say "that property" you're referring to the line :

JS_DefineProperty (jscontext, installObject, "File",
OBJECT_TO_JSVAL(gFileOpObject), JS_PropertyStub, JS_PropertyStub, JS_PROP_READONLY);

right?

If so, I just need to have JSPROP_READONLY | JSPROP_PERMANENT for the last
argument.  Is that correct?

Thanks Brendan.
Yes, that's the property I meant -- but what keeps the object in which that
property is permanently read-only around and alive?

/be
The install object is our global object -- it shouldn't be able to go away, 
should it?

Are there any bad effects of leaving INSTALL_FILEOP in the Install_slots enum? 
The explicit slot numbers are part of the JSPropertySpec so it's probably OK, 
but I don't know if there's anything magic about sequential numbers.

Also doesn't seem like there's much point in just removing the one line from 
the GetInstallProperty() case statement, why not remove the entire case?

I don't know that you want my r= on this, I'm part of how we got in this mess 
in the first place; I demonstrably don't know what I'm doing here.
Nit: the JS_SetGlobalObject call added in the patch is not needed, because
JS_InitStandardClasses(cx, obj) sets cx->globalObject to obj if cx->globalObject
is null.

It looks like you're ok once the cx->globalObject root refers to installObject,
in which the permanent, readonly File property is bound to gFileOpObject.  But,
what if the GC could race with all the code in InitXPInstallObjects?  I don't
see any calls to JS_BeginRequest and JS_EndRequest in xpinstall/src/*.cpp.  Does
this code run only on the main thread, or does it run on its own thread?

/be
dveditz: nothing magic about tinyids, except that they're negative and scarce,
so I agree: the patch should be revised to remove INSTALL_FILEOP and its uses.

I hope the JS API docs on JS_BeginRequest and JS_EndRequest are better now (Rob
and I haven't hacked lately, but we made some passes to improve these lie-filled
leftovers perpetrated by a techwriter back in the netscape.com JSRef days, and
we may have fixed up the Request API docs).

/be
(Assignee)

Comment 28

17 years ago
Created attachment 29495 [details] [diff] [review]
New patch with some of Brendan's suggestions.
-  INSTALL_FILEOP          = -10,
   INSTALL_INSTALLED_FILES = -11

I would make the tinyid domain here dense, just to help compilers optimize
switch statements and avoid creeping sparseness.

Any hope for JS_BeginRequest and JS_EndRequest around maximal non-blocking API
and native code sequences?

/be
(Assignee)

Comment 30

17 years ago
I experimented with adding JS_BeginRequest(cx) in nsSoftwareUpdateRun.cpp right
before the call to JS_EvaluateScript (~line 430) and JS_EndRequest(cx) right
before the call to JS_DestroyContextMaybeGC(cx) (~line 473 also in
nsSoftwareUpdateRun.cpp)

Strange results.  Mozilla would hang with no stack trace available somewhere
after the call to JS_BeginRequest.  Should the calls go somewhere else?
No stack backtrace?  What debugger, and what are other threads doing?

/be
(Assignee)

Comment 32

17 years ago
Sorry. Not stack trace available at the time of the "crash" but I generated one
using the Windows debugger.

Attaching.
(Assignee)

Comment 33

17 years ago
Created attachment 29505 [details]
stack trace when using JS_BeginRequest()
So the key phrase in my last comment about JS_BeginRequest was "non-blocking". 
If you find the other threads' stacks, I bet there's a deadlock.  You need to
call JS_SuspendRequest before calling out to a blocking or lock-acquiring method
from your native JS methods, and call JS_ResumeRequest after -- if you have any
blocking/lock-taking callouts.  Do you?

/be
(Assignee)

Comment 35

17 years ago
Blocking, schmockling.  Ok,I think I found where we're blocking.  I put the
Suspend/Resume in InstallStartInstall in nsJSInstall.cpp.  That seemed to do it.
 Posting new diff.
(Assignee)

Comment 36

17 years ago
Created attachment 29528 [details] [diff] [review]
Patch with JS_SuspendRequest/JS_ResumeRequest in InstallStartInstall
(Assignee)

Comment 37

17 years ago
Ok, one more patch.  I also added the Suspend/Resume code around Finalize
because a lot is done in that phase.  Adding patch.  Brendan, could you please
review and if satisfactory give it your sr=?

(Assignee)

Comment 38

17 years ago
Created attachment 29730 [details] [diff] [review]
New patch with Suspend/Resume around FinalizeInstall
sr=brendan@mozilla.org.

/be

Comment 40

17 years ago
r=sgehani
(Assignee)

Comment 41

17 years ago
Fix checked in.
Status: NEW → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED
(Reporter)

Updated

17 years ago
Depends on: 75458
(Assignee)

Comment 42

17 years ago
I'm pretty sure this was broken when danm checked in his change to nsIPrompt
from nsICommonDialogs.  We were using the nsICommonDialog service for both our
alert and confirm dialogs.  This was replaced with the nsIPrompt service on
April 8, 2001.  I'll ping danm about it.

FYI, it's cvs rev 1.161 of nsInstall.cpp.
(Assignee)

Comment 43

17 years ago
Never mind my last comment.  I got to this bug via another bug and it's the
wrong one.  That comment should have gone in 62167.
(Reporter)

Comment 44

17 years ago
Build: 2001-05-01-06-trunk(WIN)

Looks good now.  Marking Verified.
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.