JavaScript to Java broken after lazy start of JVM (REQUESTOR: Xiaobin)

VERIFIED FIXED

Status

P1
blocker
VERIFIED FIXED
18 years ago
8 years ago

People

(Reporter: joe.chou, Assigned: jband_mozilla)

Tracking

({crash})

Trunk
crash
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [oji_escalation] oji_working, URL)

Attachments

(7 attachments)

(Reporter)

Description

18 years ago
After lazy start of JVM, JavaScript to Java communication somehow was broken. A
simple test case in the URL can help to reproduce the problem: go to the URL,
bring up Java console, and press the print button.
Actual happened: no thing showed on the concole.
Expected to happen: a message, "HelloWorld.print() called" should be shown.
(Reporter)

Comment 1

18 years ago
Assign and to myself and accepted.
Assignee: edburns → joe.chou
(Reporter)

Updated

18 years ago
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

Comment 2

18 years ago
Add Stanley and Patrick to the cc list.

Comment 3

18 years ago
Joe, would you please zip the testcase and post it to here since javaweb is an 
internal website. Thanks!
(Reporter)

Comment 4

18 years ago
Created attachment 35527 [details]
a simple test case
(Reporter)

Comment 5

18 years ago
Created attachment 35528 [details]
A simple test case in plain text.

Comment 6

18 years ago
Marking as OJI hot bugs also.
(Reporter)

Comment 7

18 years ago
Xiaobin and I did some more debugging, and found out that currently the browser
crashed when pressing the print button in the test case attached. The crash
happened on Windows and Solaris (On Linux, somehow, instead of crashing, the
browser was hung.) Tracing on Windows and Solaris, the crash point is the same: 
on line 2179 of jsobj.c:
	
	ok = resolve(cx, obj, js_IdToValue(id));

and the reason of crashing was that resolve somehow was null (0x0). Further
tracing found out that the reason of resove being null was that, on line 2083 of
the same file:

        /* Try obj's class resolve hook if id was not found in obj's scope. */
        if (!sym) {
            clasp = LOCKED_OBJ_GET_CLASS(obj);
            resolve = clasp->resolve;
resolve was assigned with a null value. Obj in LOCKED_OBJ_GET_CLASS(obj) was
javaObject. It seemed the javaObject class somehow was not initialized.

Any ideas?

Comment 8

18 years ago
  I think this bug should also be marked as one of a OJI hot bugs.

Comment 9

18 years ago
Changing severity of this bug to "blocker" so that it's known to be stopping
people from getting their needed functionality out of the browser.  Xiaobin,
this is the way I'll mark it as a hot bug for now, unless we get a specific,
customer-driver case for putting the "oji_escalation" status on it.

If you query the OJI "blocker" bugs, you should see a fairly well-contained
list.  We'll start managing that more closely to help us keep our focus.
Severity: critical → blocker

Updated

18 years ago
Summary: JavaScript to Java broken after lazy start of JVM → JavaScript to Java broken after lazy start of JVM (REQUESTOR: Xiaobin)
Whiteboard: [oji_escalation]

Comment 10

18 years ago
George, I think this bug warrants escalation status.  It blocks Gary Kind.

Comment 11

18 years ago
Gary, will you please add a comment to this bug stating why this is a blocker
for you?  I'd like to start collecting customer "testimonials" about the
important bugs.  It's good data, and you've been giving us great data in the past.

Updated

18 years ago
Blocks: 77194

Comment 12

18 years ago
Jst:
   Would you please look at this bug? I suspect that it has something to do 
with your stuff.

 
(Reporter)

Comment 13

18 years ago
Per my further testing, the May 7 trunk build seemed working OK, but May 8 trunk
build had problems for the same test case attached. When the print button was
pressed, the browser was hung, instead of crashed as in the later builds (as
described earlier). Therefore, it looked like the test case started having
problem after the landing of XPCDOM_20010329_BRANCH.

Jst, would you take a look and see if you can find some clue? Thanks.
This patch fixes *a* crash that I'm seeing, but not really knowing alot about
java I'm not sure I'm even doing the right thing when testing this out. Let me
know if this makes things better.

Index: dom/src/base/nsDOMClassInfo.cpp
===================================================================
RCS file: /cvsroot/mozilla/dom/src/base/nsDOMClassInfo.cpp,v
retrieving revision 1.23
diff -u -r1.23 nsDOMClassInfo.cpp
--- nsDOMClassInfo.cpp  2001/05/22 10:48:22     1.23
+++ nsDOMClassInfo.cpp  2001/05/25 10:10:04
@@ -2987,7 +2987,10 @@

   jobject appletObject = nsnull;
   nsresult rv = javaPluginInstance->GetJavaObject(&appletObject);
-  NS_ENSURE_SUCCESS(rv, rv);
+
+  if (NS_FAILED(rv) || !appletObject) {
+    return rv;
+  }

   nsCOMPtr<nsILiveConnectManager> manager =
     do_GetService(nsIJVMManager::GetCID());

Comment 15

18 years ago
Johnny:
    Thanks for you quick response! 
    However, I don't think it is a problem of Java Plugin. Actually I applied 
your patch, it still can not avoid crash. The appletObject returns from plugin 
is a valid object, anyway.

Updated

18 years ago
Blocks: 74367
(Reporter)

Comment 16

18 years ago
Here is some more testing results of the attached test case:

- Running May 7 build (with lazy start, and XPCDOM landing not checked in yet),
the browser hung at xpcom/base/nsCOMPtr.h line 649, "You can't dereference a
NULL nsCOMPtr with operatior->().".

- Running May 7 build (backed out lazy start, and without XPCDOM), it worked.

- Runing May 8 build (with lazy start and XPCDOM landing), the browser hung.

- Running May 8 build (backed out lazy start, but with XPCDOM), the browser
still hung.

- Running May 21 build (with lazy start, XPCDOM, and whatever already checked
in), crashed (see detail at the beginning of the comments).

Keywords: crash
Priority: -- → P1

Comment 17

18 years ago
Created attachment 36402 [details] [diff] [review]
A patch avoiding crash

Comment 18

18 years ago
    This bug is due to XPCDOM which changes the way of native JS applet object 
being wrapped, I think.
    The reason of crash is "resolve call back function" is NULL for JavaObject 
class which is not valid. I posted a patch which adds this function. 
    After this patch, we can resolve the function of the applet object. However, 
there is another issue needs to be resolved. The issue is the js_Invoke function 
can not find the java object reference from the Javascript call stack. 

Johnny:
   Any ideas on solving this?
jband, help! :-)

Comment 20

18 years ago
Adding status to indicate that it's actively being worked on by Joe.
Whiteboard: [oji_escalation] → [oji_escalation] oji_working
(Assignee)

Comment 21

18 years ago
Yuck!

Note that xiaobin's patch should not return JS_FALSE if the property is not 
found. This will terminate the script if the property is not found. If the 
function is not going to define a JS property anyway then it could just as well 
be implemented as:

JSBool JS_DLL_CALLBACK
JavaObject_resolve(JSContext *cx, JSObject *obj, jsval id)
{
    return JS_TRUE;
}

The new DOM code sets the liveconnect wrapped JavaObject as the JS __proto__ of 
the xpconnect wrapped node using the code in nsHTMLExternalObjSH::PostCreate...
http://lxr.mozilla.org/seamonkey/search?string=nsHTMLExternalObjSH%3A%3APost

Note that the decriptive comments there are for the plugin case where the plugin 
instance is an xpconnect wrapped object.

So the problem we get into is that JS starts out by working on a JSClass-ish 
xpconnect wrapped native JSObject and that object delegates to a JSOps-ish 
liveconnect wrapped java object. Once we get into js_GetProperty we are on our 
way down a slippery slope.

The JavaObject_resolve implementation *could* do a lookup and do a call to 
define a JS property on the object. And you could implement a 
JSClass::getProperty callback that uses the code in JavaObject_getPropertyById.

I think Brendan has a bug (or at least a plan) to get rid of the related 
JavaObject_lookupProperty "*propp = (JSProperty*)1;" hack.

I wonder if the whole JSOps-based scheme should be dumped rather than trying to 
patch this a piece at a time to interoperate.

Comment 22

18 years ago
jband, thanks for your comment here.

Two questions for your comment:
  * Is it worthing to add resolve function? 
  * Why we need to define the JS property to the plugin object? By the way, 
what's the kind of property we need to define.
(Assignee)

Comment 23

18 years ago
Ok. So I'm debugging this and realizing that this all *should* just work right 
and nver even try to cal the JavaObject resolver because the code in 
js_LookupProperty knows how to detect a non-NATIVE JSObject as do the right 
thing. But it was not doing that. Then I discovered that shaver had made the 
JavaObject *be* a native object! (here we are saying 'native' to mean a JSObject 
that uses the JSObjectOps code of jsobj.c - it has slots and all that jazz. 
Specifically this means the the OBJ_IS_NATIVE(obj) macro resolves to true).

shaver checked in fixes to bug 74848 that make these objects be native even 
though they have a JSClass riddled with NULL callbacks! Note my astonishment.

This was *such* a bomb waiting to go off.

I'm not seeing yet exactly why shaver felt the need to go this far.
(Reporter)

Comment 24

18 years ago
More testing results (using same test case attached earlier):

1) With May 7 trunk build (including 26516(lazy start JVM) and others so far,
but no XPCDOM landing yet), if applied patch of 77600, then the test case
worked. Without the patch of 77600, the browser hung (as described in earlier
comments). The reason of the browser being bung was that the lazy start of JVM
started too late, and the patch of 77600 fixed that problem.

2) With May 8 trunk build (including XPCDOM landing, plus other checkins during
the day), even applied patch of 77600, the browser crashed (see stack below).
The cause of the crash was that somehow JSJ hash table was null. Was there any
change (s) in XPCDOM might be related to that? This crash is different from the
crash of May 21 trunk build (due to null resolve method of javaObject, as
described in earlier comments), and it may reveal some clues of the root cause
of the problem.

Stack of crash:
...
Program received signal SIGSEGV, Segmentation fault.
0xff2dac20 in JSJ_HashTableAdd (ht=0x0, key=0x1b01c4, value=0x76e808, 
    arg=0x78f4a8) at jsj_hash.c:285
285         keyHash = (*ht->keyHash)(key, arg);
Current language:  auto; currently c
(gdb) where
#0  0xff2dac20 in JSJ_HashTableAdd (ht=0x0, key=0x1b01c4, value=0x76e808, 
    arg=0x78f4a8) at jsj_hash.c:285
#1  0xff2d58a4 in new_class_descriptor (cx=0x5d9f28, jEnv=0x78f4a8, 
    java_class=0x1b01c4) at jsj_class.c:384
#2  0xff2d5b90 in jsj_GetJavaClassDescriptor (cx=0x5d9f28, jEnv=0x78f4a8, 
    java_class=0x1b02a8) at jsj_class.c:475
#3  0xff2d14a0 in jsj_WrapJavaObject (cx=0x5d9f28, jEnv=0x78f4a8, 
    java_obj=0xf855c, java_class=0x1b02a8) at jsj_JavaObject.c:155
#4  0xff2d86b4 in jsj_ConvertJavaObjectToJSValue (cx=0x5d9f28, jEnv=0x78f4a8, 
    java_obj=0xf855c, vp=0xffbec19c) at jsj_convert.c:861
#5  0xff2cbfb8 in JSJ_ConvertJavaObjectToJSValue (cx=0x5d9f28, 
    java_obj=0xf855c, vp=0xffbec19c) at jsj.c:861
#6  0xfd046dcc in nsJVMManager::WrapJavaObject (this=0x1ebfd0, 
    context=0x5d9f28, javaObject=0xf855c, outJSObject=0xffbec2b8)
    at nsJVMManager.cpp:377
#7  0xfcf9de9c in nsHTMLAppletElementSH::GetPluginJSObject (this=0x7c2920, 
    cx=0x5d9f28, obj=0x600e38, plugin_inst=0x7d7284, plugin_obj=0xffbec2b8, 
    plugin_proto=0xffbec2b4) at nsDOMClassInfo.cpp:2815
#8  0xfcf9c898 in nsHTMLExternalObjSH::PostCreate (this=0x7c2920, 
    wrapper=0x731ca0, cx=0x5d9f28, obj=0x600e38) at nsDOMClassInfo.cpp:2530
#9  0xfdf6c824 in XPCWrappedNative::GetNewOrUsed (ccx=@0xffbec4b0, 
    Object=0x76f198, Scope=0x5c28f0, Interface=0x2c2bd0, 
    resultWrapper=0xffbec41c) at xpcwrappednative.cpp:388 
...

(Assignee)

Comment 25

18 years ago
I think the fix to bug 74848 was simply wrong. AFAICT shaver and brendan decided 
to make these objects be native simply to uphold the assertions in jslock.c that 
would fire when the finalizer took us through a path that called 
js_GetRequiredSlot and thus did the locking.

I think they failed to take into account that there are categories of JSObjects 
that are not native, but have legitimate reasons for using JSScopes and having 
slots and all that. We don't have a reasonable way to detect objects 
like that currently. Perhaps the right thing is to really make these 
liveconnect objects be native. But there is much more to that than the changes 
shaver made.

I propose to undo shaver's change that made them native and (more 
controversially) remove the JS_ASSERT(OBJ_IS_NATIVE(obj)); checks from 
js_LockObj and js_UnlockObj. I'll attach a patch for testing and if brendan or 
shaver surface we can discuss whether or not this is the right thing to do.
(Assignee)

Comment 26

18 years ago
Created attachment 36456 [details] [diff] [review]
patch to make the wrappers be native again.
(Reporter)

Comment 27

18 years ago
I applied the John's patch in the May 8th trunk build, it crashed with the same
stack as the one I just posted.
(Assignee)

Comment 28

18 years ago
So that's a different bug, right? Why not file a new report? Does the current 
trunk crash for you in this way?

Comment 29

18 years ago
  Thanks, John. Would you make your patch be appoved ASAP?

Comment 30

18 years ago
Applied John's patch, it works for me now!

Comment 31

18 years ago
  I didn't test in Unix. In windows, it works now. Joe, did you still get crash 
in Unix?
(Reporter)

Comment 32

18 years ago
I tried John's patch on May 21 trunk build on Solaris, and it hung, with and
without the patch of 77600. The May 21 build was the one that being used when
the bug was filed. I'll try a more recent build later today, and see if the
result may be different.
(Reporter)

Comment 33

18 years ago
I tried the patch on May 29 build on Solaris, still hung. Looking for more
details.
(Assignee)

Comment 34

18 years ago
Joe, do you have any way to break into the hung process and see what it is 
doing? Are you sure it is really stuck? How long do you wait? On a 400mhz NT 
machine I see a circa 15 second pause the first time I hit the print button. No 
pause on subsequent button presses or after page reload. FWIW, I see the same 
sort of pause the first time I run javac on the command line.
(Reporter)

Comment 35

18 years ago
I tried the patch again, this time I did not apply the patch of 77600, only
applied John's patch, and it worked! 

John, in your patch, I noticed that you made the change in the init method for:
	jsj_JavaArray
	jsj_JavaClass
	jsj_JavaObject
what about the other ones like:
	jsj_JavaMember
	jsj_JavaPackage
	jsj_JSObject
Do they need the same change as well?
(Assignee)

Comment 36

18 years ago
Joe: AFAICT I changed all the places that use JSObjectOps.
(Reporter)

Comment 37

18 years ago
I found out that if I did not open the Java console before going to the test
page, the browser would then be hung. Patch of 77600 seemed not to be the quilty
party. Do you see that on Windows, John and Xiaobin? It seems somehow I need to
open the Java console first, then bring up the test page, and the test case will
work. Otherwise, if I bring up the test page then open the Java console, the
browser will get stuck. At least that's what happened on Solaris.

Comment 38

18 years ago
 It is fine in Windows with or without openning Java console first. Actually it 
does not matter whether you apply the "patch" for 77600. 
(Reporter)

Comment 39

18 years ago
Xiaobin, did you try with the test case in this bug (press print button)?
(Reporter)

Comment 40

18 years ago
I found out the reason of "requiring" to start Java console first, and came up
with a patch which seemed to resolve the problem. If the problem turns out to be
Unix only, then I may open a seperate bug to deal with the issue.
(Assignee)

Comment 41

18 years ago
shaver is on vacation and brendan is working on a secret project and wishes to 
not be disturbed. However, they both responded to my email ragging on them for 
their whacky change. As it happens, neither of them liked my proposed solution. 
Patrick and I also discussed this in depth. The consensus is that these objects 
should not sit on the fence trying to both be non-native (as far as JS is 
concerned) yet using JS native handlers to manage their slot accesses. The best 
long term solution may be to get rid of the JSObjectOps stuff and make these 
liveconnect wrappers just *be* JS natives.

But for now the clean and simple thing seems to be to make them be non-native 
and have then *not* use the JS engine slot accessors. This is easy enough to do.  
Patrick and I agreed that it is OK to make these slot accessors be non-locking 
since there is no slot array reallocing to race with. And (we assume) setting 
and getting jsvals from memory locations is atomic on the platforms that matter. 
Otherwise I think my patch is non-controversial. It works for me. Attachment 
coming...
(Assignee)

Comment 42

18 years ago
Created attachment 36620 [details] [diff] [review]
better patch to be non-native with custom slot accessors

Comment 43

18 years ago
Joe:
  I tested both your testcase and mine, both work OK.

(Assignee)

Comment 44

18 years ago
Created attachment 36682 [details] [diff] [review]
and a better patch
(Assignee)

Comment 45

18 years ago
Oops, I realized that in the absense of a JSObjectOps::mark function the jsgc 
will use obj->map->freeslot to determine how many slots to mark. So, we must set 
that. Also, I changed from having a magic number for the slot count to using 
(JSSLOT_PRIVATE+1).
De-lurk: looks ok to me, but the jsj_wrapper_[gs]etRequiredSlot names do not
need to end in "Op", which would have saved some whitespace-only changes in the
diff (eyestrain whine only).  r/sr=brendan@mozilla.org

<relurk/>

/be
Created attachment 36780 [details] [diff] [review]
Fix for crash in DOM helper code when loading bad java applet
The patch I just attached should fix the crash you'll see if an applet fails to
load, that patch should go in with the other patch in this bug.
(Assignee)

Comment 49

18 years ago
r/sr=jband on jst's patch.
(Reporter)

Updated

18 years ago
Blocks: 83698
(Reporter)

Comment 50

18 years ago
With the patches here, on Windows platform the problem seems resolved, but on
Unix, unless the Java console is open first, the browser will be hung. A new bug
has been filed to address the problem, and the bug number is: 83698, which will
have a patch on top of the ones here to make Unix work.

Comment 51

18 years ago
r=beard for DOM helper code patch.

Comment 52

18 years ago
A nit, the patch to jsj_JavaClass.c introduces an extra copy of the comment:

/* Mandatory non-null function pointer members. */

in the declaration of JavaClass_ops. At least that's how patch seems to be 
interpreting it, and I think that's what the diff indicates. r=beard
(Assignee)

Comment 53

18 years ago
Thanks Patrick. Nit fixed.

I'll take this bug and try to get both my last patch and jst's patch approved 
for 0.9.1. Both patches are r='d and sr='d.
Assignee: joe.chou → jband
Status: ASSIGNED → NEW
a= asa@mozilla.org for checkin to the 0.9.1 branch and the trunk.
(on behalf of drivers)
(Assignee)

Comment 55

18 years ago
My patch and jst's patch checked in to 0.9.1 branch and trunk.
Status: NEW → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED

Updated

18 years ago
Blocks: 84082

Comment 56

18 years ago
qa->pmac
QA Contact: shrir → pmac

Comment 57

18 years ago
Verified on the commercial trunk (2001-09-06-05-trunk) and branch build
(2001-09-07-05-0.9.4).
Status: RESOLVED → VERIFIED

Updated

8 years ago
Component: Java: OJI → Java: OJI
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.