Closed Bug 82034 Opened 24 years ago Closed 24 years ago

JavaScript to Java broken after lazy start of JVM (REQUESTOR: Xiaobin)

Categories

(Core Graveyard :: Java: OJI, defect, P1)

defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: joe.chou, Assigned: jband_mozilla)

References

()

Details

(Keywords: crash, Whiteboard: [oji_escalation] oji_working)

Attachments

(7 files)

After lazy start of JVM, JavaScript to Java communication somehow was broken. A simple test case in the URL can help to reproduce the problem: go to the URL, bring up Java console, and press the print button. Actual happened: no thing showed on the concole. Expected to happen: a message, "HelloWorld.print() called" should be shown.
Assign and to myself and accepted.
Assignee: edburns → joe.chou
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Add Stanley and Patrick to the cc list.
Joe, would you please zip the testcase and post it to here since javaweb is an internal website. Thanks!
Attached file a simple test case
Marking as OJI hot bugs also.
Xiaobin and I did some more debugging, and found out that currently the browser crashed when pressing the print button in the test case attached. The crash happened on Windows and Solaris (On Linux, somehow, instead of crashing, the browser was hung.) Tracing on Windows and Solaris, the crash point is the same: on line 2179 of jsobj.c: ok = resolve(cx, obj, js_IdToValue(id)); and the reason of crashing was that resolve somehow was null (0x0). Further tracing found out that the reason of resove being null was that, on line 2083 of the same file: /* Try obj's class resolve hook if id was not found in obj's scope. */ if (!sym) { clasp = LOCKED_OBJ_GET_CLASS(obj); resolve = clasp->resolve; resolve was assigned with a null value. Obj in LOCKED_OBJ_GET_CLASS(obj) was javaObject. It seemed the javaObject class somehow was not initialized. Any ideas?
I think this bug should also be marked as one of a OJI hot bugs.
Changing severity of this bug to "blocker" so that it's known to be stopping people from getting their needed functionality out of the browser. Xiaobin, this is the way I'll mark it as a hot bug for now, unless we get a specific, customer-driver case for putting the "oji_escalation" status on it. If you query the OJI "blocker" bugs, you should see a fairly well-contained list. We'll start managing that more closely to help us keep our focus.
Severity: critical → blocker
Summary: JavaScript to Java broken after lazy start of JVM → JavaScript to Java broken after lazy start of JVM (REQUESTOR: Xiaobin)
Whiteboard: [oji_escalation]
George, I think this bug warrants escalation status. It blocks Gary Kind.
Gary, will you please add a comment to this bug stating why this is a blocker for you? I'd like to start collecting customer "testimonials" about the important bugs. It's good data, and you've been giving us great data in the past.
Blocks: 77194
Jst: Would you please look at this bug? I suspect that it has something to do with your stuff.
Per my further testing, the May 7 trunk build seemed working OK, but May 8 trunk build had problems for the same test case attached. When the print button was pressed, the browser was hung, instead of crashed as in the later builds (as described earlier). Therefore, it looked like the test case started having problem after the landing of XPCDOM_20010329_BRANCH. Jst, would you take a look and see if you can find some clue? Thanks.
This patch fixes *a* crash that I'm seeing, but not really knowing alot about java I'm not sure I'm even doing the right thing when testing this out. Let me know if this makes things better. Index: dom/src/base/nsDOMClassInfo.cpp =================================================================== RCS file: /cvsroot/mozilla/dom/src/base/nsDOMClassInfo.cpp,v retrieving revision 1.23 diff -u -r1.23 nsDOMClassInfo.cpp --- nsDOMClassInfo.cpp 2001/05/22 10:48:22 1.23 +++ nsDOMClassInfo.cpp 2001/05/25 10:10:04 @@ -2987,7 +2987,10 @@ jobject appletObject = nsnull; nsresult rv = javaPluginInstance->GetJavaObject(&appletObject); - NS_ENSURE_SUCCESS(rv, rv); + + if (NS_FAILED(rv) || !appletObject) { + return rv; + } nsCOMPtr<nsILiveConnectManager> manager = do_GetService(nsIJVMManager::GetCID());
Johnny: Thanks for you quick response! However, I don't think it is a problem of Java Plugin. Actually I applied your patch, it still can not avoid crash. The appletObject returns from plugin is a valid object, anyway.
Blocks: 74367
Here is some more testing results of the attached test case: - Running May 7 build (with lazy start, and XPCDOM landing not checked in yet), the browser hung at xpcom/base/nsCOMPtr.h line 649, "You can't dereference a NULL nsCOMPtr with operatior->().". - Running May 7 build (backed out lazy start, and without XPCDOM), it worked. - Runing May 8 build (with lazy start and XPCDOM landing), the browser hung. - Running May 8 build (backed out lazy start, but with XPCDOM), the browser still hung. - Running May 21 build (with lazy start, XPCDOM, and whatever already checked in), crashed (see detail at the beginning of the comments).
Keywords: crash
Priority: -- → P1
This bug is due to XPCDOM which changes the way of native JS applet object being wrapped, I think. The reason of crash is "resolve call back function" is NULL for JavaObject class which is not valid. I posted a patch which adds this function. After this patch, we can resolve the function of the applet object. However, there is another issue needs to be resolved. The issue is the js_Invoke function can not find the java object reference from the Javascript call stack. Johnny: Any ideas on solving this?
jband, help! :-)
Adding status to indicate that it's actively being worked on by Joe.
Whiteboard: [oji_escalation] → [oji_escalation] oji_working
Yuck! Note that xiaobin's patch should not return JS_FALSE if the property is not found. This will terminate the script if the property is not found. If the function is not going to define a JS property anyway then it could just as well be implemented as: JSBool JS_DLL_CALLBACK JavaObject_resolve(JSContext *cx, JSObject *obj, jsval id) { return JS_TRUE; } The new DOM code sets the liveconnect wrapped JavaObject as the JS __proto__ of the xpconnect wrapped node using the code in nsHTMLExternalObjSH::PostCreate... http://lxr.mozilla.org/seamonkey/search?string=nsHTMLExternalObjSH%3A%3APost Note that the decriptive comments there are for the plugin case where the plugin instance is an xpconnect wrapped object. So the problem we get into is that JS starts out by working on a JSClass-ish xpconnect wrapped native JSObject and that object delegates to a JSOps-ish liveconnect wrapped java object. Once we get into js_GetProperty we are on our way down a slippery slope. The JavaObject_resolve implementation *could* do a lookup and do a call to define a JS property on the object. And you could implement a JSClass::getProperty callback that uses the code in JavaObject_getPropertyById. I think Brendan has a bug (or at least a plan) to get rid of the related JavaObject_lookupProperty "*propp = (JSProperty*)1;" hack. I wonder if the whole JSOps-based scheme should be dumped rather than trying to patch this a piece at a time to interoperate.
jband, thanks for your comment here. Two questions for your comment: * Is it worthing to add resolve function? * Why we need to define the JS property to the plugin object? By the way, what's the kind of property we need to define.
Ok. So I'm debugging this and realizing that this all *should* just work right and nver even try to cal the JavaObject resolver because the code in js_LookupProperty knows how to detect a non-NATIVE JSObject as do the right thing. But it was not doing that. Then I discovered that shaver had made the JavaObject *be* a native object! (here we are saying 'native' to mean a JSObject that uses the JSObjectOps code of jsobj.c - it has slots and all that jazz. Specifically this means the the OBJ_IS_NATIVE(obj) macro resolves to true). shaver checked in fixes to bug 74848 that make these objects be native even though they have a JSClass riddled with NULL callbacks! Note my astonishment. This was *such* a bomb waiting to go off. I'm not seeing yet exactly why shaver felt the need to go this far.
More testing results (using same test case attached earlier): 1) With May 7 trunk build (including 26516(lazy start JVM) and others so far, but no XPCDOM landing yet), if applied patch of 77600, then the test case worked. Without the patch of 77600, the browser hung (as described in earlier comments). The reason of the browser being bung was that the lazy start of JVM started too late, and the patch of 77600 fixed that problem. 2) With May 8 trunk build (including XPCDOM landing, plus other checkins during the day), even applied patch of 77600, the browser crashed (see stack below). The cause of the crash was that somehow JSJ hash table was null. Was there any change (s) in XPCDOM might be related to that? This crash is different from the crash of May 21 trunk build (due to null resolve method of javaObject, as described in earlier comments), and it may reveal some clues of the root cause of the problem. Stack of crash: ... Program received signal SIGSEGV, Segmentation fault. 0xff2dac20 in JSJ_HashTableAdd (ht=0x0, key=0x1b01c4, value=0x76e808, arg=0x78f4a8) at jsj_hash.c:285 285 keyHash = (*ht->keyHash)(key, arg); Current language: auto; currently c (gdb) where #0 0xff2dac20 in JSJ_HashTableAdd (ht=0x0, key=0x1b01c4, value=0x76e808, arg=0x78f4a8) at jsj_hash.c:285 #1 0xff2d58a4 in new_class_descriptor (cx=0x5d9f28, jEnv=0x78f4a8, java_class=0x1b01c4) at jsj_class.c:384 #2 0xff2d5b90 in jsj_GetJavaClassDescriptor (cx=0x5d9f28, jEnv=0x78f4a8, java_class=0x1b02a8) at jsj_class.c:475 #3 0xff2d14a0 in jsj_WrapJavaObject (cx=0x5d9f28, jEnv=0x78f4a8, java_obj=0xf855c, java_class=0x1b02a8) at jsj_JavaObject.c:155 #4 0xff2d86b4 in jsj_ConvertJavaObjectToJSValue (cx=0x5d9f28, jEnv=0x78f4a8, java_obj=0xf855c, vp=0xffbec19c) at jsj_convert.c:861 #5 0xff2cbfb8 in JSJ_ConvertJavaObjectToJSValue (cx=0x5d9f28, java_obj=0xf855c, vp=0xffbec19c) at jsj.c:861 #6 0xfd046dcc in nsJVMManager::WrapJavaObject (this=0x1ebfd0, context=0x5d9f28, javaObject=0xf855c, outJSObject=0xffbec2b8) at nsJVMManager.cpp:377 #7 0xfcf9de9c in nsHTMLAppletElementSH::GetPluginJSObject (this=0x7c2920, cx=0x5d9f28, obj=0x600e38, plugin_inst=0x7d7284, plugin_obj=0xffbec2b8, plugin_proto=0xffbec2b4) at nsDOMClassInfo.cpp:2815 #8 0xfcf9c898 in nsHTMLExternalObjSH::PostCreate (this=0x7c2920, wrapper=0x731ca0, cx=0x5d9f28, obj=0x600e38) at nsDOMClassInfo.cpp:2530 #9 0xfdf6c824 in XPCWrappedNative::GetNewOrUsed (ccx=@0xffbec4b0, Object=0x76f198, Scope=0x5c28f0, Interface=0x2c2bd0, resultWrapper=0xffbec41c) at xpcwrappednative.cpp:388 ...
I think the fix to bug 74848 was simply wrong. AFAICT shaver and brendan decided to make these objects be native simply to uphold the assertions in jslock.c that would fire when the finalizer took us through a path that called js_GetRequiredSlot and thus did the locking. I think they failed to take into account that there are categories of JSObjects that are not native, but have legitimate reasons for using JSScopes and having slots and all that. We don't have a reasonable way to detect objects like that currently. Perhaps the right thing is to really make these liveconnect objects be native. But there is much more to that than the changes shaver made. I propose to undo shaver's change that made them native and (more controversially) remove the JS_ASSERT(OBJ_IS_NATIVE(obj)); checks from js_LockObj and js_UnlockObj. I'll attach a patch for testing and if brendan or shaver surface we can discuss whether or not this is the right thing to do.
I applied the John's patch in the May 8th trunk build, it crashed with the same stack as the one I just posted.
So that's a different bug, right? Why not file a new report? Does the current trunk crash for you in this way?
Thanks, John. Would you make your patch be appoved ASAP?
Applied John's patch, it works for me now!
I didn't test in Unix. In windows, it works now. Joe, did you still get crash in Unix?
I tried John's patch on May 21 trunk build on Solaris, and it hung, with and without the patch of 77600. The May 21 build was the one that being used when the bug was filed. I'll try a more recent build later today, and see if the result may be different.
I tried the patch on May 29 build on Solaris, still hung. Looking for more details.
Joe, do you have any way to break into the hung process and see what it is doing? Are you sure it is really stuck? How long do you wait? On a 400mhz NT machine I see a circa 15 second pause the first time I hit the print button. No pause on subsequent button presses or after page reload. FWIW, I see the same sort of pause the first time I run javac on the command line.
I tried the patch again, this time I did not apply the patch of 77600, only applied John's patch, and it worked! John, in your patch, I noticed that you made the change in the init method for: jsj_JavaArray jsj_JavaClass jsj_JavaObject what about the other ones like: jsj_JavaMember jsj_JavaPackage jsj_JSObject Do they need the same change as well?
Joe: AFAICT I changed all the places that use JSObjectOps.
I found out that if I did not open the Java console before going to the test page, the browser would then be hung. Patch of 77600 seemed not to be the quilty party. Do you see that on Windows, John and Xiaobin? It seems somehow I need to open the Java console first, then bring up the test page, and the test case will work. Otherwise, if I bring up the test page then open the Java console, the browser will get stuck. At least that's what happened on Solaris.
It is fine in Windows with or without openning Java console first. Actually it does not matter whether you apply the "patch" for 77600.
Xiaobin, did you try with the test case in this bug (press print button)?
I found out the reason of "requiring" to start Java console first, and came up with a patch which seemed to resolve the problem. If the problem turns out to be Unix only, then I may open a seperate bug to deal with the issue.
shaver is on vacation and brendan is working on a secret project and wishes to not be disturbed. However, they both responded to my email ragging on them for their whacky change. As it happens, neither of them liked my proposed solution. Patrick and I also discussed this in depth. The consensus is that these objects should not sit on the fence trying to both be non-native (as far as JS is concerned) yet using JS native handlers to manage their slot accesses. The best long term solution may be to get rid of the JSObjectOps stuff and make these liveconnect wrappers just *be* JS natives. But for now the clean and simple thing seems to be to make them be non-native and have then *not* use the JS engine slot accessors. This is easy enough to do. Patrick and I agreed that it is OK to make these slot accessors be non-locking since there is no slot array reallocing to race with. And (we assume) setting and getting jsvals from memory locations is atomic on the platforms that matter. Otherwise I think my patch is non-controversial. It works for me. Attachment coming...
Joe: I tested both your testcase and mine, both work OK.
Oops, I realized that in the absense of a JSObjectOps::mark function the jsgc will use obj->map->freeslot to determine how many slots to mark. So, we must set that. Also, I changed from having a magic number for the slot count to using (JSSLOT_PRIVATE+1).
De-lurk: looks ok to me, but the jsj_wrapper_[gs]etRequiredSlot names do not need to end in "Op", which would have saved some whitespace-only changes in the diff (eyestrain whine only). r/sr=brendan@mozilla.org <relurk/> /be
The patch I just attached should fix the crash you'll see if an applet fails to load, that patch should go in with the other patch in this bug.
r/sr=jband on jst's patch.
Blocks: 83698
With the patches here, on Windows platform the problem seems resolved, but on Unix, unless the Java console is open first, the browser will be hung. A new bug has been filed to address the problem, and the bug number is: 83698, which will have a patch on top of the ones here to make Unix work.
r=beard for DOM helper code patch.
A nit, the patch to jsj_JavaClass.c introduces an extra copy of the comment: /* Mandatory non-null function pointer members. */ in the declaration of JavaClass_ops. At least that's how patch seems to be interpreting it, and I think that's what the diff indicates. r=beard
Thanks Patrick. Nit fixed. I'll take this bug and try to get both my last patch and jst's patch approved for 0.9.1. Both patches are r='d and sr='d.
Assignee: joe.chou → jband
Status: ASSIGNED → NEW
a= asa@mozilla.org for checkin to the 0.9.1 branch and the trunk. (on behalf of drivers)
My patch and jst's patch checked in to 0.9.1 branch and trunk.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Blocks: 84082
qa->pmac
QA Contact: shrir → pmac
Verified on the commercial trunk (2001-09-06-05-trunk) and branch build (2001-09-07-05-0.9.4).
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: