Closed
Bug 123920
Opened 23 years ago
Closed 22 years ago
Running Siebel application crashed Netscape 6.2.1 browser [@js_MarkGCThing]
Categories
(Core Graveyard :: Java: Live Connect, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: joe.chou, Assigned: beard)
References
Details
(Keywords: topembed, Whiteboard: [eapp],AOLTW+, sun_621)
Attachments
(1 file)
7.41 KB,
patch
|
jband_mozilla
:
review+
brendan
:
superreview+
asa
:
approval+
|
Details | Diff | Splinter Review |
When running Siebel application, in which an applet repeately called showDocument() to update contents on the page, the browser crashed consistantly in js_MarkGCThing of jsgc.c (see call stack below). This bug does not happen on Windows netscape 6.2. The problem seems not exist in mozilla 097 on Solaris, and seems to be fixed. Does anybody knows what might be the fix? This problem prevents Siebel from releasing their application with Netscape 6.2.1 on Solaris. Really need some timely help here. After crashed, on console: "t@1 (l@7) signal SEGV (no mapping at the fault address) in js_MarkGCThing at li ne 794 in file "jsgc.c"" call stack: ... nsDocShell::SetupNewViewer(this = 0xa0adb8, aNewViewer = 0xa52c20) DocumentViewerImpl::Init(this = 0xa52c20, aParentWidget = 0xa1cf50, aDeviceContext = 0x54f318, aBounds = STRUCT) GlobalWindowImpl::SetNewDocument(this = 0x7335e0, aDocument = 0xe2fee4, removeEventListeners = 1) nsJSContext::GC(this = 0x52a038) JS_GC(cx = 0x878e18) js_ForceGC(cx = 0x878e18) js_GC(cx = 0x878e18, gcflags = 0) JS_DHashTableEnumerate(table = 0x142aa0, etor = 0xff200590 = &`libmozjs.so`jsgc.c`gc_root_marker(struct JSDHashTable *table, struct JSDHashEntryHdr *hdr, uint32 num, void *arg), arg = 0x878e18) gc_root_marker(table = 0x142aa0, hdr = 0x71beb0, num = 463U, arg = 0x878e18) js_MarkGCThing(cx = 0x878e18, thing = 0xa93bd8, arg = (nil))
Comment 1•23 years ago
|
||
cc'ing Brendan on this one - Should Joe look at the gc_root_marker frame to find the name of the object that was rooted? I remember this used to be found via (char*)he->value in gc_root_marker, but I don't know what the format is currently.
Assignee: rogerl → khanson
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 2•23 years ago
|
||
xiaobin pinged me on IRC, and part of our exchange went like this: <brendan> what is the name of the root being marked? <xlu> The root's name is "XPCWrappedNative::mFlatJSObject". <xlu> Assertion failure: root_points_to_gcArenaPool, at jsgc.c:918 I asked that dbradley be cc'd, don't see him yet, so I'm adding him. I doubt this is a khanson core JS engine bug. I would have guessed liveconnect, but the crash seems new with 6.2.1 and xiaobin's debug build fingered XPConnect, sort of. dbradley, jband, any ideas? /be
Comment 3•23 years ago
|
||
This could be similar to 120629. If I read this correct, it appears to be a case where something was rooted, deleted, but not unrooted? I'll see what I can find with the logic dealing with mFlatJSObject.
Did some testing yesterday in Siebel again, and found out that if added a sleep(1) after showDocument in the applet that repeatly updating a frame in a display page, the crash was postponed a good while longer. On Unix, JVM is a separate process, and it seemed there is some kind of out of order between JVM and the browser GC. In this case, JVM seemed running ahead of browser. On the other hand, running mozilla097 with the same JVM (JRE) appeared to be much more stable. Any body knows any bug fixes might have been contributing to the improvement?
Comment 5•23 years ago
|
||
Curious, would this applet be spawning a thread and making the calls from a thread other than the main thread?
Comment 6•23 years ago
|
||
Taking this one for now. This looks xpconnect related and similar to bug 120629.
Assignee: khanson → dbradley
Assignee | ||
Comment 8•23 years ago
|
||
I spent some time looking at Sun's UNIX plugin, and it appears to be calling nsIPluginManager::GetURL() from a thread they created to communicate with the out of process JVM. Given the lack of thread safety in these interfaces, I'd wager that this bug is invalid. I coded up a partial solution for them, still waiting to see if this clears up the problem.
Assignee | ||
Comment 9•23 years ago
|
||
-> OJI component.
Assignee: dbradley → joe.chou
Component: XPConnect → OJI
QA Contact: pschwartau → pmac
Comment 10•23 years ago
|
||
Actually after further investigation, JPI code has no problem here. I will mail Patrick the code which did the right thing.
Assignee | ||
Comment 11•23 years ago
|
||
Having had a chance to peruse the code in more detail, it does appear that nsIPluginManager::GetURL() is being called from the correct thread. Oh what a twisted web we weave... We're not out of the woods yet.
Comment 12•23 years ago
|
||
Joe: Please paste the crash stack here and I believe that will ease the debugging. Change the component to "XPConnect".
Component: OJI → XPConnect
Reporter | ||
Comment 13•23 years ago
|
||
Per Siebel's latest update on the problem, the crash seems to be caused by showing a message in a frame that is being refreshed. In Siebel's application, a frame is being refreshed at an intervel, whereas a message can also be display to the same frame periodically. When the message is being added to the frame, the frame can be in the middle of refreshing, which may be causing GC problems. The same application works OK on Windows (with NS6.2). Any ideas?
Major corporations depend on eapp bugs, and need them to be fixed before they can recommend Mozilla-based products to their customers. Adding nsbeta1+ keyword and making sure the bugs get re-evaluated if they are targeted beyond 1.0.
Keywords: nsbeta1+
Comment 16•23 years ago
|
||
Since the suspected module is xpconnect, experts of xpconnect/js garbage collection should take a look at this bug and provide their opinion to Sun. This is a high pri high impact bug for Sun and warrants attention of experts to close this on Sun branch ASAP. Sun will not be able to fix it in our branch if this is not fixed/tested by 25th.
Assignee: joe.chou → dbradley
QA Contact: pmac → pschwartau
Comment 17•23 years ago
|
||
eapp was incorrectly used to change this to nsbeta1+. Resetting to nsbeta1 to nominate. This is an important issue and deserves to be nsbeta1+ by the ADT. This issue blocks support of Siebel's appliaction by Sun's branded NS6.2
Comment 18•23 years ago
|
||
This seems like it may be related to bug 126279. In looking at a lot of js_Mark* crashes there seems to be two themes, one of marking deleted or invalid JS Objects. And one where the stack gets trashed and the program tries to return to an invalid code address. Unfortunately I've been unable to reproduce this in a controlled environment. If this applet is causing a crash consistently it would be a great help to get small example that I could run and try to reproduce.
Reporter | ||
Comment 19•23 years ago
|
||
David, Unfortunatly, the setting of the problem was quite complicated (with phone switch configured for the Siebel call center app, etc.), so far, we have not been able to create a test case that allows people outside of Siebel to reproduce the problem. On the other hand, the problem was consistantly reproducable at Siebel, and the crash always happened in js_Mark*. We have a debug environment set up in Siebel (debug build of netscape, source and build environment). Is it possible that you and I can go to Siebel together to take a look? If not, is it possible to join me for the debug session remotely?
Reporter | ||
Comment 21•23 years ago
|
||
A complete call stack: main(argc = 1, argv = 0xffbef1d4) main1(argc = 1, argv = 0xffbef1d4, nativeApp = (nil)) nsAppShellService::Run(this = 0x16aea8) nsAppShell::Run(this = 0x10f060) gtk_main(0xf6480, 0x200ae0, 0x0, 0x0, 0x0, 0x0) g_main_run(0x246db8, 0x246db8, 0x1, 0x0, 0x0, 0x0) g_main_iterate(0x1, 0x1, 0xff1507c8, 0x0, 0xff3e2668, 0xfcf95140) g_main_dispatch(0xffbeeb00, 0x16f4b0, 0x1, 0x246da8, 0x0, 0x0) g_io_unix_dispatch(0x243068, 0xffbeeb00, 0x246da8, 0x0, 0x0, 0xffbeea68) our_gdk_io_invoke(source = 0x249e30, condition = G_IO_IN, data = 0x246da8) event_processor_callback(data = 0xf6480, source = 5, condition = GDK_INPUT _READ) nsEventQueueImpl::ProcessPendingEvents(this = 0xf6480) PL_ProcessPendingEvents(self = 0xf64b0) PL_HandleEvent(self = 0x115ec9c) nsARequestObserverEvent::HandlePLEvent(plev = 0x115ec9c) nsOnStartRequestEvent::HandleEvent(this = 0x115ec98) nsHttpChannel::OnStartRequest(this = 0xba7810, request = 0x131ef38, ctxt = (nil)) nsHttpChannel::ProcessResponse(this = 0xba7810) nsHttpChannel::ProcessNormal(this = 0xba7810) nsDocumentOpenInfo::OnStartRequest(this = 0x135dc18, request = 0xba7810, a Ctxt = (nil)) nsDocumentOpenInfo::DispatchContent(this = 0x135dc18, request = 0xba7810, aCtxt = (nil)) nsDSURIContentListener::DoContent(this = 0x733c08, aContentType = 0xffbee3 48 "text/html", aCommand = 7, request = 0xba7810, aContentHandler = 0xffbee418, aAbortProcess = 0xffbee388 ) nsDocShell::CreateContentViewer(this = 0x7d6888, aContentType = 0xffbee348 "text/html", request = 0xba7810, aContentHandler = 0xffbee418) nsWebShell::Embed(this = 0x7d6888, aContentViewer = 0x9aa490, aCommand = 0 xfbfb5801 "", aExtraInfo = (nil)) nsDocShell::Embed(this = 0x7d6888, aContentViewer = 0x9aa490, aCommand = 0 xfbfb5801 "", aExtraInfo = (nil)) nsWebShell::SetupNewViewer(this = 0x7d6888, aViewer = 0x9aa490) nsDocShell::SetupNewViewer(this = 0x7d6888, aNewViewer = 0x9aa490) DocumentViewerImpl::Init(this = 0x9aa490, aParentWidget = 0x7c0030, aDevic eContext = 0x78a530, aBounds = STRUCT) GlobalWindowImpl::SetNewDocument(this = 0x810a38, aDocument = 0xc3580c, re moveEventListeners = 1) nsJSContext::GC(this = 0x810c30) JS_GC(cx = 0x7860f8) js_ForceGC(cx = 0x7860f8) js_GC(cx = 0x7860f8, gcflags = 0) JS_DHashTableEnumerate(table = 0x1429b8, etor = 0xff200578 = &`libmozjs.so `jsgc.c`gc_root_marker(struct JSDHashTable *table, struct JSDHashEntryHdr *hdr, uint32 num, void *arg), ar g = 0x7860f8) gc_root_marker(table = 0x1429b8, hdr = 0x6d4c20, num = 133U, arg = 0x7860f 8) js_MarkGCThing(cx = 0x7860f8, thing = 0xb96300, arg = (nil))
Summary: Running Siebel application crashed Netscape 6.2.1 browser → Running Siebel application crashed Netscape 6.2.1 browser [@js_MarkGCThing]
Comment 22•22 years ago
|
||
one very interesting point about this crash is that it does not occur under linux when the 1.3.1 java plugin is installed. i observed a crash when the 1.4 java plugin was installed, but it was not this crash. i've heard that there is a known problem (crash) with the 1.4 java plugin that explains what i was seeing. however, it seems that this could very likely be a problem in the plugin code. joe: is there any way you could test the 1.3.1 java plugin under solaris instead? i know you said that it is not built using the same compiler, but is there any way to get around that?
Comment 23•22 years ago
|
||
Regarding last comment.
>however, it seems that this could very likely be a problem in the plugin code.
Can you provide some details regarding this?
Assignee | ||
Comment 24•22 years ago
|
||
Changing component to LiveConnect, assigning to self.
Assignee: dbradley → beard
Component: XPConnect → Live Connect
Assignee | ||
Comment 25•22 years ago
|
||
Good news, I was able to diagnose the problem by observing the tell-tale assertion: JS_ASSERT(!rt->gcRunning); in |js_AllocGCThing()|. The culprit turned out to be in LiveConnect, in which |JavaObject_finalize()| was calling |JNIEnv::DeleteGlobalRef()|. In Sun's current plugin, this would have the side-effect of servicing a running Java applet's requests to change the status message at the bottom of the browser window. This would in turn wind its way into the DOM, and finally back into |JS_NewObject()|. Since |JavaObject_finalize()| is only called during a garbage collection cycle, this caused the assertion to fire and return NULL from |JS_NewObject()|. This lead to a dangling root installed by |XPCWrappedNative::Init()|. Bottom line, don't call into Java during finalization. I decided to add an extra word to the |JavaObjectWrapper| struct to cache the wrapped Java object's hash code, to avoid one call into Java from |JavaObject_finalize()|. After this cached hash code is used, the word then is used to serve as a link in a singly linked list of Java wrappers waiting to have their JNI global references deleted. With jband's assistance, I was able to follow write a GC callback function that processes the deferred list of Java wrappers when the GC reaches the JSGC_END phase.
Assignee | ||
Updated•22 years ago
|
Comment 27•22 years ago
|
||
Comment on attachment 73159 [details] [diff] [review] patch v1 Awesome work. Nit: no purty JSBool and JS_TRUE/JS_FALSE for installed_GC_callback? Nit: indentation glitch: + if (jsj_env) { + JS_ASSERT(jsj_env->recursion_depth > 0); + if (--jsj_env->recursion_depth == 0) + jsj_env->cx = NULL; + } sr=brendan@mozilla.org /be
Attachment #73159 -
Flags: superreview+
Comment 28•22 years ago
|
||
Comment on attachment 73159 [details] [diff] [review] patch v1 Nit: maybe put an empty line before some of the crucial comments, too. E.g. >+ while (java_wrapper) { >+ deferred_wrappers = java_wrapper->u.next; >+ /* 1. need a JNIEnv to execute DeleteGlobalRef of each wrapper. */ >+ (*jEnv)->DeleteGlobalRef(jEnv, java_wrapper->java_obj); > if (java_obj) { >- remove_java_obj_reflection_from_hashtable(java_obj, jEnv); >- (*jEnv)->DeleteGlobalRef(jEnv, java_obj); >+ remove_java_obj_reflection_from_hashtable(java_obj, java_wrapper->u.hash_code); >+ /* defer releasing global refs until it is safe to do so. */ >+ java_wrapper->u.next = deferred_wrappers; >+ deferred_wrappers = java_wrapper; /be
Comment 29•22 years ago
|
||
Comment on attachment 73159 [details] [diff] [review] patch v1 r=jband
Attachment #73159 -
Flags: review+
Assignee | ||
Updated•22 years ago
|
Comment 30•22 years ago
|
||
Comment on attachment 73159 [details] [diff] [review] patch v1 a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #73159 -
Flags: approval+
Comment 31•22 years ago
|
||
Patrick, can you also land this on the 0.9.9 branch? THanks.
Assignee | ||
Comment 32•22 years ago
|
||
Nits noted, corrected, and patch checked in on trunk.
Assignee | ||
Comment 33•22 years ago
|
||
Patches now checked in on 0.9.9 branch. They also need to be checked in on the appropriate branch Sun is using.
Comment 34•22 years ago
|
||
Patrick: are we going to wait until the checkin is made on the appropriate Sun branch before marking this one "Fixed"?
Comment 36•22 years ago
|
||
i still see a crash with a current cvs build, linux, using the cult3d plugin -go to http://cult3d.com/ -click the pic of an orange can If you don't crash right away, choose another "label" for the can. Backtrace from a non-debug: #0 0x4008b278 in js_MarkGCThing () from libmozjs.so #1 0x4008b332 in gc_root_marker () from libmozjs.so #2 0x4007b005 in JS_DHashTableEnumerate () from libmozjs.so #3 0x4008b60f in js_GC () from libmozjs.so #4 0x4008b3a5 in js_ForceGC () from libmozjs.so #5 0x4006bb27 in JS_GC () from libmozjs.so #6 0x412f9f6b in nsJSContext::Notify () from libjsdom.so #7 0x401723ef in nsTimerImpl::Process () from libxpcom.so #8 0x4017246b in handleMyEvent () from libxpcom.so #9 0x4016e3bb in PL_HandleEvent () from libxpcom.so #10 0x4016e2b5 in PL_ProcessPendingEvents () from /libxpcom.so #11 0x4016f2ff in nsEventQueueImpl::ProcessPendingEvents () from libxpcom.so #12 0x4117a606 in event_processor_callback () from libwidget_gtk.so #13 0x4117a368 in our_gdk_io_invoke () from libwidget_gtk.so #14 0x40395a7a in g_io_unix_dispatch (source_data=0x82db358, current_time=0xbfffe800, user_data=0x83170d0) at giounix.c:137 #15 0x40397055 in g_main_dispatch (dispatch_time=0xbfffe800) at gmain.c:656 #16 0x40397659 in g_main_iterate (block=1, dispatch=1) at gmain.c:877 #17 0x403977e8 in g_main_run (loop=0x83170e0) at gmain.c:935 #18 0x402ab65b in gtk_main () at gtkmain.c:524 #19 0x4117aae5 in nsAppShell::Run () from libwidget_gtk.so #20 0x4115a9ca in nsAppShellService::Run () from libnsappshell.so #21 0x0805201b in main1 () #22 0x08052925 in main () #23 0x404f5627 in __libc_start_main (main=0x80527ec <main>, argc=1, ubp_av=0xbfffec04, init=0x804c9fc <_init>, fini=0x8053f34 <_fini>, rtld_fini=0x4000dcc4 <_dl_fini>, stack_end=0xbfffebfc) at ../sysdeps/generic/libc-start.c:129
Reporter | ||
Comment 37•22 years ago
|
||
This fix has been checked into the Sun 6.2.1 branch.
Whiteboard: [eapp],AOLTW → [eapp],AOLTW, sun_621
Comment 38•22 years ago
|
||
R.K.Aa (dark@c2i.net): unless any of the developers here disagree, you should file a new bug on the cult3d site, because I believe yours is a separate problem. By the way, I tried unsuccessfully to download the cult3d plugin on both Linux and WinNT. On each platform, the site said that Netscape 6 is not supported. For example: Sorry! There is currently no support for Netscape 6! However, you know more about this plug-in than I; if you want, please file a separate bug against the Plug-ins component; thanks -
Comment 39•22 years ago
|
||
filed bug 130183
Comment 40•22 years ago
|
||
looks like its time to close this one. beard, want to do the deed?
Reporter | ||
Comment 41•22 years ago
|
||
I verified this fix in Siebel today with the new release build. We can close this bug now.
Assignee | ||
Comment 42•22 years ago
|
||
Closing it down.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Updated•22 years ago
|
Whiteboard: [eapp],AOLTW, sun_621 → [eapp],AOLTW+, sun_621
You need to log in
before you can comment on or make changes to this bug.
Description
•