Closed Bug 123920 Opened 23 years ago Closed 23 years ago

Running Siebel application crashed Netscape 6.2.1 browser [@js_MarkGCThing]

Categories

(Core Graveyard :: Java: Live Connect, defect)

Sun
Solaris
defect
Not set
blocker

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: joe.chou, Assigned: beard)

References

Details

(Keywords: topembed, Whiteboard: [eapp],AOLTW+, sun_621)

Attachments

(1 file)

When running Siebel application, in which an applet repeately called
showDocument() to update contents on the page, the browser crashed consistantly
in js_MarkGCThing of jsgc.c (see call stack below). This bug does not happen on
Windows netscape 6.2. The problem seems not exist in mozilla 097 on Solaris, and
seems to be fixed. Does anybody knows what might be the fix? This problem
prevents Siebel from releasing their application with Netscape 6.2.1 on Solaris.
Really need some timely help here.


After crashed, 

on console:

"t@1 (l@7) signal SEGV (no mapping at the fault address) in js_MarkGCThing at li
ne 794 in file "jsgc.c""


call stack:

...
nsDocShell::SetupNewViewer(this = 0xa0adb8, aNewViewer = 0xa52c20)
   DocumentViewerImpl::Init(this = 0xa52c20, aParentWidget = 0xa1cf50,
 aDeviceContext = 0x54f318, aBounds = STRUCT)
   GlobalWindowImpl::SetNewDocument(this = 0x7335e0, aDocument =
 0xe2fee4, removeEventListeners = 1)
   nsJSContext::GC(this = 0x52a038)
   JS_GC(cx = 0x878e18)
   js_ForceGC(cx = 0x878e18)
   js_GC(cx = 0x878e18, gcflags = 0)
   JS_DHashTableEnumerate(table = 0x142aa0, etor = 0xff200590 =
 &`libmozjs.so`jsgc.c`gc_root_marker(struct JSDHashTable *table, struct
 JSDHashEntryHdr *hdr, uint32 num, void *arg), arg = 0x878e18)
   gc_root_marker(table = 0x142aa0, hdr = 0x71beb0, num = 463U, arg =
 0x878e18)
  js_MarkGCThing(cx = 0x878e18, thing = 0xa93bd8, arg = (nil))
cc'ing Brendan on this one -

Should Joe look at the gc_root_marker frame to find the name of the object
that was rooted? I remember this used to be found via (char*)he->value
in gc_root_marker, but I don't know what the format is currently. 
Assignee: rogerl → khanson
Status: UNCONFIRMED → NEW
Ever confirmed: true
xiaobin pinged me on IRC, and part of our exchange went like this:

<brendan> what is the name of the root being marked?
<xlu> The root's name is "XPCWrappedNative::mFlatJSObject".
<xlu> Assertion failure: root_points_to_gcArenaPool, at jsgc.c:918

I asked that dbradley be cc'd, don't see him yet, so I'm adding him.  I doubt
this is a khanson core JS engine bug.  I would have guessed liveconnect, but the
crash seems new with 6.2.1 and xiaobin's debug build fingered XPConnect, sort
of.  dbradley, jband, any ideas?

/be
This could be similar to 120629.

If I read this correct, it appears to be a case where something was rooted,
deleted, but not unrooted? I'll see what I can find with the logic dealing with
mFlatJSObject.
Did some testing yesterday in Siebel again, and found out that if added a
sleep(1) after showDocument in the applet that repeatly updating a frame in a
display page, the crash was postponed a good while longer. On Unix, JVM is a
separate process, and it seemed there is some kind of out of order between JVM
and the browser GC. In this case, JVM seemed running ahead of browser. 

On the other hand, running mozilla097 with the same JVM (JRE) appeared to be
much more stable. Any body knows any bug fixes might have been contributing to
the improvement? 
Curious, would this applet be spawning a thread and making the calls from a
thread other than the main thread?
Taking this one for now. This looks xpconnect related and similar to bug 120629.
Assignee: khanson → dbradley
Setting component to XPConnect -
Component: JavaScript Engine → XPConnect
I spent some time looking at Sun's UNIX plugin, and it appears to be 
calling nsIPluginManager::GetURL() from a thread they created to 
communicate with the out of process JVM. Given the lack of thread safety 
in these interfaces, I'd wager that this bug is invalid. I coded up a partial 
solution for them, still waiting to see if this clears up the problem.
-> OJI component.
Assignee: dbradley → joe.chou
Component: XPConnect → OJI
QA Contact: pschwartau → pmac
Actually after further investigation, JPI code has no problem here. I will mail
Patrick the code which did the right thing.
    
Having had a chance to peruse the code in more detail, it does appear 
that nsIPluginManager::GetURL() is being called from the correct thread. 
Oh what a twisted web we weave... We're not out of the woods yet.
Joe:
   Please paste the crash stack here and I believe that will ease the debugging.
Change the component to "XPConnect".
Component: OJI → XPConnect
Per Siebel's latest update on the problem, the crash seems to be caused by
showing a message in a frame that is being refreshed. In Siebel's application, a
frame is being refreshed at an intervel, whereas a message can also be display
to the same frame periodically. When the message is being added to the frame,
the frame can be in the middle of refreshing, which may be causing GC problems.
The same application works OK on Windows (with NS6.2). Any ideas?
considered a blocker for ns6 support by Siebel
Whiteboard: [eapp]
Major corporations depend on eapp bugs, and need them to be fixed before they
can recommend Mozilla-based products to their customers. Adding nsbeta1+ keyword
and making sure the bugs get re-evaluated if they are targeted beyond 1.0.
Keywords: nsbeta1+
Since the suspected module is xpconnect, experts of xpconnect/js garbage
collection should take a look at this bug and provide their opinion to Sun. This
is a high pri high impact bug for Sun and warrants attention of experts to close
this on Sun branch ASAP. Sun will not be able to fix it in our branch if this is
not fixed/tested by 25th.
Assignee: joe.chou → dbradley
QA Contact: pmac → pschwartau
eapp was incorrectly used to change this to nsbeta1+. Resetting to nsbeta1 to
nominate. This is an important issue and deserves to be nsbeta1+ by the ADT. 

This issue blocks support of Siebel's appliaction by Sun's branded NS6.2 

Keywords: nsbeta1+nsbeta1
This seems like it may be related to bug 126279. In looking at a lot of js_Mark*
crashes there seems to be two themes, one of marking deleted or invalid JS
Objects. And one where the stack gets trashed and the program tries to return to
an invalid code address. Unfortunately I've been unable to reproduce this in a
controlled environment.

If this applet is causing a crash consistently it would be a great help to get
small example that I could run and try to reproduce.
David, 
Unfortunatly, the setting of the problem was quite complicated (with phone
switch configured for the Siebel call center app, etc.), so far, we have not
been able to create a test case that allows people outside of Siebel to
reproduce the problem.

On the other hand, the problem was consistantly reproducable at Siebel, and the
crash always happened in js_Mark*. We have a debug environment set up in Siebel
(debug build of netscape, source and build environment).
Is it possible that you and I can go to Siebel together to take a look? If not,
is it possible to join me for the debug session remotely?

Keywords: topembed
Adding AOLTW
Whiteboard: [eapp] → [eapp],AOLTW
A complete call stack: 

   main(argc = 1, argv = 0xffbef1d4)
   main1(argc = 1, argv = 0xffbef1d4, nativeApp = (nil))
   nsAppShellService::Run(this = 0x16aea8)
   nsAppShell::Run(this = 0x10f060)
   gtk_main(0xf6480, 0x200ae0, 0x0, 0x0, 0x0, 0x0)
   g_main_run(0x246db8, 0x246db8, 0x1, 0x0, 0x0, 0x0)
   g_main_iterate(0x1, 0x1, 0xff1507c8, 0x0, 0xff3e2668, 0xfcf95140)
   g_main_dispatch(0xffbeeb00, 0x16f4b0, 0x1, 0x246da8, 0x0, 0x0)
   g_io_unix_dispatch(0x243068, 0xffbeeb00, 0x246da8, 0x0, 0x0, 0xffbeea68)
   our_gdk_io_invoke(source = 0x249e30, condition = G_IO_IN, data = 0x246da8)
   event_processor_callback(data = 0xf6480, source = 5, condition = GDK_INPUT
_READ)
   nsEventQueueImpl::ProcessPendingEvents(this = 0xf6480)
   PL_ProcessPendingEvents(self = 0xf64b0)
   PL_HandleEvent(self = 0x115ec9c)
   nsARequestObserverEvent::HandlePLEvent(plev = 0x115ec9c)
   nsOnStartRequestEvent::HandleEvent(this = 0x115ec98)
   nsHttpChannel::OnStartRequest(this = 0xba7810, request = 0x131ef38, ctxt =
 (nil))
   nsHttpChannel::ProcessResponse(this = 0xba7810)
   nsHttpChannel::ProcessNormal(this = 0xba7810)
   nsDocumentOpenInfo::OnStartRequest(this = 0x135dc18, request = 0xba7810, a
Ctxt = (nil))
   nsDocumentOpenInfo::DispatchContent(this = 0x135dc18, request = 0xba7810,
aCtxt = (nil))
   nsDSURIContentListener::DoContent(this = 0x733c08, aContentType = 0xffbee3
48 "text/html", aCommand = 7,
 request = 0xba7810, aContentHandler = 0xffbee418, aAbortProcess = 0xffbee388
)
   nsDocShell::CreateContentViewer(this = 0x7d6888, aContentType = 0xffbee348
 "text/html", request =
 0xba7810, aContentHandler = 0xffbee418)
   nsWebShell::Embed(this = 0x7d6888, aContentViewer = 0x9aa490, aCommand = 0
xfbfb5801 "", aExtraInfo =
 (nil))
   nsDocShell::Embed(this = 0x7d6888, aContentViewer = 0x9aa490, aCommand = 0
xfbfb5801 "", aExtraInfo =
 (nil))
   nsWebShell::SetupNewViewer(this = 0x7d6888, aViewer = 0x9aa490)
   nsDocShell::SetupNewViewer(this = 0x7d6888, aNewViewer = 0x9aa490)
   DocumentViewerImpl::Init(this = 0x9aa490, aParentWidget = 0x7c0030, aDevic
eContext = 0x78a530, aBounds =
 STRUCT)
   GlobalWindowImpl::SetNewDocument(this = 0x810a38, aDocument = 0xc3580c, re
moveEventListeners = 1)
   nsJSContext::GC(this = 0x810c30)
   JS_GC(cx = 0x7860f8)
   js_ForceGC(cx = 0x7860f8)
   js_GC(cx = 0x7860f8, gcflags = 0)
   JS_DHashTableEnumerate(table = 0x1429b8, etor = 0xff200578 = &`libmozjs.so
`jsgc.c`gc_root_marker(struct
 JSDHashTable *table, struct JSDHashEntryHdr *hdr, uint32 num, void *arg), ar
g = 0x7860f8)
   gc_root_marker(table = 0x1429b8, hdr = 0x6d4c20, num = 133U, arg = 0x7860f
8)
  js_MarkGCThing(cx = 0x7860f8, thing = 0xb96300, arg = (nil))
Summary: Running Siebel application crashed Netscape 6.2.1 browser → Running Siebel application crashed Netscape 6.2.1 browser [@js_MarkGCThing]
one very interesting point about this crash is that it does not occur under
linux when the 1.3.1 java plugin is installed.  i observed a crash when the 1.4
java plugin was installed, but it was not this crash.  i've heard that there is
a known problem (crash) with the 1.4 java plugin that explains what i was seeing.  

however, it seems that this could very likely be a problem in the plugin code.
joe: is there any way you could test the 1.3.1 java plugin under solaris
instead?   i know you said that it is not built using the same compiler, but is
there any way to get around that?
Regarding last comment.
>however, it seems that this could very likely be a problem in the plugin code.
Can you provide some details regarding this?
Changing component to LiveConnect, assigning to self.
Assignee: dbradley → beard
Component: XPConnect → Live Connect
Attached patch patch v1Splinter Review
Good news, I was able to diagnose the problem by observing the tell-tale
assertion:

JS_ASSERT(!rt->gcRunning);

in |js_AllocGCThing()|. The culprit turned out to be in LiveConnect, in which
|JavaObject_finalize()| was calling |JNIEnv::DeleteGlobalRef()|. In Sun's
current plugin, this would have the side-effect of servicing a running Java
applet's requests to change the status message at the bottom of the browser
window. This would in turn wind its way into the DOM, and finally back into
|JS_NewObject()|. Since |JavaObject_finalize()| is only called during a
garbage collection cycle, this caused the assertion to fire and return
NULL from |JS_NewObject()|. This lead to a dangling root installed by
|XPCWrappedNative::Init()|.

Bottom line, don't call into Java during finalization. I decided to add
an extra word to the |JavaObjectWrapper| struct to cache the wrapped Java
object's hash code, to avoid one call into Java from |JavaObject_finalize()|.
After this cached hash code is used, the word then is used to serve as a
link in a singly linked list of Java wrappers waiting to have their JNI
global references deleted.

With jband's assistance, I was able to follow write a GC callback function
that processes the deferred list of Java wrappers when the GC reaches the
JSGC_END phase.
Keywords: patch, review
Putting this on the 0.9.9 radar.

/be
Blocks: 122050
Keywords: mozilla0.9.9
Comment on attachment 73159 [details] [diff] [review]
patch v1

Awesome work.  

Nit: no purty JSBool and JS_TRUE/JS_FALSE for installed_GC_callback?

Nit: indentation glitch:

+    if (jsj_env) {
+	 JS_ASSERT(jsj_env->recursion_depth > 0);
+	 if (--jsj_env->recursion_depth == 0)
+	jsj_env->cx = NULL;
+    }

sr=brendan@mozilla.org

/be
Attachment #73159 - Flags: superreview+
Comment on attachment 73159 [details] [diff] [review]
patch v1

Nit: maybe put an empty line before some of the crucial comments, too.	E.g.

>+            while (java_wrapper) {
>+                deferred_wrappers = java_wrapper->u.next;
>+                /* 1. need a JNIEnv to execute DeleteGlobalRef of each wrapper. */
>+                (*jEnv)->DeleteGlobalRef(jEnv, java_wrapper->java_obj);
>     if (java_obj) {
>-        remove_java_obj_reflection_from_hashtable(java_obj, jEnv);
>-        (*jEnv)->DeleteGlobalRef(jEnv, java_obj);
>+        remove_java_obj_reflection_from_hashtable(java_obj, java_wrapper->u.hash_code);
>+        /* defer releasing global refs until it is safe to do so. */
>+        java_wrapper->u.next = deferred_wrappers;
>+        deferred_wrappers = java_wrapper;

/be
Comment on attachment 73159 [details] [diff] [review]
patch v1

r=jband
Attachment #73159 - Flags: review+
Keywords: reviewapproval
Comment on attachment 73159 [details] [diff] [review]
patch v1

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #73159 - Flags: approval+
Patrick, can you also land this on the 0.9.9 branch? THanks.
Nits noted, corrected, and patch checked in on trunk.
Patches now checked in on 0.9.9 branch. They also need to be checked 
in on the appropriate branch Sun is using.
No longer blocks: 122050
Patrick: are we going to wait until the checkin is made on the 
appropriate Sun branch before marking this one "Fixed"?
Yes, this makes sense to me.
Status: NEW → ASSIGNED
i still see a crash with a current cvs build, linux, using the cult3d plugin
-go to http://cult3d.com/
-click the pic of an orange can
If you don't crash right away, choose another "label" for the can.
Backtrace from a non-debug:

#0  0x4008b278 in js_MarkGCThing () from libmozjs.so
#1  0x4008b332 in gc_root_marker () from libmozjs.so
#2  0x4007b005 in JS_DHashTableEnumerate () from libmozjs.so
#3  0x4008b60f in js_GC () from libmozjs.so
#4  0x4008b3a5 in js_ForceGC ()   from libmozjs.so
#5  0x4006bb27 in JS_GC ()   from libmozjs.so
#6  0x412f9f6b in nsJSContext::Notify ()   from libjsdom.so
#7  0x401723ef in nsTimerImpl::Process ()   from libxpcom.so
#8  0x4017246b in handleMyEvent ()   from libxpcom.so
#9  0x4016e3bb in PL_HandleEvent ()   from libxpcom.so
#10 0x4016e2b5 in PL_ProcessPendingEvents ()   from /libxpcom.so
#11 0x4016f2ff in nsEventQueueImpl::ProcessPendingEvents ()   from libxpcom.so
#12 0x4117a606 in event_processor_callback ()   from libwidget_gtk.so
#13 0x4117a368 in our_gdk_io_invoke ()   from libwidget_gtk.so
#14 0x40395a7a in g_io_unix_dispatch (source_data=0x82db358, 
    current_time=0xbfffe800, user_data=0x83170d0) at giounix.c:137
#15 0x40397055 in g_main_dispatch (dispatch_time=0xbfffe800) at gmain.c:656
#16 0x40397659 in g_main_iterate (block=1, dispatch=1) at gmain.c:877
#17 0x403977e8 in g_main_run (loop=0x83170e0) at gmain.c:935
#18 0x402ab65b in gtk_main () at gtkmain.c:524
#19 0x4117aae5 in nsAppShell::Run ()   from libwidget_gtk.so
#20 0x4115a9ca in nsAppShellService::Run ()   from libnsappshell.so
#21 0x0805201b in main1 ()
#22 0x08052925 in main ()
#23 0x404f5627 in __libc_start_main (main=0x80527ec <main>, argc=1, 
    ubp_av=0xbfffec04, init=0x804c9fc <_init>, fini=0x8053f34 <_fini>, 
    rtld_fini=0x4000dcc4 <_dl_fini>, stack_end=0xbfffebfc)
    at ../sysdeps/generic/libc-start.c:129
This fix has been checked into the Sun 6.2.1 branch.
Whiteboard: [eapp],AOLTW → [eapp],AOLTW, sun_621
R.K.Aa (dark@c2i.net): unless any of the developers here disagree, 
you should file a new bug on the cult3d site, because I believe yours
is a separate problem. By the way, I tried unsuccessfully to download the
cult3d plugin on both Linux and WinNT. On each platform, the site
said that Netscape 6 is not supported. For example: 

           Sorry!

           There is currently no support for Netscape 6!


However, you know more about this plug-in than I; if you want,
please file a separate bug against the Plug-ins component; thanks - 
filed bug 130183
looks like its time to close this one.
beard, want to do the deed?
I verified this fix in Siebel today with the new release build. 
We can close this bug now.
Closing it down.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Per Joe's Comment #41, marking Verified -
Status: RESOLVED → VERIFIED
Whiteboard: [eapp],AOLTW, sun_621 → [eapp],AOLTW+, sun_621
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: