123920 - Running Siebel application crashed Netscape 6.2.1 browser [@js_MarkGCThing]

Reporter

Description

•

23 years ago

When running Siebel application, in which an applet repeately called
showDocument() to update contents on the page, the browser crashed consistantly
in js_MarkGCThing of jsgc.c (see call stack below). This bug does not happen on
Windows netscape 6.2. The problem seems not exist in mozilla 097 on Solaris, and
seems to be fixed. Does anybody knows what might be the fix? This problem
prevents Siebel from releasing their application with Netscape 6.2.1 on Solaris.
Really need some timely help here.


After crashed, 

on console:

"t@1 (l@7) signal SEGV (no mapping at the fault address) in js_MarkGCThing at li
ne 794 in file "jsgc.c""


call stack:

...
nsDocShell::SetupNewViewer(this = 0xa0adb8, aNewViewer = 0xa52c20)
   DocumentViewerImpl::Init(this = 0xa52c20, aParentWidget = 0xa1cf50,
 aDeviceContext = 0x54f318, aBounds = STRUCT)
   GlobalWindowImpl::SetNewDocument(this = 0x7335e0, aDocument =
 0xe2fee4, removeEventListeners = 1)
   nsJSContext::GC(this = 0x52a038)
   JS_GC(cx = 0x878e18)
   js_ForceGC(cx = 0x878e18)
   js_GC(cx = 0x878e18, gcflags = 0)
   JS_DHashTableEnumerate(table = 0x142aa0, etor = 0xff200590 =
 &`libmozjs.so`jsgc.c`gc_root_marker(struct JSDHashTable *table, struct
 JSDHashEntryHdr *hdr, uint32 num, void *arg), arg = 0x878e18)
   gc_root_marker(table = 0x142aa0, hdr = 0x71beb0, num = 463U, arg =
 0x878e18)
  js_MarkGCThing(cx = 0x878e18, thing = 0xa93bd8, arg = (nil))

Phil Schwartau

Comment 1

•

23 years ago

cc'ing Brendan on this one -

Should Joe look at the gc_root_marker frame to find the name of the object
that was rooted? I remember this used to be found via (char*)he->value
in gc_root_marker, but I don't know what the format is currently.

Assignee: rogerl → khanson

Status: UNCONFIRMED → NEW

Ever confirmed: true

Brendan Eich [:brendan]

Comment 2

•

23 years ago

xiaobin pinged me on IRC, and part of our exchange went like this:

<brendan> what is the name of the root being marked?
<xlu> The root's name is "XPCWrappedNative::mFlatJSObject".
<xlu> Assertion failure: root_points_to_gcArenaPool, at jsgc.c:918

I asked that dbradley be cc'd, don't see him yet, so I'm adding him.  I doubt
this is a khanson core JS engine bug.  I would have guessed liveconnect, but the
crash seems new with 6.2.1 and xiaobin's debug build fingered XPConnect, sort
of.  dbradley, jband, any ideas?

/be

David Bradley

Comment 3

•

23 years ago

This could be similar to 120629.

If I read this correct, it appears to be a case where something was rooted,
deleted, but not unrooted? I'll see what I can find with the logic dealing with
mFlatJSObject.

Joe Chou

Reporter

Comment 4

•

23 years ago

Did some testing yesterday in Siebel again, and found out that if added a
sleep(1) after showDocument in the applet that repeatly updating a frame in a
display page, the crash was postponed a good while longer. On Unix, JVM is a
separate process, and it seemed there is some kind of out of order between JVM
and the browser GC. In this case, JVM seemed running ahead of browser. 

On the other hand, running mozilla097 with the same JVM (JRE) appeared to be
much more stable. Any body knows any bug fixes might have been contributing to
the improvement?

David Bradley

Comment 5

•

23 years ago

Curious, would this applet be spawning a thread and making the calls from a
thread other than the main thread?

David Bradley

Comment 6

•

23 years ago

Taking this one for now. This looks xpconnect related and similar to bug 120629.

Assignee: khanson → dbradley

Phil Schwartau

Comment 7

•

23 years ago

Setting component to XPConnect -

Component: JavaScript Engine → XPConnect

Patrick C. Beard

Assignee

Comment 8

•

23 years ago

I spent some time looking at Sun's UNIX plugin, and it appears to be 
calling nsIPluginManager::GetURL() from a thread they created to 
communicate with the out of process JVM. Given the lack of thread safety 
in these interfaces, I'd wager that this bug is invalid. I coded up a partial 
solution for them, still waiting to see if this clears up the problem.

Patrick C. Beard

Assignee

Comment 9

•

23 years ago

-> OJI component.

Assignee: dbradley → joe.chou

Component: XPConnect → OJI

QA Contact: pschwartau → pmac

Xiaobin Lu

Comment 10

•

23 years ago

Actually after further investigation, JPI code has no problem here. I will mail
Patrick the code which did the right thing.

Patrick C. Beard

Assignee

Comment 11

•

23 years ago

Having had a chance to peruse the code in more detail, it does appear 
that nsIPluginManager::GetURL() is being called from the correct thread. 
Oh what a twisted web we weave... We're not out of the woods yet.

Xiaobin Lu

Comment 12

•

23 years ago

Joe:
   Please paste the crash stack here and I believe that will ease the debugging.
Change the component to "XPConnect".

Component: OJI → XPConnect

Joe Chou

Reporter

Comment 13

•

23 years ago

Per Siebel's latest update on the problem, the crash seems to be caused by
showing a message in a frame that is being refreshed. In Siebel's application, a
frame is being refreshed at an intervel, whereas a message can also be display
to the same frame periodically. When the message is being added to the frame,
the frame can be in the middle of refreshing, which may be causing GC problems.
The same application works OK on Windows (with NS6.2). Any ideas?

Bob Clary [:bc] (inactive)

Comment 14

•

23 years ago

considered a blocker for ns6 support by Siebel

Whiteboard: [eapp]

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 15

•

23 years ago

Major corporations depend on eapp bugs, and need them to be fixed before they
can recommend Mozilla-based products to their customers. Adding nsbeta1+ keyword
and making sure the bugs get re-evaluated if they are targeted beyond 1.0.

Keywords: nsbeta1+

Nidheesh Dubey

Comment 16

•

23 years ago

Since the suspected module is xpconnect, experts of xpconnect/js garbage
collection should take a look at this bug and provide their opinion to Sun. This
is a high pri high impact bug for Sun and warrants attention of experts to close
this on Sun branch ASAP. Sun will not be able to fix it in our branch if this is
not fixed/tested by 25th.

Assignee: joe.chou → dbradley

QA Contact: pmac → pschwartau

Heikki Toivonen (remove -bugzilla when emailing directly)

Updated

•

23 years ago

Blocks: 125136

Bob Clary [:bc] (inactive)

Comment 17

•

23 years ago

eapp was incorrectly used to change this to nsbeta1+. Resetting to nsbeta1 to
nominate. This is an important issue and deserves to be nsbeta1+ by the ADT. 

This issue blocks support of Siebel's appliaction by Sun's branded NS6.2

Keywords: nsbeta1+ → nsbeta1

David Bradley

Comment 18

•

23 years ago

This seems like it may be related to bug 126279. In looking at a lot of js_Mark*
crashes there seems to be two themes, one of marking deleted or invalid JS
Objects. And one where the stack gets trashed and the program tries to return to
an invalid code address. Unfortunately I've been unable to reproduce this in a
controlled environment.

If this applet is causing a crash consistently it would be a great help to get
small example that I could run and try to reproduce.

Joe Chou

Reporter

Comment 19

•

23 years ago

David, 
Unfortunatly, the setting of the problem was quite complicated (with phone
switch configured for the Siebel call center app, etc.), so far, we have not
been able to create a test case that allows people outside of Siebel to
reproduce the problem.

On the other hand, the problem was consistantly reproducable at Siebel, and the
crash always happened in js_Mark*. We have a debug environment set up in Siebel
(debug build of netscape, source and build environment).
Is it possible that you and I can go to Siebel together to take a look? If not,
is it possible to join me for the debug session remotely?

Bob Clary [:bc] (inactive)

Updated

•

23 years ago

Keywords: topembed

Michael Buckland

Comment 20

•

23 years ago

Adding AOLTW

Whiteboard: [eapp] → [eapp],AOLTW

Joe Chou

Reporter

Comment 21

•

23 years ago

A complete call stack: 

   main(argc = 1, argv = 0xffbef1d4)
   main1(argc = 1, argv = 0xffbef1d4, nativeApp = (nil))
   nsAppShellService::Run(this = 0x16aea8)
   nsAppShell::Run(this = 0x10f060)
   gtk_main(0xf6480, 0x200ae0, 0x0, 0x0, 0x0, 0x0)
   g_main_run(0x246db8, 0x246db8, 0x1, 0x0, 0x0, 0x0)
   g_main_iterate(0x1, 0x1, 0xff1507c8, 0x0, 0xff3e2668, 0xfcf95140)
   g_main_dispatch(0xffbeeb00, 0x16f4b0, 0x1, 0x246da8, 0x0, 0x0)
   g_io_unix_dispatch(0x243068, 0xffbeeb00, 0x246da8, 0x0, 0x0, 0xffbeea68)
   our_gdk_io_invoke(source = 0x249e30, condition = G_IO_IN, data = 0x246da8)
   event_processor_callback(data = 0xf6480, source = 5, condition = GDK_INPUT
_READ)
   nsEventQueueImpl::ProcessPendingEvents(this = 0xf6480)
   PL_ProcessPendingEvents(self = 0xf64b0)
   PL_HandleEvent(self = 0x115ec9c)
   nsARequestObserverEvent::HandlePLEvent(plev = 0x115ec9c)
   nsOnStartRequestEvent::HandleEvent(this = 0x115ec98)
   nsHttpChannel::OnStartRequest(this = 0xba7810, request = 0x131ef38, ctxt =
 (nil))
   nsHttpChannel::ProcessResponse(this = 0xba7810)
   nsHttpChannel::ProcessNormal(this = 0xba7810)
   nsDocumentOpenInfo::OnStartRequest(this = 0x135dc18, request = 0xba7810, a
Ctxt = (nil))
   nsDocumentOpenInfo::DispatchContent(this = 0x135dc18, request = 0xba7810,
aCtxt = (nil))
   nsDSURIContentListener::DoContent(this = 0x733c08, aContentType = 0xffbee3
48 "text/html", aCommand = 7,
 request = 0xba7810, aContentHandler = 0xffbee418, aAbortProcess = 0xffbee388
)
   nsDocShell::CreateContentViewer(this = 0x7d6888, aContentType = 0xffbee348
 "text/html", request =
 0xba7810, aContentHandler = 0xffbee418)
   nsWebShell::Embed(this = 0x7d6888, aContentViewer = 0x9aa490, aCommand = 0
xfbfb5801 "", aExtraInfo =
 (nil))
   nsDocShell::Embed(this = 0x7d6888, aContentViewer = 0x9aa490, aCommand = 0
xfbfb5801 "", aExtraInfo =
 (nil))
   nsWebShell::SetupNewViewer(this = 0x7d6888, aViewer = 0x9aa490)
   nsDocShell::SetupNewViewer(this = 0x7d6888, aNewViewer = 0x9aa490)
   DocumentViewerImpl::Init(this = 0x9aa490, aParentWidget = 0x7c0030, aDevic
eContext = 0x78a530, aBounds =
 STRUCT)
   GlobalWindowImpl::SetNewDocument(this = 0x810a38, aDocument = 0xc3580c, re
moveEventListeners = 1)
   nsJSContext::GC(this = 0x810c30)
   JS_GC(cx = 0x7860f8)
   js_ForceGC(cx = 0x7860f8)
   js_GC(cx = 0x7860f8, gcflags = 0)
   JS_DHashTableEnumerate(table = 0x1429b8, etor = 0xff200578 = &`libmozjs.so
`jsgc.c`gc_root_marker(struct
 JSDHashTable *table, struct JSDHashEntryHdr *hdr, uint32 num, void *arg), ar
g = 0x7860f8)
   gc_root_marker(table = 0x1429b8, hdr = 0x6d4c20, num = 133U, arg = 0x7860f
8)
  js_MarkGCThing(cx = 0x7860f8, thing = 0xb96300, arg = (nil))

timeless

Updated

•

23 years ago

Summary: Running Siebel application crashed Netscape 6.2.1 browser → Running Siebel application crashed Netscape 6.2.1 browser [@js_MarkGCThing]

Darin Fisher

Comment 22

•

22 years ago

one very interesting point about this crash is that it does not occur under
linux when the 1.3.1 java plugin is installed.  i observed a crash when the 1.4
java plugin was installed, but it was not this crash.  i've heard that there is
a known problem (crash) with the 1.4 java plugin that explains what i was seeing.  

however, it seems that this could very likely be a problem in the plugin code.
joe: is there any way you could test the 1.3.1 java plugin under solaris
instead?   i know you said that it is not built using the same compiler, but is
there any way to get around that?

Xiaobin Lu

Comment 23

•

22 years ago

Regarding last comment.
>however, it seems that this could very likely be a problem in the plugin code.
Can you provide some details regarding this?

Patrick C. Beard

Assignee

Comment 24

•

22 years ago

Changing component to LiveConnect, assigning to self.

Assignee: dbradley → beard

Component: XPConnect → Live Connect

Patrick C. Beard

Assignee

Comment 25

•

22 years ago

Attached patch patch v1 — Details — Splinter Review

Good news, I was able to diagnose the problem by observing the tell-tale
assertion:

JS_ASSERT(!rt->gcRunning);

in |js_AllocGCThing()|. The culprit turned out to be in LiveConnect, in which
|JavaObject_finalize()| was calling |JNIEnv::DeleteGlobalRef()|. In Sun's
current plugin, this would have the side-effect of servicing a running Java
applet's requests to change the status message at the bottom of the browser
window. This would in turn wind its way into the DOM, and finally back into
|JS_NewObject()|. Since |JavaObject_finalize()| is only called during a
garbage collection cycle, this caused the assertion to fire and return
NULL from |JS_NewObject()|. This lead to a dangling root installed by
|XPCWrappedNative::Init()|.

Bottom line, don't call into Java during finalization. I decided to add
an extra word to the |JavaObjectWrapper| struct to cache the wrapped Java
object's hash code, to avoid one call into Java from |JavaObject_finalize()|.
After this cached hash code is used, the word then is used to serve as a
link in a singly linked list of Java wrappers waiting to have their JNI
global references deleted.

With jband's assistance, I was able to follow write a GC callback function
that processes the deferred list of Java wrappers when the GC reaches the
JSGC_END phase.

Patrick C. Beard

Assignee

Updated

•

22 years ago

Keywords: patch, review

Brendan Eich [:brendan]

Comment 26

•

22 years ago

Putting this on the 0.9.9 radar.

/be

Blocks: 122050

Keywords: mozilla0.9.9

Brendan Eich [:brendan]

Comment 27

•

22 years ago

Comment on attachment 73159 [details] [diff] [review]
patch v1

Awesome work.  

Nit: no purty JSBool and JS_TRUE/JS_FALSE for installed_GC_callback?

Nit: indentation glitch:

+    if (jsj_env) {
+	 JS_ASSERT(jsj_env->recursion_depth > 0);
+	 if (--jsj_env->recursion_depth == 0)
+	jsj_env->cx = NULL;
+    }

sr=brendan@mozilla.org

/be

Attachment #73159 - Flags: superreview+

Brendan Eich [:brendan]

Comment 28

•

22 years ago

Comment on attachment 73159 [details] [diff] [review]
patch v1

Nit: maybe put an empty line before some of the crucial comments, too.	E.g.

>+            while (java_wrapper) {
>+                deferred_wrappers = java_wrapper->u.next;
>+                /* 1. need a JNIEnv to execute DeleteGlobalRef of each wrapper. */
>+                (*jEnv)->DeleteGlobalRef(jEnv, java_wrapper->java_obj);
>     if (java_obj) {
>-        remove_java_obj_reflection_from_hashtable(java_obj, jEnv);
>-        (*jEnv)->DeleteGlobalRef(jEnv, java_obj);
>+        remove_java_obj_reflection_from_hashtable(java_obj, java_wrapper->u.hash_code);
>+        /* defer releasing global refs until it is safe to do so. */
>+        java_wrapper->u.next = deferred_wrappers;
>+        deferred_wrappers = java_wrapper;

/be

John Bandhauer

Comment 29

•

22 years ago

Comment on attachment 73159 [details] [diff] [review]
patch v1

r=jband

Attachment #73159 - Flags: review+

Patrick C. Beard

Assignee

Updated

•

22 years ago

Keywords: review → approval

Asa Dotzler [:asa]

Comment 30

•

22 years ago

Comment on attachment 73159 [details] [diff] [review]
patch v1

a=asa (on behalf of drivers) for checkin to the 1.0 trunk

Attachment #73159 - Flags: approval+

Asa Dotzler [:asa]

Comment 31

•

22 years ago

Patrick, can you also land this on the 0.9.9 branch? THanks.

Patrick C. Beard

Assignee

Comment 32

•

22 years ago

Nits noted, corrected, and patch checked in on trunk.

Patrick C. Beard

Assignee

Comment 33

•

22 years ago

Patches now checked in on 0.9.9 branch. They also need to be checked 
in on the appropriate branch Sun is using.

Asa Dotzler [:asa]

Updated

•

22 years ago

No longer blocks: 122050

Phil Schwartau

Comment 34

•

22 years ago

Patrick: are we going to wait until the checkin is made on the 
appropriate Sun branch before marking this one "Fixed"?

Patrick C. Beard

Assignee

Comment 35

•

22 years ago

Yes, this makes sense to me.

Status: NEW → ASSIGNED

R.K.Aa.

Comment 36

•

22 years ago

i still see a crash with a current cvs build, linux, using the cult3d plugin
-go to http://cult3d.com/
-click the pic of an orange can
If you don't crash right away, choose another "label" for the can.
Backtrace from a non-debug:

#0  0x4008b278 in js_MarkGCThing () from libmozjs.so
#1  0x4008b332 in gc_root_marker () from libmozjs.so
#2  0x4007b005 in JS_DHashTableEnumerate () from libmozjs.so
#3  0x4008b60f in js_GC () from libmozjs.so
#4  0x4008b3a5 in js_ForceGC ()   from libmozjs.so
#5  0x4006bb27 in JS_GC ()   from libmozjs.so
#6  0x412f9f6b in nsJSContext::Notify ()   from libjsdom.so
#7  0x401723ef in nsTimerImpl::Process ()   from libxpcom.so
#8  0x4017246b in handleMyEvent ()   from libxpcom.so
#9  0x4016e3bb in PL_HandleEvent ()   from libxpcom.so
#10 0x4016e2b5 in PL_ProcessPendingEvents ()   from /libxpcom.so
#11 0x4016f2ff in nsEventQueueImpl::ProcessPendingEvents ()   from libxpcom.so
#12 0x4117a606 in event_processor_callback ()   from libwidget_gtk.so
#13 0x4117a368 in our_gdk_io_invoke ()   from libwidget_gtk.so
#14 0x40395a7a in g_io_unix_dispatch (source_data=0x82db358, 
    current_time=0xbfffe800, user_data=0x83170d0) at giounix.c:137
#15 0x40397055 in g_main_dispatch (dispatch_time=0xbfffe800) at gmain.c:656
#16 0x40397659 in g_main_iterate (block=1, dispatch=1) at gmain.c:877
#17 0x403977e8 in g_main_run (loop=0x83170e0) at gmain.c:935
#18 0x402ab65b in gtk_main () at gtkmain.c:524
#19 0x4117aae5 in nsAppShell::Run ()   from libwidget_gtk.so
#20 0x4115a9ca in nsAppShellService::Run ()   from libnsappshell.so
#21 0x0805201b in main1 ()
#22 0x08052925 in main ()
#23 0x404f5627 in __libc_start_main (main=0x80527ec <main>, argc=1, 
    ubp_av=0xbfffec04, init=0x804c9fc <_init>, fini=0x8053f34 <_fini>, 
    rtld_fini=0x4000dcc4 <_dl_fini>, stack_end=0xbfffebfc)
    at ../sysdeps/generic/libc-start.c:129

Joe Chou

Reporter

Comment 37

•

22 years ago

This fix has been checked into the Sun 6.2.1 branch.

Whiteboard: [eapp],AOLTW → [eapp],AOLTW, sun_621

Phil Schwartau

Comment 38

•

22 years ago

R.K.Aa (dark@c2i.net): unless any of the developers here disagree, 
you should file a new bug on the cult3d site, because I believe yours
is a separate problem. By the way, I tried unsuccessfully to download the
cult3d plugin on both Linux and WinNT. On each platform, the site
said that Netscape 6 is not supported. For example: 

           Sorry!

           There is currently no support for Netscape 6!


However, you know more about this plug-in than I; if you want,
please file a separate bug against the Plug-ins component; thanks -

R.K.Aa.

Comment 39

•

22 years ago

filed bug 130183

chris hofmann

Comment 40

•

22 years ago

looks like its time to close this one.
beard, want to do the deed?

Joe Chou

Reporter

Comment 41

•

22 years ago

I verified this fix in Siebel today with the new release build. 
We can close this bug now.

Patrick C. Beard

Assignee

Comment 42

•

22 years ago

Closing it down.

Status: ASSIGNED → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

Phil Schwartau

Comment 43

•

22 years ago

Per Joe's Comment #41, marking Verified -

Status: RESOLVED → VERIFIED

Michael Buckland

Updated

•

22 years ago

Whiteboard: [eapp],AOLTW, sun_621 → [eapp],AOLTW+, sun_621

timeless

Updated

•

14 years ago

Product: Core → Core Graveyard