Closed Bug 43466 Opened 24 years ago Closed 24 years ago

root_points_to_gcArenaPool in mail account wizard

Categories

(SeaMonkey :: General, defect, P3)

x86
Linux
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: alecf, Assigned: danm.moz)

References

Details

(Whiteboard: [nsbeta2-][nsbeta3+])

Attachments

(2 files)

So I'm getting a root_points_to_gcArenaPool assertion using the account wizard.
it's easy to reproduce:
- Open messenger
- File->New->Account
- Hit Next, then hit Cancel to cancel the dialog
- File->New->Account again
- Hit Next/Back once or twice - it crashes pretty quickly..

I've discussed this with shaver before, and some wacky things are going on in
the JS engine:
Assertion failure: root_points_to_gcArenaPool, at jsgc.c:785

Program received signal SIGABRT, Aborted.
0x4045d4e1 in __kill ()
(gdb) sha mozjs
Reading symbols from /home1/alecf/seamonkey/mozilla/dist/bin/libmozjs.so...
done.
(gdb) #0  0x4045d4e1 in __kill ()
#1  0x402201eb in raise (sig=6) at signals.c:64
#2  0x4045e868 in abort () at ../sysdeps/generic/abort.c:88
#3  0x4019d246 in JS_Assert () at jsutil.c:174
#4  0x4016cd27 in gc_root_marker (he=0x8485680, i=81, arg=0x80c1cb0)
    at jsgc.c:785
#5  0x4016de3f in JS_HashTableEnumerateEntries (ht=0x80c3ea8, 
    f=0x4016cca4 <gc_root_marker>, arg=0x80c1cb0) at jshash.c:364
#6  0x4016d06c in js_GC (cx=0x8b5a318) at jsgc.c:952
#7  0x4016cdad in js_ForceGC (cx=0x8b5a318) at jsgc.c:810
#8  0x40151cb6 in JS_GC (cx=0x8b5a318) at jsapi.c:1154
#9  0x402be5e3 in nsJSContext::GC (this=0x8a098e0) at nsJSEnvironment.cpp:1066
(More stack frames follow...)

(gdb) frame 4
#4  0x4016cd27 in gc_root_marker (he=0x8485680, i=81, arg=0x80c1cb0)
    at jsgc.c:785
Current language:  auto; currently c
(gdb) print he
$1 = (JSHashEntry *) 0x0

tell me that's not wierd... the hash key is 0x0...
now check this out:
(gdb) frame 5
#5  0x4016de3f in JS_HashTableEnumerateEntries (ht=0x80c3ea8, 
    f=0x4016cca4 <gc_root_marker>, arg=0x80c1cb0) at jshash.c:364
(gdb) print he
$2 = (JSHashEntry *) 0x8485680

I have seen this consistently with gdb 4.17 and gdb 5.0... now the one trick is
that I build with optimized debug (-O2 -g -DDEBUG etc) so that might explain why
he is 0x0.
I thought brendan was the default JS engine owner, adding him to CC..

brendan, got a WIERD gc bug for ya.
he printing as 0 in frame 4 sounds like gdb lossage.

Usually this assertion botching means someone has freed memory containing a JS 
GC root, without calling JS_RemoveRoot{,RT}.  If the caller used JS_AddNamedRoot 
as all callers should, then (char*) he->value is the root name, a static string 
literal telling the source file or class::member name of the rooted pointer.

Another way to botch this assertion is to call JS_Add{Named,}Root(cx, mRooted) 
rather than JS_Add{Named,}Root(cx, &mRooted).  Roots are registered by reference 
so you can make your own member pointers be JS GC roots.

/be
I doubt this is rogerl's or alecf's bug.  But since alecf can reproduce it, 
I'm bouncing to him.  Alecf, what's (char*)he->value?

/be
Assignee: rogerl → alecf
I think there's gotta be something wierd going on with the debug vs. optimized 
JavaScript - because I'm crashing (no assertion) on windows as well, during an 
optimized build with debugging on (i.e. debugging symbols and -DDEBUG)

I'll try to get a stack trace, but if I remember right, it looked something 
like:
NTDLL
JS_GC
JS_ForceGC

ok, this was driving me crazy, so I decided to sick purify on it.
I'm getting a whole bunch of IPR errors in the GC:
[E] IPR: Invalid pointer read in gc_root_marker {4 occurrences}
        Reading 4 bytes from 0x0e95d394 (4 bytes at 0x0e95d394 illegal)
        Address 0x0e95d394 points into a committed VirtualAlloc'd block 
        Thread ID: 0x750
        Error location
            gc_root_marker [jsgc.c:769]
                gc_root_marker(JSHashEntry *he, intN i, void *arg)
                {
                    jsval *rp = (jsval *)he->key;
             =>     jsval v = *rp;
                
                    /* Ignore null object and scalar values. */
                    if (!JSVAL_IS_NULL(v) && JSVAL_IS_GCTHING(v)) {
            js_GC          [jsgc.c:952]
            js_ForceGC     [jsgc.c:810]
            JS_DestroyContext [jsapi.c:788]
            nsJSContext::`vector deleting destructor'(UINT) [jsdom.dll]
            nsJSContext::Release(void) [nsJSEnvironment.cpp:316]
            nsDocShell::Destroy(void) [nsDocShell.cpp:1253]
            nsXULWindow::Destroy(void) [nsXULWindow.cpp:316]
            nsWebShellWindow::Destroy(void) [nsWebShellWindow.cpp:1712]
            nsWebShellWindow::Close(void) [nsWebShellWindow.cpp:357]

I get 10 of these after running mozilla with -mail, doing File->New->Account, 
creating an account, and quitting.
What's "a committed VirtualAlloc'd block", anyway?  Freed memory, or someone 
else's memory?

In any case, it sounds like data structures containing GC-rooted pointers are 
being freed without JS_RemoveRoot being called.  Again, if you can find what 
(char*)he->value is when these IPRs fire, we may have the culprits red-handed.

/be
I'll try for this tomorrow, wish me luck :)
I finally got he->value for you:
(on windows)
	(char *)he->value	0x0037ed50 "window_object"

This is the same thing shaver and I saw a while back.
reassign to danm, keeping self on CC.
I'll attach a small patch that brendan and I came up with yesterday that SEEMED
like it should fix the problem but didn't.

Also, after some experimentation, I discovered that part of the problem here is
that in CleanUp, we're destroying mScriptObject, which is bad because in 
WindowOpen we need the mScriptObject in order to generate a return value from
the window.openDialog call, but it has already been destroyed by the time we
call nsJSUtils::nsConvertObjectToJSVal(nativeRet, cx, obj, rval)
http://lxr.mozilla.org/seamonkey/source/dom/src/base/nsJSWindow.cpp#2240
Assignee: alecf → danm
nominate for nsbeta2 - this can cause lots of random crashes or possibly leak
large root objects like the window object itself.
Keywords: nsbeta2
So closing a window must not destroy its script object, or the properties in 
that object such as closed (with value true) and other user-defined ones.  This 
should be straightforward to fix, but I'm sure it won't.  Let me know if I can 
help (and so help me, I'm gonna reform the bad indentation in nsGlobalWindow.cpp 
if you let me near it!).

/be
Putting on [nsbeta2+] radar for beta2 fix. 
Whiteboard: [nsbeta2+]
Changing component to Browser-General. Not sure of the exact component,
but it's not JavaScript Engine -
Component: Javascript Engine → Browser-General
QA Contact: pschwartau → asa
QA Contact: asa → doronr
  Been playing with this for a while now, and I can't find any problem. I've been 
stepping through the instructions to reproduce, and trying a small variety of 
other windows. I've never noticed the assertion.
  Curious fact #2: I haven't noticed any problems with premature deletion of the 
script object in nsGlobalWindowImpl::CleanUp(). For starters, the window's 
mContext member has always been released and cleared out before CleanUp is called 
(by something in nsWebShellWindow::Destroy or Close, depending on the code path) 
so CleanUp itself does nothing.
  Could be that I haven't understood the intricacies of script object destruction 
-- scratch that; I know I haven't -- but I don't see a problem here. Has this 
been fixed by some other unknown checkin? Holding the bug open for comments 
before WFMing.
I'm still getting this on linux.... are you saying you don't see the crash? 
  No crash, no worries. Brendan told me you had some unusual assert configuration 
(enable-crash-on-assert?) active. Regardless, I see neither the alert nor the 
crash. However, I'm told this bug actually describes the premature javascript 
object deletion that generally expresses itself as an inability to access 
properties of closed windows. Like window.closed, for instance. That's still a 
problem (has it ever worked?). I'll work on this from that angle.
Status: NEW → ASSIGNED
well, what I was seeing was this (a better description of my 06-28 12:56 
comments - this is my understanding of what's going on, but I could be wrong)
- window is opened via WindowOpen in the JS glue. Dialog is modal, so 
nativeThis->Open doesn't return until the window closes
- window is closed, so GlobalWindowImpl::Close() is called, which calls 
GlobalWindowImpl::CleanUp()
- CleanUp destroys mScriptObject
- back in nsJSWindow.cpp, nativeThis->Open returns. At this point nativeThis's 
mScriptObject is destroyed, and nativeRet is the "native" return value (an 
nsIDOMWindow)
- in order to return a JS-friendly return value from WindowOpen, WindowOpen uses 
nsJSUtils::nsConvertObjectToJSVal to convert nativeRet into rval, using obj 
(which is the window that just closed).. this causes obj->GetScriptObject, which 
recreates another script object.
- According to brendan, mScriptObject contains some kind of reference back to 
the original window... I'm not sure, but I think the window is getting released 
and freed, and leaving this stale mScriptObject as a named root in the JS GC.

I hope that helps.
oh, I'm not running with crash-on-assert turned on or anything - the JS engine 
uses PR_Assert which is fatal on all platforms... some people on IRC 
yesterday were seeing this too - make sure you're using a debug build, and make 
sure that you're opening the account wizard TWICE:

- File->New->Account
- Hit Cancel
- File->New->Account
- Hit Next/Back a few times, until you crash
Ah. Thanks -- the above descriptions help; I've a clearer picture of the problem. 
I'm still not quite sure I'm seeing the problem on my build, and I can dance back 
and forth well after boredom sets in on my second run at the New Account dialog. 
But I know the JS object is being killed off because you can't access it after 
the window is closed. Still looking...
Oooo. I figured there was already a bug on the problem I described above. Seems 
like this is probably a dupe of 38951. Just marking this one as dependent for 
now.
Depends on: 38951
Doesn't fit latest 'would ya pull it off the wire' criteria, marking nsbeta2-,
nominating for nsbeta3
Keywords: nsbeta3
Whiteboard: [nsbeta2+] → [nsbeta2-]
adding bienvenu to the CC because he was seeing this too.
see also bug 43470
nsbeta3+
Whiteboard: [nsbeta2-] → [nsbeta2-][nsbeta3+]
Target Milestone: --- → M18
I just created an attachment that has the call stack for the 
root_points_to_gcArenaPool assertion failure I'm getting with some js heavy 
Beatnik content (which you won't be able to repro because the content won't 
work without the xpcom plugin that is under development).  
Looks related.  Any opinions on whether I should open a separate bug?
sean -- that's the same backtrace: closing a window should not nuke that 
window's script object.

/be
thanks Brendan.
adding beatnik keyword as this bug causes beatnik emixes to crash mozilla.
An example emix can be found at http://www.beatnik.com/emix/ - but those
particular emixes have not been updated to run in mozilla and you need a new
plugin anyways.
Keywords: beatnik
really adding it...
Ah! I've never been able to reproduce this bug, but these steps from shannond do 
it for me:

"...crash on HP-UX and Linux after saving 3 or more files/links from a webpage 
(for example those found on: http://slip/projects/seamonkey/im/whitebox_tests/
smoketests.html)."

The crucial thing seems to be to repeat as necessary.
*** Bug 46430 has been marked as a duplicate of this bug. ***
Whiteboard: [nsbeta2-][nsbeta3+] → [nsbeta2-][nsbeta3+] tentative fix in hand.
Care to attach the tentative fix, for the curious and the eager-to-test?
  For you, Mike, anything. Except I don't want to type all the stuff about why I 
think it works, because that takes about five minutes. I'm set up to do the 
talking thing with Brendan some time today. Basically, modal windows throw away 
their script object and then regenerate a fresh, faulty one to hand back to the 
script. The new one is never properly hooked up, and ends up being deleted 
without being unrooted from the JS GC. Things crash when the GC eventually kicks 
in, generally after three or four windows.
  There's a simpler fix for that. But while I'm at it, I'm trying to address bug 
38951 (and apparently several others that have scary patches everywhere the DOM 
window realizes it needs to let go of its JS context), preventing the premature 
release and regeneration of the script object.
  My patch isn't quite ready for prime time, but I believe it fixes this crash. 
It puts the onus of remembering what script objects have been given GC root-hood 
in the JS context itself, allowing the DOM Window to relax. Obviously I'm asking 
for trouble. Take a look!

--- ./dom/src/base/nsJSEnvironment.h.1.41	Thu Aug 24 10:54:22 2000
+++ ./dom/src/base/nsJSEnvironment.h	Thu Aug 24 10:54:08 2000
@@ -36,6 +36,7 @@
 #include "nsIScriptNameSpaceManager.h"
 #endif
 class nsIPrincipal;
+class nsVoidArray;
 
 class nsJSContext : public nsIScriptContext {
 private:
@@ -49,6 +50,7 @@
   nsCOMPtr<nsISupports> mRef;
   PRBool mScriptsEnabled;
   PRUint32 mBranchCallbackCount;
+  nsVoidArray *mReferences;
 
   static JSBool PR_CALLBACK DOMBranchCallback(JSContext *cx, JSScript *script);
   
--- ./dom/src/base/nsJSEnvironment.cpp.1.104	Thu Aug 24 10:54:24 2000
+++ ./dom/src/base/nsJSEnvironment.cpp	Thu Aug 24 10:54:08 2000
@@ -52,6 +52,7 @@
 #include "nsIInterfaceRequestor.h"
 #include "nsIPrompt.h"
 #include "nsIObserverService.h"
+#include "nsVoidArray.h"
 
 // Force PR_LOGGING so we can get JS strict warnings even in release builds
 #define FORCE_PR_LOG 1
@@ -256,6 +257,7 @@
 nsJSContext::nsJSContext(JSRuntime *aRuntime)
 {
   NS_INIT_REFCNT();
+  mReferences = 0;
   mContext = ::JS_NewContext(aRuntime, gStackSize);
   if (mContext) {
     ::JS_SetContextPrivate(mContext, (void *)this);
@@ -294,6 +296,7 @@
   mTerminationFunc = nsnull;
   mScriptsEnabled = PR_TRUE;
   mBranchCallbackCount = 0;
+  mReferences = new nsVoidArray(4);
 }
 
 const char kScriptSecurityManagerProgID[] = NS_SCRIPTSECURITYMANAGER_PROGID;
@@ -310,6 +313,14 @@
   if (!mContext)
     return;
 
+  if (mReferences) {
+    PRInt32 ctr;
+    for (ctr = mReferences->Count() - 1; ctr >= 0; ctr--)
+      ::JS_RemoveRoot(mContext, mReferences->ElementAt(ctr));
+    delete mReferences;
+    mReferences = 0;
+  }
+
   /* Remove global object reference to window object, so it can be collected. */
   ::JS_SetGlobalObject(mContext, nsnull);
   ::JS_DestroyContext(mContext);
@@ -1146,12 +1157,22 @@
 NS_IMETHODIMP
 nsJSContext::AddNamedReference(void *aSlot, void *aScriptObject, const char *
aName)
 {
-  return (::JS_AddNamedRoot(mContext, aSlot, aName)) ? NS_OK : NS_ERROR_FAILURE;
+  if (::JS_AddNamedRoot(mContext, aSlot, aName)) {
+    if (mReferences)
+      mReferences->AppendElement(aSlot);
+    return NS_OK;
+  }
+  return NS_ERROR_FAILURE;
 }
 
 NS_IMETHODIMP
 nsJSContext::RemoveReference(void *aSlot, void *aScriptObject)
 {
+  if (mReferences) {
+    PRInt32 idx = mReferences->IndexOf(aSlot);
+    if (idx >= 0)
+      mReferences->RemoveElementAt(idx);
+  }
   return (::JS_RemoveRoot(mContext, aSlot)) ? NS_OK : NS_ERROR_FAILURE;
 }
 
--- ./dom/src/base/nsGlobalWindow.cpp.1.323	Thu Aug 24 10:54:25 2000
+++ ./dom/src/base/nsGlobalWindow.cpp	Thu Aug 24 10:54:10 2000
@@ -142,8 +142,6 @@
 
 void GlobalWindowImpl::CleanUp()
 {
-  if (mContext)
-    mContext->RemoveReference(&mScriptObject, mScriptObject);
   mContext = nsnull;            // Forces Release
   mDocument = nsnull;           // Forces Release
   NS_IF_RELEASE(mNavigator);
@@ -224,13 +222,6 @@
 
 NS_IMETHODIMP GlobalWindowImpl::SetContext(nsIScriptContext* aContext)
 {
-  // if setting the context to null, then we won't get to clean up the
-  // named reference, so do it now
-  if (!aContext) {
-    NS_WARNING("Possibly early removal of script object, see bug #41608");
-    mContext->RemoveReference(&mScriptObject, mScriptObject);
-  }
-
   mContext = aContext;
   return NS_OK;
 }
@@ -336,8 +327,6 @@
       jsval val = BOOLEAN_TO_JSVAL(JS_TRUE);
       ::JS_SetProperty((JSContext *) mContext->GetNativeContext(),
                        (JSObject *) mScriptObject, "closed", &val);
-      mContext->RemoveReference(&mScriptObject, mScriptObject);
-      mScriptObject = nsnull;
     }
     mContext = nsnull;          // force release now
   }
Blocks: 42321
No longer blocks: 42321
*** Bug 48934 has been marked as a duplicate of this bug. ***
Ended up doing something similar to the above patch, except special-cased the 
window script object, rather than keeping track of every JS object referenced.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Whiteboard: [nsbeta2-][nsbeta3+] tentative fix in hand. → [nsbeta2-][nsbeta3+]
*** Bug 44777 has been marked as a duplicate of this bug. ***
More stack backtraces in 44777.  NOTE: this was not fixed as of a fresh
pull/clobber/depend/build on 08/25/00, however I see the fix notice is dated
8/26/00, so we're probably ok.   I've been getting these crashes randomly every
week or so.
I did a pull this morning (8/28) and I still get the following assertion:
Error: The address passed to JS_AddNamedRoot currently holds an invalid jsval.
   This is usually caused by a missing call to JS_RemoveRoot.
   Root name is "timeout.expr".
Assertion failure: root_points_to_gcArenaPool, at h:\moz\mozilla\js\src\jsgc.c:668

I'm pretty sure this is related to the window script object even though
he->value is timeout.expr rather than window_object.

I'll attach a call stack.

Let me know if I should open a new bug altogether.
Sean: I'd call the problem you're describing (just above) a different one. 
Barring something really nefarious, it looks like a flaw in the window shutdown 
logic of SetNewDocument, but a different flaw from the one involving the 
window_object object. Please file a new bug for this one, and mention how to make 
it happen. (Personally, I've never even seen the crash mentioned at the top of 
this bug report on a Windows build.)
Dan: thanks.  opened bug 50705.
dan, can you suggest/update the QA Contact for this bug.  Thanks.
Updating QA contact. Philip: this bug is garbage collector timing dependent. See 
my comment dated 2000-08-07 19:40 for a reliable way to reproduce it (stolen from 
bug 46430).
QA Contact: doronr → pschwartau
Keywords: beatnik
Marking VERIFIED FIXED. Using the steps to reproduce above, from bug 46430.
No matter how many times I save the links there, I do not crash.

Using Mozilla trunk binaries 20010706xx on WinNT, Linux, and Mac -
Status: RESOLVED → VERIFIED
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: