bugzilla.mozilla.org will be intermittently unavailable on Saturday, March 24th, from 16:00 until 20:00 UTC.

Reentrance in js_AllocGCThing which leads to crash in debug version

RESOLVED FIXED in mozilla0.9.1



Document Navigation
17 years ago
17 years ago


(Reporter: Shanjian Li, Assigned: Adam Lock)



Windows NT

Firefox Tracking Flags

(Not tracked)




(2 attachments)



17 years ago
While I was working on bug 70759, the same reproduce process always leads to assertion and 
crash in my debug build. This problem has been bothering me for several days.  To reproduce 
the problem:
1. go to http://www.chinesecomputing.com
2. click any link on the home page (optional: click down several links)
3. change the Character Coding selection for the page (e.g., change to Chinese 
Traditional Big 5)

After step 3, the program shows assertion at file jsgc.c, line 448

I spent some time on this, and posted here for your reference:
After step 2, set breakpoint at jsgc.c, 1103, 
    rt->gcRunning = JS_TRUE;
Then follow step 3. The program will stop at jsgc.c:1103 twice. The first
time it stop there, let it go. The second time it will crash. 

The reentrance happened in the last item of rt->gcArenaPool.first (around line 1205).
It happened inside function "finalizer(cx, thing);".

Comment 1

17 years ago
Marking NEW this has a lot of good info in it.
Severity: blocker → critical
Ever confirmed: true
Keywords: crash

Comment 2

17 years ago
Crash still occurring with debug build from 2001-03-12 on WinNT.
My steps to reproduce are slightly different than above: 

1. Go to http://www.chinesecomputing.com
2. Change the character coding selection for the page to anything new
3. CRASH !!! 

void *
js_AllocGCThing(JSContext *cx, uintN flags)
    JSBool tried_gc;
    JSRuntime *rt;
    JSGCThing *thing;
    uint8 *flagp;

#ifdef TOO_MUCH_GC
    js_GC(cx, GC_KEEP_ATOMS);
    tried_gc = JS_TRUE;
    tried_gc = JS_FALSE;

    rt = cx->runtime;
    JS_ASSERT(!rt->gcRunning); <<<<<<<<<<<<<<<<<<<<<<<<<<< CRASHED HERE
    if (rt->gcRunning) {
        return NULL;

Comment 3

17 years ago
Created attachment 27639 [details]
WinNT stack trace of crash

Comment 4

17 years ago

1. This crash only happened for me with debug builds, not the nightly binaries

2. With the nightly binaries, I would occasionally get this messagbox on WinNT:

   ###!!!  ASSERTION:  no handler for font package: '0', file       
   d:\mozilla\intl\locale\src\nsFontPackageService.cpp, line 70

3. With the debug build on WinNT, I saw messages like this in the 
   debug console just before the crash:

   JavaScript error: 
   chrome://global/content/charsetOverlay.js  line 126:  wnd has no properties

   JavaScript error: 
   chrome://global/content/charsetOverlay.js  line 126:  wnd has no properties

   ###!!!  ASSERTION:  docshell has null child:  'shell',  file       
   d:\mozilla\docshell\base\nsDocShell.cpp,  line 153

   Charset Overlay menu item pressed: EUC-JP

Comment 5

17 years ago
I think this is not a JS Engine problem; reassigning to Embedding:Docshell.
cc'ing Brendan, jband for a more refined analysis on the root cause - 
Assignee: rogerl → adamlock
Component: Javascript Engine → Embedding: Docshell
QA Contact: pschwartau → adamlock

Comment 6

17 years ago
Phil, any place where you see this on Win32 debug builds only and with another 
assertion going the you shoul dsuspect a dup of bug 54792. Look in the stack 
trace for the nest into the event loop from the NS_ASSERTION with a JS gc 
further up the stack.

Comment 7

17 years ago
I wasn't sure it was bug 54792 because I didn't see any nsDebug::Assertion 
anywhere in the stack. Should I mark this as a dupe of 54792, then?  

Comment 8

17 years ago
Phil, It is not clear if this bug on the whole is only a dup of bug 54792 or 
not. I think that the two cases you listed probably are. The stack trace 
attached to this bug is not conclusive for me. There may be an assert hiding in 
the transition from the the View dtor into the NT code (that the debugger is 
not showing). Or something else might be going on that causes it to get into 
dispatching events. We know there have been some cases where dtors called 
from JS finalizers have made calls into other objects that then get back into JS 
and cause this assert. I don't know if that is one of them or just the stupid 
Windows assert dialog bug.

Comment 9

17 years ago
The assertion happens because destruction of the old set of docshells causes 
garbage collection to start. This causes the destruction of the associated views 
and when they destroy or release their parent widget the thread goes into 
message processing loop. Before the loop ends, a DOM keyboard event is processed 
from the queue and the assertion occurs as it is turned into a JS object.

Comment 10

17 years ago
John, it does look similar to the other bug.

I think the problem here is the message processing loop happening during GC. 
This makes it possible for DOM events to be handled before the loop ends. DOM 
events usually end up with a JS object being created which is causing the bug 
we're seeing because GC is still in progress.

I can't think of why there is a message processing loop in the middle of this 
stack trace, but it could be that the widget is actually a proxy so calling 
methods on it causes events to be posted and a processing loop during which 
other things can happen.

If so then perhaps we can fix this problem by putting checks into the proxy code 
so that when the calling thread and the object thread are the same it just makes 
a straight call rather than using a queue.


17 years ago
Target Milestone: --- → mozilla0.9
Danm saw a similar problem, and testified that Windows just *had* to process a
message or two in the middle of a Win32 call that I claimed should not process

I am clueless about Windows, and defer to others, but it seems to me that
between our code's reentrancy restrictions, and JS's run-to-completion execution
model, we need to keep event/message processing confined to well-known loops
down the stack from JS and other code activations.  Can we?


Comment 12

17 years ago
The only time I know that Win32 will enter a message processing loop without 
explicitly being told to is for a marshalled COM call. Since we don't make such 
a call anywhere near here (though it could happen around the clipboard / drag & 
drop code) I'm not not sure what the reason is.

If it does occur some other time I don't think there is much we can do about 

Comment 13

17 years ago
Brendan is talking about bug 69586; a crash with a stack trace like this:

js_AllocGCThing(JSContext * 0x00db3460, unsigned int 0) line 448 + 41 bytes
nsWindow::DispatchFocus(unsigned int 105) line 4112 + 15 bytes
nsWindow::WindowProc(HWND__ * 0x00030288, unsigned int 7, unsigned int 393444, 
long 0) line 923 + 27 bytes
USER32! 77e13eb0()
DocumentViewerImpl::~DocumentViewerImpl() line 414 + 97 bytes
js_GC(JSContext * 0x035a8908, unsigned int 0) line 1218 + 11 bytes
nsDocShell::CreateContentViewer(nsDocShell * const 0x035b47c0, const char * 
0x0012f828, nsIRequest * 0x0482c698, nsIStreamListener * * 0x0012f87c) line 2771 
+ 32 bytes
PL_HandleEvent(PLEvent * 0x036924ec) line 586 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x00b57318) line 516 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x001e00e6, unsigned int 49442, unsigned int 0, 
long 11891480) line 1067 + 9 bytes
nsAppShellService::Run(nsAppShellService * const 0x00b4b770) line 408

Some object is being deleted, Mozilla enters js_GC, the whole window gets torn 
down, Windows sends a WM_SETFOCUS message directly to some (presumably, other) 
window, the handler for that event invokes JS. eek.

It isn't entering a Windows message processing loop at all. Windows is 
dispatching Windows events directly. All very legal. All very problematic.

Comment 14

17 years ago
It looks as if focus events should be discarded when a window is being taken 
down. Does anyone know who did the focus stuff?

Comment 15

17 years ago
CC'ing saari@netscape.com

Chris, do you have any ideas on the best way to prevent the focus issues from 
causing DOM events to be fired to a window during it's destruction?

Comment 16

17 years ago
Event firing during window transitions is a general problem that we haven't
really addressed yet. We could certainly lock things down if we notify the
presshell that it is about to be destroyed and then proceed with said destruction.

Comment 17

17 years ago
In the stack trace I posted above, the message causing all the trouble is a *get* 
focus message on a window other than the one being destroyed. Two different 
PresShells. Locking down event handling on a window while it is being destroyed 
wouldn't help; in fact, that may be the only window you wouldn't have to lock 
down. Yar.

Comment 18

17 years ago
It may be possible to stick some extra refcounts on the document viewer to stop 
it going away right in the middle of GC. Investigating further.

Comment 19

17 years ago
Refcount testing didn't work, moving to 0.9.1. 

Note also that the bug has no apparant ill-effect on the release version save 
perhaps for a lost focus event. The crash, or rather assertion only happens in 
debug mode.


17 years ago
Target Milestone: mozilla0.9 → mozilla0.9.1
Adam: no kungFuDeathGrip luck?

The debug-only assertion-crash (JS assertions are always fatal) does mean that
in release builds, at least one JS object (or perhaps a string or double, those
are GC-allocated too) allocation is failing.  This should throw an "out of
memory" exception.  Any sign of that?  If not, perhaps the JS allocation is also
somehow dependent on DEBUG.

*** Bug 77064 has been marked as a duplicate of this bug. ***

Comment 22

17 years ago
[Win2K] I am seeing this now and then.

Comment 23

17 years ago
Here is a list of things I tried, all of which were in nsDocShell::Destroy:

1. Put a death grip on the docshell's content viewer
2. Move the call to DestroyChildren() to before the destruction of the parent 
docshell's content viewer
3. Splitting the call to destroy each child docshell into two methods called 
before and after content viewer destruction.

No luck so far.

I will work on this some more. I will also investigate whether the loss of the 
focus DOM event has any repercussions, e.g. JS thinking it's out of memory, 
focus problems etc. Suggestions are welcome.


17 years ago
Blocks: 77421

Comment 24

17 years ago
Created attachment 32312 [details] [diff] [review]
Patch fixes the GC problem, review and superreview please

Comment 25

17 years ago
My previous attempts at death grips were in the wrong place. Turns out that a 
single line death grip in nsDocShell::Embed() ensures the content viewer is 
destroyed *after* GC so this chain of events does not occur.

Patch also removes a few extraneous lines related to the nsDocShell::Destroy() 

Can I have review and super-review please?

Comment 26

17 years ago
For ease of reference later, a comment would help here:

+     // Ensure that the content viewer is destroyed *after* the GC - bug 71515
+     nsCOMPtr<nsIContentViewer> kungfuDeathGrip = mContentViewer;

Comment 27

17 years ago
Thanks, I'll add a comment above the line.

Comment 28

17 years ago
this looks good to me - sr=rpotts.

you might want to delete the comment at line 3079 which says "don't hold onto a 
reference to the DocViewer beyond this point" :-)  Since that is exactly what 
this patch is doing :-)

I believe that that comment is old and out of date...  Does anyone think that 
the comment is still valid?

-- rick

Comment 30

17 years ago
Fix checked in.
Last Resolved: 17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.