Closed
Bug 50705
Opened 24 years ago
Closed 24 years ago
root_points_to_gcArenaPool assertion / crashes due to leaked root name "timeout.expr"
Categories
(SeaMonkey :: General, defect, P1)
SeaMonkey
General
Tracking
(Not tracked)
VERIFIED
FIXED
M18
People
(Reporter: sean, Assigned: danm.moz)
References
Details
(Keywords: crash, testcase, Whiteboard: [nsbeta3-][PDTP2] [rtm++])
Attachments
(6 files)
Error: The address passed to JS_AddNamedRoot currently holds an invalid jsval. This is usually caused by a missing call to JS_RemoveRoot. Root name is "timeout.expr". Assertion failure: root_points_to_gcArenaPool, at h:\moz\mozilla\js\src\jsgc.c:668 A callstack for this assertion is at: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=13650 (which was attached to bug 43466) This assertion is occurring with a modified Beatnik eMix. Currently available eMixes do not run in Mozilla - will attempt to generate a simplified testcase.
Comment 1•24 years ago
|
||
Lucky jst prolly gets this one. I'll take a quick look too, and see if some obvious travis-ty in the code is to blame. /be
I get something quite like this when loading a simple page with two frames. It doesn't say anything about the gc, instead it gets a segmentation fault, but the backtrace sure looks the same. Here is a backtrace: #0 0x403320b9 in chunk_free (ar_ptr=0x41900010, p=0x5a500) at malloc.c:3094 #1 0x40331fba in __libc_free (mem=0x41df94a8) at malloc.c:3023 #2 0x401ffa65 in PR_Free (ptr=0x41df94a8) at prmem.c:66 #3 0x400f4af8 in nsMemoryImpl::Free (this=0x8065ba0, ptr=0x41df94a8) at nsMemoryImpl.cpp:143 #4 0x400f5260 in nsMemory::Free (ptr=0x41df94a8) at nsMemoryImpl.cpp:269 #5 0x400a7710 in nsStr::Free (aDest=@0x41df6010) at nsStr.cpp:685 #6 0x400a69e2 in nsStr::Destroy (aDest=@0x41df6010) at nsStr.cpp:89 #7 0x400a7d1a in nsCString::~nsCString (this=0x41df600c, __in_chrg=2) at nsString.cpp:137 #8 0x40f6589d in nsXBLBinding::~nsXBLBinding (this=0x41df6000, __in_chrg=3) at nsXBLBinding.cpp:367 #9 0x40f64fe7 in nsXBLBinding::Release (this=0x41df6000) at nsXBLBinding.cpp:265 #10 0x406c0cef in nsXULElement::SetDocument (this=0x41d70f70, aDocument=0x0, aDeep=1, aCompileEventHandlers=1) at ../../../dist/include/nsCOMPtr.h:489 #11 0x406c1837 in nsXULElement::SetDocument (this=0x41dcaf28, aDocument=0x0, aDeep=1, aCompileEventHandlers=1) at nsXULElement.cpp:2306 #12 0x40fd42dd in nsGenericElement::SetDocumentInChildrenOf (aContent=0x41dcacf4, aDocument=0x0, aCompileEventHandlers=1) at nsGenericElement.cpp:1237 #13 0x40fd5028 in nsGenericElement::SetDocument (this=0x41dcad04, aDocument=0x0, aDeep=1, aCompileEventHandlers=1) at nsGenericElement.cpp:1326 #14 0x40eee5b1 in nsXMLElement::SetDocument (this=0x41dcacf0, aDocument=0x0, aDeep=1, aCompileEventHandlers=1) at nsXMLElement.h:67 #15 0x40f6ce39 in nsXBLBinding::ChangeDocument (this=0x41d70e98, aOldDocument=0x8592bf8, aNewDocument=0x0) at nsXBLBinding.cpp:1159 #16 0x406c0ad6 in nsXULElement::SetDocument (this=0x41dcdd08, aDocument=0x0, aDeep=1, aCompileEventHandlers=1) at nsXULElement.cpp:2224 #17 0x40d13504 in nsGfxScrollFrame::SetDocumentForAnonymousContent (this=0x41df3a0c, aDocument=0x0, aDeep=1, aCompileEventHandlers=1) at nsGfxScrollFrame.cpp:461 #18 0x40f9d5bd in nsDocument::SetScriptGlobalObject (this=0x8592bf8, aScriptGlobalObject=0x0) at nsDocument.cpp:1716 #19 0x40faaefb in DocumentViewerImpl::~DocumentViewerImpl (this=0x8592f08, __in_chrg=3) at nsDocumentViewer.cpp:410 #20 0x40faab50 in DocumentViewerImpl::Release (this=0x8592f08) at nsDocumentViewer.cpp:348 #21 0x41637fbe in nsCOMPtr<nsIContentViewer>::assign_with_AddRef (this=0x41d86440, rawPtr=0x0) at ../../dist/include/nsCOMPtr.h:471 #22 0x41620990 in nsDocShell::SetupNewViewer (this=0x41d863b8, aNewViewer=0x41dd5dd0) at ../../dist/include/nsCOMPtr.h:583 #23 0x4162b50a in nsWebShell::SetupNewViewer (this=0x41d863b8, aViewer=0x41dd5dd0) at nsWebShell.cpp:386 #24 0x4161e121 in nsDocShell::Embed (this=0x41d863b8, aContentViewer=0x41dd5dd0, aCommand=0x41644b64 "", aExtraInfo=0x0) at nsDocShell.cpp:2393 #25 0x4162b76d in nsWebShell::Embed (this=0x41d863b8, aContentViewer=0x41dd5dd0, aCommand=0x41644b64 "", aExtraInfo=0x0) at nsWebShell.cpp:415 #26 0x4161f01c in nsDocShell::CreateContentViewer (this=0x41d863b8, aContentType=0xbffff674 "text/html", aOpenedChannel=0x41dcc5c0, aContentHandler=0xbffff6e4) at nsDocShell.cpp:2561 #27 0x41633994 in nsDSURIContentListener::DoContent (this=0x41d85b88, aContentType=0xbffff674 "text/html", aCommand=0, aWindowTarget=0x4012bddc "", aOpenedChannel=0x41dcc5c0, aContentHandler=0xbffff6e4, aAbortProcess=0xbffff62c) at nsDSURIContentListener.cpp:99 #28 0x4165ba32 in nsDocumentOpenInfo::DispatchContent (this=0x41928948, aChannel=0x41dcc5c0, aCtxt=0x0) at nsURILoader.cpp:359 #29 0x4165aff7 in nsDocumentOpenInfo::OnStartRequest (this=0x41928948, aChannel=0x41dcc5c0, aCtxt=0x0) at nsURILoader.cpp:233 #30 0x40875c32 in nsFileChannel::OnStartRequest (this=0x41dcc5c0, transportChannel=0x4190ea00, context=0x0) at nsFileChannel.cpp:633 #31 0x407f6791 in nsOnStartRequestEvent::HandleEvent (this=0x4191de98) at nsAsyncStreamListener.cpp:210 #32 0x407f5e76 in nsStreamListenerEvent::HandlePLEvent (aEvent=0x41daf8f8) at nsAsyncStreamListener.cpp:97 #33 0x400e7132 in PL_HandleEvent (self=0x41daf8f8) at plevent.c:587 #34 0x400e7036 in PL_ProcessPendingEvents (self=0x80b1178) at plevent.c:528 #35 0x400e8bb0 in nsEventQueueImpl::ProcessPendingEvents (this=0x80b1140) at nsEventQueue.cpp:356 #36 0x41309abf in event_processor_callback (data=0x80b1140, source=8, condition=GDK_INPUT_READ) at nsAppShell.cpp:158
Comment 3•24 years ago
|
||
someone else want to own this?
I'd be happy to take a look at it, but I've never noticed this happening and there's no testcase in this bug. The source for the simple page with two frames mentioned in the above stacktrace would be handy.
danm: Sorry, my crash was due to some modifications in my tree that accidentaly modified some memory. I've fixed that now.
Adding to Alex's comment above: I've been talking offline with Sean, who tells me the bug can't be seen outside the context of a commercial product he's working on. It's an HTML/JS entity that's also tightly bound to a plugin, though he thinks the plugin isn't the part that's responsible for the problem mentioned in this bug. Still, this product is required to reproduce the problem and there are currently legal issues with seeing the product, so we're presently rather stymied on this bug.
Comment 7•24 years ago
|
||
This should be reassigned to sean, or someone else who is working on his project.
Reporter | ||
Comment 8•24 years ago
|
||
taking it until we get a testcase that doesn't require legal paperwork or someone will sign the paperwork (which Dan has said he will do when he gets back next week).
Assignee: asa → sean
Reporter | ||
Comment 9•24 years ago
|
||
reassigning back to danm. I've put together a testcase (no forms to sign) that is available at: http://www.mindspring.com/~sean_/mozilla/gcassert/ The testcase consists of 3 html pages. Steps to repro are on the index page. I'm able to repro using a build from the tip Sept 6.
Comment 10•24 years ago
|
||
cc jrgm
Comment 11•24 years ago
|
||
This no longer fits our profile for nsbeta3+ bugs, so ->future.
Target Milestone: --- → Future
Reporter | ||
Comment 12•24 years ago
|
||
The JS_ASSERT that occurs stops debug builds of mozilla dead in its tracks (no ignore option in the dialog, just ok to debug and cancel to stop the process). In a release build, will an invalid jsval result in a crash or a leak?
Comment 13•24 years ago
|
||
Too bad about JS_ASSERT, but I think the developer who hits it should debug it (we're all debugger-enabled, right?). We do not want any of this O(n^2) growth rate code enabled for non-DEBUG builds, it will slow the GC down (and I just fixed a separate O(bad) bug in the GC!). So they will probably crash on an invalid root, especially one in freed or recycled memory. That's the breaks. /be
Reporter | ||
Comment 14•24 years ago
|
||
I didn't mean to imply that the debug code should be enabled in a release build - just trying to figure out the implications for Beatnik content if the bug is not fixed. In light of that assessment, adding crash keyword.
Keywords: crash
Comment 15•24 years ago
|
||
Adding vidur and heikki to cc: list for insight on DOM object garbage collection, because that's what's causing the crash. Sean might be able to fix this if he can get a bit of Q&A.
Reporter | ||
Comment 16•24 years ago
|
||
Confirming that this does crash the browser in release builds using this morning's talkback build - there are about 5 talkbacks from me this evening and they should all reference bug 50705. Will continue to investigate.
Comment 18•24 years ago
|
||
Increasing priority to P1 and nominating as an nsbeta3/rtm crasher as we've now verified that: 1) this problem indeed causes crashes in release builds 2) this impacts the stability/usability of Beatnik, which is a high profile plug-in partner 3) high profile web sites such as http://www.beatnik.com/emix, http://digital.yahoo.com, http://www.mtv.com and artist websites all have emixes which may crash due to this bug Clearing Future milestone to -- to trigger re-evaluation based on this new data.
Comment 19•24 years ago
|
||
nsbeta3+, based on EKrock's assessment.
Whiteboard: [nsbeta3+]
Target Milestone: --- → M18
Comment 20•24 years ago
|
||
I bet this bug, bug 53094, and maybe even bug 53123 are all symptoms of the same bad change that's causing JSContexts (probably DOM ones) to be destroyed but not popped off the per-thread context stack maintained by XPConnect. And probably to be destroyed without a timeout cleanup. But I could be wrong, and these are separate bugs as well as symptoms. My two cents (for now); hope it stirs the pot. /be
Reporter | ||
Comment 21•24 years ago
|
||
I can confirm that JS_AddNamedRoot(cx, &timeout->expr, "timeout.expr") is being called one more time than ::JS_RemoveRoot(cx, &aTimeout->expr).
Reporter | ||
Comment 22•24 years ago
|
||
I can also confirm that there is a call to GlobalWindowImpl::DropTimeout in which both aContext and mContext are null - so JS_RemoveRoot is not called. And I've also received one other root_points_to_gcArenaPool assertion twice with the root name "nsGenericElement::mScriptObject".
Comment 23•24 years ago
|
||
Sean: what's the stack when DropTimeout is called with mContext null? Can you somehow trace the stack that destroyed mContext? This could be a dup of bug 53094. /be
Reporter | ||
Comment 24•24 years ago
|
||
Reporter | ||
Comment 25•24 years ago
|
||
Attached a trace of what I think are the calls leading to this problem. This is from a build based on a pull from the tip around noon. Line numbers will differ due to additional logging I've added. There is a timer set via js that eventually leads to the closing of a popup window. The window is closed during the servicing of the timer being fired. mContext is set to nsnull 3 times during the servicing of the timer. DropTimeout does not fail during the servicing of this particular timer event as the ref_count of the timer does not hit 0 - so no attempt is made to 'really' drop the timeout. The last trace in the attachment I think shows another timer being serviced on the closed window - this is the DropTimeout that fails.
Comment 26•24 years ago
|
||
PDT: Very bad, but not a ship stopper. Assuming emix isn't high frequency. Evidence that the bad root could cause even more damage could raise priority.
Whiteboard: [nsbeta3+] → [nsbeta3+][PDTP2]
Comment 27•24 years ago
|
||
*** Bug 53509 has been marked as a duplicate of this bug. ***
Comment 28•24 years ago
|
||
Note that I marked Bug 53509 as a dup. The stack over there is different (and the bug owner might want to give it a look), but it is the same named root.
Reporter | ||
Comment 29•24 years ago
|
||
updating summary
Summary: root_points_to_gcArenaPool assertion in SetNewDocument → root_points_to_gcArenaPool assertion due to leaked root name "timeout.expr"
Comment 30•24 years ago
|
||
mass-adding rtm keyword to all open nsbeta3+ xptoolkit bugs
Comment 31•24 years ago
|
||
marking nsbeta3- per clayton's comments. already nominated for rtm.
Whiteboard: [nsbeta3+][PDTP2] → [nsbeta3-][PDTP2]
Comment 32•24 years ago
|
||
53628 is a dup of this, but it's a crasher. updating summary. danm / brendan: I see in mozilla/dom/src/base/nsGlobalWindow.cpp where we call JS_AddNamedRoot() and were we call JS_RemoveRoot(cx, &aTimeout->expr); any pointers or tips on how to debug / fix this? what can I do to help fix this? this crasher blocks #47521
Summary: root_points_to_gcArenaPool assertion due to leaked root name "timeout.expr" → root_points_to_gcArenaPool assertion / crashes due to leaked root name "timeout.expr"
Comment 33•24 years ago
|
||
*** Bug 53628 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 34•24 years ago
|
||
Opened bug 54121 for the root_points_to_gcArenaPool assertion with root name "nsGenericElement::mScriptObject" I mentioned 9/21 - have seen that a lot more in the last week.
Whiteboard: [nsbeta3-][PDTP2] → [nsbeta3-][PDTP2] [rtm need info]
Reporter | ||
Comment 35•24 years ago
|
||
Testcase in 9/6 post is blocked by bug 55275
Comment 36•24 years ago
|
||
Another trivial way of reproducing this crash is to go to www.fortunecity.com, which uses some plugin called comet, I think, and then navigate around a bit. Whoever tries to fix this, if they could check that site at well, I'd appreciate it.
Comment 37•24 years ago
|
||
Is this bug still a worthwhile candidate for fixing in N6? That'll probably require a simple/safe fix in the next day or so, or a clearer picture of how this affects MTBF.
Comment 38•24 years ago
|
||
I think the best way to get a clearer picture of how this affects mtbf is to fix it. As near as I can tell, it's a topcrasher, and as checkins for a few of the other topcrashers get checked in, it might even become *the* topcrasher. A lot of the js type crashes bounce around a lot and get marked dups, etc, because the stack traces are so similar. At least for this one we have an easily reproducible case, though I haven't heard anything about how hard it is to fix.
Comment 39•24 years ago
|
||
severity->critical
Severity: normal → critical
Whiteboard: [nsbeta3-][PDTP2] [rtm need info] → [nsbeta3-][PDTP2] [rtm+ need info]
Comment 40•24 years ago
|
||
Also crashes under Mac OS 9.0.4, Moz build 2000100720-M18. Stdlog from crash to follow.
Comment 41•24 years ago
|
||
Reporter | ||
Comment 42•24 years ago
|
||
updating platform/OS per last post Also note that bug 55881 and bug 55445 may be related.
OS: Windows 2000 → All
Hardware: PC → All
Assignee | ||
Comment 43•24 years ago
|
||
The crash is caused by a JS timeout (actually an Interval) that outlives the JS context of its owning window. It is finally deleted, but there being no context from which to unroot it, it was never unrooted. At some later time a JS GC would kick in and die. This is fairly well explained in the comments for a minimal, rtm-ready patch attached below.
Status: NEW → ASSIGNED
Whiteboard: [nsbeta3-][PDTP2] [rtm+ need info] → [nsbeta3-][PDTP2] [rtm+ need info] patch awaiting review
Assignee | ||
Comment 44•24 years ago
|
||
Assignee | ||
Comment 45•24 years ago
|
||
I'm also attaching local copies of the source for sean's testcase. We'll need to revisit this problem because the solution posted just above is suboptimal, so I'd like to make sure we have the test case always available.
Assignee | ||
Comment 46•24 years ago
|
||
Assignee | ||
Comment 47•24 years ago
|
||
Assignee | ||
Comment 48•24 years ago
|
||
Comment 49•24 years ago
|
||
I'll try this patch out on the pr3 branch.
Comment 50•24 years ago
|
||
I ran with the attached patch - I still crash when I go to www.fortunecity.com and click on the music scene link. I was told that crash was a dup of this bug, so if you could try that url with your fix, I'd appreciate it.
URL: www.fortunecity.com
Comment 51•24 years ago
|
||
Patch looks good to me, I'm unable to reproduce the fortunecity.com crash so I dunno if that's related or not. r=jst
Assignee | ||
Comment 52•24 years ago
|
||
The fortunecity crash happens pretty readily on my machine, with or without my fix for this bug. But I don't understand why it's rumoured to be a duplicate bug. The roots' names are different: this one is timeout.expr; fortunecity's is nsGenericElement::mScriptObject. Is there a separate bug for the fortunecity crash? It wants one. I don't see one. I'm removing that URL from this bug's summary.
URL: www.fortunecity.com
Reporter | ||
Comment 53•24 years ago
|
||
bug 54121 is for nsGenericElement::mScriptObject
Comment 54•24 years ago
|
||
I think if you're gonna initialize rt = nsnull, you might as well test after you try the do_GetService and the GetRuntime calls that rt is not null, and if it is still null, bail (the roots must not matter, if the runtime is gone -- probably some odd shutdown case, if it can happen at all). Fix that, and a=brendan@mozilla.org /be
Assignee | ||
Comment 55•24 years ago
|
||
Alright, will do. Got a point there, after all.
Whiteboard: [nsbeta3-][PDTP2] [rtm+ need info] patch awaiting review → [nsbeta3-][PDTP2] [rtm+]
Assignee | ||
Comment 57•24 years ago
|
||
.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 58•24 years ago
|
||
Oh right. I did have something to say. Bugzilla protests so much I tend to ignore him. Fixed on trunk and Netscape rtm branch.
Comment 59•24 years ago
|
||
FYI: Branch build 2000-10-16-09MN6, NT4. The crash in Mail that was mentioned in bug# 47521 and 53628 no longer occurs.
Comment 60•24 years ago
|
||
can someone please verify this? I'm not qualifed to do this...
Comment 61•24 years ago
|
||
you will have problems reproduce this since the start up home page has changed to a giant image map.
Comment 63•24 years ago
|
||
Sean: how did you verify this? did you test on mac, window and linux? I think that the correct way to verify is to give an lxr url for the checkin.
Reporter | ||
Comment 64•24 years ago
|
||
vrfd on Win NT and Win2K, using an undistributable Beatnik eMix and using the attached testcase. checkin lxr url: http://bonsai.mozilla.org/cvsview2.cgi? diff_mode=context&whitespace_mode=show&file=nsGlobalWindow.cpp&root=/cvsroot&sub dir=mozilla/dom/src/base&command=DIFF_FRAMESET&rev1=1.354&rev2=1.355
Updated•23 years ago
|
Updated•20 years ago
|
Product: Browser → Seamonkey
You need to log in
before you can comment on or make changes to this bug.
Description
•