nsNavHistory trying to access already-deleted statement [tinderbox leak box random crash]

RESOLVED FIXED in mozilla1.9.1b2

Status

()

Toolkit
Places
P1
blocker
RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: vlad, Unassigned)

Tracking

Trunk
mozilla1.9.1b2
x86
All
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

The 10.5.2 leak build box has started randomly crashing toward the end of alive test as part of leaks.  The last bits in the log are:

WARNING: recurring into frame construction: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_FrameC] == 0', file ../../dist/include/layout/nsPresContext.h, line 959
WARNING: recurring into frame construction: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_FrameC] == 0', file ../../dist/include/layout/nsPresContext.h, line 959
WARNING: NS_ENSURE_TRUE(aPresContext->Document()->GetWindow()) failed: file /builds/buildbot/moz2-slave/mozilla-central-macosx-debug/build/content/events/src/nsIMEStateManager.cpp, line 174
--WEBSHELL 0x1032a0a0 == 6
--WEBSHELL 0x1032c190 == 5
ERROR FAIL Exited with code 256 during test run
program finished with exit code 0

In a successful run, the webshell keeps getting released, e.g.:

WARNING: recurring into frame construction: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_FrameC] == 0', file ../../dist/include/layout/nsPresContext.h, line 959
WARNING: NS_ENSURE_TRUE(aPresContext->Document()->GetWindow()) failed: file /builds/buildbot/moz2-slave/mozilla-central-macosx-debug/build/content/events/src/nsIMEStateManager.cpp, line 174
--WEBSHELL 0x11292e80 == 6
--WEBSHELL 0x112a3a40 == 5
WARNING: NS_ENSURE_TRUE(sgo || !hasHadScriptObject) failed: file /builds/buildbot/moz2-slave/mozilla-central-macosx-debug/build/content/base/src/nsContentUtils.cpp, line 4372
WARNING: NS_ENSURE_SUCCESS(rv, 0) failed with result 0x8000FFFF: file /builds/buildbot/moz2-slave/mozilla-central-macosx-debug/build/content/base/src/nsContentUtils.cpp, line 2722
--WEBSHELL 0xc230a80 == 4
--DOMWINDOW == 15 (0x115ba8a0) [serial = 7] [outer = 0x0] [url = http://localhost:8888/bloatcycle.html]
--WEBSHELL 0x115b88c0 == 3

So something between that release to 5 and the warning in nsContentUtils is crashing.

The leak boxes build with symbols, right?  Can we get gdb on this? It seems pretty reproducible.
Severity: normal → blocker
The Linux machine may have crashed with the same issue:

###!!! ASSERTION: GetPrimaryFrameFor() called while nsFrameManager is being destroyed!: 'Error', file /builds/moz2_slave/mozilla-central-linux-debug/build/layout/base/nsFrameManager.cpp, line 340
###!!! ASSERTION: GetPrimaryFrameFor() called while nsFrameManager is being destroyed!: 'Error', file /builds/moz2_slave/mozilla-central-linux-debug/build/layout/base/nsFrameManager.cpp, line 340
--WEBSHELL 0x9084cb8 == 5
--DOMWINDOW == 12 (0x955bce8) [serial = 16] [outer = 0x8cc25a8] [url = http://www.mozilla.org/newlayout/samples/test6.html]
--DOMWINDOW == 11 (0x8f2b388) [serial = 15] [outer = 0x8cc25a8] [url = http://www.mozilla.org/newlayout/samples/test8.html]
--DOMWINDOW == 10 (0x8be2998) [serial = 14] [outer = 0x8cc25a8] [url = http://www.mozilla.org/newlayout/samples/test2.html]
--WEBSHELL 0x8b46470 == 4
###!!! ASSERTION: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../../dist/include/xpcom/nsCOMPtr.h, line 796
** (bug-buddy:7090): WARNING **: Couldn't load icon for Open Folder
/usr/libexec/Gecko: No such file or directory.
warning: the debug information found in "/usr/lib/debug//lib/libpthread-2.5.so.debug" does not match "/lib/libpthread.so.0" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//lib/libdl-2.5.so.debug" does not match "/lib/libdl.so.2" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//usr/lib/libpangocairo-1.0.so.0.1400.9.debug" does not match "/usr/lib/libpangocairo-1.0.so.0" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//usr/lib/libpango-1.0.so.0.1400.9.debug" does not match "/usr/lib/libpango-1.0.so.0" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//lib/libm-2.5.so.debug" does not match "/lib/libm.so.6" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//lib/libc-2.5.so.debug" does not match "/lib/libc.so.6" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//lib/ld-2.5.so.debug" does not match "/lib/ld-linux.so.2" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//usr/lib/libpangoft2-1.0.so.0.1400.9.debug" does not match "/usr/lib/libpangoft2-1.0.so.0" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//lib/librt-2.5.so.debug" does not match "/lib/librt.so.1" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//lib/libnsl-2.5.so.debug" does not match "/lib/libnsl.so.1" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//lib/libnss_files-2.5.so.debug" does not match "/lib/libnss_files.so.2" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//usr/lib/gconv/ISO8859-1.so.debug" does not match "/usr/lib/gconv/ISO8859-1.so" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//lib/libresolv-2.5.so.debug" does not match "/lib/libresolv.so.2" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//lib/libutil-2.5.so.debug" does not match "/lib/libutil.so.1" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//usr/lib/gconv/UTF-16.so.debug" does not match "/usr/lib/gconv/UTF-16.so" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//lib/libnss_dns-2.5.so.debug" does not match "/lib/libnss_dns.so.2" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//usr/lib/pango/1.5.0/modules/pango-basic-fc.so.debug" does not match "/usr/lib/pango/1.5.0/modules/pango-basic-fc.so" (CRC mismatch).
dwarf2_read_address: Corrupted DWARF expression.
dwarf2_read_address: Corrupted DWARF expression.
dwarf2_read_address: Corrupted DWARF expression.
dwarf2_read_address: Corrupted DWARF expression.

command timed out: 1200 seconds without output, killing pid 7046
process killed by signal 9
program finished with exit code -1
elapsedTime=1435.576457
=== Output ended ===
======== BuildStep ended ========

But unfortunately, we didn't get a stack trace.  Slightly different spot, but the nsCOMPtr Assertion is definitely a crash.
Changing to reflect that this is cross-platform.

All we need is a stack trace here; I have no idea how to get one.
OS: Mac OS X → All
Summary: 10.5.2 leak build box randomly crashing towards end of alive test → leak build boxes randomly crashing towards end of alive test
I backed out bug 452979, on a hunch that it might have been exposing a crash in regexp tracing; it didn't change anything, so I put it back in.
Does this reproduced on try server?
Assignee: nobody → joduinn
Priority: -- → P1
Sounds like we're triggering bug-buddy there somehow. I think there's an environment variable to turn that off? We do have XPCOM_DEBUG_BREAK=stack-and-abort in the environment,
http://hg.mozilla.org/build/buildbot-configs/file/35f76a9c32c5/mozilla2/config.py#l180

but it's disabled per bug 463681 comment 4. However, I don't know if that matters for crashes at all, to be honest.
Priority: P1 → --
Yeah, the ah_crap_handler is supposed to stackdump for you, but bug buddy must be eating it:
http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsSigHandlers.cpp#142
I pushed a change that will (hopefully) get us a stack on Linux:
http://hg.mozilla.org/mozilla-central/rev/5ff478afc3f4
I've given vlad access to bm-xserve19 so he can try to get a stack from gdb.
(In reply to comment #7)
> I pushed a change that will (hopefully) get us a stack on Linux:
> http://hg.mozilla.org/mozilla-central/rev/5ff478afc3f4

Did this problem reproduce on try server?
Priority: -- → P1
Ok, got a stack trace from the OSX box.

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x08b577ca in BindStatementURI (statement=0x0, index=0, aURI=0x12988820) at /builds/moz2_slave/mozilla-central-macosx-debug/build/toolkit/components/places/src/nsNavHistory.cpp:6907
6907	      StringHead(utf8URISpec, HISTORY_URI_LENGTH_MAX));
(gdb) where
#0  0x08b577ca in BindStatementURI (statement=0x0, index=0, aURI=0x12988820) at /builds/moz2_slave/mozilla-central-macosx-debug/build/toolkit/components/places/src/nsNavHistory.cpp:6907
#1  0x08b62ddd in nsNavHistory::FindLastVisit (this=0x1089400, aURI=0x12988820, aVisitID=0xbfffba40, aSessionID=0xbfffbad0) at /builds/moz2_slave/mozilla-central-macosx-debug/build/toolkit/components/places/src/nsNavHistory.cpp:1958
#2  0x08b66de1 in nsNavHistory::AddVisitChain (this=0x1089400, aURI=0x1117fad0, aTime=1226540958531425, aToplevel=1, aIsRedirect=0, aReferrerURI=0x12988820, aVisitID=0xbfffbad8, aSessionID=0xbfffbad0, aRedirectBookmark=0xbfffbae0) at /builds/moz2_slave/mozilla-central-macosx-debug/build/toolkit/components/places/src/nsNavHistory.cpp:5010
#3  0x08b69027 in nsNavHistory::AddURIInternal (this=0x1089400, aURI=0x1117fad0, aTime=1226540958531425, aRedirect=0, aToplevel=1, aReferrer=0x12988820) at /builds/moz2_slave/mozilla-central-macosx-debug/build/toolkit/components/places/src/nsNavHistory.cpp:4881
#4  0x08b7946c in nsNavHistory::CommitLazyMessages (this=0x1089400) at /builds/moz2_slave/mozilla-central-macosx-debug/build/toolkit/components/places/src/nsNavHistory.cpp:5491
#5  0x08b79543 in nsNavHistory::LazyTimerCallback (aTimer=0x105f1cf0, aClosure=0x1089400) at /builds/moz2_slave/mozilla-central-macosx-debug/build/toolkit/components/places/src/nsNavHistory.cpp:5477
#6  0x006222e8 in nsTimerImpl::Fire (this=0x105f1cf0) at /builds/moz2_slave/mozilla-central-macosx-debug/build/xpcom/threads/nsTimerImpl.cpp:420
#7  0x00622520 in nsTimerEvent::Run (this=0x10533b20) at /builds/moz2_slave/mozilla-central-macosx-debug/build/xpcom/threads/nsTimerImpl.cpp:512
#8  0x0061b97a in nsThread::ProcessNextEvent (this=0xc17e20, mayWait=1, result=0xbfffbd5c) at /builds/moz2_slave/mozilla-central-macosx-debug/build/xpcom/threads/nsThread.cpp:510
#9  0x005a4d8a in NS_ProcessNextEvent_P (thread=0xc17e20, mayWait=1) at nsThreadUtils.cpp:227
#10 0x0061c011 in nsThread::Shutdown (this=0xbc675a0) at /builds/moz2_slave/mozilla-central-macosx-debug/build/xpcom/threads/nsThread.cpp:465
#11 0x08e30052 in mozStorageConnection::Close (this=0x99bfb60) at /builds/moz2_slave/mozilla-central-macosx-debug/build/storage/src/mozStorageConnection.cpp:236
#12 0x00636a45 in NS_InvokeByIndex_P (that=0x99bfb60, methodIndex=3, paramCount=0, params=0xbfffc064) at /builds/moz2_slave/mozilla-central-macosx-debug/build/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:179
#13 0x09330fab in XPCWrappedNative::CallMethod (ccx=@0xbfffc2b0, mode=XPCWrappedNative::CALL_METHOD) at /builds/moz2_slave/mozilla-central-macosx-debug/build/js/src/xpconnect/src/xpcwrappednative.cpp:2405
#14 0x0933b6f9 in XPC_WN_CallMethod (cx=0x102ba00, obj=0x9a01860, argc=0, argv=0x12892040, vp=0xbfffc3cc) at /builds/moz2_slave/mozilla-central-macosx-debug/build/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1477
#15 0x0024c255 in js_Invoke (cx=0x102ba00, argc=0, vp=0x12892038, flags=2) at jsinterp.cpp:1306
#16 0x0022845d in js_Interpret (cx=0x102ba00) at /builds/moz2_slave/mozilla-central-macosx-debug/build/js/src/jsinterp.cpp:5147
#17 0x0024c2e6 in js_Invoke (cx=0x102ba00, argc=0, vp=0x12892030, flags=0) at jsinterp.cpp:1324
#18 0x0932aa14 in nsXPCWrappedJSClass::CallMethod (this=0x111f4d80, wrapper=0x111f5fe0, methodIndex=3, info=0x110d438, nativeParams=0xbfffda04) at /builds/moz2_slave/mozilla-central-macosx-debug/build/js/src/xpconnect/src/xpcwrappedjsclass.cpp:1549
#19 0x09322293 in nsXPCWrappedJS::CallMethod (this=0x111f5fe0, methodIndex=3, info=0x110d438, params=0xbfffda04) at /builds/moz2_slave/mozilla-central-macosx-debug/build/js/src/xpconnect/src/xpcwrappedjs.cpp:563
#20 0x00636d28 in PrepareAndDispatch (self=0x111dac50, methodIndex=3, args=0xbfffdb24) at /builds/moz2_slave/mozilla-central-macosx-debug/build/xpcom/reflect/xptcall/src/md/unix/xptcstubs_unixish_x86.cpp:93
#21 0x00636d87 in nsXPTCStubBase::Stub3 (this=0x111dac50) at xptcstubsdef.inc:1
#22 0x0061b97a in nsThread::ProcessNextEvent (this=0xc17e20, mayWait=0, result=0xbfffdbd4) at /builds/moz2_slave/mozilla-central-macosx-debug/build/xpcom/threads/nsThread.cpp:510
#23 0x005a4eec in NS_ProcessPendingEvents_P (thread=0xc17e20, timeout=20) at nsThreadUtils.cpp:180
#24 0x091e2ff7 in nsBaseAppShell::NativeEventCallback (this=0x9949f30) at /builds/moz2_slave/mozilla-central-macosx-debug/build/widget/src/xpwidgets/nsBaseAppShell.cpp:121
#25 0x0919be9c in nsAppShell::ProcessGeckoEvents (aInfo=0x9949f30) at /builds/moz2_slave/mozilla-central-macosx-debug/build/widget/src/cocoa/nsAppShell.mm:373
#26 0x94a16678 in CFRunLoopRunSpecific ()
#27 0x94a16d18 in CFRunLoopRunInMode ()
#28 0x91dac6a0 in RunCurrentEventLoopInMode ()
#29 0x91dac4b9 in ReceiveNextEventCommon ()
#30 0x91dac32d in BlockUntilNextEventMatchingListInMode ()
#31 0x92d4d7d9 in _DPSNextEvent ()
#32 0x92d4d08e in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#33 0x92d460c5 in -[NSApplication run] ()
#34 0x0919a75e in nsAppShell::Run (this=0x9949f30) at /builds/moz2_slave/mozilla-central-macosx-debug/build/widget/src/cocoa/nsAppShell.mm:692
#35 0x08eb3502 in nsAppStartup::Run (this=0xc54b20) at /builds/moz2_slave/mozilla-central-macosx-debug/build/toolkit/components/startup/src/nsAppStartup.cpp:192
#36 0x000e4d3f in XRE_main (argc=6, argv=0xbffff21c, aAppData=0xc0e800) at /builds/moz2_slave/mozilla-central-macosx-debug/build/toolkit/xre/nsAppRunner.cpp:3263
#37 0x000027cb in main (argc=7, argv=0xbffff21c) at /builds/moz2_slave/mozilla-central-macosx-debug/build/browser/app/nsBrowserApp.cpp:156

Note that I forgot to get a JS stack, but statement is NULL -- this is mDBRecentVisitOfURL, at http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistory.cpp#1958 .  If you look earlier up the stack though, we are closing a mozStorageConnection, and then looks like maybe shutting down a thread (maybe joining?)... which is processing events, one of which wants to touch this code.

That statement is removed in FinalizeStatements, at http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/nsNavHistory.cpp#7485 which is called from FinalizeInternalStatements, which only seems to be called from JS.
Assignee: joduinn → nobody
Component: Release Engineering → Places
Product: mozilla.org → Toolkit
QA Contact: release → places
Summary: leak build boxes randomly crashing towards end of alive test → nsNavHistory trying to access already-deleted statement [tinderbox leak box random crash]
Target Milestone: --- → mozilla1.9.1b2
Version: other → Trunk
I suggest we back out changeset 3ed4b74c493f when the tree reopens (bug 463471)
That was checked in well after the crashes first started happening; it's possible that it's a different crash though, because we weren't getting the nsCOMPtr warning originally (though I didn't go through and check every log before).  That checkin is consistent with the crash happening on linux (I think -- tinderbox is down atm, so can't check).
(In reply to comment #9) (and comment #4)
> Did this problem reproduce on try server?

It was only happening on the leak test boxes, which are both the only things building debug and the only things enabling trace-refcnt, neither of which the try server does, so I'd bet that's a "no."
Started happening on win32 now; I don't think 3ed4 is the cause, though it might be making things worse.
backed out bug 463471. mac went green, windows is not done yet.
and windows is green as well. leaving open for now in case the crashes mentioned by vlad that occurred prior to the check-in of bug 463471 continue.
#4  0x08b7946c in nsNavHistory::CommitLazyMessages (this=0x1089400) at
/builds/moz2_slave/mozilla-central-macosx-debug/build/toolkit/components/places/src/nsNavHistory.cpp:5491
#5  0x08b79543 in nsNavHistory::LazyTimerCallback (aTimer=0x105f1cf0,

this is most likely the cause of calls to nulled out statements, the timer fires after we are closed. Previous crashes are not due to this but could be due to the big outage from yesterday (nas failures and so on)?
bug 463471 has been checked in again with fixes and the problem appear solved, can we close this?
Depends on: 463471
resolving through bug 463471, if you find something that still needs to be addressed, feel free to reopen and clarify please.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.