Closed Bug 626768 Opened 13 years ago Closed 13 years ago

Startup crash [@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run() ][@ mozalloc_abort(char const* const) | NS_DebugBreak_P | nsCycleCollectorGCHookRunnable::Run() ]

Categories

(Core :: XPCOM, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla2.0b11
Tracking Status
blocking2.0 --- final+

People

(Reporter: benjamin, Assigned: bent.mozilla)

References

Details

(Keywords: crash, Whiteboard: [hardblocker][has patch], fixed-in-tracemonkey)

Crash Data

Attachments

(1 file, 5 obsolete files)

Gleaned from crash reports:
http://crash-stats.mozilla.com/report/index/5e3ef010-11e0-455c-8f2b-90aaa2110118

Sometime early in shutdown we're hitting a NS_RUNTIMEABORT in the cycle collector code:

 	mozalloc.dll!mozalloc_abort(msg=0x0045f5d8)  Line 77	C++
 	xul.dll!NS_DebugBreak_P(aSeverity=0x00000003, aStr=0x6f3db824, aExpr=0x00000000, aFile=0x6f3db5c8, aLine=0x00000d5f)  Line 367	C++
 	xul.dll!nsCycleCollectorGCHookRunnable::Run() 	C++
>	xul.dll!nsThread::ProcessNextEvent(mayWait=0x7fe21321, result=0x6e936248)  Line 639	C++
 	xul.dll!nsRefPtr<nsIDOMEventListener>::~nsRefPtr<nsIDOMEventListener>()  Line 970	C++
 	xul.dll!nsSocketTransportService::Shutdown()  Line 472	C++
 	xul.dll!nsIOService::SetOffline(offline=0x00000001)  Line 750	C++
 	xul.dll!nsIOService::Observe(subject=0x0092b2b8, topic=0x6f2c7c50, data=0x6f2c7c2c)  Line 934	C++
 	xul.dll!nsObserverList::NotifyObservers(aSubject=0x0092b2b8, aTopic=0x00000000, someData=0x6f2c7c2c)  Line 130	C++
 	xul.dll!nsObserverService::NotifyObservers(aSubject=0x0092b2b8, aTopic=0x6f2c7c50, someData=0x6f2c7c2c)  Line 182	C++
 	xul.dll!nsXREDirProvider::DoShutdown()  Line 795	C++
 	xul.dll!ScopedXPCOMStartup::~ScopedXPCOMStartup()  Line 1115	C++
 	xul.dll!XRE_main(argc=0x00000001, argv=0x0092b0a8, aAppData=0x00915300) 	C++

The abort is this one: http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsCycleCollector.cpp#3420

The notification is profile-change-teardown, which happens before the xpcom-shutdown process, so the service manager is still alive and should be responding to requests normally.

This doesn't appear to be high-volume enough to block on, but it's worth having on record.
Keywords: crash
Hardware: x86_64 → x86
OS: Linux → All
This is now the number 1 topcrasher on b11pre.  I'd say we should block on it.
blocking2.0: --- → ?
Keywords: topcrash
I don't think it's actually #1.  This signature is for a bunch of NS_RUNTIMEABORTs, including the nasty font one that is the topcrasher on b10.  The number that are CC related looks pretty low.
Indeed, this bug is not a topcrash, and the font topcrash with the same signature is already blocking.
blocking2.0: ? → -
Keywords: topcrash
Summary: NS_RUNTIMEABORT crashes in cycle collector [@ mozcrt19.dll@0x1327f ] → NS_RUNTIMEABORT crashes in cycle collector [@ mozcrt19.dll@0x1327f ][@ mozalloc_abort(char const* const) ]
Summary: NS_RUNTIMEABORT crashes in cycle collector [@ mozcrt19.dll@0x1327f ][@ mozalloc_abort(char const* const) ] → NS_RUNTIMEABORT crashes in cycle collector [@ mozalloc_abort(char const* const) ][@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f ] (was [@ mozcrt19.dll@0x1327f ])
after the first day of beta 11 the signature  still showed up as the #2 top crash, and about 50% of the crashes appeared related to nsCycleCollector + maybe networking on the stack.  splitting up the signature in to multiple problems it appears that might be the first thing to look at.  we might get a better idea with a different sample on the second day of beta 11 but here is the current distribution.

______ distribution of 20 different stacks, looking at top 10 frames
      7  stacks like
0|0|mozalloc.dll|mozalloc_abort(char const * const)
0|1|mozcrt19.dll|
0|2|xul.dll|nsCycleCollectorGCHookRunnable::Run()
0|3|xul.dll|nsThread::ProcessNextEvent(int,int *)
0|4|xul.dll|nsCOMPtr_base::~nsCOMPtr_base()
0|5|xul.dll|nsThread::Shutdown()
0|6|nspr4.dll|
0|7|nspr4.dll|PR_AssertCurrentThreadOwnsLock
0|8|xul.dll|nsSocketTransportService::Shutdown()
0|9|xul.dll|nsDNSService::Shutdown()

      5  stacks like
0|0|mozalloc.dll|mozalloc_abort(char const * const)
0|1|mozcrt19.dll|
0|2|xul.dll|nsCycleCollectorGCHookRunnable::Run()
0|3|xul.dll|nsThread::ProcessNextEvent(int,int *)
0|4|xul.dll|nsCOMPtr_base::~nsCOMPtr_base()
0|5|xul.dll|nsThread::Shutdown()
0|6|nspr4.dll|
0|7|xul.dll|nsSocketTransportService::Shutdown()
0|8|xul.dll|nsDNSService::Shutdown()
0|9|xul.dll|nsIOService::SetOffline(int)

      1  stacks like
0|0|mozalloc.dll|mozalloc_abort(char const * const)
0|1|mozcrt19.dll|
0|2|xul.dll|mozilla::plugins::PPluginScriptableObjectChild::FatalError(char const * const)
0|3|xul.dll|mozilla::plugins::PPluginScriptableObjectChild::CallInvoke(mozilla::plugins::PPluginIdentifierChild *,InfallibleTArray&lt;mozilla::plugins::Variant&gt; const &amp;,mozilla::plugins::Variant *,bool *)
0|4|xul.dll|mozilla::plugins::PluginScriptableObjectChild::ScriptableInvoke(NPObject *,void *,_NPVariant const *,unsigned int,_NPVariant *)
0|5|xul.dll|mozilla::plugins::child::_invoke
0|6|npCoralIETab.dll|
0|7|npCoralIETab.dll|
0|8|npCoralIETab.dll|
0|9|oleaut32.dll|

      1  stacks like
0|0|mozalloc.dll|mozalloc_abort(char const * const)
0|1|mozalloc.dll|mozalloc_handle_oom()
0|2|xul.dll|nsXPCWrappedJSClass::CallQueryInterfaceOnJSObject(XPCCallContext &amp;,JSObject *,nsID const &amp;)
0|3|xul.dll|nsXPCWrappedJS::GetNewOrUsed(XPCCallContext &amp;,JSObject *,nsID const &amp;,nsISupports *,nsXPCWrappedJS * *)
0|4|xul.dll|XPCConvert::JSObject2NativeInterface(XPCCallContext &amp;,void * *,JSObject *,nsID const *,nsISupports *,unsigned int *)
0|5|xul.dll|XPCConvert::JSData2Native(XPCCallContext &amp;,void *,unsigned __int64,nsXPTType const &amp;,int,nsID const *,unsigned int *)
0|6|xul.dll|nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS *,unsigned short,XPTMethodDescriptor const *,nsXPTCMiniVariant *)
0|7|xul.dll|nsXPCWrappedJS::CallMethod(unsigned short,XPTMethodDescriptor const *,nsXPTCMiniVariant *)
0|8|xul.dll|PrepareAndDispatch
0|9|xul.dll|SharedStub

      1  stacks like
0|0|mozalloc.dll|mozalloc_abort(char const * const)
0|1|mozalloc.dll|mozalloc_handle_oom()
0|2|xul.dll|nsTimerImplConstructor
0|3|xul.dll|mozilla::GenericFactory::CreateInstance(nsISupports *,nsID const &amp;,void * *)
0|4|xul.dll|nsComponentManagerImpl::CreateInstanceByContractID(char const *,nsISupports *,nsID const &amp;,void * *)
0|5|xul.dll|nsCreateInstanceByContractID::operator()(nsID const &amp;,void * *)
0|6|xul.dll|
0|7|xul.dll|
0|8|xul.dll|nsExpirationTracker&lt;gfxFont,3&gt;::AddObject(gfxFont *)
0|9|xul.dll|gfxFont::NotifyReleased()

      1  stacks like
0|0|mozalloc.dll|mozalloc_abort(char const * const)
0|1|mozalloc.dll|mozalloc_handle_oom()
0|2|xul.dll|mozilla::plugins::PluginInstanceChild::CreateOptSurface()
0|3|xul.dll|mozilla::plugins::PluginInstanceChild::EnsureCurrentBuffer()
0|4|xul.dll|mozilla::plugins::PluginInstanceChild::ShowPluginFrame()
0|5|xul.dll|mozilla::plugins::PluginInstanceChild::InvalidateRectDelayed()
0|6|xul.dll|MessageLoop::RunTask(Task *)
0|7|xul.dll|MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const &amp;)
0|8|xul.dll|MessageLoop::DoWork()
0|9|xul.dll|base::MessagePumpForUI::DoRunLoop()

      1  stacks like
0|0|mozalloc.dll|mozalloc_abort(char const * const)
0|1|mozalloc.dll|mozalloc_handle_oom()
0|2|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)
0|3|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)
0|4|xul.dll|mozilla::FrameLayerBuilder::BuildContainerLayerFor(nsDisplayListBuilder *,mozilla::layers::LayerManager *,nsIFrame *,nsDisplayItem *,nsDisplayList const &amp;)
0|5|xul.dll|nsDisplayOwnLayer::BuildLayer(nsDisplayListBuilder *,mozilla::layers::LayerManager *)
0|6|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)
0|7|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)
0|8|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)
0|9|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)

      1  stacks like
0|0|mozalloc.dll|mozalloc_abort(char const * const)
0|1|mozalloc.dll|mozalloc_handle_oom()
0|2|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)
0|3|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)
0|4|xul.dll|mozilla::FrameLayerBuilder::BuildContainerLayerFor(nsDisplayListBuilder *,mozilla::layers::LayerManager *,nsIFrame *,nsDisplayItem *,nsDisplayList const &amp;)
0|5|xul.dll|nsDisplayList::PaintForFrame(nsDisplayListBuilder *,nsIRenderingContext *,nsIFrame *,unsigned int)
0|6|xul.dll|nsLayoutUtils::PaintFrame(nsIRenderingContext *,nsIFrame *,nsRegion const &amp;,unsigned int,unsigned int)
0|7|xul.dll|PresShell::Paint(nsIView *,nsIView *,nsIWidget *,nsRegion const &amp;,nsIntRegion const &amp;,int,int)
0|8|xul.dll|nsViewManager::RenderViews(nsView *,nsIWidget *,nsRegion const &amp;,nsIntRegion const &amp;,int,int)
0|9|xul.dll|nsViewManager::Refresh(nsView *,nsIWidget *,nsIntRegion const &amp;,unsigned int)

      1  stacks like
0|0|mozalloc.dll|mozalloc_abort(char const * const)
0|1|mozalloc.dll|mozalloc_handle_oom()
0|2|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)
0|3|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)
0|4|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)
0|5|xul.dll|mozilla::FrameLayerBuilder::BuildContainerLayerFor(nsDisplayListBuilder *,mozilla::layers::LayerManager *,nsIFrame *,nsDisplayItem *,nsDisplayList const &amp;)
0|6|xul.dll|nsDisplayOwnLayer::BuildLayer(nsDisplayListBuilder *,mozilla::layers::LayerManager *)
0|7|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)
0|8|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)
0|9|xul.dll|mozilla::`anonymous namespace'::ContainerState::ProcessDisplayItems(nsDisplayList const &amp;,mozilla::FrameLayerBuilder::Clip &amp;)

      1  stacks like
0|0|mozalloc.dll|mozalloc_abort(char const * const)
0|1|mozalloc.dll|mozalloc_handle_oom()
0|2|xul.dll|`anonymous namespace'::CSSParserImpl::ParseSelectorGroup(nsCSSSelectorList * &amp;)
0|3|xul.dll|`anonymous namespace'::CSSParserImpl::ParseSelectorList(nsCSSSelectorList * &amp;,unsigned short)
0|4|xul.dll|`anonymous namespace'::CSSParserImpl::ParseRuleSet(void (*)(nsICSSRule *,void *),void *,int)
0|5|xul.dll|`anonymous namespace'::CSSParserImpl::ParseGroupRule(nsICSSGroupRule *,void (*)(nsICSSRule *,void *),void *)
0|6|xul.dll|`anonymous namespace'::CSSParserImpl::ParseMozDocumentRule(void (*)(nsICSSRule *,void *),void *)
0|7|xul.dll|`anonymous namespace'::CSSParserImpl::ParseAtRule(void (*)(nsICSSRule *,void *),void *)
0|8|xul.dll|`anonymous namespace'::CSSParserImpl::Parse(nsIUnicharInputStream *,nsIURI *,nsIURI *,nsIPrincipal *,unsigned int,int)
0|9|xul.dll|mozilla::css::Loader::ParseSheet(nsIUnicharInputStream *,mozilla::css::SheetLoadData *,int &amp;)
All of the ones that crash in the cycle collector runnable seem to have extremely low uptimes.
I filed bug 633119 to differentiate each kind of crashes.
i put an updated list with a larger sample of 100 reports from 2011 02 08 at

http://people.mozilla.org/crash_stacks/reports/stack-summary-mozalloc_abort.char.const..const..txt

I guess the next step is to start spinning off a few bugs on at least of few of these top stacks.

the top cycle collector problems in the the reports do quite match the stack in comment zero in this bug so maybe this becomes the tracking bug.
This *is* one of the spinoff bugs. Please put your generic analysis in bug 627727 and not this bug.
Blocks: 627727
It seems pretty clear that we do have a bug here. It looks as if we are shutting down XPCOM without ever spinning the event loop (which can happen in various startup situations). When this happens, the cycle-collector startup event runs during shutdown. This means that when we ask for the js runtime service it fails, and we're aborting.

The observer topic that's being fired, oddly enough, is the profile-change-net-teardown topic, which happens well before XPCOM refuses to create services.

See https://crash-stats.mozilla.com/report/index/a53976b3-2d88-412b-b2be-fa2f62110210

It's possible that recursive layout-module init problems would prevent the js runtime service from being created, which would prevent XRE_main from actually getting going, which would then cause this bug. In which case the browser wouldn't work anyway...

We could of course just remove the abort.
I am nominating this one for 2.0 since there's definitely a spike in these crashes. bsmedberg, do we need to file a separate bug based on your comment #9.
blocking2.0: - → ?
So the runnable we're running is one that's only created once:  during XPCOM startup.

When we're crashing, we're already in shutdown but we haven't processed the event yet.

I think that means one of two things (although maybe there are more):

 (1) we shut down *immediately*, perhaps due to failure during startup

 (2) we were in some sort of nested event loop situation that prevented the event from ever being run (and thus cycle collector from ever running)

Given the uptimes, (1) seems more likely.



You can find a bunch of the crashes that are *this* bug at:
https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A4.0b11&query_search=signature&query_type=exact&query=&date=03%2F05%2F2011%2016%3A56%3A11&range_value=28&range_unit=days&hang_type=crash&process_type=browser&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=mozalloc_abort%28char%20const*%20const%29%20|%20mozcrt19.dll%400x1327f%20|%20nsCycleCollectorGCHookRunnable%3A%3ARun%28%29
although I suspect the skiplist state is going to change again (I commented in bug 633119).
It seems entirely possible that these are cases where NS_InitXPCOM2 failed.  We'd know for sure if we actually had debugging info for XRE_main (any idea why we don't?).
Is it ok if we change the failure cases in NS_InitXPCOM2 to be NS_RUNTIMEABORT calls instead of failure returns?
Summary: NS_RUNTIMEABORT crashes in cycle collector [@ mozalloc_abort(char const* const) ][@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f ] (was [@ mozcrt19.dll@0x1327f ]) → NS_RUNTIMEABORT crashes in cycle collector [@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run() ] [@ mozalloc_abort(char const* const) | nsCycleCollectorGCHookRunnable::Run() ]
(In reply to comment #11)
> So the runnable we're running is one that's only created once:  during XPCOM
> startup.
> 
> When we're crashing, we're already in shutdown but we haven't processed the
> event yet.

Yeah, I made this a NS_RUNTIMEABORT because I couldn't think of any way that this scenario (event created at startup can't access the JS runtime) would be possible without us wanting to abort.
Summary: NS_RUNTIMEABORT crashes in cycle collector [@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run() ] [@ mozalloc_abort(char const* const) | nsCycleCollectorGCHookRunnable::Run() ] → NS_RUNTIMEABORT crashes in cycle collector [@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run() ]
The XRE_main issues appear to be PGO optimizing away the bits we care about so that we don't have line numbers. Or perhaps because we're in the function epilog running destructors, it can't find a real line number.

I'd like to just remove the NS_RUNTIMEABORT for now...

smooney, are you sure there's a spike? We just changed the signature generation so that mozalloc_abort was broken down.
If you're in a cold function block, VC2005 doesn't emit line number info in the PDB files for the instructions there. It's fixed in VC2010, but there's nothing we can do about it currently.
bsmedberg - there has definitely been a uptick in this type of crash overall although there are all kinds of stacks causing this. We kind of need to try and get a handle on it. I did a quick search and there were 34 of these in Beta9. This is now consistently in the top 3 for Betas. I will try and get chofmann to run some kind of report for this signature. It seems to me like we are seeing more of these signatures but let's get some better data, perhaps comparing Beta10 and Beta11.

chofmann, any chance you can put together a trend report for nsCycleCollectorGCHookRunnable::Run(). Specifically a jump between Beta9, Beta10 and Beta11? Because of the BuiltFontList bug, perhaps it poluted the stats.
> It seems to me like we are seeing more of these signatures but let's get some 
> better data, perhaps comparing Beta10 and Beta11.
It is normal because of the fixing of bug 633119. We have currently only one day of feedback.
Removing the NS_RUNTIMEABORT seems bad; if users are in a situation where Firefox won't start, we're better off getting a crash report about it than not getting one and just having them silently fail to start.
Scoobidiver - sorry, what's normal.
And if we ever get into a situation in the future where this happens outside of shutdown silently not starting up the cycle collector seems pretty bad too.
(In reply to comment #19)
> Removing the NS_RUNTIMEABORT seems bad; if users are in a situation where
> Firefox won't start, we're better off getting a crash report about it than not
> getting one and just having them silently fail to start.

I'm in full agreement. The bug is elsewhere.
In reply to comment 20
> Scoobidiver - sorry, what's normal.
It appears as a new crash signature from 2/10/2011 11:00PST (see https://bugzilla.mozilla.org/show_bug.cgi?id=633119#c10) whatever the build (b9, b10, b11, b12pre). The previous crash signature was mozalloc_abort(char const* const).
There have been around 450 crashes in one day, that means 1350 crashes/3-days, so it will be #1-2 top crasher over the last 3 days in 2 days.

It is a startup crash.
Summary: NS_RUNTIMEABORT crashes in cycle collector [@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run() ] → NS_RUNTIMEABORT crashes in cycle collector [@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run() ] close to startup
bent/dbaron, what do you suggest we do to solve the underlying problem, then? Without STR, we could try sprinkling runtime aborts into a bunch of places, but I really don't think that NS_InitXPCOM is failing. I suspect that the layout module ctor is failing, perhaps because of recursive reentry: I know we've seen that in the past.
Assignee: nobody → benjamin
Status: NEW → ASSIGNED
Attachment #511816 - Flags: review?(bent.mozilla)
I'd probably ship with this, but perhaps we can figure it out. The patch I just attached needs to land for a beta so we can figure out whether it's being hit in the wild.
blocking2.0: ? → final+
Whiteboard: [softblocker]
Comment on attachment 511816 [details] [diff] [review]
Abort on recursive layout-module init, rev. 1

Fingers crossed!
Attachment #511816 - Flags: review?(bent.mozilla) → review+
> so it will be #1-2 top crasher over the last 3 days in 2 days.
It was beaten by other top changers, so it is #5 top crasher in 4.0b11 over the last week.

> The patch I just attached needs to land for a beta so we can figure out whether
> it's being hit in the wild.
check-in needed?
The patch here landed Friday: http://hg.mozilla.org/mozilla-central/rev/b59521d4350d
The latest crashes took place in 4.0b12pre/20110212.
It seems to be fixed.
We should be seeing a new crash signature, something like mozalloc_abort | Initialize from the new runtime abort which has replaced the other one, but I can't find it...
> We should be seeing a new crash signature
Something like that:
https://crash-stats.mozilla.com/report/list?product=Firefox&range_value=4&range_unit=weeks&signature=mozalloc_abort%28char%20const*%20const%29%20|%20MOZCRT19.dll%400x1327f
The skip list is upper case sensitive: MOZCRT19.dll is not skipped.
Summary: NS_RUNTIMEABORT crashes in cycle collector [@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run() ] close to startup → NS_RUNTIMEABORT crashes in cycle collector [@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run() ][@ mozalloc_abort(char const* const) | MOZCRT19.dll@0x1327f ] close to startup
Crash reports with "App Notes" that contain

xpcom_runtime_abort(###!!! ABORT: This must never fail!: file e:/builds/moz2_slave/cen-win64-ntly/build/xpcom/base/nsCycleCollector.cpp, line 3429)

are probably this bug.
the distribution of signatures that contain that comment in the app notes looks like this for the has 5 days.

 866 20110213-crashdata.csv:mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run()
 814 20110214-crashdata.csv:mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run()
 579 20110211-crashdata.csv:mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run()
 451 20110212-crashdata.csv:mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run()
 217 20110210-crashdata.csv:mozalloc_abort(char const* const)
 215 20110210-crashdata.csv:mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run()
  56 20110214-crashdata.csv:\N
  49 20110212-crashdata.csv:\N
  32 20110211-crashdata.csv:\N
  13 20110213-crashdata.csv:\N
  13 20110210-crashdata.csv:\N
  11 20110213-crashdata.csv:mozalloc_abort(char const* const) | MOZCRT19.dll@0x1327f
   9 20110214-crashdata.csv:mozalloc_abort(char const* const) | MOZCRT19.DLL@0x1327f
   9 20110212-crashdata.csv:mozalloc_abort(char const* const) | MOZCRT19.dll@0x1327f
   8 20110213-crashdata.csv:mozalloc_abort(char const* const) | NS_DebugBreak_P
   6 20110214-crashdata.csv:linux-gate.so@0x416
   6 20110210-crashdata.csv:mozalloc_abort(char const* const) | MOZCRT19.dll@0x1327f
   4 20110212-crashdata.csv:mozalloc.dll@0x1a39
   4 20110210-crashdata.csv:linux-gate.so@0x416
   3 20110211-crashdata.csv:mozalloc_abort(char const* const) | MOZCRT19.dll@0x1327f
   2 20110214-crashdata.csv:mozalloc_abort(char const* const) | NS_DebugBreak_P
   2 20110214-crashdata.csv:mozalloc_abort(char const* const) | MOZCRT19.dll@0x1327f
   2 20110213-crashdata.csv:linux-gate.so@0x422
   2 20110213-crashdata.csv:NS_StackWalk
   2 20110210-crashdata.csv:mozalloc.dll@0x1a39
   1 20110214-crashdata.csv:mozalloc.dll@0x1a39
   1 20110213-crashdata.csv:linux-gate.so@0x416
   1 20110210-crashdata.csv:mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | xul.dll@0x3ae087
   1 20110210-crashdata.csv:mozalloc_abort(char const* const) | NS_DebugBreak_P
   1 20110210-crashdata.csv:mozalloc_abort
   1 20110210-crashdata.csv:linux-gate.so@0x422
Attachment #512818 - Flags: review?
Attachment #512818 - Flags: review?(mrbkap)
Attachment #512818 - Flags: review?(bent.mozilla)
Attachment #512818 - Flags: review?
Comment on attachment 512818 [details] [diff] [review]
Abort on failure to create XPCJSRuntime, rev. 1

I like it!
Attachment #512818 - Flags: review?(bent.mozilla) → review+
Comment on attachment 512818 [details] [diff] [review]
Abort on failure to create XPCJSRuntime, rev. 1

I'm fine with the patch as a whole, but a couple of nits:

Instead of anything resembling K&R, XPConnect style is
if(blah)
{
}
else
{
}

note no space between the if and its parenthesis.
Attachment #512818 - Flags: review?(mrbkap) → review+
Argh, the diagnostic already landed as http://hg.mozilla.org/mozilla-central/rev/404386c7d40f

I'll fix the nits tomorrow if it turns out to be a useful permanent diagnostic.
From Feb-16 build, I don't see any crashes whatever the crash signature.
We should get NS_DebugBreak_P on the append list, right?
> We should get NS_DebugBreak_P on the append list, right?
See bug 635483.
This has now moved up to #2 crash for Beta11.
Based on the comments it looks like the people who are hitting this are hitting it constantly.
Depends on: 635483
Renominating based on that data. Boy, I have no clue what's going on here, since the latest set of debug aborts should have caught everything I could think of.
blocking2.0: final+ → ?
Whiteboard: [softblocker]
blocking2.0: ? → final+
Whiteboard: [hardblocker]
bent and I could look at it if you want.
Yeah, I'll take this.
Assignee: benjamin → bent.mozilla
Target Milestone: --- → mozilla2.0b11
Summary: NS_RUNTIMEABORT crashes in cycle collector [@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run() ][@ mozalloc_abort(char const* const) | MOZCRT19.dll@0x1327f ] close to startup → Crashes in cycle collector [@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run() ][@ mozalloc_abort(char const* const) | MOZCRT19.dll@0x1327f ][@ mozalloc_abort(char const* const) | NS_DebugBreak_P ]
Severity: normal → critical
Attached patch Patch, v1 (obsolete) — Splinter Review
Ok, this moves all the logic that ensures GC has run before CC from the CC to XPConnect.
Attachment #511816 - Attachment is obsolete: true
Attachment #512818 - Attachment is obsolete: true
Attachment #514332 - Flags: review?(jst)
Attachment #514332 - Flags: review?(jst) → review+
Attached patch Patch, v1.1 (obsolete) — Splinter Review
Oops, that patch had a startup problem. This fixes us by only calling CC after GC has run.
Attachment #514332 - Attachment is obsolete: true
Attachment #514385 - Flags: review?(gal)
Attached patch Patch, v1.2 (obsolete) — Splinter Review
Slightly less complex
Attachment #514385 - Attachment is obsolete: true
Attachment #514387 - Flags: review?(gal)
Attachment #514385 - Flags: review?(gal)
Attached patch Patch, v1.3Splinter Review
Ugh, now GC run checked in the right place.
Attachment #514387 - Attachment is obsolete: true
Attachment #514389 - Flags: review?(gal)
Attachment #514387 - Flags: review?(gal)
Comment on attachment 514389 [details] [diff] [review]
Patch, v1.3

Cool. Thanks!
Attachment #514389 - Flags: review?(gal) → review+
bent, want to land this on tracemonkey?
Whiteboard: [hardblocker] → [hardblocker][has patch]
http://hg.mozilla.org/tracemonkey/rev/725b8cce5c72
Whiteboard: [hardblocker][has patch] → [hardblocker][has patch], fixed-in-tracemonkey
Landed in mozilla-central:

http://hg.mozilla.org/mozilla-central/rev/285a09f4cc49
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Summary: Crashes in cycle collector [@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run() ][@ mozalloc_abort(char const* const) | MOZCRT19.dll@0x1327f ][@ mozalloc_abort(char const* const) | NS_DebugBreak_P ] → Startup crash [@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run() ][@ mozalloc_abort(char const* const) | NS_DebugBreak_P | nsCycleCollectorGCHookRunnable::Run() ]
Crash Signature: [@ mozalloc_abort(char const* const) | mozcrt19.dll@0x1327f | nsCycleCollectorGCHookRunnable::Run() ] [@ mozalloc_abort(char const* const) | NS_DebugBreak_P | nsCycleCollectorGCHookRunnable::Run() ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: