Closed Bug 744175 Opened 12 years ago Closed 7 years ago

Firefox crash @ js::gc::MarkInternal

Categories

(Core :: JavaScript Engine, defect)

13 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox13 - ---

People

(Reporter: marcia, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

Seen while looking at crash stats. https://crash-stats.mozilla.com/report/list?signature=js::gc::MarkInternal%3CJSString%3E%28JSTracer*,%20JSString*%29.  Will hunt down some manual correlations as a next step.

https://crash-stats.mozilla.com/report/index/2214c667-d25d-44b6-a8a0-87b8d2120410

Frame 	Module 	Signature 	Source
0 	mozjs.dll 	js::gc::MarkInternal<JSString> 	js/src/jsgcmark.cpp:107
1 	mozjs.dll 	js::gc::MarkStringUnbarriered 	js/src/jsgcmark.cpp:212
2 	mozjs.dll 	JSFunction::trace 	js/src/jsfun.cpp:1100
3 	mozjs.dll 	fun_trace 	js/src/jsfun.cpp:1113
4 	mozjs.dll 	js::GCMarker::processMarkStackTop 	js/src/jsgcmark.cpp:1068
5 	mozjs.dll 	js::GCMarker::drainMarkStack 	js/src/jsgcmark.cpp:1111
6 	mozjs.dll 	MarkAndSweep 	js/src/jsgc.cpp:3283
7 	mozjs.dll 	GCCycle 	js/src/jsgc.cpp:3634
8 	mozjs.dll 	Collect 	js/src/jsgc.cpp:3716
9 	mozjs.dll 	js::GCSlice 	js/src/jsgc.cpp:3742
10 	mozjs.dll 	js::IncrementalGC 	js/src/jsfriendapi.cpp:158
11 	xul.dll 	nsXPConnect::Collect 	js/xpconnect/src/nsXPConnect.cpp:422
12 	xul.dll 	nsXPConnect::GarbageCollect 	js/xpconnect/src/nsXPConnect.cpp:432
13 	xul.dll 	nsJSContext::GarbageCollectNow 	dom/base/nsJSEnvironment.cpp:3059
14 	xul.dll 	GCTimerFired 	dom/base/nsJSEnvironment.cpp:3181
15 	xul.dll 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:508
16 	xul.dll 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:591
17 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:657
18 	nspr4.dll 	_MD_CURRENT_THREAD 	nsprpub/pr/src/md/windows/w95thred.c:308
19 	xul.dll 	nsSystemPrincipal::QueryInterface 	caps/src/nsSystemPrincipal.cpp:57
20 	xul.dll 	xul.dll@0xb8e547 	
21 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:175
22 	xul.dll 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:189
23 	xul.dll 	nsAppShell::Run 	widget/windows/nsAppShell.cpp:252
24 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:295
25 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3703
26 	msvcr100.dll 	msvcr100.dll@0x8b581 	
27 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:107
28 	msvcr100.dll 	_initterm 	f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\crt0dat.c:872
29 	firefox.exe 	__tmainCRTStartup 	crtexe.c:552
30 	xul.dll 	nsCacheEntryDescriptor::QueryInterface 	netwerk/cache/nsCacheEntryDescriptor.cpp:55
31 	firefox.exe 	_SEH_epilog4 	
32 	kernel32.dll 	GetCodePageFileInfo 	
33 	kernel32.dll 	BaseProcessStart 	
34 	firefox.exe 	pre_c_init 	crtexe.c:261
I took a quick look and I don't see any manual correlations. Will do a little more digging.

This crash signature seems new to Aurora.
I see IncrementalGC in the stack trace. Didn't we pref it off by default?
Having IncrementalGC on the stack unfortunately doesn't mean it's an incremental GC. In this case, the fact that MarkAndSweep is on the stack means that it's not an incremental GC. If it were, it would have IncrementalGCSlice on the stack in place of MarkAndSweep.

Also, I think this is just a signature change from previous GC crashes. We introduced MarkInternal in FF13. In FF12 it would have crashed in MarkStringUnbarriered, or possibly PushMarkStack, depending on inlining.
Crash Signature: [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString*) ] → [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString*) ] [@ js::gc::MarkInternal<JSFlatString>(JSTracer*, JSFlatString*)] [@ js::gc::MarkInternal<JSLinearString>(JSTracer*, JSLinearString**)] [@ js::gc::MarkInternal<js::ArgumentsObject>(JSTracer* js::…
Summary: Firefox crash [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString*) ] → Firefox crash @ js::gc::MarkInternal
Bug 750315 was duped here - that was a 13 top crash so tracking this bug for FF13.

(In reply to Bill McCloskey (:billm) from comment #3)
> Also, I think this is just a signature change from previous GC crashes. We
> introduced MarkInternal in FF13. In FF12 it would have crashed in
> MarkStringUnbarriered, or possibly PushMarkStack, depending on inlining.

What bug do you think was tracking the old stack signature?
The original bug tracking this was bug 628105. It has gone through a number of signature changes since then. The meta-bug for these GC mark crashes is bug 613650.
RE: qawanted
I'm not sure where to begin. I assume we're looking for a reproducible case but comment 1 seems to indicate no correlations as of yet. Any assistance/advice is appreciated.
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #7)
> RE: qawanted
> I'm not sure where to begin. I assume we're looking for a reproducible case
> but comment 1 seems to indicate no correlations as of yet. Any
> assistance/advice is appreciated.

See https://bugzilla.mozilla.org/show_bug.cgi?id=750315#c3. Bug 750315 was just duped here recently.
So far I've been unable to reproduce this crash with the following add-ons installed in Firefox 13.0b1 in Windows XP:

Ask Toolbar 3.14.1.100009
avast! WebRep 7.0.1426 
Feedback 1.1.2
Greasemonkey 0.9.18
Java Console 6.0.31
Java Quick Starter 1.0
Nightly Tester Tools 3.2.2
Yandex.Bar 6.8
Microsoft .NET Framework Assistant 0.0.0

Note that bug 750315 mentions "neobux toolbar" however I was unable to get it to successfully install in Firefox due to incompatibility.

Any suggestions about how I can test this further?
With combined signatures, it's #19 top browser crasher in 13.0b1.

Correlations per extension vary each day and are low:
  js::gc::MarkInternal<JSFlatString>(JSTracer*, JSFlatString*)|EXCEPTION_ACCESS_VIOLATION_READ (65 crashes)
     12% (8/65) vs.   3% (756/22691) mozilla_cc@internetdownloadmanager.com (IDM CC, https://addons.mozilla.org/addon/6973)
Crash Signature: [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString*) ] [@ js::gc::MarkInternal<JSFlatString>(JSTracer*, JSFlatString*)] [@ js::gc::MarkInternal<JSLinearString>(JSTracer*, JSLinearString**)] [@ js::gc::MarkInternal<js::ArgumentsObject>(JSTracer* js::… → [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString*)] [@ js::gc::MarkInternal<JSFlatString>(JSTracer*, JSFlatString*)] [@ js::gc::MarkInternal<JSLinearString>(JSTracer*, JSLinearString**)] [@ js::gc::MarkInternal<js::ArgumentsObject>(JSTracer* js::A…
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #9)
> Any suggestions about how I can test this further?

Thanks Anthony - let's not do any more exploratory testing and focus on the engineering side of things.

(In reply to Bill McCloskey (:billm) from comment #3)
> Having IncrementalGC on the stack unfortunately doesn't mean it's an
> incremental GC. In this case, the fact that MarkAndSweep is on the stack
> means that it's not an incremental GC. If it were, it would have
> IncrementalGCSlice on the stack in place of MarkAndSweep.
> 
> Also, I think this is just a signature change from previous GC crashes. We
> introduced MarkInternal in FF13. In FF12 it would have crashed in
> MarkStringUnbarriered, or possibly PushMarkStack, depending on inlining.

Bill - how can we verify that there isn't a new regression in FF13 causing a new spike? It just seems to me that chalking this up to bug 613650 will cause iteratively worse quality over time.
Keywords: qawanted
(In reply to Alex Keybl [:akeybl] from comment #11)
> Bill - how can we verify that there isn't a new regression in FF13 causing a
> new spike? It just seems to me that chalking this up to bug 613650 will
> cause iteratively worse quality over time.

The only way to do this would be to lump all the marking crashes into a single mega-signature and track its volume over time. I think a while ago Sheila talked about being able to generate custom reports for this sort of thing. I can compile a list of signatures that account for the main marking crashes. However, we need some sort of process for actually tracking this data over time. Otherwise we'll just forget about it like we did last time.

However, the larger issue here is that we have no way to get any traction on these crashes. This one with fun_trace crashing in MarkId has been happening literally for years. I spent a month or two working exclusively on this about a year ago and I got nowhere. We have never had a reproducible bug report. Crash dumps contain too little information to debug the problem.
As Bill says in comment 12, all of our experience to date is that GC topcrashes are not a productive use of time. Bill has tried many tricks, and I've tried a few myself, but we were never able to gain any indication at all as to what causes these crashes. I also don't think it's going to be possible to verify conclusively that these aren't regressing.

Not quite all hope is lost: We are doing lots of work on the GC this year, so there is some chance we will accidentally shake out a few bugs.
A basic problem is that these aren't even necessarily GC bugs.  I've come across a number of cycle collector crashes that are bugs elsewhere.  The GC and CC are often the first thing to scan mangled memory, so all sorts of bugs elsewhere in the browser, or in addons or extensions, can show up as these crashes.  The cycle collector is nice because at least some of the crashes point back at the buggy class, but the GC doesn't even have that.
Thanks for the in depth explanation Bill, Dave, and Andrew.

(In reply to David Mandelin from comment #13)
> Not quite all hope is lost: We are doing lots of work on the GC this year,
> so there is some chance we will accidentally shake out a few bugs.

Good to hear :)

(In reply to Bill McCloskey (:billm) from comment #12)
> Crash dumps contain too little information to debug the problem.

(In reply to Andrew McCreight [:mccr8] from comment #14)
> The cycle collector is nice because at least some of the
> crashes point back at the buggy class, but the GC doesn't even have that.

Do we have a bug on file that can get us closer to actionable crash data for GC crashes?
(In reply to Alex Keybl [:akeybl] from comment #15)
> Do we have a bug on file that can get us closer to actionable crash data for
> GC crashes?
Bill spent a few months on getting better crash info last year, but I think it didn't really turn up anything.  Fundamentally, there's just a lot more information available about stuff the CC looks at.  It is just the nature of C++ vs JS.
(In reply to Andrew McCreight [:mccr8] from comment #16)
> (In reply to Alex Keybl [:akeybl] from comment #15)
> > Do we have a bug on file that can get us closer to actionable crash data for
> > GC crashes?
> Bill spent a few months on getting better crash info last year, but I think
> it didn't really turn up anything.  Fundamentally, there's just a lot more
> information available about stuff the CC looks at.  It is just the nature of
> C++ vs JS.

OK thanks - sounds like our strategy is to hope that hardening can occur organically while we work more on GC. I agree that given the time spent on the problem already, that's our best option.
Blocks: 754279
No longer blocks: 754279
Crash Signature: [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString*)] [@ js::gc::MarkInternal<JSFlatString>(JSTracer*, JSFlatString*)] [@ js::gc::MarkInternal<JSLinearString>(JSTracer*, JSLinearString**)] [@ js::gc::MarkInternal<js::ArgumentsObject>(JSTracer* js::A… → [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString*)] [@ js::gc::MarkInternal<JSFlatString>(JSTracer*, JSFlatString*)] [@ js::gc::MarkInternal<JSLinearString>(JSTracer*, JSLinearString**)] [@ js::gc::MarkInternal<JSLinearString>(JSTracer* JSLinearSt…
Blocks: 754279
Crash Signature: [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString*)] [@ js::gc::MarkInternal<JSFlatString>(JSTracer*, JSFlatString*)] [@ js::gc::MarkInternal<JSLinearString>(JSTracer*, JSLinearString**)] [@ js::gc::MarkInternal<JSLinearString>(JSTracer* JSLinearSt… → [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString*)] [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString**)] [@ js::gc::MarkInternal<JSFlatString>(JSTracer*, JSFlatString*)] [@ js::gc::MarkInternal<JSLinearString>(JSTracer* JSLinearString**)] [@…
Blocks: 768425
Crash Signature: [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString*)] [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString**)] [@ js::gc::MarkInternal<JSFlatString>(JSTracer*, JSFlatString*)] [@ js::gc::MarkInternal<JSLinearString>(JSTracer* JSLinearString**)] [@… → [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString*)] [@ js::gc::MarkInternal<JSString>(JSTracer*, JSString**)] [@ js::gc::MarkInternal<JSFlatString>(JSTracer*, JSFlatString*)] [@ js::gc::MarkInternal<JSFlatString>(JSTracer* JSFlatString**)] [@ js:…
So it seems that one of our internal web-applications has some JS that keeps triggering this crash for us. The issue has persisted multiple OS formats and re-installs.

The bug seems to be somewhat reproducible for us, i.e. 10-15 minutes on our internal application and it will mostly crash.

So if you need any input, logs, files, or absolutely anything else - let us know!
It's a low volume crash in 15.0.1 and above.
Keywords: topcrash
Don't know if this helps, but I crashed my firefox by uploading a huge file (4.5GB) to MEGA.

Got the following crash report: https://crash-stats.mozilla.com/report/index/bp-1d4e9b78-c047-44ed-b7a7-c60672130127
Blocks: 846282
(In reply to Andrew McCreight [:mccr8] from comment #16)
> (In reply to Alex Keybl [:akeybl] from comment #15)
> > Do we have a bug on file that can get us closer to actionable crash data for
> > GC crashes?
> Bill spent a few months on getting better crash info last year, but I think
> it didn't really turn up anything.  Fundamentally, there's just a lot more
> information available about stuff the CC looks at.  It is just the nature of
> C++ vs JS.

Is this still the case? Or are there new strategies to help narrow the source of crash?

I have filed bug 846282 for thunderbird, because there is significant divergence of thunderbird and firefox crash rate
Flags: needinfo?(continuation)
Bill turned up release mode compartment assertions, but nothing seemed particularly frequent aside from a printing-related problem with addons.
Flags: needinfo?(continuation)
Assignee: general → nobody
Crash Signature: , js::types::TypeObject*)] [@ js::gc::MarkInternal<JSObject>(JSTracer*, JSObject**)] → , js::types::TypeObject*)] [@ js::gc::MarkInternal<JSObject>(JSTracer*, JSObject**)] [@ js::gc::MarkInternal<T>]
I'm marking this bug as WORKSFORME as bug crashlog signature didn't appear from a long time (over half year) in Firefox (except some obsolete Fx <18, no crashes starting since Fx18).
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.