Closed Bug 710922 Opened 11 years ago Closed 11 years ago

Firefox Crash [@ JSC::ExecutableAllocator::sizeOfCode ]

Categories

(Core :: JavaScript Engine, defect)

11 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla15
Tracking Status
firefox14 --- verified
firefox15 --- verified
blocking-fennec1.0 --- +

People

(Reporter: marcia, Assigned: n.nethercote)

References

()

Details

(4 keywords, Whiteboard: [native-crash])

Crash Data

Attachments

(1 file)

Seen while running the latest trunk build Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0a1) Gecko/20111214 Firefox/11.0a1.

STR:

1. Load the site in the url field.
2. Wait a few seconds.
3. Crash.

https://crash-stats.mozilla.com/report/index/bp-7a764f39-8153-4ff8-af69-479702111214

Frame 	Module 	Signature [Expand] 	Source
0 	XUL 	JSC::ExecutableAllocator::sizeOfCode 	
1 	XUL 	JSThread::sizeOfIncludingThis 	js/src/jscntxt.cpp:161
2 	XUL 	mozilla::xpconnect::memory::CollectCompartmentStatsForRuntime 	js/xpconnect/src/XPCJSRuntime.cpp:1592
3 	XUL 	CollectRuntimeStatsRunnable::WorkerRun 	dom/workers/WorkerPrivate.cpp:1430
4 	XUL 	mozilla::dom::workers::WorkerRunnable::Run 	dom/workers/WorkerPrivate.cpp:1612
5 	XUL 	mozilla::dom::workers::WorkerPrivate::DoRunLoop 	dom/workers/WorkerPrivate.cpp:2442
6 	XUL 	WorkerThreadRunnable::Run 	dom/workers/RuntimeService.cpp:310
7 	XUL 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:625
8 	XUL 	NS_ProcessNextEvent_P 	obj-firefox/x86_64/xpcom/build/nsThreadUtils.cpp:245
9 	XUL 	nsThread::ThreadFunc 	xpcom/threads/nsThread.cpp:273
10 	libnspr4.dylib 	_pt_root 	nsprpub/pr/src/pthreads/ptthread.c:187
11 	libsystem_c.dylib 	libsystem_c.dylib@0x4e8be 	
12 	libsystem_c.dylib 	libsystem_c.dylib@0x51b74 	
13 	libnspr4.dylib 	PR_JoinThread 	nsprpub/pr/src/pthreads/ptthread.c:577
https://crash-stats.mozilla.com/report/list?signature=JSC::ExecutableAllocator::sizeOfCode shows Mac and Linux crashes, but there appears to be a similar Windows signature as well: 
[@ JSC::ExecutableAllocator::sizeOfCode(unsigned int*, unsigned int*, unsigned int*) ] 

The first Mac crash shows up using 2011120800, while the first Windows crash shows up using 2011120700, so I guess that will help us narrow down the regression range.
Crash Signature: [@ JSC::ExecutableAllocator::sizeOfCode ] → [@ JSC::ExecutableAllocator::sizeOfCode ] [@ JSC::ExecutableAllocator::sizeOfCode(unsigned int*, unsigned int*, unsigned int*) ]
OS: Mac OS X → All
Hardware: x86 → All
I can't reproduce with the STR in comment 0 in 12.0a1/20120121 and 11.0a2/20120121.

Anyway, there are still crashes from time to time.
It also happens in FennecAndroid/12.0a1.
Keywords: regression
Whiteboard: [native-crash]
Version: 9 Branch → 11 Branch
I remove the native-crash keyword as it seems related to crashes exclusively on mobile. See http://bit.ly/vO7SxB that excludes top crashers with this keyword.
Whiteboard: [native-crash]
Can someone clarify the native-crash keyword meaning when applied to a platform that includes ARM?
For top crashers, it means "only in native". See https://wiki.mozilla.org/Platform/2012-01-31#Desktop and https://wiki.mozilla.org/Platform/2012-01-31#Mobile_2
For non top crashers, it means "also in native".
I see this on Windows 7, with Aurora 12.0a2 from 2012-03-11, from the same URL.
I was able to reproduce this using Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:13.0) Gecko/20120312 Firefox/13.0a1. I opened the site in the URL, interacted with it, and then when I closed the tab by clicking on the "x" I crashed: https://crash-stats.mozilla.com/report/index/bp-0e2762c4-1a80-4b27-811b-c773f2120312
Attached patch patchSplinter Review
I've worked this out.  Steps to reproduce reliably:

- Open arborjs.org in one tab.

- Open about:memory in another tab.

- Crash.

Sometimes I get the sizeOfCode() crash, sometimes I get a crash in pthread_cond_wait.  I'm pretty sure it's two different manifestations of the same defect.

It happens reliably in Aurora (FF14) and Beta (FF13).  It doesn't reproduce in FF15, but I think that's just luck, the code in question hasn't changed.

The problem is simple:  in ExecutableAllocator::sizeOfCode() we call m_pools.all() without first checking if m_pools.initialized().  The patch is trivial, and I added one JS_ASSERT(m_pools.initialized()) assertion elsewhere to protect against similar problems.  (In that place where I added the assertion, m_pools should always be initialized.)

With the patch applied I can't reproduce the crash on Aurora or Beta.

I've filed bug 760337 to add more assertions to HashTable.h so that problems of this nature will assert in debug builds in the future.

BTW, if you look at "hg blame" my fingerprints are all over sizeOfCode().  If you see a crash in code like that, please CC me on the bug!
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Attachment #629019 - Flags: review?(luke)
Attachment #629019 - Flags: review?(luke) → review+
Comment on attachment 629019 [details] [diff] [review]
patch


[Approval Request Comment]
This bug is a Fennec top crasher and a Fennec 1.0 soft blocker (see bug 739640).  The same code is used on desktop and it goes at least as far back as 10, so I think it's worth taking for Aurora, Beta and ESR.

User impact if declined: Null-deref crashes, quite frequent on some platforms (i.e. Fennec beta).

Fix Landed on Version: Only mozilla-inbound so far.

Risk to taking this patch (and alternatives if risky):  Negligible;  patch is trivial.

String or UUID changes made by this patch:  None.
Attachment #629019 - Flags: approval-mozilla-esr10?
Attachment #629019 - Flags: approval-mozilla-beta?
Attachment #629019 - Flags: approval-mozilla-aurora?
> User impact if declined: Null-deref crashes, quite frequent on some
> platforms (i.e. Fennec beta).

Just to clarify:  if telemetry is enabled, these crashes can occur every time the telemetry ping occurs, which is once per minute.  If telemetry is disabled, these crashes can only occur when about:memory is viewed (or possibly if an extension that reads the memory reporters, such as about:nosy, is installed).
Nominating since a soft was duped to this.
Group: mozilla-confidential
blocking-fennec1.0: --- → ?
The topcrash keyword is only for FennecAndroid.
Component: DOM → JavaScript Engine
Keywords: topcrash
QA Contact: general → general
Whiteboard: [leave open] → [leave open][native-crash]
Group: mozilla-confidential
https://hg.mozilla.org/mozilla-central/rev/31bc7fbde2c0
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla15
Sorry, with asking about the group change, forgot the whiteboard annotation.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → ASSIGNED
Comment on attachment 629019 [details] [diff] [review]
patch

[Triage Comment]
Not a top crasher on FF13 or ESR10. Leaving nom'd for Aurora, since it does affect Fennec Native.
Attachment #629019 - Flags: approval-mozilla-esr10?
Attachment #629019 - Flags: approval-mozilla-esr10-
Attachment #629019 - Flags: approval-mozilla-beta?
Attachment #629019 - Flags: approval-mozilla-beta-
blocking-fennec1.0: ? → soft
~200 crashes on beta 3 = release blocker.
blocking-fennec1.0: soft → +
Comment on attachment 629019 [details] [diff] [review]
patch

[Triage Comment]
Approved for Beta 14, however. We are now post merge.
Attachment #629019 - Flags: approval-mozilla-beta-
Attachment #629019 - Flags: approval-mozilla-beta+
Attachment #629019 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/releases/mozilla-beta/rev/48d6aeca20a3

This landed on mozilla-central (which was FF15 at the time) a few days ago, and has now landed on mozilla-beta (which is now FF14).  I think that's all I need to do, please let me know if that's wrong.

(Hmm, should the target milestone be changed from mozilla15 to mozilla14?)
Whiteboard: [leave open][native-crash] → [native-crash]
I think the target milestone is fine (I've seen different people say different things about whether it should track initial landing versus followup), but you should set the "affected" flags as appropriate.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Mozilla/5.0 (Windows NT 6.1; rv:14.0) Gecko/20100101 Firefox/14.0

When trying to reproduce the original crash I got a different signature then the ones related to this bug:
https://crash-stats.mozilla.com/report/index/bp-e1dee497-88ad-4b8a-80d8-139822120628

I also found a crash in Socorro that happened after the patch for mozilla-beta landed:
https://crash-stats.mozilla.com/report/index/3b01e2d7-6bf7-49ca-913a-a055a2120618

But I could not reproduce the crash on the latest Nightly (Firefox 16.0a1) nor the latest beta (Firefox 14 beta 9).

Is this enough to set this as Verified on Firefox 14?
Keywords: verifyme
(In reply to Simona B [QA] from comment #24)
> Please advise, are this crashes related with this bug?

They certainly seem to be. The most recent version in those reports is Firefox 15.0b4 which would certainly indicate this might not be fixed, or that there's a different code-path triggering this signature. Can someone from Engineering or Crashkill look into this?
The original problem was an obvious defect, easily triggered, and had a simple fix.  Can we please open a new bug for the remaining crashes?
Filed Bug 785009. 

Based on the above comments, setting tracking flags for Firefox 14 and 15 to verified.

Thank you Nicholas for looking over these crashes.
Keywords: verifyme
QA Contact: simona.marcu
You need to log in before you can comment on or make changes to this bug.