Crash in NS_StackWalk

RESOLVED FIXED in mozilla16

Status

()

Core
Gecko Profiler
--
critical
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: vladan, Assigned: Ehsan)

Tracking

({crash, regression})

16 Branch
mozilla16
x86
Windows 7
crash, regression
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(crash signature)

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
Breakpad report:

https://crash-stats.mozilla.com/report/index/bp-d76c3a07-8f98-40a4-9bfd-7f4192120622

Comment 1

5 years ago
It first appeared in 16.0a1/20120619. The regression range might be:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b1a0fb2bdbf7&tochange=373e6f9264e6
Crash Signature: [@ NS_StackWalk]

Updated

5 years ago
Severity: normal → critical
Keywords: crash, regression
Hardware: x86_64 → x86
(Reporter)

Comment 2

5 years ago
(In reply to Scoobidiver from comment #1)
> It first appeared in 16.0a1/20120619. The regression range might be:
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=b1a0fb2bdbf7&tochange=373e6f9264e6

You're right, this bug depends on frame pointers being turned so it only would have caused an uptick in crash reports when we enabled frame pointers on mozilla-central Nightly. The bug itself has been around for a while though. Ehsan has an idea about the root causes and is working on a patch.

Comment 3

5 years ago
I hit this reliably on ionmonkey nightly builds of 21 and 22 June (post- bug #764216 merge-to-ion), brand new profile with only Gecko Profiler extension installed and the stack walking option turned on, running kraken benchmark.

That setup crashes every time by/at the end of the benchmark. Stack walking turned off doesn't crash. Mozilla central nightly builds don't crash, regardless of stackwalk option. 

32 bit Windows Vista.
What happens here is that if we have more than 1024 frames on the stack to walk, WalkStackMain64 starts to increment pc_count past that, which then causes the code here: <http://hg.mozilla.org/mozilla-central/diff/b9b9d9f379db/xpcom/base/nsStackWalk.cpp> to read more elements than there are inside the buffer.

I think the right fix is to apply the same checks that we perform when walking the stack from another thread in the case of walking a stack with a thread handle specified.

Patch coming up.
Created attachment 635887 [details] [diff] [review]
Patch (v1)

I also fixed the fallible malloc call which was not being followed by a null check.
Attachment #635887 - Flags: review?(dbaron)
http://tbpl.mozilla.org/?tree=Try&rev=e9efe37ec040
Comment on attachment 635887 [details] [diff] [review]
Patch (v1)

It seems odd to use alloca for one codepath and moz_xmalloc for the other.  Why not use the same for both?

Otherwise, r=dbaron.

Merging with bug 766579 requires care.
Attachment #635887 - Flags: review?(dbaron) → review+
Blocks: 713278
Hmm, you're right.  I see no reason to not use _alloca in both codepaths.
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b03b3e56196
Target Milestone: --- → mozilla16
(In reply to Ehsan Akhgari [:ehsan] from comment #9)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/5b03b3e56196

I forgot to fix the commit message to reflect comment 8.  :(
https://hg.mozilla.org/mozilla-central/rev/5b03b3e56196
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.