Closed Bug 767479 Opened 9 years ago Closed 9 years ago

Crash in NS_StackWalk

Categories

(Core :: Gecko Profiler, defect)

16 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla16

People

(Reporter: vladan, Assigned: ehsan.akhgari)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

It first appeared in 16.0a1/20120619. The regression range might be:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b1a0fb2bdbf7&tochange=373e6f9264e6
Crash Signature: [@ NS_StackWalk]
Severity: normal → critical
Keywords: crash, regression
Hardware: x86_64 → x86
(In reply to Scoobidiver from comment #1)
> It first appeared in 16.0a1/20120619. The regression range might be:
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=b1a0fb2bdbf7&tochange=373e6f9264e6

You're right, this bug depends on frame pointers being turned so it only would have caused an uptick in crash reports when we enabled frame pointers on mozilla-central Nightly. The bug itself has been around for a while though. Ehsan has an idea about the root causes and is working on a patch.
I hit this reliably on ionmonkey nightly builds of 21 and 22 June (post- bug #764216 merge-to-ion), brand new profile with only Gecko Profiler extension installed and the stack walking option turned on, running kraken benchmark.

That setup crashes every time by/at the end of the benchmark. Stack walking turned off doesn't crash. Mozilla central nightly builds don't crash, regardless of stackwalk option. 

32 bit Windows Vista.
What happens here is that if we have more than 1024 frames on the stack to walk, WalkStackMain64 starts to increment pc_count past that, which then causes the code here: <http://hg.mozilla.org/mozilla-central/diff/b9b9d9f379db/xpcom/base/nsStackWalk.cpp> to read more elements than there are inside the buffer.

I think the right fix is to apply the same checks that we perform when walking the stack from another thread in the case of walking a stack with a thread handle specified.

Patch coming up.
Attached patch Patch (v1)Splinter Review
I also fixed the fallible malloc call which was not being followed by a null check.
Attachment #635887 - Flags: review?(dbaron)
Comment on attachment 635887 [details] [diff] [review]
Patch (v1)

It seems odd to use alloca for one codepath and moz_xmalloc for the other.  Why not use the same for both?

Otherwise, r=dbaron.

Merging with bug 766579 requires care.
Attachment #635887 - Flags: review?(dbaron) → review+
Blocks: 713278
Hmm, you're right.  I see no reason to not use _alloca in both codepaths.
(In reply to Ehsan Akhgari [:ehsan] from comment #9)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/5b03b3e56196

I forgot to fix the commit message to reflect comment 8.  :(
https://hg.mozilla.org/mozilla-central/rev/5b03b3e56196
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.