Last Comment Bug 767479 - Crash in NS_StackWalk
: Crash in NS_StackWalk
Status: RESOLVED FIXED
: crash, regression
Product: Core
Classification: Components
Component: Gecko Profiler (show other bugs)
: 16 Branch
: x86 Windows 7
: -- critical (vote)
: mozilla16
Assigned To: :Ehsan Akhgari
:
Mentors:
Depends on:
Blocks: 713278
  Show dependency treegraph
 
Reported: 2012-06-22 11:23 PDT by Vladan Djeric (:vladan)
Modified: 2012-07-10 15:49 PDT (History)
6 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch (v1) (2.15 KB, patch)
2012-06-22 13:33 PDT, :Ehsan Akhgari
dbaron: review+
Details | Diff | Splinter Review

Description Vladan Djeric (:vladan) 2012-06-22 11:23:06 PDT
Breakpad report:

https://crash-stats.mozilla.com/report/index/bp-d76c3a07-8f98-40a4-9bfd-7f4192120622
Comment 1 Scoobidiver (away) 2012-06-22 11:43:06 PDT
It first appeared in 16.0a1/20120619. The regression range might be:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b1a0fb2bdbf7&tochange=373e6f9264e6
Comment 2 Vladan Djeric (:vladan) 2012-06-22 11:52:50 PDT
(In reply to Scoobidiver from comment #1)
> It first appeared in 16.0a1/20120619. The regression range might be:
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=b1a0fb2bdbf7&tochange=373e6f9264e6

You're right, this bug depends on frame pointers being turned so it only would have caused an uptick in crash reports when we enabled frame pointers on mozilla-central Nightly. The bug itself has been around for a while though. Ehsan has an idea about the root causes and is working on a patch.
Comment 3 bomfog 2012-06-22 12:46:05 PDT
I hit this reliably on ionmonkey nightly builds of 21 and 22 June (post- bug #764216 merge-to-ion), brand new profile with only Gecko Profiler extension installed and the stack walking option turned on, running kraken benchmark.

That setup crashes every time by/at the end of the benchmark. Stack walking turned off doesn't crash. Mozilla central nightly builds don't crash, regardless of stackwalk option. 

32 bit Windows Vista.
Comment 4 :Ehsan Akhgari 2012-06-22 13:21:05 PDT
What happens here is that if we have more than 1024 frames on the stack to walk, WalkStackMain64 starts to increment pc_count past that, which then causes the code here: <http://hg.mozilla.org/mozilla-central/diff/b9b9d9f379db/xpcom/base/nsStackWalk.cpp> to read more elements than there are inside the buffer.

I think the right fix is to apply the same checks that we perform when walking the stack from another thread in the case of walking a stack with a thread handle specified.

Patch coming up.
Comment 5 :Ehsan Akhgari 2012-06-22 13:33:14 PDT
Created attachment 635887 [details] [diff] [review]
Patch (v1)

I also fixed the fallible malloc call which was not being followed by a null check.
Comment 6 :Ehsan Akhgari 2012-06-22 13:36:25 PDT
http://tbpl.mozilla.org/?tree=Try&rev=e9efe37ec040
Comment 7 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2012-07-09 14:48:34 PDT
Comment on attachment 635887 [details] [diff] [review]
Patch (v1)

It seems odd to use alloca for one codepath and moz_xmalloc for the other.  Why not use the same for both?

Otherwise, r=dbaron.

Merging with bug 766579 requires care.
Comment 8 :Ehsan Akhgari 2012-07-09 15:19:43 PDT
Hmm, you're right.  I see no reason to not use _alloca in both codepaths.
Comment 10 :Ehsan Akhgari 2012-07-09 15:25:52 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #9)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/5b03b3e56196

I forgot to fix the commit message to reflect comment 8.  :(
Comment 11 Ryan VanderMeulen [:RyanVM] 2012-07-10 15:49:04 PDT
https://hg.mozilla.org/mozilla-central/rev/5b03b3e56196

Note You need to log in before you can comment on or make changes to this bug.