Last Comment Bug 767479 - Crash in NS_StackWalk
: Crash in NS_StackWalk
: crash, regression
Product: Core
Classification: Components
Component: Gecko Profiler (show other bugs)
: 16 Branch
: x86 Windows 7
-- critical (vote)
: mozilla16
Assigned To: :Ehsan Akhgari
: Markus Stange [:mstange]
Depends on:
Blocks: 713278
  Show dependency treegraph
Reported: 2012-06-22 11:23 PDT by Vladan Djeric (:vladan)
Modified: 2012-07-10 15:49 PDT (History)
6 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

Patch (v1) (2.15 KB, patch)
2012-06-22 13:33 PDT, :Ehsan Akhgari
dbaron: review+
Details | Diff | Splinter Review

Description User image Vladan Djeric (:vladan) 2012-06-22 11:23:06 PDT
Breakpad report:
Comment 1 User image Scoobidiver (away) 2012-06-22 11:43:06 PDT
It first appeared in 16.0a1/20120619. The regression range might be:
Comment 2 User image Vladan Djeric (:vladan) 2012-06-22 11:52:50 PDT
(In reply to Scoobidiver from comment #1)
> It first appeared in 16.0a1/20120619. The regression range might be:
> pushloghtml?fromchange=b1a0fb2bdbf7&tochange=373e6f9264e6

You're right, this bug depends on frame pointers being turned so it only would have caused an uptick in crash reports when we enabled frame pointers on mozilla-central Nightly. The bug itself has been around for a while though. Ehsan has an idea about the root causes and is working on a patch.
Comment 3 User image bomfog 2012-06-22 12:46:05 PDT
I hit this reliably on ionmonkey nightly builds of 21 and 22 June (post- bug #764216 merge-to-ion), brand new profile with only Gecko Profiler extension installed and the stack walking option turned on, running kraken benchmark.

That setup crashes every time by/at the end of the benchmark. Stack walking turned off doesn't crash. Mozilla central nightly builds don't crash, regardless of stackwalk option. 

32 bit Windows Vista.
Comment 4 User image :Ehsan Akhgari 2012-06-22 13:21:05 PDT
What happens here is that if we have more than 1024 frames on the stack to walk, WalkStackMain64 starts to increment pc_count past that, which then causes the code here: <> to read more elements than there are inside the buffer.

I think the right fix is to apply the same checks that we perform when walking the stack from another thread in the case of walking a stack with a thread handle specified.

Patch coming up.
Comment 5 User image :Ehsan Akhgari 2012-06-22 13:33:14 PDT
Created attachment 635887 [details] [diff] [review]
Patch (v1)

I also fixed the fallible malloc call which was not being followed by a null check.
Comment 6 User image :Ehsan Akhgari 2012-06-22 13:36:25 PDT
Comment 7 User image David Baron :dbaron: ⌚️UTC-8 2012-07-09 14:48:34 PDT
Comment on attachment 635887 [details] [diff] [review]
Patch (v1)

It seems odd to use alloca for one codepath and moz_xmalloc for the other.  Why not use the same for both?

Otherwise, r=dbaron.

Merging with bug 766579 requires care.
Comment 8 User image :Ehsan Akhgari 2012-07-09 15:19:43 PDT
Hmm, you're right.  I see no reason to not use _alloca in both codepaths.
Comment 10 User image :Ehsan Akhgari 2012-07-09 15:25:52 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #9)

I forgot to fix the commit message to reflect comment 8.  :(
Comment 11 User image Ryan VanderMeulen [:RyanVM] 2012-07-10 15:49:04 PDT

Note You need to log in before you can comment on or make changes to this bug.