Closed Bug 818793 Opened 7 years ago Closed 7 years ago

Allow depth-limiting in NS_StackWalk

Categories

(Core :: XPCOM, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla20

People

(Reporter: njn, Assigned: njn)

Details

Attachments

(1 file)

NS_StackWalk is really expensive.  DMD calls it a lot.  NS_StackWalk unavoidably gets the entire stack, even though DMD only wants it to a certain depth.

It's pretty easy to change NS_WalkStackCallback to return a bool which, if false, tells NS_StackWalk to stop.  This makes a big difference to DMD (see bug 717853 comment 98).  I don't know if all the NS_StackWalk implementations can facilitate this early bailing, but the _Unwind_Backtrace ones certainly can.
There are four implementations of NS_StackWalk.

- Windows:  early bail-out isn't possible just by changing the retval of NS_WalkStackCallback.  However, if NS_StackWalk was changed to take a maxLength parameter, it could be supported.

- Solaris:  I think early bail-out is possible, but I don't really care.

- i386/PPC-Linux/Mac: early bail-out is easy.  See FramePointerStackWalk().

- HAVE__UNWIND_BACKTRACE: early bail-out is tricky.  _Unwind_Backtrace takes a callback, and if that callback returns anything other than _URC_NO_REASON, _Unwind_Backtrace will stop and return a value that indicates failure.  (The actual code varies across platforms.  And I worked this out by reading the GCC source code, because _Unwind_Backtrace has no known documentation on the entire web, AFAICT.)

  However, if we are happy to ignore the return code failure from _Unwind_Backtrace (as we do on ARM/Android, for example) then it's doable.

As for the return value of NS_StackWalk... it's usually ignored, and trace-malloc is the only place that worries about it (due to critical addresses on Mac, see
https://blog.mozilla.org/respindola/2011/12/04/tracking-malloc-on-os-x/ for details).  I think the HAVE__UNWIND_BACKTRACE could easily ignore the errors it gets.
This patch:

- Adds a |aMaxFrames| parameter to NS_StackWalk, which limits the size of the
  obtained trace (and it uses 0 to mean "no limit").

  I've modified all four implementations of NS_StackWalk: Windows, Solaris,
  i386-or-PPC/Linux-or-Mac, and HAVE__UNWIND_BACKTRACE.  I've tested on Linux
  and Mac, but they both use HAVE__UNWIND_BACKTRACE.  I.e. three of the four
  are as-yet-untested.

- Clarifies the meaning of the return value from NS_StackWalk, which was
  unclear in some circumstances.  Specifically, the call succeeds if we
  successfully obtain at least one stack frame.  (Note also that only three(?)
  callers of NS_StackWalk actually look at the return value.)

- Consistently uses |aSkipFrames|/|skipFrames| as the name of the 2nd arg to
  NS_StackWalk.

Review/feedback requests: 

- glandium for the overall patch.

- benoit for the TableTicker stuff.
    
- jlebar for the DMD and Android/ARM stack trace stuff.

- respindola for anything and everything.
Attachment #695413 - Flags: review?(mh+mozilla)
Summary: Allow early bailouts from NS_StackWalk → Allow depth-limiting in NS_StackWalk
Attachment #695413 - Flags: feedback?(respindola)
Attachment #695413 - Flags: feedback?(justin.lebar+bug)
Attachment #695413 - Flags: feedback?(bgirard)
feedback:

* This looks like a good idea to me.
* At least the windows implementation should also be tested :-)
* You have

817	    // However, the most complex thing DMD does after Get() returns is to put
818	    // something in a hash table, which might call
819	    // InfallibleAllocPolicy::malloc_.  I'm not yet sure if this needs special
820	    // handling

If I remember correctly, it does on 10.6. Allocating memory can try to get a lock which can try to allocate memory which causes a deadlock.  You might want to check if you get to this MOZ_CRASH on 10.6.

* It looks like the return value has redundant information. That is not the fault of this patch and can be fixed in a followup if you want. All that we need is boolean saying "we are in a critical situation, don't do anything". All callers can then just look at that and at the numbers of stack frames collected. They don't differentiate getting 12 frames instead of 13 because there were only 12 frames or because we failed to follow the 12->13 link, right?
Comment on attachment 695413 [details] [diff] [review]
Add a |aMaxFrames| parameter to NS_StackWalk.

Review of attachment 695413 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, but I'm not a peer.
Attachment #695413 - Flags: review?(mh+mozilla) → review+
Comment on attachment 695413 [details] [diff] [review]
Add a |aMaxFrames| parameter to NS_StackWalk.

glandium's r+ is good enough for me, but I need to test this anyway.
Attachment #695413 - Flags: feedback?(justin.lebar+bug) → review?(justin.lebar+bug)
Comment on attachment 695413 [details] [diff] [review]
Add a |aMaxFrames| parameter to NS_StackWalk.

Review of attachment 695413 [details] [diff] [review]:
-----------------------------------------------------------------

These changes look good to me. Typically dbaron reviews the changes to nsStackWalk.

::: xpcom/base/nsStackWalk.cpp
@@ +527,4 @@
>      void *local_sps[1024];
>      data.sps = local_sps;
>      data.sp_count = 0;
> +    data.sp_size = ArrayLength(local_sps);

nice find
Attachment #695413 - Flags: feedback?(bgirard) → feedback+
Comment on attachment 695413 [details] [diff] [review]
Add a |aMaxFrames| parameter to NS_StackWalk.

dbaron, do you want to review this?
Attachment #695413 - Flags: review?(dbaron)
Comment on attachment 695413 [details] [diff] [review]
Add a |aMaxFrames| parameter to NS_StackWalk.

Let's call what I did a superreview, since I really looked only at the API; I think you've got enough other reviewers here.
Attachment #695413 - Flags: review?(dbaron) → superreview+
Comment on attachment 695413 [details] [diff] [review]
Add a |aMaxFrames| parameter to NS_StackWalk.

I just did a DMD build with this on B2G, and it seems to work fine.  I'll review the patch now.  Sorry for the lag, Nick.
Attachment #695413 - Flags: feedback+
Attachment #695413 - Flags: review?(justin.lebar+bug) → review+
With this patch I'm getting

../../../../src/xpcom/base/nsStackWalk.cpp:163:13: warning: unused function 'IsCriticalAddress' [-Wunused-function]
static bool IsCriticalAddress(void* aPC)
> ../../../../src/xpcom/base/nsStackWalk.cpp:163:13: warning: unused function
> 'IsCriticalAddress' [-Wunused-function]
> static bool IsCriticalAddress(void* aPC)

Odd... on which platform?
> Odd... on which platform?

B2G ARM opt.
> B2G ARM opt.

Is defined(HAVE__UNWIND_BACKTRACE) not true on that platform?
> Is defined(HAVE__UNWIND_BACKTRACE) not true on that platform?

Oh, I'm really sorry -- I'm seeing this on a Linux 64 desktop DMD build.

On this platform, HAVE__UNWIND_BACKTRACE is indeed not defined.
Instead, we hit the "#else // unsupported platform." branch.

Maybe because I'm building with clang?
I build with clang on Linux 64 desktop (Ubuntu 12.10, and before that, 12.04) and defined(HAVE__UNWIND_BACKTRACE) is true.  So that's weird.
Okay, I figured it out.  HAVE__UNWIND_BACKTACE is a check from configure.  It tries to compile a file which includes unwind.h.  My clang errors out with this, due to

http://lists.cs.uiuc.edu/pipermail/llvmbugs/2010-November/015813.html

I don't care about fixing this, unless we build xpcom/ with -Werror.

It's weird that your clang does not have this problem.
I filed bug 826962 on the clang / NS_StackWalk issue.
> * It looks like the return value has redundant information. That is not the
> fault of this patch and can be fixed in a followup if you want. All that we
> need is boolean saying "we are in a critical situation, don't do anything".

The NS_OK and NS_FAILURE cases could be merged, but we still need to distinguish three cases:  "unimplemented", "need to abort", "other".
Comment on attachment 695413 [details] [diff] [review]
Add a |aMaxFrames| parameter to NS_StackWalk.

Looks like I forgot to flip the switch when I commented on the bug :-(
Attachment #695413 - Flags: feedback?(respindola) → feedback+
https://hg.mozilla.org/mozilla-central/rev/13cf1c8cceab
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
You need to log in before you can comment on or make changes to this bug.