Closed Bug 1016629 Opened 10 years ago Closed 10 years ago

Add native stack support in ThreadStackHelper

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla34

People

(Reporter: jchen, Assigned: jchen)

References

Details

Attachments

(8 files, 1 obsolete file)

1.65 KB, patch
vladan
: review+
Details | Diff | Splinter Review
1.56 KB, patch
snorp
: review+
Details | Diff | Splinter Review
4.73 KB, patch
snorp
: review+
Details | Diff | Splinter Review
15.36 KB, patch
snorp
: review+
jseward
: review+
Details | Diff | Splinter Review
9.04 KB, patch
snorp
: review+
jseward
: review+
Details | Diff | Splinter Review
874 bytes, patch
BenWa
: review+
Details | Diff | Splinter Review
1.75 KB, patch
snorp
: review+
Details | Diff | Splinter Review
1.17 KB, patch
jchen
: review+
Details | Diff | Splinter Review
ThreadStackHelper should be able to use the new unwinder library to perform unwinding and get a native stack in addition to the pseudo-stack.
Summary: Add LUL support in ThreadStackHelper → Add native stack support in ThreadStackHelper
Depends on: 1016441
Depends on: 1033006
Keep a separate native stack for each pseudostack.
Attachment #8448982 - Flags: review?(vdjeric)
Simple class to set/clear a pointer within a scope.
Attachment #8448991 - Flags: review?(snorp)
Simplify what we had before.
Attachment #8448995 - Flags: review?(snorp)
Attachment #8448991 - Flags: review?(snorp) → review+
Attachment #8448995 - Flags: review?(snorp) → review+
Add a GetNativeStack method to ThreadStackHelper. It first uses GetStack to get the pseudostack, as well as the thread context within the signal handler. It then unwinds the stack outside of the signal handler.
Attachment #8450300 - Flags: review?(snorp)
Add implementation for platform-specific bits. Tested on Windows, OS X, Linux, and Android.
Attachment #8450301 - Flags: review?(snorp)
Get the native stack during a permanent hang if we don't have a native stack yet.
Attachment #8450303 - Flags: review?(snorp)
Comment on attachment 8450303 [details] [diff] [review]
f. Get native stack for permahangs in BHM (v1)

Actually I'm going to move this patch to bug 1034138.
Attachment #8450303 - Attachment is obsolete: true
Attachment #8450303 - Flags: review?(snorp)
On ARM Android, dl_iterate_phdr is provided by the custom linker. So if libxul was loaded by the system linker (e.g. as part of xpcshell when running tests), it won't be available and we should not call it.
Attachment #8450324 - Flags: review?(bgirard)
The address sanitizer catches us copying the stack and thinks it's a bug. We should blacklist FillThreadContext and not use memcpy in that case.
Attachment #8450351 - Flags: review?(snorp)
Comment on attachment 8450324 [details] [diff] [review]
f. Don't call dl_iterate_phdr if it's not available (v1)

Review of attachment 8450324 [details] [diff] [review]:
-----------------------------------------------------------------

Can you put the comment in the patch description in line please?
Attachment #8450324 - Flags: review?(bgirard) → review+
Comment on attachment 8448982 [details] [diff] [review]
Associate a native stack for each hang (v1)

Review of attachment 8448982 [details] [diff] [review]:
-----------------------------------------------------------------

Trivial review :)

What's the bigger picture here though.. are you just reporting C++ stacks of all threads during a permanent hang on Fennec, or do you plan to extend it beyond that?
Attachment #8448982 - Flags: review?(vdjeric) → review+
(In reply to Vladan Djeric (:vladan) from comment #11)
> Comment on attachment 8448982 [details] [diff] [review]
> Associate a native stack for each hang (v1)
> 
> Review of attachment 8448982 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Trivial review :)
> 
> What's the bigger picture here though.. are you just reporting C++ stacks of
> all threads during a permanent hang on Fennec, or do you plan to extend it
> beyond that?

Right now it's just the C++ stack for the hung thread, in addition to the pseudostack that we already report. I don't think we have plans beyond that; we want to at least look at the gathered data first.
Comment on attachment 8450300 [details] [diff] [review]
d. Add and implement GetNativeStack method in ThreadStackHelper (v1)

Review of attachment 8450300 [details] [diff] [review]:
-----------------------------------------------------------------

Seems like I'm not the right person to review this, but looks good to me. Someone who knows breakpad should also have a look.
Attachment #8450300 - Flags: review?(snorp) → review+
Comment on attachment 8450351 [details] [diff] [review]
g. Avoid ASan flag when copying stack (v1)

Review of attachment 8450351 [details] [diff] [review]:
-----------------------------------------------------------------

Is there a way to get ASAN or our test harness to ignore this? I wonder if the loop there is significantly slower than the (presumably optimized?) libc implementation.
Attachment #8450351 - Flags: review?(snorp) → review+
Comment on attachment 8450301 [details] [diff] [review]
e. Implement platform-specific code for filling in context (v1)

Review of attachment 8450301 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good I guess, but I don't know any of that code. Should definitely get another person to review.
Attachment #8450301 - Flags: review?(snorp) → review+
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #14)
> Comment on attachment 8450351 [details] [diff] [review]
> g. Avoid ASan flag when copying stack (v1)
> 
> Review of attachment 8450351 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Is there a way to get ASAN or our test harness to ignore this? I wonder if
> the loop there is significantly slower than the (presumably optimized?) libc
> implementation.

The loop is only enabled for ASAN builds so I don't think we have to worry about performance here.
Comment on attachment 8450300 [details] [diff] [review]
d. Add and implement GetNativeStack method in ThreadStackHelper (v1)

Asking :jseward to review this because he wrote the original Breakpad unwinding code in the profiler.

I chose to use Breakpad for unwinding because it's relatively straightforward to use and it covers all platforms/architectures (AFAIK the LUL only covers Linux at the moment). Also, unwinding performance is not critical in this case because we only unwind when there's a hang of several seconds.
Attachment #8450300 - Flags: review?(jseward)
Comment on attachment 8450301 [details] [diff] [review]
e. Implement platform-specific code for filling in context (v1)

See comment 17
Attachment #8450301 - Flags: review?(jseward)
(In reply to Jim Chen [:jchen :nchen] from comment #17)
> Comment on attachment 8450300 [details] [diff] [review]
> d. Add and implement GetNativeStack method in ThreadStackHelper (v1)

On looking at this, and in at the implementation of class
ThreadStackHelper::CodeModulesProvider, I was unclear how it
guarantees not to fall off the upper end (base) of the stack and
segfault.  Can you clarify?
Comment on attachment 8450301 [details] [diff] [review]
e. Implement platform-specific code for filling in context (v1)

Review of attachment 8450301 [details] [diff] [review]:
-----------------------------------------------------------------

Linux and MacOS bits are plausible.  I can't say anything useful
for the Windows specific bits.

How well tested is it?  In particular, have you tested
ThreadStackHelper::GetThreadStackBase to ensure it gives plausible
values on all three targets?  Finding safe stack bounds reliably has
is a difficult problem IME.

::: xpcom/threads/ThreadStackHelper.cpp
@@ +641,5 @@
> +  mContextToFill->mContext.rbp = context.uc_mcontext.gregs[REG_RBP];
> +  mContextToFill->mContext.rsi = context.uc_mcontext.gregs[REG_RSI];
> +  mContextToFill->mContext.rdi = context.uc_mcontext.gregs[REG_RDI];
> +  memcpy(&mContextToFill->mContext.r8,
> +         &context.uc_mcontext.gregs[REG_R8], 8 * sizeof(int64_t));

This seems a bit strange to me.  For RAX .. RDI you copy each one
individually.  No problem there.  But for R8 to R15 you're assuming
that they are packed back-to-back, in order, in uc_mcontext.gregs[],
and that that layout is never going to change in future.  It would be
safer and more consistent to assume no particular layout in the
mcontext, and copy all of them individually.

@@ +708,5 @@
> +  mContextToFill->mContext.rbp = GET_REGISTER(state, rbp);
> +  mContextToFill->mContext.rsi = GET_REGISTER(state, rsi);
> +  mContextToFill->mContext.rdi = GET_REGISTER(state, rdi);
> +  memcpy(&mContextToFill->mContext.r8,
> +         &GET_REGISTER(state, r8), 8 * sizeof(int64_t));

Same comment as above.
Attachment #8450301 - Flags: review?(jseward) → review+
(In reply to Julian Seward [:jseward] from comment #20)
> Comment on attachment 8450301 [details] [diff] [review]
> e. Implement platform-specific code for filling in context (v1)
> 
> Review of attachment 8450301 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> How well tested is it?  In particular, have you tested
> ThreadStackHelper::GetThreadStackBase to ensure it gives plausible
> values on all three targets?  Finding safe stack bounds reliably has
> is a difficult problem IME.

I tested that we get reasonable stacks on Windows, OS X, Linux, and Android. I assume B2G is close enough to Android to also work. I didn't look at what GetThreadStackBase returned specifically, but I will take a look.

> ::: xpcom/threads/ThreadStackHelper.cpp
> @@ +641,5 @@
> > +  mContextToFill->mContext.rbp = context.uc_mcontext.gregs[REG_RBP];
> > +  mContextToFill->mContext.rsi = context.uc_mcontext.gregs[REG_RSI];
> > +  mContextToFill->mContext.rdi = context.uc_mcontext.gregs[REG_RDI];
> > +  memcpy(&mContextToFill->mContext.r8,
> > +         &context.uc_mcontext.gregs[REG_R8], 8 * sizeof(int64_t));
> 
> This seems a bit strange to me.  For RAX .. RDI you copy each one
> individually.  No problem there.  But for R8 to R15 you're assuming
> that they are packed back-to-back, in order, in uc_mcontext.gregs[],
> and that that layout is never going to change in future.  It would be
> safer and more consistent to assume no particular layout in the
> mcontext, and copy all of them individually.

Yeah I verified that these registers are consecutive in the structs, and I wanted to make the code less messy. I'm pretty confident that, for binary compatibility, these structures will never change for these architectures.
(In reply to Julian Seward [:jseward] from comment #19)
> (In reply to Jim Chen [:jchen :nchen] from comment #17)
> > Comment on attachment 8450300 [details] [diff] [review]
> > d. Add and implement GetNativeStack method in ThreadStackHelper (v1)
> 
> On looking at this, and in at the implementation of class
> ThreadStackHelper::CodeModulesProvider, I was unclear how it
> guarantees not to fall off the upper end (base) of the stack and
> segfault.  Can you clarify?

We keep the stack buffer in ThreadStackHelper::ThreadContext, and ThreadStackHelper::ThreadContext::GetMemoryAtAddressInternal() makes sure that we don't read from an address that's outside of the buffer's bounds. Is this what you meant?
Comment on attachment 8450300 [details] [diff] [review]
d. Add and implement GetNativeStack method in ThreadStackHelper (v1)

Review of attachment 8450300 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/threads/ThreadStackHelper.cpp
@@ +296,5 @@
> +  size_t mStackSize;
> +  // End of stack area
> +  const void* mStackEnd;
> +
> +  ThreadContext() : mValid(false), mStackSize(0), mStackEnd(nullptr) {}

This doesn't initialise mStackBase.  Should it?
Attachment #8450300 - Flags: review?(jseward) → review+
sorry had to back this changes in https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=6ffcd2030ed8 since one of this changesets caused a nearly permanent error like https://tbpl.mozilla.org/php/getParsedLog.php?id=44598113&tree=Mozilla-Inbound
Trivial patch to fix wrong sysinfo usage.
Attachment #8463479 - Flags: review+
Depends on: 1045176
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: