Closed Bug 1210272 Opened 9 years ago Closed 9 years ago

Some samples come with incomplete stack frames

Categories

(Core :: Gecko Profiler, defect)

ARM
Unspecified
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox44 --- affected

People

(Reporter: ting, Unassigned)

Details

I see this often which some samples come with incomplete stack and are listed in different places in the samples view. For example, in following profile the sample of b2g process at 60248ms is with incomplete stack, though it should have similiar stack with its neighbors.

http://people.mozilla.org/~bgirard/cleopatra/#report=5cd8e0d329a515f909131de370ab081f8d70496b&select=60245,60251
The profile_*.txt already has the frames missed, so it's not about front end processing. Will check stack walking.
The problem occurs when the pc passes to EHAddrSpace::lookup() can't be found. But I haven't figured out how.

I've checked followings but didn't see anything wrong:

  EHTable::lookup(),
  EHAddrSpace::lookup(),
  EHInterp::unwind(),
  confirmed |readelf -u| and the content of EHEntry are matched
Found a case could cause trouble: if SIGPROF is signaled when a function is within prologue or epilogue. Since EHInterp::unwind() executes all the unwind instructions, so in this case it could replace lr (pc for next frame) with incorrect value, and failed EHAddrSpace::lookup() later.

To fix this, we need to do unwind instructions partially, depend on where is pc in prologue/epilogue.
Another case is when unwind from a VMFunction wrapped function, e.g., DoSetElemFallback(). The lr after executing unwind instructions is somewhere in an executable region which does not map to any library like:

  b09b0000-b09c0000 rwxp 00000000 00:00 0

which fails EHAddrSpace::lookup().
(In reply to Ting-Yu Chou [:ting] from comment #3)
> Found a case could cause trouble: if SIGPROF is signaled when a function is
> within prologue or epilogue.

I observed only pc is in fuction's offset 0, which is still the first line of prologue, i.e., not doing anything yet.

But I found no means to check whether pc is in function's offset 0 or within its body.
(In reply to Ting-Yu Chou [:ting] from comment #3)
> Found a case could cause trouble: if SIGPROF is signaled when a function is
> within prologue or epilogue. Since EHInterp::unwind() executes all the
> unwind instructions, so in this case it could replace lr (pc for next frame)
> with incorrect value, and failed EHAddrSpace::lookup() later.
> 
> To fix this, we need to do unwind instructions partially, depend on where is
> pc in prologue/epilogue.

This issue was described in bug 863475 comment 2, point 5.
(In reply to Ting-Yu Chou [:ting] from comment #5)
> But I found no means to check whether pc is in function's offset 0 or within
> its body.

Actually exidx's first word has function's offset, which can be used to check this. But an entry doesn't have to be 1-to-1 mapping to a function, it could be 1-to-n. I'll see if we can get function's start pc from other places.
Julian, do you have any suggestions regarding comment 3? Thank you.
Flags: needinfo?(jseward)
(In reply to Ting-Yu Chou [:ting] from comment #9)
> Julian, do you have any suggestions regarding comment 3? Thank you.

I'm afraid I do not .. sorry.  The underlying problem is that we are
using EXIDX for something it was not designed for -- for unwinds from
arbitrary code locations, whereas it is designed only to support unwinds
at points where there could be C++ exceptions.

How bad is the problem for you?  I had assumed that this problem would
only happen occasionally and so would not be a big deal in practice.
But maybe I was wrong.

One possibility is to build the library(s) (libxul?) with CFI unwind
data instead.  That should give exact unwinding everywhere, at least
if you use -fasynchronous-exceptions.  But doing such a build is a
bit of hassle, and it is not clear to me whether the Gecko profiler
can directly use it on ARM platforms.  (It used to be possible, but now
I am not sure what the status is.)
Flags: needinfo?(jseward)
Julian, thank you for your feedback. It's not a problem, just it looks like broken on profiler UI.

I decided to put this to won't fix, reasons are:

  1. Even though comparing pc with the function's offset from EH entry can know whether it's before prologue or not so we can avoid unwind instructions, but to deal with the 1-to-n mapping described in comment 7, it'll make unwinding longer (search function for its address in symbol table). Also I am not sure comment 5 is the only case.
  2. It does not fail the goal of profiling.
  3. It's a known issue when we chose EHABI to uwind.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.