Closed
Bug 855977
Opened 11 years ago
Closed 11 years ago
SPS breakpad: disable stack scanning for profiling unwinds
Categories
(Core :: Gecko Profiler, defect)
Tracking
()
RESOLVED
FIXED
mozilla23
People
(Reporter: jseward, Assigned: jseward)
References
Details
Attachments
(1 file, 1 obsolete file)
12.15 KB,
patch
|
BenWa
:
review+
|
Details | Diff | Splinter Review |
Breakpad will -- as a last-ditch measure -- use stack scanning to find probable return addresses. This is useful when trying to extract info from crashed out processes, but is problematic when doing profiling, because the heuristic tends to insert frames which don't actually exist, messing up the profiling results. This is because there is no way to tell which values-that-look-like-return-addresses on the stack are actually live, and which are dead ones created by now-defunct stack frames, which haven't got overwritten. I think it might be possible to do this without modifying breakpad, since breakpad tells us the 'trustyworthyness' of each frame it gives us. So it would be possible to ignore untrustyworthy frames, or (maybe better) truncate the stack at the first one encountered.
Comment 1•11 years ago
|
||
I thought we short-circuted this with the LocalDebugInfoSymbolizer?
Assignee | ||
Comment 2•11 years ago
|
||
Hmm. LocalDebugInfoSymbolizer does have this // Lie to the stackwalker to short-circuit stack-scanning heuristics. virtual bool HasImplementation() { return false; } and the only place it is called is from Stackwalker::InstructionAddressSeemsValid, which causes that routine to return 'true'. I can't remember any details about this now, but I have the feeling that this was done so as to stop stack scanning poking around with symbols to try and figure out if an address is a plausible return address. I don't get the impression from reading the code, that LocalDebugInfoSymbolizer's HasImplementation method would actually stop stack scanning from happening. But maybe I misunderstand.
Comment 3•11 years ago
|
||
Ah, you are correct. So we have two options, I think: 1) Let the stack walker scan, but drop the rest of the stack on the floor when we find a frame that was found by scanning (a little wasteful, but not terrible) 2) Add an interface to google_breakpad::Stackwalker to disable stack scanning.
Assignee | ||
Comment 4•11 years ago
|
||
> So we have two options, I think:
I had planned to do (1) on the basis that it doesn't involve messing
with breakpad, and not messing with breakpad is generally a Good
Thing. However, I'd also agree that (2) is a generally better
solution.
Assignee | ||
Comment 5•11 years ago
|
||
I should also maybe point out that (1) is pretty close to being a 1-liner on the SPS side of things, whereas (2) will inevitably involve a bunch of plumbing.
Comment 6•11 years ago
|
||
Might as well just hook up the easy thing, then. If it's a problem we can look at the more complicated solution. The stack traces you get should be the same in either case anyway.
Comment 7•11 years ago
|
||
I'd vote for (1). Once we get this all hooked up and working I think the stack scanning could be use later but slightly differently if the results aren't accurate enough or we have system library that don't have proper unwind info. (1) Stack scanning results wouldn't show up in the aggregation. (2) You'd be able to click on the timeline and ask for the best guess for the source.
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → jseward
Assignee | ||
Comment 8•11 years ago
|
||
Here's a WIP patch I've been playing with. I discovered a couple of things with this. The story is more complex than I thought. Firstly, due to the way in which unwind results are returned from do_breakpad_unwind_Buffer to its caller, we can't just break out of the loop when we find a frame we don't like. Doing so causes uninitialised garbage to be fed through to the rest of the profiler. This patch handles it correctly, but at the cost of a bit more complexity. Secondly, truncating stacks at the first untrustworthy frame does indeed give more believable results on x86_64 Linux, at least. However it does appear to mean no unwinds out of libc when (I think) firefox is blocked in syscalls. Which means in practice, no useful data for thread(s) blocked in syscalls. Not sure why breakpad can't handle that case. Maybe there is no CFI on the syscall assembly code. Thirdly, as a result of the above, it's clear that truncating the stacks on the SPS side rather than giving the problem to breakpad (iow, using comment 3 option 1) is the better thing to do. Because it gives us flexibility. Eg, we could try and work around the above problem by accepting exactly on stack-scanned innermost frame. Using comment 3 option 2 would make that impossible.
Assignee | ||
Comment 9•11 years ago
|
||
(In reply to Julian Seward from comment #8) > Eg, we could try and work around the above > problem by accepting exactly one stack-scanned innermost frame. Trivially done, with the above patch: change "if (dubious)" to "if (dubious && frame_index > 1)". That helps in the case where we have (eg) libxul calling libc.so, but not in the case where (eg) libxcb.so calls libc.so and there is no unwind info for libxcb.so (for whatever reason). Not sure what to do about this.
Comment 10•11 years ago
|
||
(In reply to Julian Seward from comment #8) > Secondly, truncating stacks at the first untrustworthy frame does > indeed give more believable results on x86_64 Linux, at least. > However it does appear to mean no unwinds out of libc when (I think) > firefox is blocked in syscalls. Which means in practice, no useful > data for thread(s) blocked in syscalls. Not sure why breakpad can't > handle that case. Maybe there is no CFI on the syscall assembly code. There are some symbol files for linux-gate.so in the Breakpad tree: http://code.google.com/p/google-breakpad/source/browse/#svn%2Ftrunk%2Fsrc%2Fclient%2Flinux%2Fdata If you're seeing stacks truncated in syscalls we should probably bake the equivalent of this into our LocalDebugInfoSymbolizer.
Assignee | ||
Comment 11•11 years ago
|
||
Turns off stack scanning by default, but allows it to be selectively re-enabled using env var MOZ_PROFILER_STACK_SCAN=<number> where <number> is the max number of dubious frames that can be accepted in a stack trace, before it gets truncated. This allows easy experimentation whilst we are still in the debugging-the-native-unwinder phase. Also moves the declarations for sUnwind{Mode,Interval,StackScan} from TableTicker.h to platform.h, which seems a more logical place for them.
Attachment #732727 -
Attachment is obsolete: true
Attachment #734037 -
Flags: review?(bgirard)
Comment 12•11 years ago
|
||
(In reply to Julian Seward from comment #11) > Also moves the declarations for sUnwind{Mode,Interval,StackScan} from > TableTicker.h to platform.h, which seems a more logical place for > them. The reason this wasn't done before was because Sampler.h was the interface to the sampler and TableTicker.h was just an implementation. The idea was to allow multiple sampler to exists. But at this point it we wont be support other samplers so I've reverted that decision in my multi-thread patch and it's fine here as well.
Comment 13•11 years ago
|
||
Comment on attachment 734037 [details] [diff] [review] Revised patch Review of attachment 734037 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/platform.h @@ +221,5 @@ > +void read_profiler_env_vars(); > +typedef enum { UnwINVALID, UnwNATIVE, UnwPSEUDO, UnwCOMBINED } UnwMode; > +extern UnwMode sUnwindMode; /* what mode? */ > +extern int sUnwindInterval; /* in milliseconds */ > +extern int sUnwindStackScan; /* max # of dubious frames allowed */ It would be nice to have a way to use INT_MAX/generally allow stack scan but for now this is just debugging so it's fine.
Attachment #734037 -
Flags: review?(bgirard) → review+
Assignee | ||
Comment 14•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/f0063cd8a37b
Comment 15•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/f0063cd8a37b
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
You need to log in
before you can comment on or make changes to this bug.
Description
•