Closed Bug 893542 Opened 11 years ago Closed 10 years ago

Breakpad CFI/EXIDX unwind: add cross-module caching of unwind rules

Categories

(Core :: Gecko Profiler, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jseward, Assigned: jseward)

References

Details

Attachments

(1 file, 1 obsolete file)

One reason (I suspect) why Breakpad is slow at CFI/EXIDX unwinding
is that, for each code address, it must find the relevant CodeModule,
then find the relevant set of rules, and -- at least in the CFI case --
assemble the rules for a particular address range from the stored
initial rules (CIEs) and later FDEs that modify the CIE.

Short of pre-computing the exact rule-set for every address range
at debuginfo-read time, which would be pretty space-inefficient, we
can maybe cache the most popular few thousand rule-sets as they
get looked up.  This could potentially short-circuit most of the
existing overhead.
First attempt at a patch.  Needs discussion w/ Ted re change of the return
type of ::FindCFIFrameInfo.

With patches for bug 892774 and bug 894264 in place, the CFI unwinder runs at a
cost of about 11200 insns per frame.  Adding in this patch reduces that to about
8100 insns per frame, a significant improvement.  The majority of the remaining
cost is related to actually using the CFIFrameInfo to compute the previous
frame's register values; the cost of finding the right CFIFrameInfo is now
pretty small.

The cache seems to be pretty effective, and becomes more so the longer Gecko
runs (unsurprisingly).  Eg, 300000 queries to the cache gave only 7048 misses,
a miss rate of about 2.5%.  To keep the worst-case space behaviour under
control, it might be wise to nuke it and start over when it accumulates (eg)
10000 entries.  Given the high hit rate, this is not likely to be much of a
performance loss.
The API change isn't fantastic, but I haven't read this in enough detail to give you a better suggestion.

RE: cache, sounds like you just want some sort of LRU cache, right?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #2)
> The API change isn't fantastic, but I haven't read this in enough detail to
> give you a better suggestion.

I agree, it's less than optimal.  I ummed and ahhed about this for a
long time, but couldn't think of a less ugly solution.  Maybe you can.
The problem is as follows:

Prior to this patch, ::FindCFIFrameInfo created a CFIFrameInfo object
(+ lots of stuff hanging off it), returned it to the caller, the
caller took ownership, and freed the object when it was done with it.
That interacts awkwardly with wanting to cache those objects, since we
can't have the callers freeing objects which ::FindCFIFrameInfo has
cached.

The options I considered were:

* Have ::FindCFIFrameInfo cache query results.  When a query results
  in a cache hit, return a copy of the cached CFIFrameInfo object.
  I didn't like that because
  - copying CFIFrameInfo plus the stuff attached to it sounds 
    complex, and I wasn't confident I could get it right
  - it's inefficient; it implies allocation, copy, deallocation
    on the fast path (cache hit) for no purpose

* Have ::CFIFrameInfo return a pointer to a refcounted CFIFrameInfo
  object; then have the caller .Release it when done.
  - Not as bad as the first proposal.
  - But I still don't know how it would play out with the sub-objects
    hanging off CFIFrameInfo
  - I couldn't find any RefCountedThing<..> class in the Breakpad
    sources, which means either importing one or using stuff from
    XPCOM, which would break standalone building

* Changing the ownership rules for ::FindCFIFrameInfo, to say that
  the callee (::FindCFIFrameInfo) retains ownership
  - would be impractical because the other implementations of
    ::FindCFIFrameInfo really depend on the callers to take
    ownership

Hence the fudge in the patch, in which ::FindCFIFrameInfo can tell
the callers who has ownership.

If we stay with this scheme, it'd be worth replacing the use of
std::pair with a custom class, so as to avoid readers of the code
having to infer the meaning of ".first" / ".second".

(kind of relatedly): One thing I did observe, that concerns me, in the
pre-patch implementation of stackwalker_x86.cc, is the possibility
that StackwalkerX86::GetCallerFrame calling GetCallerByCFIFrameInfo
stashes a pointer the the CFIFrameInfo in the frame it returns, and
then frees the CFIFrameInfo, hence opening the door to a bunch of
reads-after-free.  Not sure if this analysis is correct, but needs
examining.


> RE: cache, sounds like you just want some sort of LRU cache, right?

LRU might be optimal, but is a bit complex to do, and I don't think it
is worth the hassle, given that the hit rate exceeds 90% after a
couple of thousand unwinds, even with this brain-dead implementation.

I'm more concerned about keeping the worst-case space use under
control, hence the nuke-periodically suggestion.
Attachment #786304 - Flags: feedback?(ted)
Attachment #786304 - Attachment is obsolete: true
Attachment #786304 - Flags: feedback?(ted)
Attachment #793385 - Flags: feedback?(ted)
Do you want to bother with incremental improvements to the Breakpad-based unwinder, or are you going to focus all your energy on the new unwinder instead?
Attachment #793385 - Flags: feedback?(ted)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #5)
I am inclined to focus effort into the new unwinder, unless you
see some other reason to continue developing the Breakpad-based
unwinder.  I believe the outstanding Breakpad changes are somewhat
intrusive and make upstream integration more difficult, so AFAICS
not hacking further on it would be no bad thing.

But LMK if you think otherwise.
Nope, I'm totally on board with that, thanks for clarifying.
I'm guessing we'll just WONTFIX any outstanding Breakpad unwinder bugs at this point.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: