One reason (I suspect) why Breakpad is slow at CFI/EXIDX unwinding is that, for each code address, it must find the relevant CodeModule, then find the relevant set of rules, and -- at least in the CFI case -- assemble the rules for a particular address range from the stored initial rules (CIEs) and later FDEs that modify the CIE. Short of pre-computing the exact rule-set for every address range at debuginfo-read time, which would be pretty space-inefficient, we can maybe cache the most popular few thousand rule-sets as they get looked up. This could potentially short-circuit most of the existing overhead.
First attempt at a patch. Needs discussion w/ Ted re change of the return type of ::FindCFIFrameInfo. With patches for bug 892774 and bug 894264 in place, the CFI unwinder runs at a cost of about 11200 insns per frame. Adding in this patch reduces that to about 8100 insns per frame, a significant improvement. The majority of the remaining cost is related to actually using the CFIFrameInfo to compute the previous frame's register values; the cost of finding the right CFIFrameInfo is now pretty small. The cache seems to be pretty effective, and becomes more so the longer Gecko runs (unsurprisingly). Eg, 300000 queries to the cache gave only 7048 misses, a miss rate of about 2.5%. To keep the worst-case space behaviour under control, it might be wise to nuke it and start over when it accumulates (eg) 10000 entries. Given the high hit rate, this is not likely to be much of a performance loss.
The API change isn't fantastic, but I haven't read this in enough detail to give you a better suggestion. RE: cache, sounds like you just want some sort of LRU cache, right?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #2) > The API change isn't fantastic, but I haven't read this in enough detail to > give you a better suggestion. I agree, it's less than optimal. I ummed and ahhed about this for a long time, but couldn't think of a less ugly solution. Maybe you can. The problem is as follows: Prior to this patch, ::FindCFIFrameInfo created a CFIFrameInfo object (+ lots of stuff hanging off it), returned it to the caller, the caller took ownership, and freed the object when it was done with it. That interacts awkwardly with wanting to cache those objects, since we can't have the callers freeing objects which ::FindCFIFrameInfo has cached. The options I considered were: * Have ::FindCFIFrameInfo cache query results. When a query results in a cache hit, return a copy of the cached CFIFrameInfo object. I didn't like that because - copying CFIFrameInfo plus the stuff attached to it sounds complex, and I wasn't confident I could get it right - it's inefficient; it implies allocation, copy, deallocation on the fast path (cache hit) for no purpose * Have ::CFIFrameInfo return a pointer to a refcounted CFIFrameInfo object; then have the caller .Release it when done. - Not as bad as the first proposal. - But I still don't know how it would play out with the sub-objects hanging off CFIFrameInfo - I couldn't find any RefCountedThing<..> class in the Breakpad sources, which means either importing one or using stuff from XPCOM, which would break standalone building * Changing the ownership rules for ::FindCFIFrameInfo, to say that the callee (::FindCFIFrameInfo) retains ownership - would be impractical because the other implementations of ::FindCFIFrameInfo really depend on the callers to take ownership Hence the fudge in the patch, in which ::FindCFIFrameInfo can tell the callers who has ownership. If we stay with this scheme, it'd be worth replacing the use of std::pair with a custom class, so as to avoid readers of the code having to infer the meaning of ".first" / ".second". (kind of relatedly): One thing I did observe, that concerns me, in the pre-patch implementation of stackwalker_x86.cc, is the possibility that StackwalkerX86::GetCallerFrame calling GetCallerByCFIFrameInfo stashes a pointer the the CFIFrameInfo in the frame it returns, and then frees the CFIFrameInfo, hence opening the door to a bunch of reads-after-free. Not sure if this analysis is correct, but needs examining. > RE: cache, sounds like you just want some sort of LRU cache, right? LRU might be optimal, but is a bit complex to do, and I don't think it is worth the hassle, given that the hit rate exceeds 90% after a couple of thousand unwinds, even with this brain-dead implementation. I'm more concerned about keeping the worst-case space use under control, hence the nuke-periodically suggestion.
Do you want to bother with incremental improvements to the Breakpad-based unwinder, or are you going to focus all your energy on the new unwinder instead?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #5) I am inclined to focus effort into the new unwinder, unless you see some other reason to continue developing the Breakpad-based unwinder. I believe the outstanding Breakpad changes are somewhat intrusive and make upstream integration more difficult, so AFAICS not hacking further on it would be no bad thing. But LMK if you think otherwise.
Nope, I'm totally on board with that, thanks for clarifying.
I'm guessing we'll just WONTFIX any outstanding Breakpad unwinder bugs at this point.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.