Closed Bug 802106 Opened 12 years ago Closed 12 years ago

Breakpad stackwalking optimization: optimize PostfixEvaluator

Categories

(Toolkit :: Crash Reporting, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla21

People

(Reporter: ted, Assigned: jseward)

References

Details

Attachments

(4 files, 2 obsolete files)

Julian did a callgrind profile of his SPS unwinder that uses Breakpad for unwinding, and it shows that we spend a huge portion of time in the PostfixEvaluator class:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/postfix_evaluator.h
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/postfix_evaluator-inl.h

Specifically, the unwind rules in Breakpad symbol files are specified as postfix expressions translated from DWARF CFI, and every time we walk the stack from a callee frame to its caller we build a PostfixEvaluator, feed it the current register state, and evaluate one of these expressions to recover the caller register values. See:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/stackwalker_amd64.cc#122
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/cfi_frame_info.cc#46

Most of the damage seems to be fallout from heavy use of the STL: lots of memcmp from comparing string keys and an awful lot of mallocing. I suspect we could get some pretty large wins if we could just switch from string keys to int keys by maintaining a string table.
This replaces the use of std::string keys in the dictionaries used for
evaluation, with a new type UniqueString*, basically an interned
string for which equality can be done via pointer equality.

It also changes the type of the evaluator stack element from
std::string to a new type StackElem, which is a tagged union of
ValueType and UniqueString*.  This makes the evaluator a bit clearer
and avoids storing literals (numbers) in the stack as text; the latter
clearly is inefficient in that it causes arithmetic on literals to
require conversion of operands to/from strings.

This helps, although it doesn't give a huge speedup.  In the SPS
integration it reduces the cost of StackwalkerAMD64::GetCallerFrame
from about 150k instructions down to 100k, better but still insanely
expensive.  One background reason is there is still a lot of
std::string activity present because the expression strings are
tokenised and parsed using std::string.
Assignee: nobody → jseward
Give "expressions" (the things evaluated by the postfix evaluator)
their own type (Module::Type) rather than merely being strings.  Allow
the type to have special non-string representation for the special
cases "identifier + immediate" and "*(identifier + immediate)".  These
special cases can represent all the Dwarf Cfi expressions we generate,
and so allow Cfi unwinding to happen without endlessly having to parse
strings.  Also enhance the postfix evaluator to handle these special
cases.  This speeds up StackwalkerAMD64::GetCallerFrame by very
roughly a factor of (150k insns/frame before, 46k/frame after).
This looks great! After reading your patch I can offer some more concrete suggestions. My original idea was that we'd be able to avoid making changes to PostfixEvaluator at all, which should make the patch a bit conceptually simpler. I thought that perhaps in CFIFrameInfo::FindCallerRegs:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/cfi_frame_info.cc#73

we could somehow lazily init the PostfixEvaluator and only use it if the unwind rule required it.
Rehash of the UniqueString patch, to be applied on top of the patch in
comment 2.  This patch replaces the used of std::string keys with a
new type, UniqueString*, which only ever has one copy of any string
and hence allows string equality to be done by pointer equality.  In
combination with Ted's direct-from-the-dwarf-sections patches and the
Cfi rule re-representation patch in comment 2, this allows CFI
unwinding to happen with essentially no std::string allocation or
comparison.  According to my rough measurements, this reduces the cost
of calls to StackwalkerAMD64::GetCallerFrame from about 46kI/frame to
about 26kI/frame.

This is not as aggressive as changing of keys directly to integers,
but it does sidestep the problem of various evaluators (particularly
stackwalker_x86.cc) stuffing entirely arbitrary string keys into the
evaluation dictionaries and expecting everything still to work.

This patch contains some "backwards compatibility" impedance matching,
so that uses of the plain postfix evaluator with plain strings, still
works right (although untested).

It also contains a cleanup change to postfix_evaluator-inl.h which
should possibly be factored out into its own patch: it changes the
type of the stack elements from strings to a union of string
(actually, of course, UniqueString*) and ValueType.  This makes the
code clearer and avoids having to convert literal values back and
forth to strings during evaluation.
Attachment #672755 - Attachment is obsolete: true
One final speedup patch for the unwinder, that applies on top of the
patch in comment 4.  This changes the type of the dictionaries used in
evaluation from instantiations of std::map to a new type,
UniqueStringMap<ValueType>.  This has two advantages:

* it allows precise measurement/profiling of dictionary usage stats
* it allows using a optimised map implementations

UniqueStringMap optimises the small-map case.  From some testing it
appears that almost all maps (dictionaries) have 10 or fewer elements.
UniqueStringMap therefore stores (key,value) pairs in a fixed array of
10 elements and performs linear search for access.  If this array
fills up (very rarely), then it allocates and uses a std::map to
handle the overflow.

This easily doubles the performance of
StackwalkerAMD64::GetCallerFrame, reducing the cost from about
26kI/frame to 11kI/frame.  Profiling shows it removes almost all the
std::map activity during CFI unwinding.

Overall cost when testing SPS on x86_64-linux, for max-32 element
stack unwinds, shows a cost of about 670K insns/unwind.  SPS can
sample at 1KHz on this machine (Core i5) at less than 40% cpu
utilisation now.
This applies on top of the patch in comment 5, and fixes a problem
with it.  UniqueStringMap contains a raw pointer, and the default
assignment operator for it does not handle that correctly, leading to
read/write after free and double frees, on arm-android.  If we had
used the default copy constructor the same problem would exist.  This
patch fixes both cases.
Misc fixes needed to build/use native unwind on arm-android:
* don't call abi::__cxa_demangle since it doesn't exist
* ~MmapWrapper(): don't munmap unless base_/size_ seem sane

Applies on top of the patch in comment 6.
Comment on attachment 674594 [details] [diff] [review]
Change representation of Cfi rules internally

I fixed a few style issues that I knew would get dinged in review, and also fixed some of the Breakpad unit tests, and got this patch up for review:
https://breakpad.appspot.com/516004/
I rolled up all the remaining patches here along with the Breakpad bits of the patch from bug 812133 and pushed them to Try:
https://tbpl.mozilla.org/?tree=Try&rev=717a458f3518

It all built fine on my local Linux machine.
Comment on attachment 675297 [details] [diff] [review]
Change std::string keys to UniqueString*, second try

After working out a few small issues with Julian, and cleaning up some style nits, this patch is up for upstream review here:
https://breakpad.appspot.com/519002/
Comment on attachment 675301 [details] [diff] [review]
Replace eval-dictionary mappings with a specialised type, UniqueStringMap

This patch (with the "fix assignment operator" patch rolled into it) is up for upstream review:
https://breakpad.appspot.com/520002/
Attachment #678232 - Attachment is obsolete: true
Comment on attachment 678240 [details] [diff] [review]
Misc fixes for arm-android

Uploaded this patch for upstream review:
https://breakpad.appspot.com/521002

I didn't really understand the purpose of the dump_symbols MmapWrapper change, so I left it out. If it turns out to be important we can put it back, but it feels hacky.
Depends on: 838882
https://hg.mozilla.org/mozilla-central/rev/be126e0d29a1
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: