Closed Bug 1157194 Opened 10 years ago Closed 10 years ago

Make LUL able to deal with Dwarf expressions in CFI unwind info

Categories

(Core :: Gecko Profiler, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox40 --- affected
firefox42 --- fixed

People

(Reporter: jseward, Assigned: jseward)

References

Details

Attachments

(1 file, 4 obsolete files)

LUL now provides native unwind for SPS on desktop Linux. Unfortunately it doesn't deal with CFI that contains so-called Dwarf expressions, which means that unwinding sometimes fails in various bits of hand-written assembly in libpthread that have hand-written CFI to match. In particular pthread_mutex_lock (et al) and its pseudo-subroutines _L_lock_<number>.
Attached patch bug1157194-1.diff (obsolete) — Splinter Review
Initial implementation; x86_64-linux only; WIP.
Attached patch bug1157194-2.diff (obsolete) — Splinter Review
x86-64 only; WIP. Minor cleanups compared to previous version. This appears to fix the unwind problems reported on recent Ubuntu versions. Specifically, LUL can now successfully unwind through the _L_lock_<number> pseudo-functions that are present in Ubuntu builds of libpthread.so.
Attachment #8597238 - Attachment is obsolete: true
Attached patch bug1157194-4.cset (obsolete) — Splinter Review
Builds on all targets; tested on x86_64-linux; also has improved diagnostics for cases where summarisation is still failing.
Attachment #8597679 - Attachment is obsolete: true
Depends on: 1165833
Attached patch bug1157194-5.cset-4 (obsolete) — Splinter Review
Same as the previous version of the patch, except with slightly improved debug printing and -- more importantly -- adds some test cases to tools/profiler/tests/gtest/LulTestDwarf.cpp.
Assignee: nobody → jseward
Attachment #8597946 - Attachment is obsolete: true
Attachment #8607489 - Flags: review?(jimb)
Just a thought: I was looking at all these comments: // RUNS IN NO-MALLOC CONTEXT Steve Fink said it wouldn't be hard to add a static analysis that checked that functions having a given attribute couldn't ever call malloc. You'd have to annotate away function pointers and similar things, but if it were valuable, the analysis part itself wouldn't be too bad.
JimB, review ping?
Comment on attachment 8607489 [details] [diff] [review] bug1157194-5.cset-4 Review of attachment 8607489 [details] [diff] [review]: ----------------------------------------------------------------- Looks good; please address the comments (or explain why it's better not to) before landing. Debugging output should be handled the same way it is in the rest of the Mozilla tree. ::: tools/profiler/LulDwarf.cpp @@ +1881,5 @@ > + > + while (true) { > + > + if (cursor >= end1) > + break; Why not just: while (cursor < end1) { ? @@ +1910,5 @@ > + printf("LUL.DW DW_OP_breg%d %lld\n", (int)reg, (long long int)n); > + } > + // The PfxInstr only allows a 32 bit signed offset. So we > + // must fail if the immediate is out of range. > + if (n != (int64_t)(int32_t)n) The behavior of the cast to int32_t is unspecified in C if n is out of range. You want to say, instead: if (n < INT32_MIN || INT32_MAX < n) @@ +1921,5 @@ > + > + case DW_OP_const4s: { > + uint64_t u64 = reader->ReadFourBytes(cursor); > + cursor += 4; > + int32_t s32 = (int32)(uint32)u64; This is also unspecified. You need: int32_t s32 = u64 > INT32_MAX ? -u64 : u64; @@ +2079,5 @@ > + int32_t start_ix = parseDwarfExpr(summ_, reader, expression, debug, > + true/*pushCfaAtStart*/, > + true/*derefAtEnd*/); > + if (start_ix >= 0) { > + summ_->Rule(address, reg, PFXEXPR, 0, (int64_t)start_ix); This cast doesn't do anything. ::: tools/profiler/LulDwarfExt.h @@ +987,5 @@ > // At ADDRESS, the DWARF expression EXPRESSION yields the address at > // which REG was saved. > virtual bool ExpressionRule(uint64 address, int reg, > + const std::string &expression, > + ByteReader* reader) = 0; It seems to me that the same code is constructing both the lul::CallFrameInfo and the lul::DwarfCFIToModule that is passed to it as its Handler, so we can drop the 'reader' argument here, and make the DwarfCFIToModule hold its own reference to the ByteReader it should use. That will save all the changes routing it through the CallFrameInfo::Rule::Handler implementations, and various other spots. @@ +994,5 @@ > // value for REG. (This rule doesn't provide an address at which the > // register's value is saved.) > virtual bool ValExpressionRule(uint64 address, int reg, > + const std::string &expression, > + ByteReader* reader) = 0; Similarly here. ::: tools/profiler/LulDwarfSummariser.cpp @@ +15,5 @@ > > +// Do |s64|'s lowest 32 bits sign extend back to |s64| itself? > +static inline bool fitsIn32Bits(int64 s64) { > + int32 s32 = (int32)s64; > + return s64 == (int64)s32; The behavior of the cast to int32 is unspecified. I think you want: return s64 == ((s64 & 0xffffffff) ^ 0x80000000) - 0x80000000; @@ +43,5 @@ > + } > + } > + if (0) { > + fprintf(stderr, "XXXXXXXXXX checkPfxExpr: validated %u .. %u\n", > + (unsigned int)start, (unsigned int)(i-1)); This should come out if you don't want it. ::: tools/profiler/LulMain.cpp @@ +76,4 @@ > } > > static string > ShowRule(const char* aNewReg, LExpr aExpr) It feels like this should be a member function of LExpr, that takes aNewReg as its only argument. This doesn't need a case for PFXEXPR? @@ +601,5 @@ > } > > // See if it falls inside a known r-x mapped area. Poking around > // outside such places risks segfaulting. > + uintptr_t insns_min = 0, insns_max = 0; This seems like an unrelated bug fix. If it's possible, try to leave this out of patches posted for review, since it takes me a little time to realize that it's extraneous. @@ +1001,5 @@ > } > > > // RUNS IN NO-MALLOC CONTEXT > +static TaggedUWord AddTUW (TaggedUWord tuw1, TaggedUWord tuw2) { It seems to me that these functions really belong as overloaded operators on TaggedUWord itself. In other words, C++ has a natural naming convention for functions of this variety. @@ +1099,5 @@ > +// See prototype for comment. > +TaggedUWord EvaluatePfxExpr(int32_t start, > + UnwindRegs* aOldRegs, > + TaggedUWord aCFA, const StackImage* aStackImg, > + const vector<PfxInstr>* aPfxInstrs) If you took const vector<PfxInstr>& aPfxInstrs, that would clean up the uses of aPfxInstrs a bit. @@ +1167,5 @@ > + case PX_End: > + // We just took care of that, so we shouldn't see it again. > + MOZ_CRASH(); > + case PX_SImm32: > + PUSH(TaggedUWord((intptr_t)(int32_t)pfxi.mOperand)); Why the double cast? pfxi.mOperand is already int32_t. @@ +1170,5 @@ > + case PX_SImm32: > + PUSH(TaggedUWord((intptr_t)(int32_t)pfxi.mOperand)); > + break; > + case PX_DwReg: { > + DW_REG_NUMBER reg = (DW_REG_NUMBER)pfxi.mOperand; This cast doesn't do anything that the assignment to reg doesn't do already. @@ +1239,2 @@ > static > TaggedUWord EvaluateExpr(LExpr aExpr, UnwindRegs* aOldRegs, It feels like this should be a member function of LExpr. @@ +1261,2 @@ > default: > MOZ_ASSERT(0); You're using MOZ_CRASH in EvaluatePfxExpr, for what seems like the same kind of error. MOZ_ASSERT seems appropriate in both cases. ::: tools/profiler/LulMain.h @@ +80,5 @@ > } > } > > // RUNS IN NO-MALLOC CONTEXT > + // Is equal? Note: non-validity on either side gives non-equality. Does it matter that this is a partial equivalence relation (i.e. x != x)? ::: tools/profiler/LulMainInt.h @@ +67,5 @@ > + // meaning of mOperand effect on stack > + PX_Start, // bool start-with-CFA? start, with CFA on stack, or not > + PX_End, // none stop; result is at top of stack > + PX_SImm32, // int32 push signed int32 > + PX_DwReg, // DW_REG_NUMBER push value of the specified reg As signed, or unsigned? @@ +73,5 @@ > + PX_Add, // none pop X ; pop Y ; push Y + X > + PX_Sub, // none pop X ; pop Y ; push Y - X > + PX_And, // none pop X ; pop Y ; push Y & X > + PX_Or, // none pop X ; pop Y ; push Y | X > + PX_CmpGES, // none pop X ; pop Y ; push Y >=s X The result being one or zero? @@ +86,5 @@ > + explicit PfxInstr(PfxExprOp opcode) > + : mOpcode(opcode) > + , mOperand(0) > + {} > + bool operator==(const PfxInstr other) { You don't want to pass 'other' by const reference? @@ +101,5 @@ > +// the instruction vector, obviously malformed sequences), > +// return an invalid TaggedUWord. > +// RUNS IN NO-MALLOC CONTEXT > +TaggedUWord EvaluatePfxExpr(int32_t start, > + UnwindRegs* aOldRegs, Could aOldRegs be 'const UnwindRegs *'? @@ +181,5 @@ > > // Representation of expressions. If |mReg| is DW_REG_CFA (-1) then > // it denotes the CFA. All other allowed values for |mReg| are > // nonnegative and are DW_REG_ values. > + LExprHow mHow:8; // UNKNOWN, NODEREF, DEREF or PFXEXPR I think just using the type LExprHow here says everything the comment says. @@ +265,5 @@ > }; > > +// Returns |true| for Dwarf register numbers which are members > +// of the set of registers that LUL unwinds on this target. > +static inline bool registerIsTracked(DW_REG_NUMBER reg) { I'm surprised to find this bit of policy implemented in LulMainInt.h. There's no header for architecture-specific things? @@ +354,5 @@ > > // A vector of RuleSets, sorted, nonoverlapping (post Prepare()). > + vector<RuleSet> mRuleSets; > + > + // A vector of PfxInstrs, which are referred to by the RuleSets. It would be nice to say just a bit more about the structure here: This vector holds a bunch of separate PfxInstr programs, each one starting with a PX_Start and terminated by a PX_End, all concatenated together. When a RuleSet can't recover a value using a self-contained LExpr, it uses a PFXEXPR whose mOffset is the index in this vector of start of the necessary PfxInstr program. These are provided as...
Attachment #8607489 - Flags: review?(jimb) → review+
With fixes for most of the review items in comment 8, and comments below on the rest. Debugging output should be handled the same way it is in the rest of the Mozilla tree. >>> Can you be more specific about this? I'm not sure what you intend. >>> Currently LUL uses the same mechanisms as the rest of SPS for logging. @@ +2079,5 @@ > + int32_t start_ix = parseDwarfExpr(summ_, reader, expression, debug, > + true/*pushCfaAtStart*/, > + true/*derefAtEnd*/); > + if (start_ix >= 0) { > + summ_->Rule(address, reg, PFXEXPR, 0, (int64_t)start_ix); This cast doesn't do anything. >>> True. But it makes it clear to the reader that there is a change >>> of type involved, which she would otherwise have to infer. It's >>> also stylistically consistent with the corresponding casts in >>> the pre-existing DwarfCFIToModule::{OffsetRule,ValOffsetRule} for >>> example. @@ +1099,5 @@ > +// See prototype for comment. > +TaggedUWord EvaluatePfxExpr(int32_t start, > + UnwindRegs* aOldRegs, > + TaggedUWord aCFA, const StackImage* aStackImg, > + const vector<PfxInstr>* aPfxInstrs) If you took const vector<PfxInstr>& aPfxInstrs, that would clean up the uses of aPfxInstrs a bit. >>> You raise an interesting point. I looked at this. It turns out to be >>> difficult to prove that aPfxInstrs can never be null. So I left the >>> type as-is but added a null test on aPfxInstrs at the start of >>> EvaluatePfxExpr. @@ +1173,2 @@ > + case PX_DwReg: { > + DW_REG_NUMBER reg = (DW_REG_NUMBER)pfxi.mOperand; This cast doesn't do anything that the assignment to reg doesn't do already. >>> This is a cast from an int32_t an enum type (lul::DW_REG_NUMBER), and >>> g++ bounces it if the cast is omitted. ::: tools/profiler/LulMain.h @@ +80,5 @@ > } > } > > // RUNS IN NO-MALLOC CONTEXT > + // Is equal? Note: non-validity on either side gives non-equality. Does it matter that this is a partial equivalence relation (i.e. x != x)? >>> I don't think so. It is only used for comparing TaggedUWords tagged as >>> "valid" and only in the gmock tests, not in the main code. ::: tools/profiler/LulMainInt.h @@ +67,5 @@ > + // meaning of mOperand effect on stack > + PX_Start, // bool start-with-CFA? start, with CFA on stack, or not > + PX_End, // none stop; result is at top of stack > + PX_SImm32, // int32 push signed int32 > + PX_DwReg, // DW_REG_NUMBER push value of the specified reg As signed, or unsigned? >>> I am unclear whether the referred-to signedness issue pertains to the >>> register number itself (I suspect not) or whether it pertains to how >>> the register value is widened before pushing on the eval stack (I >>> suspect so). If the latter, this is moot because the eval stack is a >>> stack of TaggedUWords (unsigned machine words + tags) and so are all >>> the unwound registers (integer registers). So there's no widening to >>> be done. @@ +265,5 @@ > }; > > +// Returns |true| for Dwarf register numbers which are members > +// of the set of registers that LUL unwinds on this target. > +static inline bool registerIsTracked(DW_REG_NUMBER reg) { I'm surprised to find this bit of policy implemented in LulMainInt.h. There's no header for architecture-specific things? >>> No, not in LUL and not really in SPS either. Currently there's >>> not enough arch-specific stuff to make arch specific headers feel >>> necessary (to me, at least).
Attachment #8607489 - Attachment is obsolete: true
Flags: needinfo?(jimb)
(In reply to Julian Seward [:jseward] from comment #9) > Debugging output should be handled the same way it is in the rest of > the Mozilla tree. > > >>> Can you be more specific about this? I'm not sure what you intend. > >>> Currently LUL uses the same mechanisms as the rest of SPS for logging. Okay --- I wasn't aware that SPS had its own conventions. I was expecting to see uses of MOZ_LOG and the like. You should definitely match the conventions used by the surrounding code. But I see two kinds of things in the patch: calls to printf, and calls to the mLog hook. The printf calls should either be "promoted" to proper logging, or taken out. We shouldn't leave debugging code in. > @@ +2079,5 @@ > > + int32_t start_ix = parseDwarfExpr(summ_, reader, expression, debug, > > + true/*pushCfaAtStart*/, > > + true/*derefAtEnd*/); > > + if (start_ix >= 0) { > > + summ_->Rule(address, reg, PFXEXPR, 0, (int64_t)start_ix); > > This cast doesn't do anything. > > >>> True. But it makes it clear to the reader that there is a change > >>> of type involved, which she would otherwise have to infer. It's > >>> also stylistically consistent with the corresponding casts in > >>> the pre-existing DwarfCFIToModule::{OffsetRule,ValOffsetRule} for > >>> example. You mean this code? I'm not seeing the casts: https://hg.mozilla.org/mozilla-central/file/920ded6a1f77/tools/profiler/LulDwarf.cpp#l1923 I think unless a type conversion is lossy (which this one isn't, right?) there's no reason to call it out. If we marked every place C does a value-preserving promotion, there'd be casts everywhere. > @@ +1099,5 @@ > > +// See prototype for comment. > > +TaggedUWord EvaluatePfxExpr(int32_t start, > > + UnwindRegs* aOldRegs, > > + TaggedUWord aCFA, const StackImage* aStackImg, > > + const vector<PfxInstr>* aPfxInstrs) > > If you took const vector<PfxInstr>& aPfxInstrs, that would clean up > the uses of aPfxInstrs a bit. > > >>> You raise an interesting point. I looked at this. It turns out to be > >>> difficult to prove that aPfxInstrs can never be null. So I left the > >>> type as-is but added a null test on aPfxInstrs at the start of > >>> EvaluatePfxExpr. It seems to me we shouldn't be calling EvaluatePfxExprs when we have no PfxExpr to Evaluate. We should assert that it's not nullptr in the PFXEXPR case of LExpr::EvaluateExpr, and then pass *aPfxInstrs there. Wouldn't that work? > > > @@ +1173,2 @@ > > + case PX_DwReg: { > > + DW_REG_NUMBER reg = (DW_REG_NUMBER)pfxi.mOperand; > > This cast doesn't do anything that the assignment to reg doesn't do already. > > >>> This is a cast from an int32_t an enum type (lul::DW_REG_NUMBER), and > >>> g++ bounces it if the cast is omitted. Sorry --- I totally missed the int-to-enum conversion. Of course. > ::: tools/profiler/LulMainInt.h > @@ +67,5 @@ > > + // meaning of mOperand effect on stack > > + PX_Start, // bool start-with-CFA? start, with CFA on stack, or not > > + PX_End, // none stop; result is at top of stack > > + PX_SImm32, // int32 push signed int32 > > + PX_DwReg, // DW_REG_NUMBER push value of the specified reg > > As signed, or unsigned? > > >>> I am unclear whether the referred-to signedness issue pertains to the > >>> register number itself (I suspect not) or whether it pertains to how > >>> the register value is widened before pushing on the eval stack (I > >>> suspect so). If the latter, this is moot because the eval stack is a > >>> stack of TaggedUWords (unsigned machine words + tags) and so are all > >>> the unwound registers (integer registers). So there's no widening to > >>> be done. Okay. My question was indeed about the register value. Seems good, then; sorry I didn't catch the identity of the types. > @@ +265,5 @@ > > }; > > > > +// Returns |true| for Dwarf register numbers which are members > > +// of the set of registers that LUL unwinds on this target. > > +static inline bool registerIsTracked(DW_REG_NUMBER reg) { > > I'm surprised to find this bit of policy implemented in > LulMainInt.h. There's no header for architecture-specific things? > > >>> No, not in LUL and not really in SPS either. Currently there's > >>> not enough arch-specific stuff to make arch specific headers feel > >>> necessary (to me, at least). Okay. I'm just scarred, I guess. GDB supported some thirty-odd architectures, which does give one certain reflexes.
Flags: needinfo?(jimb)
Comment on attachment 8622080 [details] [diff] [review] bug1157194-7.cset Review of attachment 8622080 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/LulMain.h @@ +69,5 @@ > , mValid(false) > {} > > // RUNS IN NO-MALLOC CONTEXT > + TaggedUWord operator+(TaggedUWord rhs) const { Usually all these would be |TaggedUWord& rhs|, but I guess it all gets inlined and doesn't matter much.
Attachment #8622080 - Flags: review+
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42

(In reply to Julian Seward [:jseward] from comment #0)

unwinding sometimes fails in various
bits of hand-written assembly in libpthread that have hand-written CFI
to match. In particular pthread_mutex_lock (et al) and its
pseudo-subroutines _L_lock_<number>.

Interesting historical tidbit: These _L_lock subroutines were removed in this commit from March 26, 2014 in order to simplify unwinding - just 25 days before this bug was filed. The change shipped with glibc 2.20 in September 2014.
The hand-written unwind info for them was originally written in 2006.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: