Closed Bug 916106 Opened 12 years ago Closed 12 years ago

ARM unwinder can go into a loop and fill the stack with copies of a leaf function

Categories

(Core :: Gecko Profiler, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
b2g-v1.2 --- affected

People

(Reporter: jld, Assigned: jld)

References

Details

(Keywords: perf, Whiteboard: [c=profiling p= s= u=])

Attachments

(1 file)

:mikeh tripped over this at the work week (and I helpfully made myself late to my own demo investigating, as one does). I think what happened is that we got a sample in a prologue/epilogue, which the ARM Exception Handling model doesn't attempt to handle, and the value we popped off of the stack into the virtual LR was in a leak function. That function's unwind program is just "finish", which (per the spec) copies LR into PC. And repeat. Since the available space is bounded (and has to be, because this is in signal context), and this failure mode seems to be relatively uncommon, the result is just that there's one 1000-entry stack which makes the profiler GUI's timeline hard to read. I think it's safe to set the virtual LR to something invalid after each frame, at least if the unwinder didn't explicitly set PC — the caller in question generally can't assume anything about what the callee will do with LR by the time it returns.
Keywords: perf
Whiteboard: [c=profiling] → [c=profiling p= s= u=]
I ran into one of these while doing something else, so I figured I'd analyze it. https://people.mozilla.com/~bgirard/cleopatra/#report=a3ac6a4c8339065100bd2278ca36f613a07ec976&search=0x41a3f061 Adjusting for the offset of 0x40923000, the leaf is 0x10dcc7c and the bad LR is 111c061. That is: 010dcc7c <_ZN8JSObject19addPropertyInternalEPN2js16ExclusiveContextEN2JS6HandleIPS_EENS4_IiEEPFbP9JSContextS6_S7_NS3_13MutableHandleINS3_5ValueEEEEPFbS9_S6_S7_bSC_EjjjiPPNS0_5ShapeEb>: 10dcc7c: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr} 10dcc80: b0b3 sub sp, #204 ; 0xcc Indeed, we've been interrupted before the prologue is done. And the address we pick off the stack, at offset 204 + 4 * 8, is: 0111c060 <_Z15JS_PropertyStubP9JSContextN2JS6HandleIP8JSObjectEENS2_IiEENS1_13MutableHandleINS1_5ValueEEE>: 111c060: 2001 movs r0, #1 111c062: 4770 bx lr So that function's address (with the low bit set to indicate Thumb) was present in a stack slot; given where and how often JS_PropertyStub shows up in the source, I'm not too surprised. This appears to support my guess about what's going on interally.
Assignee: nobody → jld
I'm not sure if this should also be reviewed by someone who's specifically worked on ARM low-level stuff.
Attachment #817477 - Flags: review?(bgirard)
Comment on attachment 817477 [details] [diff] [review] bug916106-unwind-leafloop-hg0.diff Review of attachment 817477 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/EHABIStackWalk.cpp @@ +351,5 @@ > + // This makes a difference only if an error in unwinding > + // (e.g., caused by starting from within a prologue/epilogue) > + // causes us to load a pointer to a leaf routine as LR; if we > + // don't do something, we'll go into an infinite loop of > + // "returning" to that same function. Couldn't LR be used to derived the other state? I don't think I can provide a useful review here. I don't see why we wouldn't land this either so feel free to land it. If you're looking for more useful feedback maybe you can request Julian.
Attachment #817477 - Flags: review?(bgirard) → review+
Status: NEW → ASSIGNED
Checkin note: b2g-inbound.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: