ARM unwinder can go into a loop and fill the stack with copies of a leaf function

RESOLVED FIXED in mozilla27

Status

()

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: jld, Assigned: jld)

Tracking

({perf})

Trunk
mozilla27
ARM
Gonk (Firefox OS)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(b2g-v1.2 affected)

Details

(Whiteboard: [c=profiling p= s= u=])

Attachments

(1 attachment)

:mikeh tripped over this at the work week (and I helpfully made myself late to my own demo investigating, as one does).  I think what happened is that we got a sample in a prologue/epilogue, which the ARM Exception Handling model doesn't attempt to handle, and the value we popped off of the stack into the virtual LR was in a leak function.  That function's unwind program is just "finish", which (per the spec) copies LR into PC.  And repeat.

Since the available space is bounded (and has to be, because this is in signal context), and this failure mode seems to be relatively uncommon, the result is just that there's one 1000-entry stack which makes the profiler GUI's timeline hard to read.

I think it's safe to set the virtual LR to something invalid after each frame, at least if the unwinder didn't explicitly set PC — the caller in question generally can't assume anything about what the callee will do with LR by the time it returns.

Updated

6 years ago
Keywords: perf
Whiteboard: [c=profiling] → [c=profiling p= s= u=]
I ran into one of these while doing something else, so I figured I'd analyze it.

https://people.mozilla.com/~bgirard/cleopatra/#report=a3ac6a4c8339065100bd2278ca36f613a07ec976&search=0x41a3f061

Adjusting for the offset of 0x40923000, the leaf is 0x10dcc7c and the bad LR is 111c061.

That is:

010dcc7c <_ZN8JSObject19addPropertyInternalEPN2js16ExclusiveContextEN2JS6HandleIPS_EENS4_IiEEPFbP9JSContextS6_S7_NS3_13MutableHandleINS3_5ValueEEEEPFbS9_S6_S7_bSC_EjjjiPPNS0_5ShapeEb>:
 10dcc7c:       e92d 4ff0       stmdb   sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
 10dcc80:       b0b3            sub     sp, #204        ; 0xcc

Indeed, we've been interrupted before the prologue is done.  And the address we pick off the stack, at offset 204 + 4 * 8, is:

0111c060 <_Z15JS_PropertyStubP9JSContextN2JS6HandleIP8JSObjectEENS2_IiEENS1_13MutableHandleINS1_5ValueEEE>:
 111c060:       2001            movs    r0, #1
 111c062:       4770            bx      lr

So that function's address (with the low bit set to indicate Thumb) was present in a stack slot; given where and how often JS_PropertyStub shows up in the source, I'm not too surprised.

This appears to support my guess about what's going on interally.
Assignee: nobody → jld
I'm not sure if this should also be reviewed by someone who's specifically worked on ARM low-level stuff.
Attachment #817477 - Flags: review?(bgirard)
Comment on attachment 817477 [details] [diff] [review]
bug916106-unwind-leafloop-hg0.diff

Review of attachment 817477 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/EHABIStackWalk.cpp
@@ +351,5 @@
> +        // This makes a difference only if an error in unwinding
> +        // (e.g., caused by starting from within a prologue/epilogue)
> +        // causes us to load a pointer to a leaf routine as LR; if we
> +        // don't do something, we'll go into an infinite loop of
> +        // "returning" to that same function.

Couldn't LR be used to derived the other state? I don't think I can provide a useful review here. I don't see why we wouldn't land this either so feel free to land it. If you're looking for more useful feedback maybe you can request Julian.
Attachment #817477 - Flags: review?(bgirard) → review+

Updated

6 years ago
Status: NEW → ASSIGNED
Checkin note: b2g-inbound.
https://hg.mozilla.org/mozilla-central/rev/bdcfcd904df0
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
You need to log in before you can comment on or make changes to this bug.