Closed Bug 605758 Opened 14 years ago Closed 14 years ago

JM causes gdb to be unusable on ARM within JITted code

Categories

(Core :: JavaScript Engine, defect)

ARM
Linux
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: cdleary, Assigned: jbramley)

References

Details

We're not sure what's going on yet, but both Jacob and I are seeing issues debugging JM on ARM with gdb versions up to and including 7.2, of the general form: (gdb) stepi Cannot access memory at address 0x5ffffa The value in $pc is basically unusable and you can't step through instructions. "tbreak *addr; continue" and "x/i addr" works, but requires manually updating the address every time you want to step an instruction -- it makes debugging JITted code efficiently quite difficult. Jacob says that this isn't the way things used to be, and is currently bisecting on the culprit JM changeset with his small pegatron army.
Forgot to mention, STR are to perform "break addr" where addr is within a JM JIT'd code segment.
/work/moz/tm/js/src$ hg bisect --good Due to skipped revisions, the first bad revision could be any of: changeset: 54744:0230a9e80c1f user: Jason Orendorff <jorendorff@mozilla.com> date: Wed Sep 29 10:00:52 2010 -0700 summary: Bug 600193 - trace-test/tests/jaeger/bug588363-1.js asserts with CompartmentChecker enabled. r=gal. changeset: 54745:2824ef10a50f user: Brian Hackett <bhackett1024@gmail.com> date: Sun Oct 03 08:21:38 2010 -0700 summary: Lazify fp->scopeChain, JM call path cleanup. bug 593882, r=lw,dvander.
54745 messes with some return address paths, so it's likely to be the real culprit here.
More information: I can put a breakpoint on JaegerTrampoline and run to there. I can then nexti/stepi right through to the "bx r4". As soon as I progress past there, however, I get the error. (gdb) si 0x002ece24 in JaegerTrampoline () 1: x/i $pc 0x2ece24 <JaegerTrampoline+48>: bx r4 (gdb) si Cannot access memory at address 0x5ffff8 No register has that value. The value of r4 is 0x40919070, and the code at that address is quite sensible. I cannot stepi from this point, but I can continue, and that causes the program to run to completion. (This is valid only if there are no nested JaegerTrampoline calls here.)
It's worth pointing out that I do see the suspicious 0x5ffff8 value in the stack frame of GDB (when debugging GDB itself), but the call frame seems to change over time when it really shouldn't. Also, when I stepped (mostly) all the way to the error message, it complained about address 0, not 0x5ffff8. The problem seems to be that GDB cannot resolve the frame address once it hits JIT-compiled code. If frame debug information is not available, GDB tries to guess what's going on, and I suspect that the guessing is inaccurate in this case. Why this is a problem for 54745 but not 54744, I do not know. Partial back trace from the error message (debugging GDB in GDB): -------- #0 memory_error (status=5, memaddr=6291448) at /work/gdb/gdb/gdb/corefile.c:217 #1 0x00014648 in read_memory_unsigned_integer (memaddr=5, len=4, byte_order=BFD_ENDIAN_LITTLE) at /work/gdb/gdb/gdb/corefile.c:325 #2 0x00029aac in arm_analyze_prologue (gdbarch=0x2, prologue_start=6291448, prologue_end=0, cache=0x34b458) at /work/gdb/gdb/gdb/arm-tdep.c:1467 #3 0x0002a4d0 in arm_scan_prologue (cache=<value optimised out>, this_frame=<value optimised out>) at /work/gdb/gdb/gdb/arm-tdep.c:1779 [...] #14 0x0006c208 in cmd_func (cmd=0x5, args=0x5ffff8 "", from_tty=1073802912) at /work/gdb/gdb/gdb/cli/cli-decode.c:1771 [...] -------- I've yet to find a decent work-around for this, for either GDB or for Jaeger Monkey. GDB is behaving strangely, but I can't entirely blame it as it doesn't have the information that it generally expects. However, Jaeger Monkey is also behaving correctly (according to the processor architecture). This will require further investigation.
Ok, I've found the problem! When GDB doesn't have stack frame information once we leave JaegerTrampoline, it tries to guess what the frame looks like. This is reasonable behaviour, and allows (limited) debug of libraries compiled without debug information (and so on). It is this guessing mechanism that is causing us pain. Basically, in the absence of any other information, GDB assumes (amongst other things) that you have a standard frame pointer (r11). It dereferences r11 to find the address of the prologue, so it can find out where registers are saved. This used so that GDB can query values that are no longer in registers (or maybe not in the current frame). Now, in Jaeger Monkey, we use r11 as a pointer to a JSStackFrame. (JSFrameReg is set to r11.) When GDB enters JIT-compiled code, it tries to dereference r11, and gets JSStackFrame::flags_. This seems to get set to 0x600001, which is translated to 0x5ffff8 by GDB (to account for 4-byte alignment and the 8-byte PC offset). When GDB tries to examine prologue (which it thinks is at 0x5ffff8), it fails. This was, of course, broken before, but prior to changeset 54745:2824ef10a50f, the usual value of flags_ was 0x100001. This became 0x0ffff8. GDB was clearly able to access this memory and determine that it didn't recognize the contents as a prologue. Failing to interpret a prologue is not considered a fatal error, but failing to access it is. I have yet to investigate a good fix, but here are some suggestions off the top of my head: * Use a different register for JSFrameReg, and set r11 to something meaningful. - This effectively steals a register from Jaeger Monkey. * Provide frame information to GDB using its JIT API. * Modify GDB so that it treats a frame-access-error in the same way as an unrecognized frame. Essentially, it should fail silently and just forget the backtrace. (This is the behaviour we had before.) I thought of a few more options too, but they all sound like hacks.
Assignee: general → Jacob.Bramley
(In reply to comment #6) Awesome find! Removing the register should be as easy as getting rid of it in the MacroAssembler enum, but patching GDB seems like a better solution -- if failing-to-access is an error for no good reason, we can probably get it upstream'd.
We confirmed this as a bug in GDB and raised the issue in their database, so I'm closing this one. It doesn't look like GDB will be fixed any time soon, but Chris implemented a work-around in Jaeger Monkey so this shouldn't cause problems any more.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → INVALID
The workaround hasn't been committed to JM, you just have to add a patch to not use r11 in MethodJIT.cpp.
You need to log in before you can comment on or make changes to this bug.