Closed Bug 1232712 Opened 4 years ago Closed 4 years ago

add gdb unwinder to js/src/gdb/

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox48 --- fixed

People

(Reporter: tromey, Assigned: tromey)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file, 6 obsolete files)

gdb 7.10 adds the ability to write a custom unwinder in Python.
I've been working on a plugin to unwind SpiderMonkey frames here:
https://github.com/tromey/spidermonkey-unwinder

This bug is to pull that unwinder into the tree.
I would be ok to review it.
Here's an initial version.  It works ok for me on x86-64 Linux.  A few
notes:

* It has some code to try to at least disable itself on unsupported
  architectures, or on older versions of gdb.  I haven't tested these
  paths.

* It assumes Python 3, which I'll fix at some point.

* It needs tests.

There are also some wish-list to-dos:

* More architecture ports

* Augment the frame filter to understand interpreter frames and
  display nicer information there

* If remote debugging is common, fix the parse_proc_maps code
Comment on attachment 8701581 [details] [diff] [review]
add a gdb unwinder for SpiderMonkey

Review of attachment 8701581 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great!
I was kind of lost at the beginning, as manipulating symbols within python add a bit of boiler plate. 

I will try it next week.

::: js/src/gdb/mozilla/unwind.py
@@ +240,5 @@
> +        # would be a good spot to try to fetch the function object,
> +        # arguments, etc.
> +        self.add_frame(sp, {
> +            "name": self.typecache.frame_enum_names[self.next_type]
> +        })

IIUC: Move this dictionary creation into a new method (named "extractInfo") of UnwinderState, and comment that this dictionary corresponds to the "info" field given as argument of the JitFrameDecorator.
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #3)
>
> IIUC: Move this dictionary creation into a new method (named "extractInfo")
> of UnwinderState, and comment that this dictionary corresponds to the "info"
> field given as argument of the JitFrameDecorator.

I looked at this a bit and in the end decided to give add_frame an extra argument,
and then have it construct the dictionary.  This seems just as clear to me and
it is easy to add more arguments here later on.
Port to Python 2.
Take a stab at addressing review comments.

Still needs a bit of testing (unrecognized arch) plus a test in tree.
Attachment #8701581 - Attachment is obsolete: true
Comment on attachment 8704325 [details] [diff] [review]
add a gdb unwinder for SpiderMonkey

Review of attachment 8704325 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gdb/mozilla/unwind.py
@@ +194,5 @@
> +    # Add information about a frame to the frame map.  This map is
> +    # queried by |self.get_frame|.  |sp| is the frame's stack pointer,
> +    # and |name| the frame's type as a string, e.g. "JitFrame_Exit".
> +    def add_frame(self, sp, name):
> +        self.frame_map[long(sp)] = { name: name }

After testing this patch, I noticed that the decorated name did not appear, the reason being that the "name" field did not exists as part of self.info.
This issue comes from the fact that the key name is not quoted here ;)
(In reply to Nicolas B. Pierron [:nbp] from comment #6)

> After testing this patch, I noticed that the decorated name did not appear,
> the reason being that the "name" field did not exists as part of self.info.
> This issue comes from the fact that the key name is not quoted here ;)

Yeah, sorry.  I have some updated locally, I'll attach them momentarily.
Attachment #8704325 - Attachment is obsolete: true
attachment 8704325 [details] [diff] [review] tested on:
 - x64 with gdb 7.8.2 and python 2.7.9 => No stack trace, as expected.
 - x64 with gdb 7.10.1 and python 2.7.11 => For a reason that do not know, it seems that the jitTop variable is not properly synchronized when we hit a breakpoint on AssertEq, but doing "info threads", or "si" is enough to unwind the stack again, and get the proper stacks annotations.
 - x86 with gdb 7.10.1 (x64) and python 2.7.11 => unwinder registered, no unwinder support yet. (x64 jsval view {bug}).
 - x86 with gdb 7.10.1 (x86) and python 2.7.11 => unwinder registered, no unwinder support yet. (x86 jsval view).

I think adding x86 / ARM can be done as follow-up patches.
Flags: needinfo?(nicolas.b.pierron)
Comment on attachment 8712176 [details] [diff] [review]
add a gdb unwinder for SpiderMonkey

Review of attachment 8712176 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gdb/mozilla/unwind.py
@@ +285,5 @@
> +        else:
> +            debug("@@ unwind_exit_frame: next")
> +            jittop = self.activation['prevJitTop_']
> +            self.activation = self.activation['prevJitActivation_']
> +        debug("@@ jittop = 0x%x" % jittop)

return None if the jittop is 0, otherwise a python exception would be raised because we cannot read the jittop.
Comment on attachment 8712176 [details] [diff] [review]
add a gdb unwinder for SpiderMonkey

Review of attachment 8712176 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gdb/mozilla/unwind.py
@@ +245,5 @@
> +            # EntryFrameLayout, but rather CommonFrameLayout.  This
> +            # matches what the code in generateEnterJIT does.
> +            frame_size = self.typecache.typeCommonFrameLayoutPointer.target().sizeof
> +        else:
> +            frame_size = self.sizeof_frame_type(frame_type)

For your information, Jan recently changed the way we acquire the frame_size (frame_header_size), as it is now part of the descriptor. (see Bug 1237284)
(In reply to Nicolas B. Pierron [:nbp] from comment #11)

> For your information, Jan recently changed the way we acquire the frame_size
> (frame_header_size), as it is now part of the descriptor. (see Bug 1237284)

Thanks.  I'll update the patch soon.
I'll try to write the test soon too so we can get this in.
Updated for the frame changes.  This works for the simple test I've
been using locally.
Attachment #8712176 - Attachment is obsolete: true
I've had this fail once in a real-world case.  The failure mode was just a failure to properly
unwind, so it's no worse than the status quo ante.

I've appended the output.  I wasn't sure if what it did manage to do is correct; for instance
the exit/enter pairs (like frames #6/#7) seemed weird -- but I don't know, and maybe this is
expected. 

(gdb) bt
#0  0x00007effc1a847d3 in js::jit::DebugModeOSRVolatileStub<js::jit::ICCompare_Fallback*>::invalid() const (this=this@entry=0x7fff233e9370) at /home/tromey/firefox-git/gecko/js/src/jit/BaselineDebugModeOSR.h:71
#1  0x00007effc1a7ccca in js::jit::DoCompareFallback(JSContext*, void*, js::jit::ICCompare_Fallback*, JS::HandleValue, JS::HandleValue, JS::MutableHandleValue) (cx=0xd0a1b0, payload=<optimized out>, stub_=<optimized out>, lhs=..., rhs=..., ret=...) at /home/tromey/firefox-git/gecko/js/src/jit/SharedIC.cpp:1702
#2  0x00007effd122b201 in <<JitFrame_Exit>> ()
#3  0x00007eff3f7a152f in <<JitFrame_BaselineJS>> ()
#4  0x00007eff681c2294 in <<JitFrame_BaselineStub>> ()
#5  0x00007eff3f74fb87 in <<JitFrame_BaselineJS>> ()
#6  0x00007effd1227d94 in <<JitFrame_Entry>> ()
#7  0x0000000000d0a1b0 in <<JitFrame_Exit>> ()
#8  0x00007effac57f4f3 in <<JitFrame_BaselineStub>> ()
#9  0x00007eff68191943 in <<JitFrame_BaselineJS>> ()
#10 0x00007effd1227d94 in <<JitFrame_Entry>> ()
#11 0x0000000000d0a1b0 in <<JitFrame_Exit>> ()
#12 0x00007effac5e27a3 in <<JitFrame_BaselineStub>> ()
#13 0x00007effac585976 in <<JitFrame_BaselineJS>> ()
#14 0x00007effd1227b45 in <<JitFrame_Rectifier>> ()
#15 0x00007effac58c474 in <<JitFrame_BaselineStub>> ()
#16 0x00007effac58d041 in <<JitFrame_BaselineJS>> ()
#17 0x00007eff681aa16e in <<JitFrame_BaselineStub>> ()
#18 0x00007effac58be8e in <<JitFrame_BaselineJS>> ()
#19 0x00007effd1227d94 in <<JitFrame_Entry>> ()
#20 0x00007fff233ecd10 in  ()
#21 0x00007fff233ecd10 in  ()
#22 0x00007fff233ecd30 in  ()
#23 0x0000000000d0a1b0 in  ()
#24 0x00007fff233ed000 in  ()
#25 0x00007effc1871e35 in EnterBaseline(JSContext*, js::jit::EnterJitData&) (cx=0x7fff233ed040, data=...) at /home/tromey/firefox-git/gecko/js/src/jit/BaselineJIT.cpp:147
#26 0x000000000100f598 in  ()
#27 0x00000000000000ab in  ()
#28 0x00007effc182e621 in js::gc::TenuredCell::readBarrier(js::gc::TenuredCell*) (thing=0x7fff233ed098) at /home/tromey/firefox-git/gecko/js/src/gc/Heap.h:1464
Indeed, these Exit/Entry frames seems weird.  The only case where something similar might happen is when we call from AsmJS code into JS code.  So, I would definitely expects to have C++ frames in between, as we only create the JitActivation in C++.

It almost sounds as if you were following the JitActivations next pointer before resuming to the C++ stack.
I ran into another bad unwind:

#20 0x00007f9e02649717 in <<JitFrame_Exit>> ()
#21 0x00007f9e026496e0 in <<JitFrame_IonJS>> ()
#22 0x00007f9eaa9e25e4 in <<JitFrame_BaselineStub>> ()
#23 0x00007f9e20f1248e in <<JitFrame_BaselineJS>> ()
#24 0x00007f9eaa9e25e4 in <<JitFrame_BaselineStub>> ()
#25 0x00007f9e203ce656 in <<JitFrame_BaselineJS>> ()
#26 0x00007f9edfeb5d94 in <<JitFrame_Entry>> ()
#27 0x00007ffddd6c0100 in  ()
#28 0x00007ffddd6c0100 in  ()
#29 0x00007ffddd6c0120 in  ()
#30 0x0000000002393e10 in  ()
#31 0x00007ffddd6c03f0 in  ()
#32 0x00007f9ed04d2795 in EnterBaseline(JSContext*, js::jit::EnterJitData&) (cx=0x7ffddd6c0460, data=...) at /home/tromey/firefox-git/gecko/js/src/jit/BaselineJIT.cpp:150
#33 0x0000000002555dc0 in  ()
#34 0x0000000000000332 in  ()
#35 0x0000000000000000 in  ()


Also, it occurred to me that for jit frames the frame filter can decipher the
callee token to get the function and/or file name.
(In reply to Tom Tromey :tromey from comment #16)
> Also, it occurred to me that for jit frames the frame filter can decipher the
> callee token to get the function and/or file name.

Maybe we can land the current patch and improve the report as a follow-up.
What do you think?

I would also say that even if this is not perfect, this is already better than what we currently have, as we are able to see that a function got called from JS code, either from Ion / Baseline.
I added function names.  Now I can see:

#34 0x00007f9edfeb5d94 in <<JitFrame_Entry "CSSCompleter.prototype.getInfoAt/traverseBackwards">> ()



I also looked into the bug a little.

#28 0x00007f9e02649717 in <<JitFrame_Exit>> ()
#29 0x00007f9e026496e0 in <<JitFrame_IonJS>> ()
#30 0x00007f9eaa9e25e4 in <<JitFrame_BaselineStub>> ()
#31 0x00007f9e20f1248e in <<JitFrame_BaselineJS MEMORY ERROR>> ()
#32 0x00007f9eaa9e25e4 in <<JitFrame_BaselineStub>> ()
#33 0x00007f9e203ce656 in <<JitFrame_BaselineJS>> ()
#34 0x00007f9edfeb5d94 in <<JitFrame_Entry "CSSCompleter.prototype.getInfoAt/traverseBackwards">> ()


(gdb) frame 31
#31 0x00007f9e20f1248e in ?? ()
(gdb) p $sp
$1 = (void *) 0x7ffddd6bfe40
(gdb) p (js::jit::JitFrameLayout*) $
$2 = (js::jit::JitFrameLayout *) 0x7ffddd6bfe40
(gdb) p *$
$3 = {
  <js::jit::CommonFrameLayout> = {
    returnAddress_ = 0x7f9e20f1248e "H\203\304\030H\211M\200H\213M\200H\277\360\006(\t", 
    descriptor_ = 53281, 
    static FrameTypeMask = 15
  }, 
  members of js::jit::JitFrameLayout: 
  calleeToken_ = 0xfffaff9e20211598, 
  numActualArgs_ = 18444773748872577024
}

So it seems we must be unwinding incorrect at some point here, because this frame layout
looks wrong.  What's curious about it is that we manage to re-sync somehow, since I think
frame #34 is correct.
(In reply to Nicolas B. Pierron [:nbp] from comment #17)

> Maybe we can land the current patch and improve the report as a follow-up.
> What do you think?

Yeah, I'll move back to doing that and just file follow-ups.
Blocks: 1254295
Blocks: 1254297
(In reply to Tom Tromey :tromey from comment #18)
> I added function names.  Now I can see:
> 
> #34 0x00007f9edfeb5d94 in <<JitFrame_Entry
> "CSSCompleter.prototype.getInfoAt/traverseBackwards">> ()
> 
> 
> 
> I also looked into the bug a little.
> 
> #28 0x00007f9e02649717 in <<JitFrame_Exit>> ()
> #29 0x00007f9e026496e0 in <<JitFrame_IonJS>> ()
> #30 0x00007f9eaa9e25e4 in <<JitFrame_BaselineStub>> ()
> #31 0x00007f9e20f1248e in <<JitFrame_BaselineJS MEMORY ERROR>> ()
> #32 0x00007f9eaa9e25e4 in <<JitFrame_BaselineStub>> ()
> #33 0x00007f9e203ce656 in <<JitFrame_BaselineJS>> ()
> #34 0x00007f9edfeb5d94 in <<JitFrame_Entry
> "CSSCompleter.prototype.getInfoAt/traverseBackwards">> ()

The entry frame is not a scripted frame.  Thus this should probably move to the previous JitFrame_BaselineJS.

> 
> (gdb) frame 31
> #31 0x00007f9e20f1248e in ?? ()
> (gdb) p $sp
> $1 = (void *) 0x7ffddd6bfe40
> (gdb) p (js::jit::JitFrameLayout*) $
> $2 = (js::jit::JitFrameLayout *) 0x7ffddd6bfe40
> (gdb) p *$
> $3 = {
>   <js::jit::CommonFrameLayout> = {
>     returnAddress_ = 0x7f9e20f1248e
> "H\203\304\030H\211M\200H\213M\200H\277\360\006(\t", 
>     descriptor_ = 53281, 
>     static FrameTypeMask = 15
>   }, 
>   members of js::jit::JitFrameLayout: 
>   calleeToken_ = 0xfffaff9e20211598, 
>   numActualArgs_ = 18444773748872577024
> }

Indeed, this is weird as the calleeToken and the numActualArgs_ are JS::Value, where they are supposed to be a potentially miss-aligned pointer and a small number.

The content of this frame looks more like the content of the BaselineStub frame.  Thus, I think reported stack pointer might be offset by 1 frame.

The problem might be related to the fact the type of the frame is stored in the next frame, or by the fact that the stack pointer is not pushed to the next frame before interpreting the content:

 --+---------------+----------------------------------+---------------+--
   |    Baseline   |                                  | Baseline Stub |
   |    Frame      |           Baseline Frame         | Frame         |
   |    Header     |                                  | Header        |
 --+-----+--+-----------------------------------------+----+----------+--
         |  |      ^                                       |
         |  |      |                                       |
         |  |      +---------------------------------------+
         |  |            prevType == JitFrame_Baseline
         |  |
         |  |
         |  +---> calleeToken_ == "foo" … " at file.js:51"
         |
         |
         +----------> numActualArgs == "(…, …, …)"
MozReview-Commit-ID: 36HpkzbFabq
Attachment #8717993 - Attachment is obsolete: true
I added a test case and made some other improvements.
The frame filter enhancements are broken off into a separate bug.

This version still has the unwinding bugs noted above.
The unwinding bug turned out to be that the entry frame claims to be
a JitFrameLayout, but the stack pointer is actually adjusted as if it
were a CommonFrameLayout:

https://dxr.mozilla.org/mozilla-central/source/js/src/jit/x64/Trampoline-x64.cpp#158

This seems to be the case on arm64 as well, so I am going to say
it isn't a bug, but instead just work around it in the unwinder.
This version is cleaned up a bit.  I removed some current/next frame
confusion from the unwinder, and this now works around the
EntryFrameLayout oddity.  With this I can now unwind through multiple
entry/exits all the way back to main.  So, I think it's good enough.
Attachment #8728075 - Attachment is obsolete: true
Attachment #8728602 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8728602 [details] [diff] [review]
add a gdb unwinder for SpiderMonkey

Review of attachment 8728602 [details] [diff] [review]:
-----------------------------------------------------------------

Let's make the life of Gecko developers less miserable than Today!

::: js/src/gdb/mozilla/unwind.py
@@ +220,5 @@
> +    # |common| is a pointer to a CommonFrameLayout object.  Return a
> +    # tuple (local_size, header_size, frame_type), where |size| is the
> +    # integer size of the previous frame's locals; |header_size| is
> +    # the size of this frame's header; and |frame_type| is an integer
> +    # representing the frame type.

nit: representing the *previous* frame type.

@@ +273,5 @@
> +
> +        unwind_info = pending_frame.create_unwind_info(frame_id)
> +        unwind_info.add_saved_register(self.PC_REGISTER, next_pc)
> +        unwind_info.add_saved_register(self.SP_REGISTER, next_sp)
> +        # FIXME it would be great to unwind any other registers here.

Only AsmJS and Trampolines and non-GC calls from Ion / Baseline have callee saved registers, but in such case, we don't have an Exit frame either.
When we have an Exit frame, Ion and Baseline have caller saved registers which technically means that we have no need to unwind registers, as everything is on the stack.
Attachment #8728602 - Flags: review?(nicolas.b.pierron) → review+
Updated per review.
Attachment #8728602 - Attachment is obsolete: true
Attachment #8729060 - Flags: review+
(In reply to Nicolas B. Pierron [:nbp] from comment #25)

> > +        # FIXME it would be great to unwind any other registers here.
> 
> Only AsmJS and Trampolines and non-GC calls from Ion / Baseline have callee
> saved registers, but in such case, we don't have an Exit frame either.
> When we have an Exit frame, Ion and Baseline have caller saved registers
> which technically means that we have no need to unwind registers, as
> everything is on the stack.

What the comment means is that when unwinding, it's nice to users to
also restore whatever registers can be restored.  That way, when selecting
a frame in gdb, the registers will have their expected values at that point
in the code.  That is, it helps with debugging.  For gdb's own purposes we really
only need to unwind the stack pointer and PC.  I think whether the registers
are callee- or caller-saved doesn't matter; something somewhere has to pluck
out the saved value and tell gdb about it.
Note that the js gdb tests are not run in automation, so no try run.
Keywords: checkin-needed
Note that there is now a try build that will run the js gdb tests, if you do try: -b d -p sm-gdb

(The build landed after this bug did, so it's not like it was a preexisting thing.)
https://hg.mozilla.org/mozilla-central/rev/fd1ff3e2c5a4
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Blocks: 1255884
You need to log in before you can comment on or make changes to this bug.