Closed Bug 766579 Opened 12 years ago Closed 12 years ago

Support stack intertwining (C++/JS/Labels)

Categories

(Core :: Gecko Profiler, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: BenWa, Assigned: BenWa)

References

Details

Attachments

(3 files, 8 obsolete files)

Currently when profiling we either operate in 'stackwalk' mode or 'pseudostack' mode. Full frame information is present in the C++ stack, while with bug 707308 we will get rich contextual information including JS and URI information.

The goal is to intertwine the two stacks using the stack pointer and bring this data together into a single stack.
Assignee: nobody → bgirard
Status: NEW → ASSIGNED
Attachment #634933 - Flags: feedback?
Attachment #634933 - Flags: feedback? → feedback?(jmuizelaar)
Fixed a todo comment
Attachment #634933 - Attachment is obsolete: true
Attachment #634933 - Flags: feedback?(jmuizelaar)
Attachment #634934 - Flags: feedback?(jmuizelaar)
Comment on attachment 634934 [details] [diff] [review]
Part 1: Add SP param to NS_StackWalk

I think you should get dbaron's approval on the NS_StackWalk changes (although they look fine to me).
Depends on: 707308
Comment on attachment 634934 [details] [diff] [review]
Part 1: Add SP param to NS_StackWalk

A bit more context why this is needed:

We keep two separate stack. By knowing what the SP are for each of the entry, the upcoming part 2 patch will be able to order them in a single stack when recording a sample.
Attachment #634934 - Flags: review?(dbaron)
Attachment #635102 - Flags: review?(jmuizelaar)
Could you describe (preferably in a code comment) how the stack pointer and code pointer pairs are supposed to be associated?  I can imagine two, or maybe four possible answers.  In particular:

1. A stack pointer should be an address on the stack that partitions the stack into pieces associated with different functions.  Is the stack pointer passed along with the PC of the caller (more likely, I'd think) or the PC of the callee?

2. I'd presume that you're defining the stack pointer such that any address greater than or equal to the stack pointer is associated with one stack frame, and any address less than the stack pointer is associated with the other.  (With no prejudice as to which is callee vs. caller, since it could be either way.)  I don't think it much matters which frame the saved PC (and maybe saved stack pointer) are associated with, though, but if it happens to be consistent you should perhaps define it in the comment as well.
(In reply to David Baron [:dbaron] from comment #7)
> Could you describe (preferably in a code comment) how the stack pointer and
> code pointer pairs are supposed to be associated?  I can imagine two, or
> maybe four possible answers.  In particular:
> 
> 1. A stack pointer should be an address on the stack that partitions the
> stack into pieces associated with different functions.  Is the stack pointer
> passed along with the PC of the caller (more likely, I'd think) or the PC of
> the callee?
> 
> 2. I'd presume that you're defining the stack pointer such that any address
> greater than or equal to the stack pointer is associated with one stack
> frame, and any address less than the stack pointer is associated with the
> other.  (With no prejudice as to which is callee vs. caller, since it could
> be either way.)  I don't think it much matters which frame the saved PC (and
> maybe saved stack pointer) are associated with, though, but if it happens to
> be consistent you should perhaps define it in the comment as well.

I'm defining it as follows. Note that the definition is extremely vague because libunwind guarantees only to provide the SP/IP of the frame while other implementation provide the SP. Note that the contract to the caller would be very vague and the implementation could later chance to specify something very specific (such as exactly the BP or exactly the SP) at a later time with breaking compatibility. The reason I did not do this is because 1) it is not required for my use case, 2) I'm not sure all the implementation are compatible enough to provide something specific.

The stack pointer is an address between the base address of the frame inclusive and it's calle's base address exclusive. If the function does not have a calle's it is between the base address and the 'sp' register. If the address is not known (or implemented) 0 is returned.

If you're satisfied with this property I'll post a revised patch with this comment.
could be changed without* breaking compatibility.
I guess that seems ok with s/calle/callee/, though I'd hoped you'd at least be able to say it was at one end of the range or the other.
Discussed this on IRC, we need a strong definition otherwise we can't intertwine consistently if this properly isn't consistent.
It's now as:
// aSP will be the best approximation possible of the SP at the time
// we walk the stack. If no approximation can be made this value
// will be NULL.
Attachment #634934 - Attachment is obsolete: true
Attachment #634934 - Flags: review?(dbaron)
Attachment #634934 - Flags: feedback?(jmuizelaar)
Attachment #635879 - Flags: review?(dbaron)
The previous patch didn't build because an if block didn't use {} and I added a second statement to it without noticing.
Attachment #635940 - Flags: review?(dbaron)
Try run for dcdbe62dcf56 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=dcdbe62dcf56
Results (out of 30 total builds):
    exception: 14
    success: 1
    failure: 15
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/b56girard@gmail.com-dcdbe62dcf56
(In reply to Mozilla RelEng Bot from comment #14)
> Try run for dcdbe62dcf56 is complete.
> Detailed breakdown of the results available here:
>     https://tbpl.mozilla.org/?tree=Try&rev=dcdbe62dcf56
> Results (out of 30 total builds):
>     exception: 14
>     success: 1
>     failure: 15
> Builds (or logs if builds failed) available at:
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/b56girard@gmail.
> com-dcdbe62dcf56

Someone aborted this job by accident, a new job is pending.
(In reply to Benoit Girard (:BenWa) from comment #12)
> Created attachment 635879 [details] [diff] [review]
> Part 1: Add SP param to NS_StackWalk
> 
> It's now as:
> // aSP will be the best approximation possible of the SP at the time
> // we walk the stack. If no approximation can be made this value
> // will be NULL.

I don't see how this answers the question of which way the sp and pc are paired up.  Or did you decide not to go with actually providing a strong-enough definition that it's useful for intertwining reliably?
Attachment #635879 - Attachment is obsolete: true
Attachment #635879 - Flags: review?(dbaron)
Attached patch Add SP param to NS_StackWalk (obsolete) — Splinter Review
Updated the comment as we discussed on IRC.
Attachment #637227 - Flags: review?(dbaron)
Attachment #635940 - Attachment is obsolete: true
Attachment #635940 - Flags: review?(dbaron)
I changed this patch to support some changes we made to be more flexible for JS frames.

Here's a screenshot of the Gecko Profiler addons responsiveness canvas graph drawing with these patches and the ones from bug 761261 when the panel is active:
https://dl.dropbox.com/u/10523664/Screenshots/58.png
Attachment #635102 - Attachment is obsolete: true
Attachment #635102 - Flags: review?(jmuizelaar)
Attachment #637301 - Flags: review?(jmuizelaar)
Review ping. Bug 761261 will be landing shortly and this change is required to fully leverage JS profiling.
Bug 761261 landed last night. This is now the long pole for JS profiling with unwind information.
Comment on attachment 637227 [details] [diff] [review]
Add SP param to NS_StackWalk

nsStackWalk.h:

>+// aSP will be the best approximation possible of what the SP will be
>+// pointing to when the execution returns to executing that frame.

Maybe clearer to say "returns to executing at that PC"?

And definitely "what the SP will be" -> "what the stack pointer will be".
(and rewrap)

nsStackWalk.cpp:

In the windows code, it probably would have been better to allocate
the SPs and the PCs in a single array than in 2, but I guess this is ok.
(i.e., have a struct with 2 void*, and allocate an array of structs).
Might be nice to do as a followup, though.

In FramePointerStackWalk, no need to over-parenthesize.  That said, if
you wanted to be exact I think you'd want to use bp+3 for the cases
in the #if of the ifdef a few lines above, and bp+2 for the others.
(This could be refactored by making the #ifdef increment bp.)  Given
that it's that easy (if you agree that's correct), I think it's worth
doing.

>+    // TODO Use something like 'unw_get_reg(&cursor, UNW_REG_SP, &sp)' to get sp

Perhaps this should refer to _Unwind_GetGR() instead?  We're using
unwind.h (part of gcc), not libunwind.h (a separate library).


r=dbaron with those things fixed
Attachment #637227 - Flags: review?(dbaron) → review+
Also, merging with bug 767479 requires some care.
Comment on attachment 637227 [details] [diff] [review]
Add SP param to NS_StackWalk

Review of attachment 637227 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsStackWalk.cpp
@@ +515,5 @@
>          if (data.pc_count > data.pc_size) {
>              data.pcs = (void**) malloc(data.pc_count * sizeof(void*));
>              data.pc_size = data.pc_count;
>              data.pc_count = 0;
> +            data.sps = (void**) malloc(data.sp_count * sizeof(void*));

Please use _alloca here as well.

@@ +539,5 @@
>      if (data.pc_size > ArrayLength(local_pcs))
>          free(data.pcs);
>  
> +    if (data.sp_size > ArrayLength(local_sps))
> +        free(data.sps);

And then you can get rid of this whole block!
Comment on attachment 637301 [details] [diff] [review]
Part 2: Intertwine during sampling

Review of attachment 637301 [details] [diff] [review]:
-----------------------------------------------------------------

Do well.

::: tools/profiler/TableTicker.cpp
@@ +501,5 @@
> +    size_t strLen = strlen(sampleLabel) + 1;
> +    for (size_t j = 0; j < strLen;) {
> +      // Store as many characters in the void* as the platform allows
> +      char text[sizeof(void*)/sizeof(char)];
> +      for (size_t pos = 0; pos < sizeof(void*)/sizeof(char) && j+pos < strLen; pos++) {

- sizeof(char)

@@ +532,5 @@
>  
>  #ifdef USE_NS_STACKWALK
>  typedef struct {
>    void** array;
> +  void** sparray;

sp_array

@@ +587,1 @@
>      for (size_t i = array.count; i > 0; --i) {

// interleave the pseudoStack
add explanation of what's happening

@@ +592,5 @@
> +          break;
> +
> +        //if (entry.isCopyLabel()) {
> +          addProfileEntry(stack, aProfile, pseudoStackPos);
> +        //}

??

@@ +605,5 @@
> +            break;
> +
> +          addProfileEntry(stack, aProfile, pseudoStackPos);
> +          pseudoStackPos++;
> +        }

Simplify
Attachment #637301 - Flags: review?(jmuizelaar) → review+
Carry forward r=dbaron
Attachment #637227 - Attachment is obsolete: true
Attachment #640498 - Flags: review+
Carry forward r=jmuizel

Code is much simpler, fixed sizeof(char), and added a big fat over the top comment :)
Attachment #637301 - Attachment is obsolete: true
Attachment #640499 - Flags: review+
Attachment #640499 - Attachment is obsolete: true
Attachment #640500 - Flags: review+
What about these comments:

(In reply to David Baron [:dbaron] from comment #21)
> In FramePointerStackWalk, no need to over-parenthesize.  That said, if
> you wanted to be exact I think you'd want to use bp+3 for the cases
> in the #if of the ifdef a few lines above, and bp+2 for the others.
> (This could be refactored by making the #ifdef increment bp.)  Given
> that it's that easy (if you agree that's correct), I think it's worth
> doing.
> 
> >+    // TODO Use something like 'unw_get_reg(&cursor, UNW_REG_SP, &sp)' to get sp
> 
> Perhaps this should refer to _Unwind_GetGR() instead?  We're using
> unwind.h (part of gcc), not libunwind.h (a separate library).

And also now that you've done this:

(In reply to Ehsan Akhgari [:ehsan] from comment #23)
> ::: xpcom/base/nsStackWalk.cpp
> @@ +515,5 @@
> >          if (data.pc_count > data.pc_size) {
> >              data.pcs = (void**) malloc(data.pc_count * sizeof(void*));
> >              data.pc_size = data.pc_count;
> >              data.pc_count = 0;
> > +            data.sps = (void**) malloc(data.sp_count * sizeof(void*));
> 
> Please use _alloca here as well.

You *have* to do this as well:

> @@ +539,5 @@
> >      if (data.pc_size > ArrayLength(local_pcs))
> >          free(data.pcs);
> >  
> > +    if (data.sp_size > ArrayLength(local_sps))
> > +        free(data.sps);
> 
> And then you can get rid of this whole block!
I fixed the malloc and _Unwind_GetGR() but missed removing the block, bp and didn't merge correctly with ehsan new branch from bug 767479. 

Follow up coming right now. Really sorry about that :(
Again, sorry about that.
Attachment #640722 - Flags: review?(dbaron)
Whiteboard: [leave open]
Comment on attachment 640722 [details] [diff] [review]
Part 3: Follow up

r=dbaron, but you should also fix the unw_get_reg comment (I don't see it fixed in https://hg.mozilla.org/integration/mozilla-inbound/rev/461dd0c4f2b7 , at least.)
Attachment #640722 - Flags: review?(dbaron) → review+
\o/ Merged in time for nightly. Thanks Ryan!
Whiteboard: [leave open]
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: