Closed Bug 785440 Opened 12 years ago Closed 6 years ago

Support profiling DOM calls in the profiler

Categories

(Core :: Gecko Profiler, defect, P1)

x86
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: mstange)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(6 files, 3 obsolete files)

We were looking at the IE's profiler and they have a nice UI for displaying how much time each DOM function called from a script took.  We need to add this support to the profiler.
Here is the idea that I have for this.  As far as I can tell, there are three ways that pages can call DOM functions: xpconnect, quick stubs and Paris bindings.  For every one of these, there is a name lookup phase where we know what the name of the function being called is.  We should be able to decide whether profiling is enabled based on a flag in the js context which would be a well predicted branch, and if it is, we can inject a dynamic sample, similar to the way we inject dynamic labels for js function calls.

Does this make sense to people?
When we call a native, we know the JSFunction (callee) and thus should be able to derive the name.  CallJSNative/CallJSPropertyOp is the pinchpoint for calls from C++, for jit code I think we'd want to push/pop an entry for the stub in jit code (which would make it free when profiling is off and fast even when profiling is on).  This shouldn't be too hard.

On this subject, I've been discussing with other members of the JS team another feature I'd like to add that is actually an extension of this: whenever we push one of these C++-call profile entries, I'd like to attach a "reason".  A reason could have a code (from a static enumeration) and a char* generated for the call site.  The goal of the "reason" is to explain, for all the calls into C++ that have a faster path that stays in jit code (or a faster C++ function), "why didn't I get the faster path"?  The static code could allow the UI to print a nice prose summary (from a static table) of the reason and perhaps link to an MDN page explaining the optimization and how it fails and the generated string would provide specific details: if the problem is bad type info, what types did we see? if too many shapes flowed through, what were they? etc, if we are converting into dictionary mode b/c a property changed descriptor, what was the property and descriptor?  I think there are many many potential uses in JS alone.

Using this, the profiler could report: "for all the calls into C++ function X, here is the breakdown of the reasons: X% called because ..., Y% called because ....  I think this would be useful b/c it is exactly what I do (in an ad hoc manner) when I diagnose perf faults in HTML5 games and I know gamedevs will want to do this for themselves.

Anyhow, I know this is more than you're asking for in this bug (and we should what you are asking), but I thought I'd mention it since we're on the topic :).  If Alex doesn't do this first (he's finishing is internship today :( ), I can do this.
So would we disable ICs while profiling?  Or change our IC code in some way?

Note that there are at least 2 other ways that DOM code can be entered from a web page: nsIXPCScriptable hooks and DOM proxies (both pre-Paris and Paris; the former to go away soon).  Plus the IC for the proxies.
Any call out from jit code already needs instrumentation for profiling (to say "not in jit code").
(In reply to comment #2)
> When we call a native, we know the JSFunction (callee) and thus should be able
> to derive the name.  CallJSNative/CallJSPropertyOp is the pinchpoint for calls
> from C++, for jit code I think we'd want to push/pop an entry for the stub in
> jit code (which would make it free when profiling is off and fast even when
> profiling is on).  This shouldn't be too hard.

Cool.

> On this subject, I've been discussing with other members of the JS team another
> feature I'd like to add that is actually an extension of this: whenever we push
> one of these C++-call profile entries, I'd like to attach a "reason".  A reason
> could have a code (from a static enumeration) and a char* generated for the
> call site.  The goal of the "reason" is to explain, for all the calls into C++
> that have a faster path that stays in jit code (or a faster C++ function), "why
> didn't I get the faster path"?  The static code could allow the UI to print a
> nice prose summary (from a static table) of the reason and perhaps link to an
> MDN page explaining the optimization and how it fails and the generated string
> would provide specific details: if the problem is bad type info, what types did
> we see? if too many shapes flowed through, what were they? etc, if we are
> converting into dictionary mode b/c a property changed descriptor, what was the
> property and descriptor?  I think there are many many potential uses in JS
> alone.
> 
> Using this, the profiler could report: "for all the calls into C++ function X,
> here is the breakdown of the reasons: X% called because ..., Y% called because
> ....  I think this would be useful b/c it is exactly what I do (in an ad hoc
> manner) when I diagnose perf faults in HTML5 games and I know gamedevs will
> want to do this for themselves.

This is a good idea, but before taking the time to implement it, we should figure out a good UI to expose it, I think.
I experimented with adding sample labels to Web IDL getters, and it does indeed degrade performance with the profiler being turned off rather significantly (see http://dromaeo.com/?id=188581,188585 for example.)  At the very least, we need to ensure that mozilla_sampler_call_enter and mozilla_sampler_call_exit don't do any work when profiling is turned off.  It would also help to experiment with implementing SAMPLE_MAIN_THREAD_LABEL without using a TLS lookup.

Benoit, are you interested in looking to see how much you can reduce the overhad of adding sample labels?

(Also I couldn't figure out how to get the built-in cleopatra to show these labels but that's not very important for now.)
As expected there's a perf hit but it's very small.

Results:
              default      /w patch
profiler off: 1571.41 -> 1569.03
profiler on:  1480.88 -> 1429.88

Raw data:
patch, on: http://dromaeo.com/?id=188639 : http://dl.dropbox.com/u/10523664/Screenshots/8l.png
patch, off:  http://dromaeo.com/?id=188640 : http://dl.dropbox.com/u/10523664/Screenshots/8m.png
nop, off:   http://dromaeo.com/?id=188641 : http://dl.dropbox.com/u/10523664/Screenshots/8n.png
nop, on:    http://dromaeo.com/?id=188642 : http://dl.dropbox.com/u/10523664/Screenshots/8o.png
Attachment #704404 - Attachment is obsolete: true
Attachment #704642 - Flags: feedback?(ehsan)
Comment on attachment 704642 [details] [diff] [review]
Add sample labels to Web IDL binding getters + fast state check

Review of attachment 704642 [details] [diff] [review]:
-----------------------------------------------------------------

OK, this is more along the lines of what I was expecting.  I'm curious to see what Boris thinks about this.
Attachment #704642 - Flags: feedback?(ehsan) → feedback?(bzbarsky)
Comment on attachment 704642 [details] [diff] [review]
Add sample labels to Web IDL binding getters + fast state check

What platform/compiler are the numbers from comment 7 on?  This stuff is _extremely_ compiler-dependent.  In particular, nothing guarantees that mozilla_sampler_call_enter/exit are inlined, and some of our compilers are much more aggressive about inlining than others are.  We should consider either MOZ_ALWAYS_INLINE or at least measuring with all our compilers.  Or maybe both.  :(

Apart from that, this looks like a solid 2% hit on the parts of dromaeo that really exercise this code (e.g. "DOM Traversal").  I assume the huge regression on element.property under "DOM Attributes" is a gc-timing-related fluke, just like the corresponding win on getAttrribute....

I can probably live with a 2% hit if we need to, I guess.

Past that, this patch is measuring the cost of the actual call into the DOM, but not necessarily the cost of argument marshalling (and in particular in overloaded cases it's picking up some of the argument conversions but not others).  It's also not picking up the cost of this-unwrapping when not coming in via the Ion fast-path, or the cost of the various JIT appurtenances to a DOM call, but maybe that's ok.  As long as we're ok with missing out on the JIT bits and the this-unwrapping, we could just hoist this to the top of the getters/setters/methods, instead of the per-signature calls, and that should be good-ish.
Attachment #704642 - Flags: feedback?(bzbarsky) → feedback+
This is measured on 2.3 Ghz Core i7 (Sandy), OSX 10.8.2 compiled with Clang.
(In reply to comment #9)
> Past that, this patch is measuring the cost of the actual call into the DOM,
> but not necessarily the cost of argument marshalling (and in particular in
> overloaded cases it's picking up some of the argument conversions but not
> others).  It's also not picking up the cost of this-unwrapping when not coming
> in via the Ion fast-path, or the cost of the various JIT appurtenances to a DOM
> call, but maybe that's ok.  As long as we're ok with missing out on the JIT
> bits and the this-unwrapping, we could just hoist this to the top of the
> getters/setters/methods, instead of the per-signature calls, and that should be
> good-ish.

Would you mind rewording this in a way that I can understand?  I pretty much don't have any idea about how the Web IDL code generator works and what many of the jargon there means :/
> This is measured on 2.3 Ghz Core i7 (Sandy), OSX 10.8.2 compiled with Clang.

OK.  I'd love to see numbers on ARM and with MSVC, at least.

> Would you mind rewording this in a way that I can understand?

Hmm.

So for a DOM call to happen the following steps need to be performed, conceptually:

1)  Locating the right JSFunction to call for the given property name.
2)  Setting up the call into the C++ native backing the JSFunction.
3)  Calling into the generated C++ function.
4)  Extracting the right C++ object from the "this" object of the call (and throwing if
    the "this" object is not of the right type).
5)  Converting all the arguments that are the same for all IDL functions with the given
    name to the right C++ types.
6)  Determining which of the overloads to call for the case of overloaded functions.
7)  Converting the rest of the arguments based on the selected overload.
8)  Calling the relevant Gecko method.
9)  Converting the return value to a JS value.
10) Doing whatever guards need to be done on the return value to ensure our type
    inference information is not incorrect.

In practice, #1 is handled by ICs and whatnot in the JIT in hot code.  And when we've Ion-compiled and managed to have the right type info, #4 is handled at JIT-compile time.  And measuring anything about #1, 2, 3, 10 from inside the binding code is of course impossible.

The attached patch never measures #4, always measures #7-9, never measures #6, and only measures #5 if we're not dealing with an overloaded method (in which case #5 and #7 happen in a single step, handled by the CGArgumentConverter in the Python there).

What I'm saying is that it might be better to move the label to be handled in CGMethodCall/CGGetterCall/CGSetterCall, not CGPerSignatureCall, so it correctly picks up steps #5 and #6 above for overloaded methods.  And that it might be nice to measure #4 somehow, but I'm not quite sure how yet.
(In reply to comment #12)
> So for a DOM call to happen the following steps need to be performed,
> conceptually:
> 
> 1)  Locating the right JSFunction to call for the given property name.
> 2)  Setting up the call into the C++ native backing the JSFunction.
> 3)  Calling into the generated C++ function.
> 4)  Extracting the right C++ object from the "this" object of the call (and
> throwing if
>     the "this" object is not of the right type).
> 5)  Converting all the arguments that are the same for all IDL functions with
> the given
>     name to the right C++ types.
> 6)  Determining which of the overloads to call for the case of overloaded
> functions.
> 7)  Converting the rest of the arguments based on the selected overload.
> 8)  Calling the relevant Gecko method.
> 9)  Converting the return value to a JS value.
> 10) Doing whatever guards need to be done on the return value to ensure our
> type
>     inference information is not incorrect.

Thanks for this.  Really helpful.

> In practice, #1 is handled by ICs and whatnot in the JIT in hot code.  And when
> we've Ion-compiled and managed to have the right type info, #4 is handled at
> JIT-compile time.  And measuring anything about #1, 2, 3, 10 from inside the
> binding code is of course impossible.
> 
> The attached patch never measures #4, always measures #7-9, never measures #6,
> and only measures #5 if we're not dealing with an overloaded method (in which
> case #5 and #7 happen in a single step, handled by the CGArgumentConverter in
> the Python there).
> 
> What I'm saying is that it might be better to move the label to be handled in
> CGMethodCall/CGGetterCall/CGSetterCall, not CGPerSignatureCall, so it correctly
> picks up steps #5 and #6 above for overloaded methods.  And that it might be
> nice to measure #4 somehow, but I'm not quite sure how yet.

OK, this all makes sense now.  I can definitely do that.  But first, let's wait for Benoit to give us the numbers you asked for (yes, I volunteered him ;)
Let's wait until the patch is ready before doing a lot of time consuming benchmarks.
Blocks: 1329161
Depends on: 1348402
Attachment #8849319 - Attachment description: Add profiler laels to WebIDL bindings: getters / setters / method calls → Add profiler labels to WebIDL bindings: getters / setters / method calls
Dromaeo results with that patch are very noisy. I'll probably need to push to try / Talos instead.

without patch, profiler off:
http://dromaeo.com/?id=262440 4291.02runs/s (Total)
http://dromaeo.com/?id=262441 4606.26runs/s (Total)
http://dromaeo.com/?id=262442 4574.36runs/s (Total)

with patch, profiler off:
http://dromaeo.com/?id=262439 4367.84runs/s (Total)
http://dromaeo.com/?id=262443 3971.83runs/s (Total)
http://dromaeo.com/?id=262444 4273.71runs/s (Total)
Attached file microbenchmark
Assignee: nobody → mstange
Status: NEW → ASSIGNED
I have a few patches which reduce the overhead of pushing and popping a pseudo stack frame.

On my machine, on current mozilla-central, the microbenchmark in this bug gives me a score of 7.6 nanoseconds per call.
With the patch in this bug applied, that number increases to 23.3ns.
So pushing and popping a pseudo stack frame adds 15.7ns of overhead at the moment.

I was able to reduce those 15.7ns to 1.7ns with a few patches. If that's still too much overhead, then we can reduce it to basically zero if the profiler is inactive by adding an (inlined) profiler_is_active() check. The only disadvantage of such a check would be that we won't see pseudo stack frames for WebIDL methods that were already on the stack when the profiler was started, which should be quite rare.
There's only one example that I can think of where this would lead to confusing results: Let's say a website builds up a huge DOM tree and then causes a layout flush by calling someElement.offsetTop, and then the content process hangs while doing layout. If you start the profiler in the middle of that hang, and then look at the profile in the "hide platform details" mode, all you'll see is that some JavaScript caused a layout flush, but not that it was offsetTop that triggered the flush.
I think that's ok.

The biggest speedup was from using ReleaseAcquire memory ordering semantics for the PseudoStack's stackPointer (bug 1385998): Doing that reduces the time in the microbenchmark from 23.3ns to 12.5ns on my machine (-10.8ns). The other speedup came from avoiding TLS to get the PseudoStack: In the WebIDL bindings code, we have access to a JSContext, which carries the PseudoStack in a member. Accessing the PseudoStack from there brought the time down to 9.3ns (-3.2ns).

With an inlined profiler_is_active() check, the overhead is 0ns if the profiler is disabled. With a non-inlined profiler_is_active() check, i.e. a function call, the overhead is 1.2ns.

In order to get Dromaeo numbers, I've done a few try pushes:

(1) baseline: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0a2c0316cf3b8a36fe7247d2314f746454c0a617
(2) with pseudo frame: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d66e162e9601d7a86e09e8a30f5760dcbad10084
(3) no atomic increment / decrement: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0787bf89b7096f0e67b708da7014b02eaaa4c44d
(4) ReleaseAcquire: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca2422fdae2846eeebc23c6d02050fd77a22773b
(5) non-inlined GetContextProfilingStack: https://treeherder.mozilla.org/#/jobs?repo=try&revision=be95a4c203e8211f39b19469e3072cf6628906ac
(6) inlined GetContextProfilingStack: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4927e68cc87cf186da26c3b77d2c9a24a04cc25b
(7) with non-inline profiler_is_active() check:  https://treeherder.mozilla.org/#/jobs?repo=try&revision=990e0fedceaa3aa3ff18c6df8a311476a394e87e
(8) with inlined profiler_is_active(): https://treeherder.mozilla.org/#/jobs?repo=try&revision=368468cce8d3aeb340798565b1c6449b703bc531

Here again are the numbers from the microbenchmark on my machine with the patches:

(1) 7.6
(2) 23.3
(3) 25.5
(4) 12.5
(5) 11.5
(6) 9.3
(7) 8.8
(8) 7.6

Here are the Dromaeo numbers on Win10 64Bit (higher is better):

(1) 7,315.96 ± 1.25%
(2) 6,998.74 ± 1.15% 	-4.34% 
(3) 7,158.23 ± 1.11% 	-2.16%
(4) 7,106.56 ± 1.43% 	-2.86% 
(5) 7,113.73 ± 1.33% 	-2.76% 
(6) 7,231.00 ± 1.25% 	-1.16% 
(7) 7,103.49 ± 1.36% 	-2.90% 
(8) 7,259.46 ± 1.07% 	-0.77% 

Win7 32bit:

(1) 7,000.43 ± 5.63%
(2) 6,811.43 ± 0.87% 	-2.70%
(3) 6,824.70 ± 0.50% 	-2.51%
(4) 6,953.09 ± 0.62% 	-0.68%
(5) 6,915.12 ± 0.63% 	-1.22% 
(6) 6,931.97 ± 0.89% 	-0.98%
(7) 6,967.01 ± 0.86% 	-0.48%
(8) 7,035.33 ± 3.06% 	0.50% 

macOS 64bit:

(1) 9,064.99 ± 0.54% 
(2) 8,683.21 ± 0.73% 	-4.21% 
(3) 8,614.07 ± 0.56% 	-4.97% 
(4) 8,823.87 ± 0.78% 	-2.66% 
(5) 8,922.72 ± 0.66% 	-1.57%
(6) 8,903.12 ± 0.66% 	-1.79%
(7) 8,942.54 ± 0.54% 	-1.35% 
(8) 8,813.63 ± 9.54% 	-2.77% (this one has really high variance and is probably not meaningful)

Linux 64bit:

(1) 8,353.46 ± 0.60% 
(2) 8,093.11 ± 0.58% 	-3.12%
(3) 7,558.68 ± 0.53% 	-9.51% 
(4) 8,069.23 ± 0.73% 	-3.40%
(5) 8,146.06 ± 0.92% 	-2.48%
(6) 8,265.63 ± 0.74% 	-1.05% 
(7) 8,326.55 ± 0.72% 	-0.32%
(8) 8,391.20 ± 0.71% 	0.45%
> If you start the profiler in the middle of that hang, and then look at the profile in the
> "hide platform details" mode

I think that's fine.  On nightly, with engineers profiling, looking for the binding method involved in the real stack is reasonable.  On release, does the real stack have enough symbols to get the binding method involved?

For the use case described in comment 0, which I assume is web developers profiling, is this scenario something that would happen?  That is, can you even start the devtools profiler (as opposed to the profiler extension profiler) in the middle of a layout flush?
(In reply to Boris Zbarsky [:bz] from comment #19)
> > If you start the profiler in the middle of that hang, and then look at the profile in the
> > "hide platform details" mode
> 
> I think that's fine.  On nightly, with engineers profiling, looking for the
> binding method involved in the real stack is reasonable.

I agree.

>  On release, does
> the real stack have enough symbols to get the binding method involved?

On Windows and Linux yes, on Mac no, but only because we haven't decided to ship frame pointers on release on Mac.

> For the use case described in comment 0, which I assume is web developers
> profiling, is this scenario something that would happen?  That is, can you
> even start the devtools profiler (as opposed to the profiler extension
> profiler) in the middle of a layout flush?

At the moment you can't, but maybe once we integrate perf.html back into the devtools you may be able to.
Comment on attachment 8892623 [details]
Bug 785440 - Move profiler_is_active() implementation (and RacyFeatures) into GeckoProfiler.h.

https://reviewboard.mozilla.org/r/163600/#review169092

::: tools/profiler/public/GeckoProfiler.h:131
(Diff revision 2)
>  
> +#ifdef MOZ_GECKO_PROFILER
> +namespace mozilla {
> +namespace profiler {
> +namespace detail {
> +// RacyFeatures is only defined in this header file so that its methods can

Nit: blank line before comment.

::: tools/profiler/public/GeckoProfiler.h:138
(Diff revision 2)
> +// detail namespace outside the profiler.
> +
> +// The preferred way to check profiler activeness and features is via
> +// ActivePS(). However, that requires locking gPSMutex. There are some hot
> +// operations where absolute precision isn't required, so we duplicate the
> +// activeness/feature state in a lock-free manner in this class.

This comment only makes sense within the profiler's code -- ActivePS is only visible in platform.cpp. So I suggest changing the start to "Within the profiler's code, the preferred way to check..."
Attachment #8892623 - Flags: review?(n.nethercote) → review+
Comment on attachment 8892625 [details]
Bug 785440 - Add AUTO_PROFILER_LABEL_FAST.

https://reviewboard.mozilla.org/r/163604/#review169094

::: tools/profiler/public/GeckoProfiler.h:492
(Diff revision 2)
> +// no-op if the profiler is disabled.
> +// Used to annotate functions for which overhead in the range of nanoseconds is
> +// noticeable. It avoids overhead from the TLS lookup because it can get the
> +// PseudoStack from the JS context, and avoids almost all overhead in the case
> +// where the profiler is disabled.
> +#define AUTO_PROFILER_LABEL_JSCONTEXT(ctx, label, category) \

Would AUTO_PROFILER_LABEL_FAST be a better name? Not sure.

Also, would it be better to make ctx the third arg, so that label and category remain the first and second args across all AUTO_PROFILER_LABEL\_\* macros?

::: tools/profiler/public/GeckoProfiler.h:671
(Diff revision 2)
> +                    MOZ_GUARD_OBJECT_NOTIFIER_PARAM)
> +  {
> +    MOZ_GUARD_OBJECT_NOTIFIER_INIT;
> +    if (profiler_is_active()) {
> +      Push(js::GetContextProfilingStack(aJSContext),
> +          aLabel, aDynamicString, aLine, aCategory);

Nit: Incorrect indent.
Attachment #8892625 - Flags: review?(n.nethercote) → review+
Comment on attachment 8892625 [details]
Bug 785440 - Add AUTO_PROFILER_LABEL_FAST.

https://reviewboard.mozilla.org/r/163604/#review169094

> Would AUTO_PROFILER_LABEL_FAST be a better name? Not sure.
> 
> Also, would it be better to make ctx the third arg, so that label and category remain the first and second args across all AUTO_PROFILER_LABEL\_\* macros?

I think AUTO_PROFILER_LABEL_FAST would be a better name, yes. AUTO_PROFILER_LABEL_JSCONTEXT makes it sound like the only difference is that it accepts a JSContext, and makes it easy to forget about the other difference.
And AUTO_PROFILER_LABEL_JSCONTEXT_IF_ACTIVE seems entirely too long.

I'll also change the argument order.
Comment on attachment 8892626 [details]
Bug 785440 - Add profiler labels to WebIDL bindings: getters / setters / method calls.

https://reviewboard.mozilla.org/r/163606/#review169368

::: dom/bindings/Codegen.py:7631
(Diff revision 5)
>      # there.
>  
>      def __init__(self, returnType, arguments, nativeMethodName, static,
>                   descriptor, idlNode, argConversionStartsAt=0, getter=False,
>                   setter=False, isConstructor=False, useCounterName=None,
> -                 resultVar=None, objectName="obj"):
> +                 resultVar=None, objectName="obj", profilerLabel=None):

So I'm not a fan of doing this in PerSignatureCall but then bypassing that for methods to spit it out directly.

Conceptually the profiler label is not per-signature, so we really shouldn't be stuffing it in here.

Doing it this way also misses it in various cases: navigator getters and maplike/setlike generated getters come to mind.

So in practice, this patch is adding labels to the following cases:

* CGSpecializedGetter except navigator and maplike/setlike getters.
* CGStaticGetter
* CGSpecializedSetter
* CGStaticSetter
* CGSpecializedMethod
* CGLegacyCallHook
* CGStaticMethod
* CGClassConstructor but only in the non-HTMLConstructor cases.

and it's not entirely clear to me that the labels it produces in all those cases make sense.

So I think I would prefer it if we added a thing on CGAbstractMethod that queried get_profiler_label() to get the first arg to pass to AUTO_PROFILER_LABEL_FAST, and if not None output the corresponding label.    Then we could override get_profiler_label in the places where we want them.
Attachment #8892626 - Flags: review?(bzbarsky) → review-
Attachment #8849319 - Attachment is obsolete: true
Attachment #704642 - Attachment is obsolete: true
Comment on attachment 8892626 [details]
Bug 785440 - Add profiler labels to WebIDL bindings: getters / setters / method calls.

https://reviewboard.mozilla.org/r/163606/#review169368

> So I'm not a fan of doing this in PerSignatureCall but then bypassing that for methods to spit it out directly.
> 
> Conceptually the profiler label is not per-signature, so we really shouldn't be stuffing it in here.
> 
> Doing it this way also misses it in various cases: navigator getters and maplike/setlike generated getters come to mind.
> 
> So in practice, this patch is adding labels to the following cases:
> 
> * CGSpecializedGetter except navigator and maplike/setlike getters.
> * CGStaticGetter
> * CGSpecializedSetter
> * CGStaticSetter
> * CGSpecializedMethod
> * CGLegacyCallHook
> * CGStaticMethod
> * CGClassConstructor but only in the non-HTMLConstructor cases.
> 
> and it's not entirely clear to me that the labels it produces in all those cases make sense.
> 
> So I think I would prefer it if we added a thing on CGAbstractMethod that queried get_profiler_label() to get the first arg to pass to AUTO_PROFILER_LABEL_FAST, and if not None output the corresponding label.    Then we could override get_profiler_label in the places where we want them.

Thank you for that suggestion. I think the patch is much better now.

There was a slight problem with the fact that AUTO_PROFILER_LABEL_FAST wants a JSContext* argument, and CGAbstractMethod is at an abstraction level where it doesn't know about the method's arguments. And I didn't want to add an implicit assumption of the form "if your subclass returns something other than None from get_profiler_label(), it promises that there's an argument called 'cx' around which can be passed to AUTO_PROFILER_LABEL_FAST". So I called the method profiler_label_and_jscontext(), and it returns a pair; the first item being the label and the second being the name of a JSContext* variable (which is effectively always "cx").

Now I have the string "cx" only in the places where I know that "cx" is part of my arguments, and I think this makes the code easier to reason about.
Comment on attachment 8892626 [details]
Bug 785440 - Add profiler labels to WebIDL bindings: getters / setters / method calls.

https://reviewboard.mozilla.org/r/163606/#review170072

::: dom/bindings/Codegen.py:9317
(Diff revision 6)
>                               self.descriptor, self.attr).define())
>  
> +    def profiler_label_and_jscontext(self):
> +        interface_name = self.descriptor.interface.identifier.name
> +        attr_name = self.attr.identifier.name
> +        return ("%s.%s" % (interface_name, attr_name), "cx")

Maybe make the label "get %s.%s"?

::: dom/bindings/Codegen.py:9386
(Diff revision 6)
>                              self.attr)
>  
> +    def profiler_label(self):
> +        interface_name = self.descriptor.interface.identifier.name
> +        attr_name = self.attr.identifier.name
> +        return "%s.%s" % (interface_name, attr_name)

Maybe "get %s.%s" for the label.

::: dom/bindings/Codegen.py:9461
(Diff revision 6)
>                              self.attr).define()
>  
> +    def profiler_label_and_jscontext(self):
> +        interface_name = self.descriptor.interface.identifier.name
> +        attr_name = self.attr.identifier.name
> +        return ("%s.%s" % (interface_name, attr_name), "cx")

And here "set %s.%s"

::: dom/bindings/Codegen.py:9495
(Diff revision 6)
>          return CGList([checkForArg, call])
>  
> +    def profiler_label(self):
> +        interface_name = self.descriptor.interface.identifier.name
> +        attr_name = self.attr.identifier.name
> +        return "%s.%s" % (interface_name, attr_name)

"set %s.%s"
Attachment #8892626 - Flags: review?(bzbarsky) → review+
Comment on attachment 8892624 [details]
Bug 785440 - Add js::GetContextProfilingStack in such a way that it can be inlined into non-JS code.

https://reviewboard.mozilla.org/r/163602/#review170084

::: js/public/ProfilingStack.h:25
(Diff revision 4)
>  struct JSRuntime;
> -class JSTracer;
> +class JS_PUBLIC_API(JSTracer);
>  
>  class PseudoStack;
>  
>  namespace js {

Pre-existing, but why is this in js? I would have expected JS. JS is for public stuff, js is for engine-internal stuff, at least conceptually. Isn't GeckoProfilerThread intended for use by external consumers, or are all users within spidermonkey?

Hm. In the current tree, it seems like all users *are* within spidermonkey. But maybe after this bug, there will be external users? I haven't really looked into exactly what you're doing here.

::: js/public/ProfilingStack.h:300
(Diff revision 4)
>      mozilla::Atomic<uint32_t, mozilla::ReleaseAcquire> stackPointer;
>  };
>  
> +namespace js {
> +
> +class AutoGeckoProfilerEntry;

These at least seem like they ought to be in js.

::: js/public/ProfilingStack.h:304
(Diff revision 4)
> +
> +class AutoGeckoProfilerEntry;
> +class GeckoProfilerEntryMarker;
> +class GeckoProfilerBaselineOSRMarker;
> +
> +class GeckoProfilerThread

I saw you added JS_PUBLIC_API around JSTracer. Does this need JS_PUBLIC_API too?

::: js/src/jspubtd.h:308
(Diff revision 4)
>  
> +    // Gecko profiling metadata.
> +    // This isn't really rooting related. It's only here because we want
> +    // GetContextProfilingStack to be inlineable into non-JS code, and we
> +    // didn't want to add another superclass of JSContext just for this.
> +    js::GeckoProfilerThread geckoProfiler_;

Yeah, we should probably eventually rename this to ShadowContext or something. But I think this is fine for now.
Attachment #8892624 - Flags: review?(sphink) → review+
Comment on attachment 8892624 [details]
Bug 785440 - Add js::GetContextProfilingStack in such a way that it can be inlined into non-JS code.

https://reviewboard.mozilla.org/r/163602/#review170084

> Pre-existing, but why is this in js? I would have expected JS. JS is for public stuff, js is for engine-internal stuff, at least conceptually. Isn't GeckoProfilerThread intended for use by external consumers, or are all users within spidermonkey?
> 
> Hm. In the current tree, it seems like all users *are* within spidermonkey. But maybe after this bug, there will be external users? I haven't really looked into exactly what you're doing here.

GeckoProfilerThread is only used for engine-internal stuff, and by GetContextProfilingStack (added in this patch), and that only needs it because GeckoProfilerThread is on the path from JSContext to PseudoStack. However, given that RootingContext is exposed to external consumers, and given that geckoProfiler() is a public method of RootingContext, I suppose you could say that GeckoProfilerThread is now effectively exposed. I don't know how the exact definitions are here.

As far as things go that external consumers actually need, those are just ProfileEntry and PseudoStack, as far as I know.

> I saw you added JS_PUBLIC_API around JSTracer. Does this need JS_PUBLIC_API too?

Ah, I should have commented on the JSTracer change. That change is just fixing up an incorrect forward declaration. JSTracer already was JS_PUBLIC_API. With the #include changes in this patch, this incorrect forward declaration caused some compilers to complain about the mismatch, so I fixed it.
Priority: -- → P1
This is almost ready to land. There's one remaining build failure on Windows 32bit, and it's caused by a warning in RootingAPI.h that is being treated as an error when compiled as part of security/certverifier/:
> warning C4324: 'js::DispatchWrapper<T>': structure was padded due to alignment specifier

security/certverifier has its own build settings that enable more warnings than anywhere in the rest of the tree, so I've added a patch that disables this specific warning there.

Build success before adding #include "RootingAPI.h" to GeckoProfiler.h:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e1088bae21697e63627e4ffce42ef0f936cd44f&selectedJob=161557084

Build failure after adding #include "RootingAPI.h" to GeckoProfiler.h:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f9c18956d64591c783ff533af1a796eeac7eeffa&selectedJob=161559792
Log: https://treeherder.mozilla.org/logviewer.html#?job_id=161559792&repo=try&lineNumber=9042

Build success after disabling warning 4324 in security/certverifier:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=35ff65d4a1c3f49499b9ddcaba7efbed66279d1f&selectedJob=161562278
Comment on attachment 8950073 [details]
Bug 785440 - Disable warning C4324 when building security/certverifier.

https://reviewboard.mozilla.org/r/219342/#review225302

Yeah, this is sort-of a bummer, but I don't know that we have a better solution at the moment.
Attachment #8950073 - Flags: review?(dkeeler) → review+
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/c0a7cb6d1b61
Move profiler_is_active() implementation (and RacyFeatures) into GeckoProfiler.h. r=njn
https://hg.mozilla.org/integration/autoland/rev/ecfe33a4bb62
Add js::GetContextProfilingStack in such a way that it can be inlined into non-JS code. r=sfink
https://hg.mozilla.org/integration/autoland/rev/d2dea7eb260f
Add AUTO_PROFILER_LABEL_FAST. r=njn
https://hg.mozilla.org/integration/autoland/rev/1236752bf224
Add profiler labels to WebIDL bindings: getters / setters / method calls. r=bz
https://hg.mozilla.org/integration/autoland/rev/fbd6ca22f841
Disable warning C4324 when building security/certverifier. r=keeler
Depends on: 1437772
Depends on: 1438120
Depends on: 1438485
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: