Last Comment Bug 638328 - Investigate the XPC overhead on WebGL calls.
: Investigate the XPC overhead on WebGL calls.
Product: Core
Classification: Components
Component: Canvas: WebGL (show other bugs)
: unspecified
: All All
: -- normal with 2 votes (vote)
: mozilla10
Assigned To: Benoit Jacob [:bjacob] (mostly away)
Depends on: 694114 622298
Blocks: mapsgl-performance 638321
  Show dependency treegraph
Reported: 2011-03-02 18:49 PST by Matt Woodrow (:mattwoodrow)
Modified: 2011-10-26 07:59 PDT (History)
14 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

Peter's magic DOMClassInfo tip (1.61 KB, patch)
2011-09-18 22:26 PDT, Benoit Jacob [:bjacob] (mostly away)
peterv: review+
Details | Diff | Splinter Review
remove quickstubbing blacklist (1.34 KB, patch)
2011-09-18 22:28 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Splinter Review
remove (most of) quickstubbing blacklist (1.39 KB, patch)
2011-09-18 22:34 PDT, Benoit Jacob [:bjacob] (mostly away)
peterv: review+
Details | Diff | Splinter Review
use builtinclass and remove do_QueryInterface (5.45 KB, patch)
2011-10-02 10:43 PDT, Benoit Jacob [:bjacob] (mostly away)
peterv: review+
Details | Diff | Splinter Review
fix bug in assertion, bad usage of queryinterface (1.02 KB, patch)
2011-10-14 06:44 PDT, Benoit Jacob [:bjacob] (mostly away)
roc: review+
Details | Diff | Splinter Review

Description Matt Woodrow (:mattwoodrow) 2011-03-02 18:49:49 PST
Inside a call to glUniform3f, I'm see a lot of time spent inside XPC (I think!).

xpc_qsUnwrapArgImpl - 1.1%
castNative - 1.1%
getWrapper - 0.4%
JS_ValueToNumber - 0.2%

The total of these is almost more time than is spent actually inside glUniform3f(). Is there any way we can reduce these?
Comment 1 Boris Zbarsky [:bz] (TPAC) 2011-03-02 20:32:44 PST
There are some ways, yes.  And there are plans to redo how all that works for Fx6 or so.  Which quickstub are we talking about here?  You may be able to do some stuff short-term to make this faster....
Comment 2 Benoit Jacob [:bjacob] (mostly away) 2011-03-03 04:59:40 PST
the quickstub for WebGL.uniform3f is defined in


and has a 'TN version' (not that I actually know what that is. Vlad wrote that.)
Comment 3 Boris Zbarsky [:bz] (TPAC) 2011-03-03 07:37:28 PST
TN == Traceable Native.  That would be called from trace, but on jsgamebench we seem to always be in the methodjit (something else worth figuring out).
Comment 4 Boris Zbarsky [:bz] (TPAC) 2011-03-03 07:38:16 PST
Oh, and I did look at the code; none of the classes involved are doing any of the fast-path stuff we can do in quickstubs, and adding it would be some work.  Probably better to just wait for the new bindings work, hence the dependency.
Comment 5 Benoit Jacob [:bjacob] (mostly away) 2011-03-04 08:40:01 PST
See bug 638321 comment 5: the XPC overhead for certain functions, at least Uniform3f, Uniform4f and BindTexture, is huge (it is as expensive as the function itself).
Comment 6 Benoit Jacob [:bjacob] (mostly away) 2011-03-04 08:44:42 PST
For Uniform4f:

| + 2.6%, nsIDOMWebGLRenderingContext_Uniform4f(JSContext*, unsigned int, unsigned long long*), XUL
| | + 1.0%, mozilla::WebGLContext::Uniform4f(nsIWebGLUniformLocation*, float, float, float, float), XUL
| | | - 0.7%, glUniform4fARB_Exec, GLEngine, XUL
| | |   0.1%, nsCOMPtr_base::assign_from_qi_with_error, XUL
| | |   0.1%, mozilla::WebGLUniformLocation::Release(), XUL
| | |   0.0%, nsCOMPtr_base::~nsCOMPtr_base(), XUL
| | - 0.4%, xpc_qsUnwrapArgImpl, XUL
| | - 0.3%, getWrapper, XUL
| |   0.2%, mozilla::WebGLContext::QueryInterface(nsID const&, void**), XUL
| | - 0.2%, castNative, XUL
| | - 0.2%, mozilla::WebGLContext::Release(), XUL
| |   0.1%, JS_ValueToNumber, XUL
| |   0.1%, glUniform4f, libGL.dylib

So despite being quickstubbed, we pay a huge price for just converting JS arguments to C++ values.
Comment 7 Benoit Jacob [:bjacob] (mostly away) 2011-03-04 08:51:45 PST
Another remark: we also pay a high price (10% in that function) for QueryInterface and Release.
Comment 8 Benoit Jacob [:bjacob] (mostly away) 2011-03-04 08:56:09 PST
Let's break down the time spent in Uniform4f:
* 38% spent converting JS arguments to C++ values
* 31% spent in OpenGL
* 12% spent in Release() (on WebGLContext and on WebGLUniformLocation)
* 12% spent in QueryInterface and assign_from_qi_with_error
Comment 9 Peter Van der Beken [:peterv] 2011-03-04 10:15:26 PST
Adding a bit for nsIDOMWebGLRenderingContext and nsIWebGLUniformLocation to DOMCI_CASTABLE_INTERFACES might help.
Comment 10 Benoit Jacob [:bjacob] (mostly away) 2011-03-05 17:22:57 PST
Now this is weird: a bug report complains that those uniformXx functions perform actually worse in Chrome than in FF:

@ Peter: thanks for the tip, I will try to make a patch.
Comment 11 Benoit Jacob [:bjacob] (mostly away) 2011-09-18 22:26:07 PDT
Created attachment 560851 [details] [diff] [review]
Peter's magic DOMClassInfo tip

Finally got Peter to explain me what he meant!

Peter -> just one thing I don't remember. I can understand that this will allow to optimize things on the xpconnect/quickstubs side. But what about "my" side, in the C++ implementation of WebGL functions: there, I am doing QueryInterface to get e.g. a WebGLUniformLocation* from a nsIWebGLUniformLocation*. Will this QueryInterface magically go faster thanks to the present patch, or should I replace it by something else? (A plain cast?)
Comment 12 Benoit Jacob [:bjacob] (mostly away) 2011-09-18 22:28:15 PDT
Created attachment 560852 [details] [diff] [review]
remove quickstubbing blacklist

Some WebGL functions were opted out from auto-quickstubbing because they used to directly access the JS context to manipulate the JS return value. They don't do that anymore, instead they return a nsIVariant.
Comment 13 Benoit Jacob [:bjacob] (mostly away) 2011-09-18 22:34:02 PDT
Created attachment 560854 [details] [diff] [review]
remove (most of) quickstubbing blacklist

previous version didn't build, as getContextAttributes is still manipulating its return jsval. Other functions are fine.
Comment 14 Boris Zbarsky [:bz] (TPAC) 2011-09-19 05:14:15 PDT
Benoit, if you list nsIWebGLUniformLocation as "builtinclass" in the IDL (see the way nsIDOMEventTarget does it, say), then you can assume that given an nsIWebGLUniformLocation the C++ class is one that you wrote.  If there's only one class that implements that interface, then you can just cast to it instead of using QI.
Comment 15 Benoit Jacob [:bjacob] (mostly away) 2011-10-01 23:18:54 PDT
Should I be parcimonious in sprinkling "builtinclass" over the IDL?

I mean, should I only put "builtinclass" on nsIWebGLUniformLocation and other super-performance-critical interfaces, or is it OK to put it all over the place? What is the cost of putting "builtinclass" on more classes than necessary?

I'm asking because in the WebGL implementation we have some generic functions to handle WebGL function arguments, see GetConcreteObjectAndGLName and friends in WebGLContext.h, so it would be nice to keeping treating all WebGL function parameters in the same way, rather than having to treat the "builtinclass" ones differently.
Comment 16 Benoit Jacob [:bjacob] (mostly away) 2011-10-01 23:20:57 PDT
...or wait, is it possible to determine, using template metaprogramming, whether a type is a "builtinclass"? That would allow me to update GetConcreteObject and friends to handle this in a completely fool-proof way.
Comment 17 Peter Van der Beken [:peterv] 2011-10-02 02:14:44 PDT
(In reply to Benoit Jacob [:bjacob] from comment #15)
> I mean, should I only put "builtinclass" on nsIWebGLUniformLocation and
> other super-performance-critical interfaces, or is it OK to put it all over
> the place? What is the cost of putting "builtinclass" on more classes than
> necessary?

You can put it on any class that's never supposed to be implemented in JS. There shouldn't be any downside afaik. Note that from C++, for example from GetConcreteObjectAndGLName, there is no easy way to detect whether the interface is marked "builtinclass" or not.
Comment 18 Benoit Jacob [:bjacob] (mostly away) 2011-10-02 10:43:42 PDT
Created attachment 564061 [details] [diff] [review]
use builtinclass and remove do_QueryInterface
Comment 19 Peter Van der Beken [:peterv] 2011-10-10 09:19:39 PDT
Comment on attachment 564061 [details] [diff] [review]
use builtinclass and remove do_QueryInterface

Review of attachment 564061 [details] [diff] [review]:

::: content/canvas/src/WebGLContext.h
@@ +2111,5 @@
>      if (isNull)
>          *isNull = PR_FALSE;
> +    *aConcreteObject = static_cast<ConcreteObjectType*>(aInterface);

It might still be a good idea to assert in a debug build that a QI gives the same object (especially since all those classes already have an IID). Could you also file a bug to add a way to statically assert from C++ that a given interface is marked builtinclass, and to do that here? Given that this is a fairly generic conversion function it'd be good that we could make sure that anything that this converts is safe.
Comment 20 Benoit Jacob [:bjacob] (mostly away) 2011-10-12 13:22:15 PDT
Filed follow-up bug 694114.
Comment 23 Benoit Jacob [:bjacob] (mostly away) 2011-10-14 06:44:01 PDT
Created attachment 567068 [details] [diff] [review]
fix bug in assertion, bad usage of queryinterface

This crept in my last minute changes, unreviewed, to address review comments. sorry.
Comment 24 Benoit Jacob [:bjacob] (mostly away) 2011-10-14 07:02:43 PDT
Comment 25 Benoit Jacob [:bjacob] (mostly away) 2011-10-14 07:26:54 PDT
These patches make a huge difference on Google MapsGL!
Comment 26 Benoit Jacob [:bjacob] (mostly away) 2011-10-14 07:28:16 PDT
Should I try to get Aurora approval for these patches? If WebGL performance is considered strategic?
Comment 27 2011-10-14 08:56:40 PDT
Comment on attachment 567068 [details] [diff] [review]
fix bug in assertion, bad usage of queryinterface

>+        nsCOMPtr<ConcreteObjectType> tmp(do_QueryInterface(aInterface, &rv));
Nit: this should actually read
nsRefPtr<ConcreteObjectType> tmp(do_QueryObject(aInterface, &rv));
[The effect is the same in this case.]

Note You need to log in before you can comment on or make changes to this bug.