Closed Bug 638328 Opened 9 years ago Closed 8 years ago

Investigate the XPC overhead on WebGL calls.


(Core :: Canvas: WebGL, defect)

Not set





(Reporter: mattwoodrow, Assigned: bjacob)


(Depends on 1 open bug)



(4 files, 1 obsolete file)

Inside a call to glUniform3f, I'm see a lot of time spent inside XPC (I think!).

xpc_qsUnwrapArgImpl - 1.1%
castNative - 1.1%
getWrapper - 0.4%
JS_ValueToNumber - 0.2%

The total of these is almost more time than is spent actually inside glUniform3f(). Is there any way we can reduce these?
Blocks: 638321
There are some ways, yes.  And there are plans to redo how all that works for Fx6 or so.  Which quickstub are we talking about here?  You may be able to do some stuff short-term to make this faster....
the quickstub for WebGL.uniform3f is defined in


and has a 'TN version' (not that I actually know what that is. Vlad wrote that.)
TN == Traceable Native.  That would be called from trace, but on jsgamebench we seem to always be in the methodjit (something else worth figuring out).
Depends on: 622298
Oh, and I did look at the code; none of the classes involved are doing any of the fast-path stuff we can do in quickstubs, and adding it would be some work.  Probably better to just wait for the new bindings work, hence the dependency.
See bug 638321 comment 5: the XPC overhead for certain functions, at least Uniform3f, Uniform4f and BindTexture, is huge (it is as expensive as the function itself).
For Uniform4f:

| + 2.6%, nsIDOMWebGLRenderingContext_Uniform4f(JSContext*, unsigned int, unsigned long long*), XUL
| | + 1.0%, mozilla::WebGLContext::Uniform4f(nsIWebGLUniformLocation*, float, float, float, float), XUL
| | | - 0.7%, glUniform4fARB_Exec, GLEngine, XUL
| | |   0.1%, nsCOMPtr_base::assign_from_qi_with_error, XUL
| | |   0.1%, mozilla::WebGLUniformLocation::Release(), XUL
| | |   0.0%, nsCOMPtr_base::~nsCOMPtr_base(), XUL
| | - 0.4%, xpc_qsUnwrapArgImpl, XUL
| | - 0.3%, getWrapper, XUL
| |   0.2%, mozilla::WebGLContext::QueryInterface(nsID const&, void**), XUL
| | - 0.2%, castNative, XUL
| | - 0.2%, mozilla::WebGLContext::Release(), XUL
| |   0.1%, JS_ValueToNumber, XUL
| |   0.1%, glUniform4f, libGL.dylib

So despite being quickstubbed, we pay a huge price for just converting JS arguments to C++ values.
Another remark: we also pay a high price (10% in that function) for QueryInterface and Release.
Let's break down the time spent in Uniform4f:
* 38% spent converting JS arguments to C++ values
* 31% spent in OpenGL
* 12% spent in Release() (on WebGLContext and on WebGLUniformLocation)
* 12% spent in QueryInterface and assign_from_qi_with_error
Adding a bit for nsIDOMWebGLRenderingContext and nsIWebGLUniformLocation to DOMCI_CASTABLE_INTERFACES might help.
Now this is weird: a bug report complains that those uniformXx functions perform actually worse in Chrome than in FF:

@ Peter: thanks for the tip, I will try to make a patch.
Finally got Peter to explain me what he meant!

Peter -> just one thing I don't remember. I can understand that this will allow to optimize things on the xpconnect/quickstubs side. But what about "my" side, in the C++ implementation of WebGL functions: there, I am doing QueryInterface to get e.g. a WebGLUniformLocation* from a nsIWebGLUniformLocation*. Will this QueryInterface magically go faster thanks to the present patch, or should I replace it by something else? (A plain cast?)
Attachment #560851 - Flags: review?(peterv)
Attached patch remove quickstubbing blacklist (obsolete) — Splinter Review
Some WebGL functions were opted out from auto-quickstubbing because they used to directly access the JS context to manipulate the JS return value. They don't do that anymore, instead they return a nsIVariant.
Attachment #560852 - Flags: review?(peterv)
previous version didn't build, as getContextAttributes is still manipulating its return jsval. Other functions are fine.
Attachment #560852 - Attachment is obsolete: true
Attachment #560852 - Flags: review?(peterv)
Attachment #560854 - Flags: review?(peterv)
Benoit, if you list nsIWebGLUniformLocation as "builtinclass" in the IDL (see the way nsIDOMEventTarget does it, say), then you can assume that given an nsIWebGLUniformLocation the C++ class is one that you wrote.  If there's only one class that implements that interface, then you can just cast to it instead of using QI.
Should I be parcimonious in sprinkling "builtinclass" over the IDL?

I mean, should I only put "builtinclass" on nsIWebGLUniformLocation and other super-performance-critical interfaces, or is it OK to put it all over the place? What is the cost of putting "builtinclass" on more classes than necessary?

I'm asking because in the WebGL implementation we have some generic functions to handle WebGL function arguments, see GetConcreteObjectAndGLName and friends in WebGLContext.h, so it would be nice to keeping treating all WebGL function parameters in the same way, rather than having to treat the "builtinclass" ones differently.
...or wait, is it possible to determine, using template metaprogramming, whether a type is a "builtinclass"? That would allow me to update GetConcreteObject and friends to handle this in a completely fool-proof way.
(In reply to Benoit Jacob [:bjacob] from comment #15)
> I mean, should I only put "builtinclass" on nsIWebGLUniformLocation and
> other super-performance-critical interfaces, or is it OK to put it all over
> the place? What is the cost of putting "builtinclass" on more classes than
> necessary?

You can put it on any class that's never supposed to be implemented in JS. There shouldn't be any downside afaik. Note that from C++, for example from GetConcreteObjectAndGLName, there is no easy way to detect whether the interface is marked "builtinclass" or not.
Attachment #560854 - Flags: review?(peterv) → review+
Attachment #560851 - Flags: review?(peterv) → review+
Comment on attachment 564061 [details] [diff] [review]
use builtinclass and remove do_QueryInterface

Review of attachment 564061 [details] [diff] [review]:

::: content/canvas/src/WebGLContext.h
@@ +2111,5 @@
>      if (isNull)
>          *isNull = PR_FALSE;
> +    *aConcreteObject = static_cast<ConcreteObjectType*>(aInterface);

It might still be a good idea to assert in a debug build that a QI gives the same object (especially since all those classes already have an IID). Could you also file a bug to add a way to statically assert from C++ that a given interface is marked builtinclass, and to do that here? Given that this is a fairly generic conversion function it'd be good that we could make sure that anything that this converts is safe.
Attachment #564061 - Flags: review?(peterv) → review+
Depends on: 694114
Filed follow-up bug 694114.
Assignee: nobody → bjacob
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
This crept in my last minute changes, unreviewed, to address review comments. sorry.
Attachment #567068 - Flags: review?(roc)
These patches make a huge difference on Google MapsGL!
Should I try to get Aurora approval for these patches? If WebGL performance is considered strategic?
Comment on attachment 567068 [details] [diff] [review]
fix bug in assertion, bad usage of queryinterface

>+        nsCOMPtr<ConcreteObjectType> tmp(do_QueryInterface(aInterface, &rv));
Nit: this should actually read
nsRefPtr<ConcreteObjectType> tmp(do_QueryObject(aInterface, &rv));
[The effect is the same in this case.]
OS: Mac OS X → All
Hardware: x86 → All
You need to log in before you can comment on or make changes to this bug.