Investigate the XPC overhead on WebGL calls.

RESOLVED FIXED in mozilla10



Canvas: WebGL
7 years ago
6 years ago


(Reporter: mattwoodrow, Assigned: bjacob)


(Depends on: 1 bug, Blocks: 1 bug)

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)



(4 attachments, 1 obsolete attachment)



7 years ago
Inside a call to glUniform3f, I'm see a lot of time spent inside XPC (I think!).

xpc_qsUnwrapArgImpl - 1.1%
castNative - 1.1%
getWrapper - 0.4%
JS_ValueToNumber - 0.2%

The total of these is almost more time than is spent actually inside glUniform3f(). Is there any way we can reduce these?


7 years ago
Blocks: 638321

Comment 1

7 years ago
There are some ways, yes.  And there are plans to redo how all that works for Fx6 or so.  Which quickstub are we talking about here?  You may be able to do some stuff short-term to make this faster....

Comment 2

7 years ago
the quickstub for WebGL.uniform3f is defined in


and has a 'TN version' (not that I actually know what that is. Vlad wrote that.)

Comment 3

7 years ago
TN == Traceable Native.  That would be called from trace, but on jsgamebench we seem to always be in the methodjit (something else worth figuring out).
Depends on: 622298

Comment 4

7 years ago
Oh, and I did look at the code; none of the classes involved are doing any of the fast-path stuff we can do in quickstubs, and adding it would be some work.  Probably better to just wait for the new bindings work, hence the dependency.

Comment 5

7 years ago
See bug 638321 comment 5: the XPC overhead for certain functions, at least Uniform3f, Uniform4f and BindTexture, is huge (it is as expensive as the function itself).

Comment 6

7 years ago
For Uniform4f:

| + 2.6%, nsIDOMWebGLRenderingContext_Uniform4f(JSContext*, unsigned int, unsigned long long*), XUL
| | + 1.0%, mozilla::WebGLContext::Uniform4f(nsIWebGLUniformLocation*, float, float, float, float), XUL
| | | - 0.7%, glUniform4fARB_Exec, GLEngine, XUL
| | |   0.1%, nsCOMPtr_base::assign_from_qi_with_error, XUL
| | |   0.1%, mozilla::WebGLUniformLocation::Release(), XUL
| | |   0.0%, nsCOMPtr_base::~nsCOMPtr_base(), XUL
| | - 0.4%, xpc_qsUnwrapArgImpl, XUL
| | - 0.3%, getWrapper, XUL
| |   0.2%, mozilla::WebGLContext::QueryInterface(nsID const&, void**), XUL
| | - 0.2%, castNative, XUL
| | - 0.2%, mozilla::WebGLContext::Release(), XUL
| |   0.1%, JS_ValueToNumber, XUL
| |   0.1%, glUniform4f, libGL.dylib

So despite being quickstubbed, we pay a huge price for just converting JS arguments to C++ values.

Comment 7

7 years ago
Another remark: we also pay a high price (10% in that function) for QueryInterface and Release.

Comment 8

7 years ago
Let's break down the time spent in Uniform4f:
* 38% spent converting JS arguments to C++ values
* 31% spent in OpenGL
* 12% spent in Release() (on WebGLContext and on WebGLUniformLocation)
* 12% spent in QueryInterface and assign_from_qi_with_error
Adding a bit for nsIDOMWebGLRenderingContext and nsIWebGLUniformLocation to DOMCI_CASTABLE_INTERFACES might help.
Now this is weird: a bug report complains that those uniformXx functions perform actually worse in Chrome than in FF:

@ Peter: thanks for the tip, I will try to make a patch.
Created attachment 560851 [details] [diff] [review]
Peter's magic DOMClassInfo tip

Finally got Peter to explain me what he meant!

Peter -> just one thing I don't remember. I can understand that this will allow to optimize things on the xpconnect/quickstubs side. But what about "my" side, in the C++ implementation of WebGL functions: there, I am doing QueryInterface to get e.g. a WebGLUniformLocation* from a nsIWebGLUniformLocation*. Will this QueryInterface magically go faster thanks to the present patch, or should I replace it by something else? (A plain cast?)
Attachment #560851 - Flags: review?(peterv)
Created attachment 560852 [details] [diff] [review]
remove quickstubbing blacklist

Some WebGL functions were opted out from auto-quickstubbing because they used to directly access the JS context to manipulate the JS return value. They don't do that anymore, instead they return a nsIVariant.
Attachment #560852 - Flags: review?(peterv)
Created attachment 560854 [details] [diff] [review]
remove (most of) quickstubbing blacklist

previous version didn't build, as getContextAttributes is still manipulating its return jsval. Other functions are fine.
Attachment #560852 - Attachment is obsolete: true
Attachment #560852 - Flags: review?(peterv)
Attachment #560854 - Flags: review?(peterv)

Comment 14

6 years ago
Benoit, if you list nsIWebGLUniformLocation as "builtinclass" in the IDL (see the way nsIDOMEventTarget does it, say), then you can assume that given an nsIWebGLUniformLocation the C++ class is one that you wrote.  If there's only one class that implements that interface, then you can just cast to it instead of using QI.
Should I be parcimonious in sprinkling "builtinclass" over the IDL?

I mean, should I only put "builtinclass" on nsIWebGLUniformLocation and other super-performance-critical interfaces, or is it OK to put it all over the place? What is the cost of putting "builtinclass" on more classes than necessary?

I'm asking because in the WebGL implementation we have some generic functions to handle WebGL function arguments, see GetConcreteObjectAndGLName and friends in WebGLContext.h, so it would be nice to keeping treating all WebGL function parameters in the same way, rather than having to treat the "builtinclass" ones differently.
...or wait, is it possible to determine, using template metaprogramming, whether a type is a "builtinclass"? That would allow me to update GetConcreteObject and friends to handle this in a completely fool-proof way.
(In reply to Benoit Jacob [:bjacob] from comment #15)
> I mean, should I only put "builtinclass" on nsIWebGLUniformLocation and
> other super-performance-critical interfaces, or is it OK to put it all over
> the place? What is the cost of putting "builtinclass" on more classes than
> necessary?

You can put it on any class that's never supposed to be implemented in JS. There shouldn't be any downside afaik. Note that from C++, for example from GetConcreteObjectAndGLName, there is no easy way to detect whether the interface is marked "builtinclass" or not.
Created attachment 564061 [details] [diff] [review]
use builtinclass and remove do_QueryInterface
Attachment #564061 - Flags: review?(peterv)
Attachment #560854 - Flags: review?(peterv) → review+
Attachment #560851 - Flags: review?(peterv) → review+
Comment on attachment 564061 [details] [diff] [review]
use builtinclass and remove do_QueryInterface

Review of attachment 564061 [details] [diff] [review]:

::: content/canvas/src/WebGLContext.h
@@ +2111,5 @@
>      if (isNull)
>          *isNull = PR_FALSE;
> +    *aConcreteObject = static_cast<ConcreteObjectType*>(aInterface);

It might still be a good idea to assert in a debug build that a QI gives the same object (especially since all those classes already have an IID). Could you also file a bug to add a way to statically assert from C++ that a given interface is marked builtinclass, and to do that here? Given that this is a fairly generic conversion function it'd be good that we could make sure that anything that this converts is safe.
Attachment #564061 - Flags: review?(peterv) → review+


6 years ago
Depends on: 694114
Filed follow-up bug 694114.
Assignee: nobody → bjacob
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
Created attachment 567068 [details] [diff] [review]
fix bug in assertion, bad usage of queryinterface

This crept in my last minute changes, unreviewed, to address review comments. sorry.
Attachment #567068 - Flags: review?(roc)
Attachment #567068 - Flags: review?(roc) → review+
These patches make a huge difference on Google MapsGL!
Should I try to get Aurora approval for these patches? If WebGL performance is considered strategic?
Comment on attachment 567068 [details] [diff] [review]
fix bug in assertion, bad usage of queryinterface

>+        nsCOMPtr<ConcreteObjectType> tmp(do_QueryInterface(aInterface, &rv));
Nit: this should actually read
nsRefPtr<ConcreteObjectType> tmp(do_QueryObject(aInterface, &rv));
[The effect is the same in this case.]


6 years ago
OS: Mac OS X → All
Hardware: x86 → All


6 years ago
Blocks: 697443
You need to log in before you can comment on or make changes to this bug.