Open Bug 250107 Opened 20 years ago Updated 2 years ago

inline nsQueryInterface operator () ... and more

Categories

(Core :: XPCOM, defect)

x86
All
defect

Tracking

()

People

(Reporter: bernard.alleysson, Unassigned)

Details

(Keywords: perf)

Attachments

(1 file, 7 obsolete files)

nsQueryInterface doesn't inherit from nsCOMPtr_helper and thus nsQueryInterface
operator() is not virtual and can be inlined. Because it is not inlined at -O1
and there's only 2 call sites (only one gets compiled actually), operator() is
simply removed and the original call is inlined by hand.
It will completely the operator() call overhead noticed in bug 249652.
Same applies to nsQueryInterfaceWithError operator ().
I only tested this with MSVC 7.1.

See the patch for more information.

Note that assign_assuming_AddRef() calls in nscomptr.cpp are not inlined at -O1.
They could be replaced by some macros to force inlining.
Note that the code can be simplified if it is sure that a NULL pointer is
returned by QI when QI fails (see comment in the patch).
Note that other operator () from classes inheriting from nsCOMPtr_helper could
be simplified also by changing the signature to
void operator() ( ... ) instead of nsresult operator () (...) and ensuring that
the returned pointer is NULL on error (just forget about the result of QI)
assign_assuming_addref() is meant to be inlined because it is defined inline
with the class declaration but it is not inlined with MSVC 7.1 -O1 build, force
inlining with the macro NSCAP_ASSIGN_ASSUMING_ADDREF

With this patch, I have verified that all code of nscomptr.cpp is inlined, no
more call overhead for operator() and assign_assuming_addref(). It saves a few
cycles for each of nscomptr.cpp methods and they are called quite a lot...
Make all nsCOMPtr_helper return "void" instead of "nsresult" and insure that
they set the answer to NULL in case of error (most of them did that already).
Before : 2 checks, one in assign_from_helper(), one in operator(). After: only
one check, in operator(). With the removing of the return value + the check
saved, it saves a few cycles of every operator() call, and there are quite a
lot...
It is the final patch.

The standard nsISupports implementation returns NULL on error, so I guess that
it is the contract. Removed the checks for failure in assign_from_qi() and
assign_from_qi_with_error(). Also changed some QueryElementAt implementation to
match nsISupports and removed checks from the operators () using them.

I'm setting the perf keyword because it touches some top functions.
Keywords: perf
Summary: inline nsQueryInterface operator () → inline nsQueryInterface operator () ... and more
Attached patch final patch (obsolete) — Splinter Review
Sorry, I've changed the signatures (again):

"virtual void operator()( const nsIID& aIID, void** aInstancePtr) const;"
to "virtual void* operator()( const nsIID& aIID ) const;"
because the return value is *aInstancePtr

and

"void assign_from_qi_with_error( nsISupports *rawPtr, nsresult *errorPtr, const
nsIID& iid )"
to "nsresult assign_from_qi_with_error( nsISupports *rawPtr, const nsIID& iid
)"
because the return value is nsresult

The conversion was pretty straightforward. The assembly code is much better
now. Less parameters on the stack, eax is used for the return value, ecx for
"this", and hopefully with the __fastcall stuff edx will hold &aIID. It will be
just perfect.

I think that the next step would be to remove more operator() virtualness but
it would require to add headers (nsIWeakreference.h and stuff) to nsComptr.h
and/or to move code around. nsQueryInterface also looks quite a bit useless now
(remove it ?).
Can you really change the signature of nsCOMPtr_helper? Won't that break plugins?
The reason I didn't inline nsQueryInterface::operator() right away was that I
wasn't sure if we ever had NSCAP_FEATURE_USE_BASE undefined. In that case
nsQueryInterface::operator() is called from the function 
nsCOMPtr<T>::assign_from_qi( const nsQueryInterface qi, const nsIID& aIID )
which is potentially compiled once per .cpp file, so an inlined
nsQueryInterface::operator() could increase the binsize a lot.

However, there are a lot of 'if's in the above (which is why I didn't deal with
this in the first place). We might define NSCAP_FEATURE_USE_BASE in all builds
that we care about. And even when it's undefined the linker might remove all
redundant functions.
(In reply to comment #7)
NSCAP_FEATURE_USE_BASE is undefined when NS_DEBUG is defined and otherwise it is
explitcitly set on top of nsCOMPtr.h, so I guess that NSCAP_FEATURE_USE_BASE is
only undefined with debug builds. With my patch, it adds 190 Ko of code (didn't
try without).

> The reason I didn't inline nsQueryInterface::operator() right away was that I
> wasn't sure if we ever had NSCAP_FEATURE_USE_BASE undefined. In that case
> nsQueryInterface::operator() is called from the function 
> nsCOMPtr<T>::assign_from_qi( const nsQueryInterface qi, const nsIID& aIID )
> which is potentially compiled once per .cpp file, so an inlined
> nsQueryInterface::operator() could increase the binsize a lot.

I can add a macro, same thing I've done with assign_assuming_addref() (it isn't
inlined when FEATURE_USE_BASE is undefined, but is always inlined in nscomptr.cpp).

>Can you really change the signature of nsCOMPtr_helper? Won't that break >plugins?
I don't know. Plugins should link with a static glue library. I thought all
plugins would go through xpt calls and the static glue would contain static
implementation of ncCOMPtr_helper methods, thus no need to link against
xpcom.dll. Maybe I could keep the old signatures but only use the new ones (be
sure they are not removed by the linker). I'll test some plugins. 
 
Attached patch patch v1 (obsolete) — Splinter Review
I added some macros and now with a build without NSCAP_USE_BASE I'm at +180 Ko
instead of +190 Ko. I tried different plugins (flash, java, quicktime, wmp,
adobe pdf). They all work fine. The ActiveX plugin is broken because it links
against xpcom.dll (import a lot of stuff) but I've rebuilt it and used
successfully.

But it is even better !

I've made the constructors "nsCOMPtr( const nsQueryInterface qi )" and
"nsCOMPtr( const nsQueryInterfaceWithError qi )" inline (see patch v1). For
instance the first one is just now:

nsCOMPtr( const nsQueryInterface qi ) : nsCOMPtr_base(qi.get())
// assign from |do_QueryInterface(expr)|
{
  if (mRawPtr)
      mRawPtr->QueryInterface(NS_GET_IID(nsISupports),
NS_REINTERPRET_CAST(void**, &mRawPtr));
}

it is very efficient, it is now much better to write:

nsCOMPtr<nsIXXXXXX> ptr(do_QueryInterface(other)); // optimized, inline
nsCOMPtr<nsIXXXXXX> ptr(do_QueryInterface(other, &rv)); // optimized, inline

than

nsCOMPtr<nsIXXXXXX> ptr = do_QueryInterface(other); // not optimized, not
inline
nsCOMPtr<nsIXXXXXX> ptr = do_QueryInterface(other, &rv); // not optimized, not
inline

because we don't have to check the old pointer and release it if it's not NULL
and because it is totally inline. The generated code is the same you would
write with raw COM pointers. No more assign_from_qi() overhead.

There's no code size increase (I've done my best to avoid it at all cost).
There's even a small footprint win (a few Ko).

I've also rewritten nsGetInterface operator() to use raw pointers, because even
with the optimizations above, the ~nsCOMPtr() was not inlined. It is much
better now.

Now I am looking into removing nsGetInterface operator() virtual call (I don't
think I can inline do_GetInterface() because it is a bit large).
And there's also a lot of cleanup to do to remove bad nsCOMPtr construction
code, patterns like "ptr = do_QueryInterface" to replace with
"ptr(do_QueryInterface)")

I've also run browser buster, I went to all my favorites sites, used print
preview : no problem at all, it works for me
Attachment #152476 - Attachment is obsolete: true
Attachment #152482 - Attachment is obsolete: true
Attachment #152491 - Attachment is obsolete: true
Attachment #152507 - Attachment is obsolete: true
Attachment #152522 - Attachment is obsolete: true
(In reply to comment #9)
> nsCOMPtr<nsIXXXXXX> ptr(do_QueryInterface(other)); // optimized, inline
> nsCOMPtr<nsIXXXXXX> ptr = do_QueryInterface(other); // not optimized, not

hm, do compilers really not optimize the second to be exactly the same as the first?
I'm still nervous about changing the nsCOMPtr_helper signature. I guess it might
work since nsCOMPtr and nsCOMPtr_base are the only callers of nsCOMPtr_helper
and it lives in the gulecode which is linked into all plugins. It would be nice
if someone that knows this code better then me could verify this though
(bsmedberg, dbaron, bryner, scc). Triel and error in nsCOMPtr makes me nervous.


> nsCOMPtr<nsIXXXXXX> ptr(do_QueryInterface(other)); // optimized, inline
> nsCOMPtr<nsIXXXXXX> ptr = do_QueryInterface(other); // not optimized, not

I too would think that these compile to the same thing. In fact, the C++
specification says that these mean the same thing as long as nsCOMPtr<T> has a
ctor that takes a whatever-do_QueryInterface-returns. It shouldn't even need to
rely on the optimizer. This is a behaviour inherited from C language.
|int i = 5| does not mean "create i and then assign it 5". It means "create an i
with the value of 5". Same thing goes for classes.

There are articles online but i can't remember the site off the top of my head.
I've just made some breakpoints randomly (xul), and I can see that you're right.

I spoke too fast.

But it's worse, in that file the contructor is not inlined but it's still 2
lines of code and 30 bytes ! I guess I will have to check all this again later.
I worked on a bit of code (nsGetInterface operator()) and I tried to optimize
it. I swear it was inlined there ! Maybe the compiler (msvc7.1) can choose not
to inline some template methods in one dll (layout), and inline in another dll
(xpcom) ? Maybe it is smarter than I am :-) I'll check that later. 

I don't know if I can change nsCOMPtr_helper methods but I noticed that in bug
249652 NS_FASTCALL is applied to all methods. It is the same as changing
parameters : the plugin will crash if it is linked dynamically with xpcom.dll. I
think that it changes the name decoration so the symbols will not be found.

Some input from people that knows about this code would be great ! I think that
even if it's not as good as I thought it would be in the first place it's still
an improvement.
Attached patch updated with detailed comments (obsolete) — Splinter Review
The fact that the new nsCOMPtr contructors were inlined at some places and not
inlined at other places was expected. Declaring a function inline just makes it
a good candidate for inlining, nothing more. Then, given the constraint on code
size (/O1), the compiler can choose to inline or not. It depends on the
compiler. So I've just removed the inline code from the constructors for more
safety and went back to make a function call instead. 

I've added detailed comments in the patch, I'm just summarizing the benefits of
the patch:

a) Remove some call overhead from all operator() by changing the declaration of
all operator() and adjusting the implementation accordingly. Saves a few cycles
on every operator() call.

b) Inlined nsQueryInterface operator() and assign_assuming_addref(). No more
call overhead from these functions at all. No function calls from nsCOMPtr.cpp
anymore. It is done by using some macros. No code size increase when
nsCOMPtr_base is not used because the common code is factored out of nsCOMPtr<>
into a new nsCOMPtr_factor class (jonas request).

c) New constructors from nsQueryInterface. Saves a few cycles on every
constructor by providing specialized assign_from_qi function for this
particular case.

d) Faster do_GetInterface() by removing all overhead from nsCOMPtr<> in its
implementation. Saves one pair of AddRef()/Release() calls for every use of
do_GetInterface() (there are more than 400 occurences all over).

code size numbers:

without the patch: 12984832
with the patch:    12977152 (-7.5 Kb)

Most of savings (6 Kb) come from d) because removing nsCOMPtr<> remove some
hidden inlines constructor/destructor call from nsGetInterface class. Other
changes account for 1.5 Kb only.

when nsCOMPtr_base is not used:
without the patch: 13178368
with the patch:    13197312 (+18.5 Kb)

Code size increased because of c) because it introduced 2 new nsCOMPtr<>
methods and these could not be factored out because they use mRawPtr member.

Feel free to take a look at the patch. Any comments are welcome. I don't know
if I have to go for reviews now. I think I would need someone to test this on
linux/mac before. I can also provide smaller patches based on a) b) c) or d) if
you're only insterested in part of it.
Attachment #152683 - Attachment is obsolete: true
Comment on attachment 153142 [details] [diff] [review]
updated with detailed comments

I'd like to know what reviewers think of it. I'm running with the patch, tested
the plugins, run browser buster. No problem.
Attachment #153142 - Flags: review?(dougt)
Attachment #153142 - Flags: review?(dougt) → review?(scc)
Attachment #153142 - Attachment is obsolete: true
Attachment #153142 - Flags: review?(scc)
Attachment #153780 - Flags: review?(scc)
Assignee: dougt → nobody
QA Contact: xpcom
This this something we want to resurrect? 

Seems like this might still be useful. Presumably we want to at least make sure nsQueryInterfaceISupports::operator() gets inlined into its two callsites?

Flags: needinfo?(nfroyd)

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #17)

Seems like this might still be useful. Presumably we want to at least make sure nsQueryInterfaceISupports::operator() gets inlined into its two callsites?

I think it already does, thanks to LTO? readelf -sW $NIGHTLY/libxul.so |grep nsQueryInterface doesn't seem to turn up anything related to this bug.

Flags: needinfo?(nfroyd)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: