Closed Bug 75620 Opened 23 years ago Closed 22 years ago

Change clients that link to xpcom from using nsMemory::*

Categories

(Core :: XPCOM, defect, P4)

defect

Tracking

()

RESOLVED DUPLICATE of bug 153215
mozilla1.1alpha

People

(Reporter: bbaetz, Assigned: dp)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(1 file)

Callers nsMemory::* #include "nsMemory.h", which contains method declarations
only.

So at runtime, everytime a string is allocated or freed, this then calls into
nsMemory (defined in nsMemoryImpl.cpp), which gets a global nsIMemory service,
and then performs a virtual function call on that, which resolves back to class
nsMemoryImpl (defined in nsMemoryImpl.cpp), which then calls PR_Free.

Strings also go through a templated string allocator which wraps nsMemory::Free
via a couple of virtual methods as well. Fun. (I can guess that this is designed
to allow memory pools, or something)

All of these just boil down into PR_Free, which is another (out of line)
function which just calls free (except, of course, #if defined (WIN16)).

This happens for almost every allocation/deallocation of memory in mozilla.

(There is an nsCRT::free(char*) method, but that just calls down to PL_strfree,
which again just calls free)

nsIMemory is used from JS, but thats only for one of the debug menu memory
options.

I believe that this is overkill. :)
So to keep the idl, and to please performance, we could use something like:

%{C++
#if 0
%}

idl fu

%{C++
#endif

inline implementations in c++

%}

Or is this sacrilege?
dp, is this something you or your team can get data on?
Assignee: kandrot → dp
If anyone wants to do some perftests (the only real way to see if this is 
faster)...
Keywords: perf
Reassigning to dougt, the new owner of xpcom. This is a good thing to do.
Assignee: dp → dougt
Blocks: 98275
Blocks: 71874
Comment on attachment 47375 [details] [diff] [review]
patch for testing purposes (diff -u)

r=dp
Attachment #47375 - Flags: review+
The memory pressure observers are nice things to have, and that's implemented in
nsMemory, iirc.  

Undoubtedly scc can speak more to the string allocators.
running an internal page loader test (cowtools), here are the raw results:

Current nsMemory implemention:
Test id: 3BB036B00B
Avg. Median : 823 msec		Minimum     : 271 msec
Average     : 1728 msec		Maximum     : 6164 msec

After change:
Test id: 3BB039A325
Avg. Median : 767 msec		Minimum     : 271 msec
Average     : 851 msec		Maximum     : 4137 msec

It does nothing to hurt performance.  

I need to understand what the benefit of use PR_Malloc is over just malloc. 
Should we just call it a day and use PR_foo, or do we really want to call
directly to the OS?

Also, do we really need to test the result of every memory allocation call for
failure so that we can FlushMemory?  Seams to me, if we get to the point of
malloc's failing, we are way too late.  
This *halves* our average?

dougt, can you run those tests again?

Since we don't support win16 (does NSPR still support it?), PR_Malloc == malloc,
except that the compiler can assume things about malloc (the results from two
calls to malloc don't alias).

Also, what about declaring PR_malloc (and nsMemory::Alloc, if possible)
__attribute__ ((__malloc__)) under gcc:

`malloc'
     The `malloc' attribute is used to tell the compiler that a function
     may be treated as if it were the malloc function.  The compiler
     assumes that calls to malloc result in a pointers that cannot
     alias anything.  This will often improve optimization.

This is only valid for gcc >= 2.9.6 though.
dougt, any chance that your second (w/ patch) run was hitting the cached pages
loaded by the first run?

Given threadsafe malloc, as dbaron points out, any trivial (few to dozen) cycle
layers on top should be noise.  But I'm all for cutting out layers.  In the JS
engine, malloc is used directly except where JS_malloc adds value by reporting
an out of memory condition before returning NULL, using JS's error/exception
reporting machinery.

I don't think every place that *call* malloc must check for null and call a
memory pressure API.  Rather, the theory I recall (warren's, based on wade
hennesy's idea) was to call the the flushers from a background thread, when we
noticed pressure (but before it got so high that malloc was likely to fail). 
waterson knows all!

There may have been (my memory is failing me) a related idea that subsystems
should flush when successfully allocating, when pressure was rising, so as to
avoid failing subsequently.  I forget the details of that was supposed to work.

/be
cached must have been primed as I see results very close to running without this
patch.  As long as the cost of two function class is cheap relative to the
malloc call, this patch will no buy us much.

brendan, check out the current nsMemoryImpl class. EVERY time you alloc
something, it checks that the result against null.  If the result of the
allocation is null, we call FlushMemory.  What is kinda funny is FlushMemory
does allocate memory.  So, if alloc failed, and the value requested is equal to
or greater than what FlushMemory requires, there will be a nice crash.  Maybe we
should just rip out these checks altogether?
Target Milestone: --- → mozilla1.0
Blocks: 92580
Blocks: deCOM
doug, can we try to fix this for soon, by 0.9.7?
Target Milestone: mozilla1.0 → mozilla0.9.7
Can we get apples-to-apples page-load results (cached for with and without this
patch, and uncached too, or averaged together in the same way for both trials)?

/be
dp, I think that this is a dup of a bug that you have been working on.  If so,
can you dup this one?
Yeah. I think the right thing is to make my bug a dup of this one. I will do that.
*** Bug 105501 has been marked as a duplicate of this bug. ***
nsMemory is a static class that should move into the xpcom glue library - as is.
 Any optimization to this class to call directly into malloc is a bad thing to
do since the purpose of this class is to allow plugin/component authors easy
access to global allocation.  

If clients are already linking to xpcom, then they should just be using plain
malloc/free anyways.  

The optimizaiton needed here is to make callers call the right allocator.  DP,
is it worth scrubbing the callers in the mozilla sourcebase to use the vanilla
allocator?  In the other bug you said that it was 0.4 percent of startup, but
noise possibly accounts for some of that.


Summary: calls to nsMemory::* go through too much indirection. → Change clients that link to xpcom from using nsMemory::*
Rick has some objections to changes any of the callers.  Rick, can you doc your
concern?
Why not:
 * change nsMemory to be a static class with inlined calls to malloc and free
 * change nsMemoryImpl to call the nsMemory inlined malloc/free, rather than the
other way around
 * Give the class in xpcom/glue (which would just be a simple way to maintain a
per-library pointer to the global nsMemoryImpl, the way nsMemory does now) a new
name other than nsMemory, since it would be for clients of xpcom that don't link
against xpcom, as opposed to nsMemory, which is for those who do link against xpcom.

(I still don't see why we need to care about pluggable allocators, though. 
We're nowhere near consistent enough.)
I _really_ don't see why we care about pluggable allocators for 1.0, for which
we know we need to get speed and space improvements from all over, and for which
we also know -- I think -- that we're not going to be able to support pluggable
allocators.

XPCOM 2.0, maybe?
Note that we can't call malloc/free directly - we have to use static functions
in another dll. The reason for this (which I didn't know when I filed this bug)
is that on windows, debug dlls use a different mscrt dll to non-debug dlls. And
malloc/free in these apparently use different heaps.

So if an object is created in one dll, with a malloc'd member variable, and
destroyed in one with different debug settings, Bad Things will happen. Or so
I've been told, by several people. I'd appreciate it if a windows person could
confirm this, though.

ISTR seeing a comment in the source somewhere about this, too, but I can't find
it now.
Then Windows builds can pay the cost of those static functions and let the rest
of us not pay that cost and can use inlined nsMemory methods.  That's an
advantage of encapsulating things in nsMemory.
it is not so much we want to replace a allocator globally. We want to allow
embedders the ablity to allocate and free into the mozilla environment.  If they
use their free() on a allocated buffer that mozilla created, this *will* cause
heap  corruption if their free differs from the free() that xpcom uses.

Rick also commented in the other bug - the solution to this our problem is not
to speed up free(), but to stop calling it so damn much.  

I do like what dbaron suggested.  This solves all the problems raised.  Comments
on that approach?  Also see 112262
OK.  i am really fed up with all of this CRAP about why calling nsMemory::Free() 
is so slow and how we should make it inline to speed mozilla up!!!

will someone please grow a brain and realize that malloc/free are SLOW and 
removing a function call is NOT going to make a huge difference one way or the 
other!

the problem is that we are calling free() too many times!  that's the issue that 
we need to address!  now how to make the function inline!!

-- rick
Well, I can't grow a brain, but maybe you can spare a cutting of your throbbing
cranium for a graft?

Nobody, that I can see, is saying that inlining nsMemory::* will cure all of our
performance woes, make us coffee and wash dmose's feet.  What we _are_ saying is
that memory management consumes a substantial amount of our CPU time, and that
inlining nsMemory::* will be a relatively easy way to recover some of that,
across the width of the application.

How do you propose to call free() less?  Leaking more, if it turns out to be
possible, isn't likely to help.  Do you have a better concrete means of
partially addressing the cost of our memory consumption (which is, and again
nobody disputes this, way too large) than that proposed here?
My jprof data show that of time spent within nsMemory::Free in page loading:

- 8% is in nsMemory::Free
- 3% is in nsMemoryImpl::Free
- 7% is in PR_Free
- the rest actually within free

I'm not inclined to believe that these numbers are accurate, but they're at
least suggestive that there's some real cost here.  virtual function calls are
expensive, especially on heavily pipelined processors, since an instruction
cache miss can cost hundreds of cycles (not just the 20 or so cycles from a
pipeline halt that one will get with every virtual function call).  It's also
not clear how much not using |malloc| and |free| hurts our compilers' ability to
optimize due to the assumptions malloc and free allow the compiler to make about
non-aliasing (could it affect optimization of other codepaths that don't go
through malloc and free?).

Finally, I think it would be easier to just try it and run some page load tests
in a well-controlled environment than to get better numbers using profiling. 
After all, we allocate too much, so it could be an easy performance win.  Fixing
our wider too-much-allocation problems probably requires both copy-on-write
strings and switching code over to interfaces that use AString rather than
"return"ing (allocated and copied) strings.  Judging by quick glances at page
load profiles, a vast majority (around 99% - to be exact, 156 out of the 157
timer hits in nsMemory::Alloc in a page loading profile I have) of allocations
that go through nsMemory::Alloc / nsMemory::Free are string buffers, many of
them string objects' internal buffers.  (We spend more time in new and delete,
but those aren't string buffers.)  Are we willing to abandon our frozen or
quasi-frozen interfaces that require such copying?

Of course, another solution would be to change the string code that's calling
nsMemory::Alloc and nsMemory::Free to call malloc and free directly, and then
we'd just avoid the problem, since by far the largest consumer wouldn't get the
extra ~20% (again, that's just a guess from jprof) overhead around malloc and free.
> Do you have a better concrete means of
> partially addressing the cost of our memory consumption (which is, and again
> nobody disputes this, way too large) than that proposed here?

Our memory consumption and the performance cost of calling malloc and free are
two very different problems.  Much of the latter is incurred for very
short-lived allocations that have very little influence on the former.  (Within
page load, allocation of content-model data structures (e.g., text nodes) stands
out as an exception.)
>> Do you have a better concrete means of
>> partially addressing the cost of our memory consumption (which is, and again
>> nobody disputes this, way too large) than that proposed here?
> 
> Our memory consumption and the performance cost of calling malloc and free are
> two very different problems.  Much of the latter is incurred for very
> short-lived allocations that have very little influence on the former.

Our memory consumption is a two-faceted problem: we use too much memory, which
bloats our heap, and we consume it too often, which hogs the CPU.  Those two
problems (space and time) are very different, and  I never claimed otherwise,
but they're both costs of our bad memory habits.
ok... i've taken my prozac and the world is looking a little rosier now :-)

i *really* do not think that inlining the calls to malloc/free on *any* client 
code is a good idea because it makes it *very* difficult for embedding 
applications to use their own memory managers.

if we decide to call malloc/free in the static XPCOM glue code, then we are 
subtly forcing embeddors to use the same MM as us (ie. MSCRT).  I believe that 
this is unacceptable for many applications.

the data that i collected about startup memory memory allocations indicated that 
for the first 40K allocations (that's all i botherer to track) most of them were 
held onto for very short periods of time!!  most of these temporary allocations 
were caused by temporary strings as out parameters...

so, if we are *really* worried about the overhead of free() during startup i 
think that it would be better to identify the (hopefully small number) of 
functions that are generating these string allocations and consider changing 
them to use nsAString (or some other strategy)...

-- rick
Embedding applications that don't want to have to use the CRT's malloc are
probably going to have to define their own malloc symbol, because it's used all
over just about every library in the world.  And if they have their own malloc
symbol, won't we pick that up?  I was pretty sure I've read pages talking about
how to get the MSCRT to use a custom allocator, underneath its own malloc, but I
can't find them right now.  I'll look again later.

I agree that nsAString is probably a win too, but it's almost certainly more
work than this change.
If we want embedders to be able to use pluggable allocators without overriding
malloc and free themselves, we'd move to a world where an XPCOM string getter
couldn't just fill in an out param with PR_smprintf or PL_strdup.  Instead it
would have to use nsMemory::Alloc (perhaps via ToNewCString and
nsDependentString or through nsCRT::strdup, if nsCRT::strdup actually called
nsMemory::Alloc, which I don't think it does right now), copy the string into
the out buffer and fill in the out param, and call PL_strfree or
PL_smprintf_free.  Do we really want to pay the cost of that consistency for
pluggable allocators.  (We can't even keep g_free and g_malloc vs. our own
allocators straight, and we only deal with those in a handful of files -- see
bug 95599.)

I'd argue that we wouldn't want to deal with the cost of all that allocation. 
Or are we willing to abandon passing ownership of allocated memory from C to C++
(or come up with an even uglier solution)?
i'm not talking about embeddors that want to use custom allocators without 
overriding malloc/free...  i'm talking about embeddors that *have* done this 
work.

my concern is that if nsMemory:: in the glue code calls directly into malloc, 
then if an embeddor has overridden it we could end up corrupting their heap.

when we allocate memory within mozilla we always use MSCRT... but suppose an 
embeddor wanted to use say SmartHeap... they would have to create *special 
purpose* allocation/free routines which called MSCRT *just* to deal with freeing 
the blocks of memory that we handed out...

that's why i strongly feel that the nsMemory glue code *must* call through 
nsImemory...
> that's why i strongly feel that the nsMemory glue code *must* call through 
> nsImemory...

In my proposal above the nsMemory glue code wouldn't be used by embedders -- it
would be used only by those who link against XPCOM.

Furthermore, why is it so important for it to go through nsIMemory if the
nsIMemory implementation will always call malloc/free (which is what it seems
you're saying above -- that an embedder would change their allocator but not
ours, and would need to use ours explicitly in some places).
moving out a milestone.
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Well, the checkin of bug 112262 makes my suggestion in comment #19 a lot harder
to implement now.
we would need to rename 'xpcom linked' nsMemory users.
> i *really* do not think that inlining the calls to malloc/free on *any* client 
> code is a good idea because it makes it *very* difficult for embedding 
> applications to use their own memory managers.

As per Brendan's comment #10, the JS engine calls down into malloc(3), and is
unlikely to change that behaviour.  So embedders that can't use malloc(3) are
going to need to at the least re#define malloc to point at their own allocator.
 At that point, they can rebuild our tree with minimal additional pain. 
(Avantgo uses spidermonkey on the Palm, which to my knowledge doesn't have a
malloc(3), so there are existance proofs of the viability of this approach.)

If we just need a way for embedders and XPCOM components to use the same
allocator we use, then let's document malloc(3)/free(3) as that allocator, and
forget this whole nsMemory thing, no?

(I'm a little ticked that the glue-code stuff went ahead, without responses to
dbaron's counterindications in this bug, but I guess we can back it out later if
we need to?)
Maybe the avocates of removing nsIMemory and nsMemory are missing the point.

1.  Currently client replacement of the allocator via nsIMemory is NOT
SUPPORTED.  It may be at some point.  

2.  When you embed gecko, there are interfaces which pass buffers around.  These
buffers need to be allocated and freed from the same heap.  We provide a
interface to access the same heap.  

3.  Some day, the nsIMemory implemention may be better than malloc.  

4.  Why does everyone call into nsIMemory? We could have asked everyone to use
malloc/free but history has proven that this does not work.  Ask potts for the
number of headache he gotten from chasing heap corruption down caused by
mismatch allocations/deallocations.

5.  This change was requested buy the mozilla embedding and plugin teams. 
dbaron's and dp's comments about speeding up startup if we inline free() don't
fly with me (the data is not conclusive and imo it is the wrong approach).

shaver, don't be ticked.  It is Friday :-)
I'm not complaining about that interface.  I'm complaining that all of our
internal code that's linked against XPCOM has to go through it.  Now that the
name that we're using throughout all our code is the name of what you put in
xpcom/glue, we either have to change all the Mozilla code or change the name of
what you put in xpcom/glue if we want to fix this.
If you want to mix allocators, you're going to get heap corruption.  Just like
you get corruption if people mix different system allocators, you'll get
corruption if people mix nsMemory:: with the wrong system allocator.  So don't
mix allocators.  Just don't.  I don't see why using the term "nsMemory" rather
than "free" makes the problem go away, or even mutate interestingly.

The history I have -- 5 years of the JS engine being used by hundreds of
embedders and extenders all over the place -- says that using malloc and free
works great.  If people are mixing up allocators, they will lose.  People who
can't use system malloc(3) are already faced with the effort of making their
other system-and-component libraries replace their malloc(3) calls with
something that points to their custom allocator.  Many environments (Palm, for
example) have existing bodies of work that make that quite a reasonable task.

Do all Windows applications have to use this extra layer of indirection?  Did
4.x have plugin troubles when people didn't use NS_Alloc or whatever?  Maybe I
just need to read more about Windows applications and heap management.  I'll go
and read up this weekend, so I don't look like an idiot.

But what sort of data would be conclusive?  What problems are there with
dbaron's data?

(I think this sucks just as much on a Friday as on any other day. =/)
i will be the first to back this out and stript mozilla of nsMemory all together
if we can get complete agreement from all parties: rick potts, patrick beard,
comment please.  

(I think this sucks less on fridays since I am usually drinking. :-)  )
On the Mac, we always have had to use the browser's allocator within plugins.
There is no standard *FAST* allocator in the classic Mac OS. malloc just doesn't
exist. We don't need nsMemory, but we do need nsIMemory. Static linking is bad.
dougt, I'm gonna wrassle with ya:

>1.  Currently client replacement of the allocator via nsIMemory is NOT
>SUPPORTED.  It may be at some point.  

And monkeys might fly outta my butt.  In the order of things to do for 1.0 and
any milestone within sight, this is so far over the horizon that I bet donuts it
won't ever happen.  Are you citing it as a "might be nice" argument?  That's not
compelling.  "Might be nice" is too often a warning sound.

>2.  When you embed gecko, there are interfaces which pass buffers around. 
>These buffers need to be allocated and freed from the same heap.  We provide a
>interface to access the same heap.  

Good for embedders, but no reason to avoid doing what dbaron suggested.

>3.  Some day, the nsIMemory implemention may be better than malloc.  

Here I'll bet more than donuts.  If we put in a better allocator, it will be
called malloc, because of 1, above (too many naked malloc refs in our code). 
Better malloc is better, period.  If the embedder can't use it, then the
embedder will have to rebind our malloc refs, too -- or do some kind of
prelinking of our code against the better malloc, and the embedding code against
the other malloc.  In the latter case, nsIMemory helps the embedder, but it only
adds an unnecessary layer in our code.

>4.  Why does everyone call into nsIMemory? We could have asked everyone to use
>malloc/free but history has proven that this does not work.  Ask potts for the
>number of headache he gotten from chasing heap corruption down caused by
>mismatch allocations/deallocations.

Everyone does not call nsIMemory.  For example, nsCRT::strdup calls PL_strdup,
which calls malloc -- and too many XPCOM methods use nsCRT::strdup still (see
http://lxr.mozilla.org/mozilla/search?string=nsCRT%3A%3Astrdup).

>5.  This change was requested buy the mozilla embedding and plugin teams. 
>dbaron's and dp's comments about speeding up startup if we inline free() don't
>fly with me (the data is not conclusive and imo it is the wrong approach).

What does the first sentence have to do with the second?  You need to provide
some numbers as suggestive as dbaron's, but in the opposite sense, to dismiss
his data (which he himself has already qualified appropriately).  The change
being requested by embedding and plugin teams doesn't cut any ice against
dbaron's profiling results -- it's a different universe of discourse, it is not
technical data.

Of course we need to cut down on malloc/free calls (I'm to blame for our simple
COM-ish string/wstring-map-to-char-arrays decision as much as anyone -- warren
will never forgive me!).  But we also need to cut out gratuitous layering, which
blows modern, heavily pipelined CPUs' caches.  We don't need to field- upgrade
the malloc used by every part of Mozilla for 1.0 or any milestone being planned,
so I do not think dbaron's suggestion should be dismissed or opposed by your
code change in bug 112262, without all of us first agreeing about what we are
trying to do here, and how.

/be
LONG-WINDED PREFACE:

nsMemory::Alloc currently maps to the system malloc.  all of the allocations
mozilla generates result from a call to the system malloc.  mozilla does not
implement the system malloc.  the system malloc can change and mozilla would
still work just fine, provided the runtime linker can find a malloc for mozilla
when it's libraries are loaded.

what we need to do is ensure that people building apps against mozilla always
link their apps such that mozilla (as well as the embedder's app) pick up the
one (and hopefully only) global malloc symbol.  normally, this will just be in
libc (or MSCRT) and everything will just happily work.

of course this becomes difficult if the embedder #define's malloc to something
else... like my_own_private_malloc.  then, we've got the potential for mixing up
allocators, and the nightmare ensues.

IMO the solution to this problem is to make it very clear to embedders that they
must use the system malloc/free when interacting with mozilla because this is
the malloc/free that mozilla is going to use when it hands back strings and
what-not.

maybe i'm trivializing things here, but i see no reason why this can't be solved
with proper documentation.  if an embedder is using a special malloc for their
own stuff, then they should be careful to isolate the code that talks to mozilla
in such a way that they can ensure that the global malloc/free is always used.

of course, on windows there's the problem of debug MSCRT vs. optimized MSCRT,
which probably means on windows, we need to provide xpcom (glue or otherwise)
entry points to the malloc/free that mozilla likes.  these entry points MUST NOT
be inlined for the embedder because that would defeat the purpose.  they would
have to be implemented inside the library so that they would _want_ the same
malloc/free as the rest of mozilla.  this is IMO nsMemory::Alloc,
nsMemory::Free, etc.  they should be implemented in a library that we build such
that they _want_ to the same malloc/free as the rest of mozilla.

so, i guess what i'm saying is that nsMemory::Alloc need not be implemented as a
call to some nsIMemory implementation.  the implementation of nsMemory::Alloc
should just call malloc.  if there are memory pressure things that we want to
do, we can still do them.  for example, if malloc fails, we could talk to some
interface to handle the memory pressure (failure) condition.

also, i'm not advocating getting rid of nsIMemory.  specific components may like
to abstract the memory allocator (eg. nsIPipe -- which is particularly
significant to me since i am working on a specialized allocator for necko that
is optimized for 4k allocations only).

CANDIDATE SOLUTION:

i think what's most important is that we make it possible for mozilla libraries
to be built such that nsMemory::Alloc turns into an inlined malloc (why? because
we can, and anything else is just overkill).

for embedders, we need a different solution.  the implementation could be
controlled with a preprocessor directive (much like -D__KERNEL__ in linux land).
for embedders that want to link to xpcom.dll, nsMemory::Alloc should be a DLL
export implemented as a call to malloc.  for embedders that don't want to link
to xpcom.dll, nsMemory::Alloc should be a call to a nsIMemory implementation
just as it currently is.

any problems with this solution?
ok... if everyone is hell-bent on making mozilla lightening fast by inlining all
calls to free, then far be it from me to stand in the way of genius :-)

my only line in the sand is to preserve the existing use of nsMemory by clients
outside of the mozilla codebase.  ever since we created the static nsMemory
class in XPCOM we have actively told external developers to call it!

i can live with dbaron's proposal to create a static class which calls free
directly -- i think that it it totally unnecessary and harmful because it
diverts attention from the real problem which is the frequency with which we
call free not the two extra function calls we make in the process...

but i cannot live with the external, public interface being 're-invented' in
such a way that embeddors can no longer use it...

i'm really shocked that brendan, of all people, is being so capricious about
changing a longstanding, public interface for no good reason.

if people feel strongly that calls to free() must be made faster, then by all
means, create a new static class, say nsMonkeysFlyOutOfBrendansButt, which has
inline versions of alloc/free and fix up everyplace in our codebase that
currently calls nsMemory::Free(...) to call nsMonkeysFlyOutOfBrendansButt::Free()

but *do not* make the existing nsMemory class unusable by external consumers!!

in the end, does anyone honestly expect to see a measurable performance
improvement by these changes that merits the amount of time (and emotion) that
has *already* been invested in this discussion?

-- rick
i like dbaron's suggestion (comment #19) except for the third part.  i agree
with rick in that we should not try to change what API embedders use.  in fact,
i don't see why it wouldn't be possible to allow everyone to call nsMemory::
methods.

as i said previously, why can't there be multiple versions of the nsMemory::
methods.  depending on some #define(s), you'd get different version(s).

for example, there could be some #define that is used when building mozilla...
say MOZ_INLINE_NSMEMORY that would cause all of our mozilla components to build
against an inline version of the nsMemory:: methods.  without this #define,
you'd just get the plain old nsMemory:: impl that we currently have.

this seems like a very minimal and sufficient solution.  or, am i missing something?
> ok... if everyone is hell-bent on making mozilla lightening fast by
> inlining all calls to free, then far be it from me to stand in the way
> of genius

Everyone here who thinks that this, on its own, will make Mozilla
lightning-fast, please raise your hand.  We don't need any more strawmen here,
thanks; the actual issues are contentious enough.

I don't see an @FROZEN in nsMemory.h (though I do see one in nsIMemory.idl; but
we're not discussing changing the binary interface there, just not using it
anymore).  Who's been recommending unfrozen interfaces to embedding folk
_again_?  Haven't we been burned by that enough?  Who has a
released-to-the-world component that will survive all the other freezing-driven
changes but will be burned by this?

> i'm really shocked that brendan, of all people, is being so capricious about
> changing a longstanding, public interface for no good reason.

(Brendan, quite obviously, thinks that there is good reason to do this, as do
dbaron and I.  If you don't, that's fine, but don't cons up inconsistencies by
projecting your premise onto other people's reasoning.  This isn't some wanky
nsAllocator->nsMemory renaming.)

I don't think anyone's attention will be diverted from the frequency of
malloc/free calls by the reduction in cost of those calls -- unless it actually
takes the malloc/free patterns out of the profiler's crosshairs, in which case
we shall all rejoice.  But anyone who thinks that we're not going to need lots
of these sorts of maybe-a-percent wins to get in shape, even if some huge 10%
wins fly out of someone's butt, is, deluding themselves, IMO.  We are _always_
going to have to allocate and free lots of memory as it crosses XPCOM interface
boundaries, and I don't foresee us shedding the over-componentization albatross
by 1.0, by a long shot.

And I don't care at all about sunk-cost economics arguments around how we
shouldn't bother to do this, because it won't be enough gain to have justified
the emotional energy already put into it.  (Surely getting _no_ gain by walking
away now is even worse, by that logic, not that I'm especially fond of this
variety of sunk-costing either.)  As far as measurable goes, I think dbaron has
already given us _measurements_, with responsible qualification and description,
of the kind of gain we're looking at here.

If you think that more time should be spent making us allocate/free less, then
why are you spending that time posting to this bug? =)
> for example, there could be some #define that is used when building mozilla...
> say MOZ_INLINE_NSMEMORY that would cause all of our mozilla components to build
> against an inline version of the nsMemory:: methods.  without this #define,
> you'd just get the plain old nsMemory:: impl that we currently have.

In some cases (the ones that provide the rationale for nsIMemory), an embedder
who does not link against XPCOM needs to use an nsMemory class that maintains a
pointer to an nsIMemory object so that the calls to malloc and free are made in
another library.  At least, I think that's the case if I understand the Windows
debug/opt issue.  As of this afternoon, nsMemory is in xpcom/glue, which is
meant to be statically linked in to any library that doesn't want to link
against XPCOM.  However, we don't want all the code in our browser to have to go
through the extra level of (virtual function) indirection.  |#define nsMemory
nsMemoryForMozillaItself| (or whatever) seems a bit ugly to me, so I'd rather
the callers who don't link against XPCOM be the ones who deal with a new name. 
After all, nsMemory has only been available to those callers since this
afternoon, so there shouldn't be huge compatibility issues to deal with.
dbaron: good point.  today, consumers of nsMemory:: are implicitly linked to
xpcom.dll, and so there really isn't anyone depending on the nsIMemory indirection.

however, making the nsMemory:: impl's inline is likely to cause problems for
embedders (even though this is not a frozen API).  is that something we're OK
with?  do we have any idea of how many embedders would even care?  like it or
not, i think it is something we should consider.

as a side note: why is it so important for embedders to be able to explicitly
link to xpcom.dll?  why does implicit linking not suffice?
>i'm really shocked that brendan, of all people, is being so capricious about
>changing a longstanding, public interface for no good reason.

Rick, I'm not advocating changing the public interface embedders have been told
to use, but I'm sure I blew it by not saying so more clearly, when I was (a)
defending the important part of dbaron's proposal ("inlining" calls within
Mozilla code) and (b) wrassling with dougt over the lack of agreement before he
took action.

If it isn't clear by now, I'm with darin: keep embedders using a public,
"indirect-able" API, but inline our internal calls that must always go to
malloc-the-symbol-we-link-against.  I guess that means (and this is what dbaron
was trying to avoid) that our internal calls will have to be renamed.  Or does
it (see dbaron's comment #48)?  Are embedders in fact using nsMemory, or only
nsIMemory?

I'm surprised by all the emotion too -- for my part, this is not a burning issue
that we need to rush along, or fret over excessively compared to any other
problem, and I hope people will believe me when I say that I'm not mad at anyone.

/be
the nsMemory class is use all over the place.  Take a look at the galleon code
as an example.  Just an small example is this:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/galeon/galeon/src/mozilla/ExternalProtocolService.cpp?rev=HEAD&content-type=text/vnd.viewcvs-markup

I have looked at many gecko embeded clients.  all of them use this class.
Throwing out pluggable allocators seems to imply throwing out our only current
hope for dealing with low-memory situations in any reasonable way (the memory
flusher stuff).  It seems unfortunate that folks are willing to do this without
proposing some alternate solution to the low-memory issue.
dmose: from my comment #44:

> so, i guess what i'm saying is that nsMemory::Alloc need not be implemented as a
> call to some nsIMemory implementation.  the implementation of nsMemory::Alloc
> should just call malloc.  if there are memory pressure things that we want to
> do, we can still do them.  for example, if malloc fails, we could talk to some
> interface to handle the memory pressure (failure) condition.

why isn't memory pressure handling orthogonal to memory allocation?  i mean
isn't it something done only when malloc fails?
As I said to dmose the other day, waiting till malloc fails usually means
waiting till the app has chewed up all swap and thrased your system to its
knees.  That's too late.  Anyway, it has nothing to do with *how* one calls the
malloc-wrapper, which could still be inlined.

dougt: well, more _de facto_ standardization of nsMemory.h by its use among
embedders does make me want to preserve compatibility -- although if you feel so
strongly, then shaver's comment #47 should guilt you into putting a @status
FROZEN in nsMemory.h, no?

So, if we want to support nsMemory usage among current embedders' codebases,
then I think darin has the right solution: an #ifdef MOZILLA_CLIENT (JS uses
this macro already, defining it for itself when mozilla/js/src is built as part
of Mozilla-the-appsuite/platform) or similar test.

/be
things in xpcom/glue do not need to be "frozen".  Embedders are expected to take
snapshots of this code and link against that.  It might help if I finish the
document regarding this and post it to the newsgroup.  :-)

If someone wants to make this class inline when it is linked into xpcom, fine by
me.  Note that the class can not inlined for anything compiled into the glue
library (namely mozilla/strings).  
On Friday, December 7, 2001, at 08:58  PM, shaver@mozilla.org wrote:

>I don't see an @FROZEN in nsMemory.h (though I do see one in
>nsIMemory.idl; but we're not discussing changing the binary interface
>there, just not using it anymore).  Who's been recommending unfrozen
>interfaces to embedding folk _again_?  Haven't we been burned by that
>enough?  Who has a released-to-the-world component that will survive
>all the other freezing-driven changes but will be burned by this?

As long as you mean by "just not using it anymore",  not using it
internally to implement the nsMemory class, I have no problem with this
statement. As long as we keep the nsIMemory interface useable via the
service manager, plugin developers should be be unaffected.
dp, cathleen: do we need a spin-off bug, or should this bug be reassigned to one
of you (dp :-) to finish fixing, by inlining conditionally at compile time, for
"inside the embedded Mozilla" fast paths to malloc, etc.?

/be
I wont mind fixing this just as wouldn't mind fixing it, I think. Doug why not
squish this when you have the chance ? Reassign to me and I will take care of it.
Over to dp.  
Assignee: dougt → dp
I will hazard to summarize this:

- Caller of nsMemory:: that link with xpcom changed to call malloc/free directly
  Intent: Save the two extra levels of call indirection that happens

- Everything else stays exactly the same.
Status: NEW → ASSIGNED
Priority: -- → P4
Target Milestone: mozilla0.9.8 → mozilla0.9.9
How about just doing the following:

* avoid the global service get and the virtual function call
 
* avoid using PR_Free() under the hood

* keep the existing methods in nsMemory so that the memory-pressure observer
stuff stays intact.

* inline the nsMemory::* methods


Moving past Mozilla 1.0
Target Milestone: mozilla0.9.9 → mozilla1.0.1
Target Milestone: mozilla1.0.1 → mozilla1.1
dup of 153215.  153215 has a better patch, than is why this is being dup'ed of
that. :-)

*** This bug has been marked as a duplicate of 153215 ***
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: