Closed Bug 113393 Opened 23 years ago Closed 23 years ago

zlib : 407 allocs account for about 8% of all allocation cost

Categories

(SeaMonkey :: UI Design, defect, P2)

x86
Windows 2000
defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla0.9.8

People

(Reporter: dp, Assigned: dp)

References

Details

(Keywords: perf)

Attachments

(1 file, 3 obsolete files)

[279,700] malloc : 0.896 sec [ 42.32%]
[  9,151] calloc : 0.211 sec [  9.94%] <-- zlib
[  3,493] realloc: 0.022 sec [  1.05%]
[287,080] free   : 0.989 sec [ 46.70%]
-------------------------------------
[579,424] total  : 2.118 sec [100.00%]

I instrumented trace-malloc to dump before and after timing on each malloc/free
in an attempt to find a pattern on expensive mallocs.

Here is the high level view. Total startup is about 6 secs (debug - once garrett
checks in tracemalloc for optimized builds I will switch over).

Overall malloc/free is certainly about 30% of startup cost.

The big depressing part is for every jar file read we do:
<size of file> + 32k + 11k + 3 more tiny allocs

In effect, allmost all the 9.94% of total alloc cost is from zlib. 80% of it is
caused by 413 allocs (207 32768, 206 11520) which are allocations done when
reading a file from jar by zlib. Buffer management into zlib could be a win.
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → mozilla0.9.7
InflateItem() could at least be cleaned up to use the passed-in buffer directly
in the inflate, instead of coping everything around twice (and probably more).
I'll write that separately: bug 113404
Blocks: 7251
Keywords: perf
Time consumed by the 32768 allocs and 11520 allocs from zlib

chetu> grep 'calloc 32768' alloc.out | perl bucket.pl
[     0] malloc : 0.000 sec [  0.00%]
[   207] calloc : 0.114 sec [100.00%]
[     0] realloc: 0.000 sec [  0.00%]
[     0] free   : 0.000 sec [  0.00%]
-------------------------------------
[   207] total  : 0.114 sec [100.00%]

chetu> grep 'calloc 11520' alloc.out | perl bucket.pl
[     0] malloc : 0.000 sec [  0.00%]
[   206] calloc : 0.032 sec [100.00%]
[     0] realloc: 0.000 sec [  0.00%]
[     0] free   : 0.000 sec [  0.00%]
-------------------------------------
[   206] total  : 0.032 sec [100.00%]
QA Contact: sairuh → jrgm
This eliminates 1200+ mallocs from zlib. Timing wise it improves startup by
about 1.5%
Minor code modifications from previous patch.

The only thing left is to figure out when we release the 48k of memory that the
allocator holds.
Attachment #60573 - Attachment is obsolete: true
performance improvement is about 4% of startup on a optimized builg (after I
killed the virus scan that kept running and screwing up my timging measurements)
Strategy for releasing the memory the allocator holds:

a) timer - set a timer and if allocator wasnt used within a specific time,
cleanup. I am thinking of 10 secs.

b) Once startup is over, release and disable this allocator all together.
Disadvantage is mail, mailcompose, aim windows wont get the benefit of the
performant allocator.

I am leaning towards (a). With (a) if we release, and mail startsup say, the
allocator will kick in, grab its 48k and 10sec after mail has started up will
release the 48k

Chris/Simon, would love your input.
After talking to Alec, I am going to wait to do the timer thing. Pav is moving
timers into xpcom and I can do it then. Right now jar using widget (where timer
lives) is a bad dependency

Total memory bloat caused by this would be 46k I am going to make module
shutdown clean the memory for now.
Comment on attachment 60785 [details] [diff] [review]
Eliminte 1200+ allocations from zlib

the current patch looks good to me :)

sr=darin

out of curiosity... wby the memset in zalloc?  is zalloc meant to resemble
calloc?
Yeah zalloc is replacing a calloc that zlib does.
Improvements from previous patch:
- uses a ptr as opposed to a static object
- releases alloced memory on shutdown
Attachment #60785 - Attachment is obsolete: true
dveditz/darin : r=/sr= ?
Comment on attachment 61120 [details] [diff] [review]
Final patch - implements zlib allocator

r/sr=darin, but how about putting the

extern nsZlibAllocator *gZlibAllocator;

in nsZlibAllocator.h ?
Attachment #61120 - Flags: superreview+
Comment on attachment 61120 [details] [diff] [review]
Final patch - implements zlib allocator

>+nsZlibAllocator::nsZlibAllocator()
>+{
>+  for (int i = 0; i < NBUCKETS; i++) {
>+    mMemBucket[i].ptr = NULL;
>+    mMemBucket[i].size = 0;
>+    mMemBucket[i].inUse = PR_FALSE;
>+  }
>+  return;
>+}

A memset() might be cheaper, and the constructor could be
moved inline into the .h file to reduce clutter.

>+int nsZlibAllocator::zClearBuckets()
   ...
>+    mMemBucket[i].ptr = NULL;
>+    mMemBucket[i].size = 0;
>+    mMemBucket[i].inUse = PR_TRUE;

why PR_TRUE? Is this to catch double clearing?

>+void * nsZlibAllocator::zAlloc(PRUint32 items, PRUint32 size)

This is totally not threadsafe -- how do you protect against collisions?
This needs to get tested on a dual processor machine, several thread problems
cropped up in nsJAR/nsZipArchive when used on dual-processor machines. The
locking already in nsJAR is likely to save you for the chrome case, but I worry
about other zip users who might not be on the UI thread. A quick glance through
lxr didn't turn up anything though.

>+    if (!mMemBucket[i].inUse && mMemBucket[i].ptr && mMemBucket[i].size == totalsize)
>+    {
>+      // zero out memory, increase refcnt and return it
>+      memset(mMemBucket[i].ptr, 0, totalsize);
>+      mMemBucket[i].inUse = PR_TRUE;
>+      return mMemBucket[i].ptr;
>+    }

So you memset if it's the right size

>+  // See if we have an allocated bucket
>+  if (freeAllocatedBucket)
>+  {
>+    // Mark it used
>+    freeAllocatedBucket->inUse = PR_TRUE;
>+    return freeAllocatedBucket->ptr;
>+  }

But not if it's too big.

>+  void *ptr = calloc(realitems, size);

And if it's too small it gets cleared, too.

>+  if (freeBucket)
>+  {
>+    // Found free slot. Store it
>+    freeBucket->inUse = PR_TRUE;
>+    freeBucket->ptr = ptr;
>+    freeBucket->size = realitems * size;
>+  }

But if it's too small you leak whatever used to be there (if anything).

>Index: nsZlibAllocator.h
>===================================================================
>RCS file: nsZlibAllocator.h
>diff -N nsZlibAllocator.h
>--- /dev/null	1 Jan 1970 00:00:00 -0000
>+++ nsZlibAllocator.h	10 Dec 2001 18:53:23 -0000
>@@ -0,0 +1,53 @@
>+/* -*- Mode: C; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
>+/*
>+ * The contents of this file are subject to the Netscape Public
>+ * License Version 1.1 (the "License"); you may not use this file

Need new boilerplate header from http://www.mozilla.org/MPL/

mozilla.org discourages new files from using the NPL, so use the *M*PL
tri-license version
http://www.mozilla.org/MPL/boilerplate-1.1/mpl-tri-license-c

Fill in the Original Code and  Initial Developer as you have here:
"Mozilla client code" and "Netscape Communications Corporation".

Copyright date is 2001
Attachment #61120 - Flags: superreview+
>A memset() might be cheaper, and the constructor could be
>moved inline into the .h file to reduce clutter.

Sure. will do.

>why PR_TRUE? Is this to catch double clearing?

That was a typo. It should have been PR_FALSE.

>>+  // See if we have an allocated bucket
>>+  if (freeAllocatedBucket)
>>+  {
>>+    // Mark it used
>>+    freeAllocatedBucket->inUse = PR_TRUE;
>>+    return freeAllocatedBucket->ptr;
>>+  }
>
>But not if it's too big.

Yes. Nice catch. Will add the memset here.

>But if it's too small you leak whatever used to be there (if anything).

That is true too. Lucky, that wasnt the intent and that case never happened.
freeBucket needs to point to a truely free bucket. Will make sure.

> Need new boilerplate header from http://www.mozilla.org/MPL/

Yup. Did that one too.

So the only thing is thread safeness. My logic was that there was locks in
nsJAR::GetInputStream() that does a lock and serializes calls to
nsJARInputStream::Init() -> nsZipArchive::ReadInit() -> nsZipArchive::InflateItem()

The comment here assumes nsZipArchive is not threadsafe. So I felt introducing
another level of locking was wasteful.

NS_IMETHODIMP
nsJAR::GetInputStream(const char* aFilename, nsIInputStream** result)
{
  // nsZipArchive and zlib are not thread safe
  // we need to use a lock to prevent bug #51267
  nsAutoLock lock(mLock);

  NS_ENSURE_ARG_POINTER(result);
  nsresult rv;
  nsJARInputStream* jis = nsnull;
  rv = nsJARInputStream::Create(nsnull, NS_GET_IID(nsIInputStream), (void**)&jis);
  if (!jis) return NS_ERROR_FAILURE;

  rv = jis->Init(this, aFilename);
....

BTW, a very nice review as usual. Thanks. I will attach new patch fixing all
other parts of the review.
I did some lxr hunting and am satisfied on the threadsafety front. Only the
native installers use nsZipArchive directly (using the STANDALONE version) and
they're both single-threaded and separate from Mozilla. Everything else goes
though the locks in nsJAR.
This includes all of dveditz's comments + moving extern into .h file per darin.
Attachment #61120 - Attachment is obsolete: true
Comment on attachment 61405 [details] [diff] [review]
Fixing comments from dveditz

>+  nsZlibAllocator() {
>+    memset(mMemBucket, 0, sizeof(mMemBucket));
>+    printf("sizeof(mMemBucket) = %d\n", sizeof(mMemBucket));
>+    return;
>+  }

printf?
Attachment #61405 - Flags: needs-work+
That was a stray debug printf. Will remove on checkin.
Comment on attachment 61405 [details] [diff] [review]
Fixing comments from dveditz

>+ * The Original Code is Suresh Duddi <dp@netscape.com>

"I am not a number", likewise you are not code. "Mozilla client code"
or some such will do.

>+ * The Initial Developer of the Original Code is
>+ * Suresh Duddi <dp@netscape.com>
>+ * Portions created by the Initial Developer are Copyright (C) 2001
>+ * the Initial Developer. All Rights Reserved.

As an employee everything you produce is copyright the company, and
the initial developer (a legal term of the license) should be
Netscape Communications Corporation

>+ * Contributor(s):

This is where names go. Sometimes people attach comments like
(original author) or some particularly big restructuring they did,
so if you like:

+ * Contributor(s):
+ *	Suresh Duddi <dp@netscape.com>	   (original author)

>+  nsZlibAllocator() {
>+    memset(mMemBucket, 0, sizeof(mMemBucket));
>+    printf("sizeof(mMemBucket) = %d\n", sizeof(mMemBucket));
>+    return;
>+  }

You've zapped the printf(), right?

sr=dveditz
Attachment #61405 - Flags: superreview+
r=darin with the fixups dveditz mentioned.
You can tell I hate reading the license. Anywhere there was a _____ I
religiously put my name :-) Thanks Guys. I made the changes. Yeah I zapped the
printf.
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Final cost analysis:

Before fix:
opt> perl e:/dp/bucket.pl < malloc-cost.out.sorted 
[239623] malloc :  57900 ticks - 1.035 sec [ 43.65%]
[  7808] calloc :  14318 ticks - 0.256 sec [ 10.79%]
[  3395] realloc:   2082 ticks - 0.037 sec [  1.57%]
[242391] free   :  58347 ticks - 1.043 sec [ 43.99%]
----------------------------------------------------
[493218] total  : 132647 ticks - 2.372 sec [100.00%]


After fix:
opt> perl e:/dp/bucket.pl < malloc-cost.out.sorted 
[252784] malloc :  51906 ticks - 0.928 sec [ 45.79%]
[  6910] calloc :   3866 ticks - 0.069 sec [  3.41%]
[  3452] realloc:   1690 ticks - 0.030 sec [  1.49%]
[259093] free   :  55892 ticks - 0.999 sec [ 49.31%]
----------------------------------------------------
[522241] total  : 113354 ticks - 2.027 sec [100.00%]

Overall improvement is about 14% of allocation cost. This would translate to
anywhere betn 4-8% of startup.
Comment on attachment 61405 [details] [diff] [review]
Fixing comments from dveditz

darin r=ed.
Attachment #61405 - Flags: needs-work+ → review+
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
timers are now in xpcom (let's hope they can stay there -- thread-based timers
have caused regressions, as expected; not all are in hand, but I will help to
fix them yet preserve thread-based/safe timers).  So, is there a bug on the 10
second timer idea?

/be
Blocks: 118061
Just filed bug 118061 for the 10sec timer to release memory
Product: Core → Mozilla Application Suite
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: