Closed Bug 394425 Opened 17 years ago Closed 15 years ago

nsCacheMetaData: alloc key & value & metadata in one malloc call

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1

People

(Reporter: alfredkayser, Assigned: alfredkayser)

References

Details

(Keywords: memory-footprint)

Attachments

(1 file, 3 obsolete files)

bug 391915 changed the key for nsCacheMetaData from Atom to nsCString. But nsCString allocates space for the key, while we could also allocate the key & value directly with the allocation of the metadataelement.
This saves a separate allocation for each key & value pair (and some bytes for nsCString). Furthermore by storing key&value together the 'flatten' operation can just memcpy the keyvalue in one step.

Even more, the objectsize of nsCacheMetaData is reduced by about 3K:
Before: 6477 Aug 31 10:47 nsCacheMetaData.obj
After:  3310 Aug 31 10:59 nsCacheMetaData.obj

See also bug 201455: where I applied this trick to the value part.
Attachment #279076 - Flags: review?(cbiesinger)
Status: NEW → ASSIGNED
Comment on attachment 279076 [details] [diff] [review]
Patch to combine key/value allocation

Hm... why remove the operator new?
Because the two locations where new was called have a different way to initialize the construct, and because both would require 4 parameters: key, keySize, value, valueSize.  In 'Unflatten' we can directly memcpy the key/value pair, but in SetElement, key and value are separatly set. If we take this memcpy's out of the new operator, the new operator only does size calculation and then new to allocate itself. This doesn't help much, and a simple malloc is good enough (and on the same level as memcpy).
Assignee: nobody → alfredkayser
Status: ASSIGNED → NEW
Biesinger, can you please review this one?
It saves in number of allocations (see the blog from Pavlov on this: http://www.pavlov.net/blog/archives/2007/11/windows_low_fra.html), and reduces codesize.
Keywords: footprint
To reduce memory fragmentation, and to more quickly flatten/unflatten the key/value metadata pair, keep them in the format as stored in disk or SQL database.

Total codesize savings is ±10K measured on Windows.

(this means we can later on also remove the 'md' buffer allocation in the OfflineCache part)
Attachment #279076 - Attachment is obsolete: true
Attachment #289008 - Flags: review?(cbiesinger)
Attachment #279076 - Flags: review?(cbiesinger)
Some statistics on codesize:
BEFORE:
suite-debug:
   27692 Nov 16 16:53 nsCacheMetaData.obj
 1831476 Nov 16 16:53 nkcache_s.lib

ff-opt:
   26853 Nov 17 09:41 nsCacheMetaData.obj
 1433122 Nov 17 09:41 nkcache_s.lib

AFTER:
suite-debug:
   18398 Nov 16 17:36 nsCacheMetaData.obj
 1821956 Nov 16 16:52 nkcache_s.lib

ff-opt:
   15200 Nov 17 09:42 nsCacheMetaData.obj
 1421042 Nov 17 09:42 nkcache_s.lib

So, net 'objectsize' savings for ff-opt is 11-12K.

Memory usage:
Assuming 2 key/values per cache entry, and 20 cache entries per page (page, css, js's, images, etc)
Before: 16 bytes per key/value, plus bytes for key & value 
strings: 16 bytes per key/value + key/value strings in 4 allocated parts: 80 allocations per pages, 2.5K overhead.
After: key/value strings in one allocated part per entry: 20 allocations per page, no overhead.
Attachment #289008 - Flags: review?(cbiesinger) → review?(dcamp)
From http://people.mozilla.com/%7Epavlov/frag/allocs8.txt.gz (from
http://blog.pavlov.net/2007/11/13/allocation-data/), the entries that are
covered by this bug are:
              XUL`nsCacheMetaData::MetaElement::operator new(unsigned long, char const*, unsigned int)+0x1c
              XUL`nsCacheMetaData::UnflattenMetaData(char const*, unsigned int)+0x79

           value  ------------- Distribution ------------- count    
              16 |@@@@@@@@@@@@@@@@@@@                      640      
              32 |@@@                                      105      
              64 |                                         2        
             128 |@@@                                      89       
             256 |@@@@@@@@@@@@@@@@                         542      
and:
              XUL`nsACString_internal::Assign(char const*, unsigned int)+0xc8
              XUL`nsCacheMetaData::UnflattenMetaData(char const*, unsigned int)+0xae

           value  ------------- Distribution ------------- count    
              16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   1325     
              32 |@@                                       53       
And ± 200 times a SetMetaDataElement and ± 150 GetMetaDataElement.

The interesting part is that Flatten/Unflatten are called ± 1400 times each, and Set/Get ± 200 times each.

So this patch rightfully optimizes the Flatten/Unflatten cases, by just memcpy'ing the data into one allocated memory chunk, and let Set/Get do the hard work.

Comment on attachment 289008 [details] [diff] [review]
V2: Allocate all key/values pairs in one malloc call per cache entry

>--- netwerk/cache/src/nsCacheMetaData.cpp	30 Aug 2007 17:51:07 -0000	1.27
>+++ netwerk/cache/src/nsCacheMetaData.cpp	16 Nov 2007 16:17:10 -0000

> nsresult
> nsCacheMetaData::SetElement(const char * key,
>                             const char * value)
> {
...
>+    PRUint32 keySize = strlen(key) + 1;
>+    PRUint32 valueSize = (value ? strlen(value) : 0) + 1;
>+    char * pos = (char *)GetElement(key);
>+    PRUint32 newSize;
>+
>+    if (pos) {
...
>+        // Update the value in place
>+        newSize = mMetaSize + valueSize - oldValueSize;
>+        nsresult rv = EnsureBuffer(newSize);
>+        NS_ENSURE_SUCCESS(rv, rv);
>+        pos = mBuffer + offset;
...
>+    } else if (value) {
...
>+        // Add after last element
>+        pos = mBuffer + mMetaSize;
>+        memcpy(pos, key, keySize);
>+        pos += keySize;
>+    }
>+    // Update value
>+    memcpy(pos, value, valueSize);
>+    mMetaSize = newSize;

This breaks if pos == 0 and value == 0.

Looks good to me.  Make sure biesi sr's.
Attachment #289008 - Flags: review?(dcamp) → review+
Attached patch V3: Addressed comment of dcamp (obsolete) — Splinter Review
Make sure that in SetElement with !value and !pos also works correctly.
Attachment #289008 - Attachment is obsolete: true
Attachment #293103 - Flags: superreview?(cbiesinger)
Requesting blocking1.9 as this is both a sizeable code saving as well as sizeable memory allocation saving.
Flags: blocking1.9?
definitely want this, but not blocking. nice work.
Flags: wanted1.9+
Flags: blocking1.9?
Flags: blocking1.9-
Target Milestone: --- → mozilla1.9beta4
Requesting 1.9.1 as it is ready to go, except for the last SR, and this is both a sizeable code saving as well as sizeable memory allocation saving.
Flags: wanted1.9.1?
Moving to 1.9.2...

biesi, can you do the sr of this one?
Flags: wanted1.9.1? → wanted1.9.2?
This is still a valid candidate for 1.9.1 I think.
Flags: wanted1.9.2? → wanted1.9.1?
Blocks: 459117
Linked to Fennec performance improvement bug, as this is a valid, working patch to optimizes cache memory and is a sizeable code reduction.
Comment on attachment 293103 [details] [diff] [review]
V3: Addressed comment of dcamp

refresh it against mozilla-central and I'll sr; thanks.
Attachment #293103 - Attachment is obsolete: true
Attachment #398849 - Flags: superreview?(shaver)
Comment on attachment 398849 [details] [diff] [review]
V4: Updated to current trunk (hg)
[Checkin: Comment 21]

Carrying forward r, and sr=shaver.

I'm always a little nervous when we compute pointer-math operands based on addition of unsigned quantities, but since everything seems to come from strlen I don't see any way for an attacker to cause us to overflow.

That said, an assertion or two about that wouldn't have utterly turned my stomach.

Nor, in fact, would some test cases for cache management, but the functionality here seems behaviour-neutral, so our existing (or, ahem, not) cache tests will cover it.

Alfred: how much memory does this save in your testing?
Attachment #398849 - Flags: superreview?(shaver)
Attachment #398849 - Flags: superreview+
Attachment #398849 - Flags: review+
Concerning memory savings: this prevents fragmentation by allocating all in one instead of multiple smaller pieces. See comment #5: Estimation is about 2-3K per page.
Keywords: checkin-needed
I was really asking what you'd measured, ideally on a codebase more recent than 2007 (that was from before jemalloc, no?) -- what memory effects do you see, under what test load?
I have instrumented a current build to count number of allocs and function calls.
Then started firefox and opened 4 pages, closed them, closed FF.
The results are as follows:
Before: 730 allocations, totalling 120601 bytes.
After: 348 (re)allocations, totalling 100768 bytes.
So, for 4 pages, the saving is 20K, and about 380 alloc's.
The second is more important, reducing fragmentation.
And saving is thus more like 5K per page.
Comment on attachment 398849 [details] [diff] [review]
V4: Updated to current trunk (hg)
[Checkin: Comment 21]


http://hg.mozilla.org/mozilla-central/rev/ba331f3a3daf
Attachment #398849 - Attachment description: V4: Updated to current trunk (hg) → V4: Updated to current trunk (hg) [Checkin: Comment 21]
Attachment #398849 - Flags: approval1.9.2?
Attachment #398849 - Flags: approval1.9.1.4?
Status: NEW → RESOLVED
Closed: 15 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: mozilla1.9beta4 → mozilla1.9.3a1
Version: unspecified → Trunk
Comment on attachment 398849 [details] [diff] [review]
V4: Updated to current trunk (hg)
[Checkin: Comment 21]

This seems wholly inappropriate for the 1.9.1 stability branch unless I'm completely missing the point of what it does.
Attachment #398849 - Flags: approval1.9.1.4? → approval1.9.1.4-
The wanted flag was from long time ago, and is no longer applicable.
Flags: wanted1.9.1?
Did any perf improvements show up on Talos? I don't think we need to take this on 1.9.2 unless it has some noticeable benefit.
No need to get this on 1.9.2, as perftastic doesn't any measurable benefit...
Attachment #398849 - Flags: approval1.9.2? → approval1.9.2-
Depends on: 529733
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: