Closed
Bug 394425
Opened 17 years ago
Closed 15 years ago
nsCacheMetaData: alloc key & value & metadata in one malloc call
Categories
(Core :: Networking: Cache, defect)
Core
Networking: Cache
Tracking
()
RESOLVED
FIXED
mozilla1.9.3a1
People
(Reporter: alfredkayser, Assigned: alfredkayser)
References
Details
(Keywords: memory-footprint)
Attachments
(1 file, 3 obsolete files)
10.26 KB,
patch
|
shaver
:
review+
shaver
:
superreview+
benjamin
:
approval1.9.2-
dveditz
:
approval1.9.1.4-
|
Details | Diff | Splinter Review |
bug 391915 changed the key for nsCacheMetaData from Atom to nsCString. But nsCString allocates space for the key, while we could also allocate the key & value directly with the allocation of the metadataelement. This saves a separate allocation for each key & value pair (and some bytes for nsCString). Furthermore by storing key&value together the 'flatten' operation can just memcpy the keyvalue in one step. Even more, the objectsize of nsCacheMetaData is reduced by about 3K: Before: 6477 Aug 31 10:47 nsCacheMetaData.obj After: 3310 Aug 31 10:59 nsCacheMetaData.obj See also bug 201455: where I applied this trick to the value part.
Attachment #279076 -
Flags: review?(cbiesinger)
Assignee | ||
Updated•17 years ago
|
Status: NEW → ASSIGNED
Comment 1•17 years ago
|
||
Comment on attachment 279076 [details] [diff] [review] Patch to combine key/value allocation Hm... why remove the operator new?
Assignee | ||
Comment 2•17 years ago
|
||
Because the two locations where new was called have a different way to initialize the construct, and because both would require 4 parameters: key, keySize, value, valueSize. In 'Unflatten' we can directly memcpy the key/value pair, but in SetElement, key and value are separatly set. If we take this memcpy's out of the new operator, the new operator only does size calculation and then new to allocate itself. This doesn't help much, and a simple malloc is good enough (and on the same level as memcpy).
Assignee: nobody → alfredkayser
Status: ASSIGNED → NEW
Assignee | ||
Comment 3•17 years ago
|
||
Biesinger, can you please review this one? It saves in number of allocations (see the blog from Pavlov on this: http://www.pavlov.net/blog/archives/2007/11/windows_low_fra.html), and reduces codesize.
Keywords: footprint
Assignee | ||
Comment 4•17 years ago
|
||
To reduce memory fragmentation, and to more quickly flatten/unflatten the key/value metadata pair, keep them in the format as stored in disk or SQL database. Total codesize savings is ±10K measured on Windows. (this means we can later on also remove the 'md' buffer allocation in the OfflineCache part)
Attachment #279076 -
Attachment is obsolete: true
Attachment #289008 -
Flags: review?(cbiesinger)
Attachment #279076 -
Flags: review?(cbiesinger)
Assignee | ||
Comment 5•17 years ago
|
||
Some statistics on codesize: BEFORE: suite-debug: 27692 Nov 16 16:53 nsCacheMetaData.obj 1831476 Nov 16 16:53 nkcache_s.lib ff-opt: 26853 Nov 17 09:41 nsCacheMetaData.obj 1433122 Nov 17 09:41 nkcache_s.lib AFTER: suite-debug: 18398 Nov 16 17:36 nsCacheMetaData.obj 1821956 Nov 16 16:52 nkcache_s.lib ff-opt: 15200 Nov 17 09:42 nsCacheMetaData.obj 1421042 Nov 17 09:42 nkcache_s.lib So, net 'objectsize' savings for ff-opt is 11-12K. Memory usage: Assuming 2 key/values per cache entry, and 20 cache entries per page (page, css, js's, images, etc) Before: 16 bytes per key/value, plus bytes for key & value strings: 16 bytes per key/value + key/value strings in 4 allocated parts: 80 allocations per pages, 2.5K overhead. After: key/value strings in one allocated part per entry: 20 allocations per page, no overhead.
Assignee | ||
Updated•17 years ago
|
Attachment #289008 -
Flags: review?(cbiesinger) → review?(dcamp)
Assignee | ||
Comment 6•17 years ago
|
||
From http://people.mozilla.com/%7Epavlov/frag/allocs8.txt.gz (from http://blog.pavlov.net/2007/11/13/allocation-data/), the entries that are covered by this bug are: XUL`nsCacheMetaData::MetaElement::operator new(unsigned long, char const*, unsigned int)+0x1c XUL`nsCacheMetaData::UnflattenMetaData(char const*, unsigned int)+0x79 value ------------- Distribution ------------- count 16 |@@@@@@@@@@@@@@@@@@@ 640 32 |@@@ 105 64 | 2 128 |@@@ 89 256 |@@@@@@@@@@@@@@@@ 542 and: XUL`nsACString_internal::Assign(char const*, unsigned int)+0xc8 XUL`nsCacheMetaData::UnflattenMetaData(char const*, unsigned int)+0xae value ------------- Distribution ------------- count 16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1325 32 |@@ 53 And ± 200 times a SetMetaDataElement and ± 150 GetMetaDataElement. The interesting part is that Flatten/Unflatten are called ± 1400 times each, and Set/Get ± 200 times each. So this patch rightfully optimizes the Flatten/Unflatten cases, by just memcpy'ing the data into one allocated memory chunk, and let Set/Get do the hard work.
Comment 7•17 years ago
|
||
Comment on attachment 289008 [details] [diff] [review] V2: Allocate all key/values pairs in one malloc call per cache entry >--- netwerk/cache/src/nsCacheMetaData.cpp 30 Aug 2007 17:51:07 -0000 1.27 >+++ netwerk/cache/src/nsCacheMetaData.cpp 16 Nov 2007 16:17:10 -0000 > nsresult > nsCacheMetaData::SetElement(const char * key, > const char * value) > { ... >+ PRUint32 keySize = strlen(key) + 1; >+ PRUint32 valueSize = (value ? strlen(value) : 0) + 1; >+ char * pos = (char *)GetElement(key); >+ PRUint32 newSize; >+ >+ if (pos) { ... >+ // Update the value in place >+ newSize = mMetaSize + valueSize - oldValueSize; >+ nsresult rv = EnsureBuffer(newSize); >+ NS_ENSURE_SUCCESS(rv, rv); >+ pos = mBuffer + offset; ... >+ } else if (value) { ... >+ // Add after last element >+ pos = mBuffer + mMetaSize; >+ memcpy(pos, key, keySize); >+ pos += keySize; >+ } >+ // Update value >+ memcpy(pos, value, valueSize); >+ mMetaSize = newSize; This breaks if pos == 0 and value == 0. Looks good to me. Make sure biesi sr's.
Attachment #289008 -
Flags: review?(dcamp) → review+
Assignee | ||
Comment 8•17 years ago
|
||
Make sure that in SetElement with !value and !pos also works correctly.
Attachment #289008 -
Attachment is obsolete: true
Attachment #293103 -
Flags: superreview?(cbiesinger)
Assignee | ||
Comment 9•16 years ago
|
||
Requesting blocking1.9 as this is both a sizeable code saving as well as sizeable memory allocation saving.
Flags: blocking1.9?
Comment 10•16 years ago
|
||
definitely want this, but not blocking. nice work.
Flags: wanted1.9+
Flags: blocking1.9?
Flags: blocking1.9-
Target Milestone: --- → mozilla1.9beta4
Assignee | ||
Comment 11•16 years ago
|
||
Requesting 1.9.1 as it is ready to go, except for the last SR, and this is both a sizeable code saving as well as sizeable memory allocation saving.
Flags: wanted1.9.1?
Assignee | ||
Comment 12•16 years ago
|
||
Moving to 1.9.2... biesi, can you do the sr of this one?
Flags: wanted1.9.1? → wanted1.9.2?
Assignee | ||
Comment 13•15 years ago
|
||
This is still a valid candidate for 1.9.1 I think.
Flags: wanted1.9.2? → wanted1.9.1?
Assignee | ||
Comment 14•15 years ago
|
||
Linked to Fennec performance improvement bug, as this is a valid, working patch to optimizes cache memory and is a sizeable code reduction.
Attachment #293103 -
Flags: superreview?(cbiesinger)
Comment on attachment 293103 [details] [diff] [review] V3: Addressed comment of dcamp refresh it against mozilla-central and I'll sr; thanks.
Assignee | ||
Comment 16•15 years ago
|
||
Attachment #293103 -
Attachment is obsolete: true
Attachment #398849 -
Flags: superreview?(shaver)
Comment on attachment 398849 [details] [diff] [review] V4: Updated to current trunk (hg) [Checkin: Comment 21] Carrying forward r, and sr=shaver. I'm always a little nervous when we compute pointer-math operands based on addition of unsigned quantities, but since everything seems to come from strlen I don't see any way for an attacker to cause us to overflow. That said, an assertion or two about that wouldn't have utterly turned my stomach. Nor, in fact, would some test cases for cache management, but the functionality here seems behaviour-neutral, so our existing (or, ahem, not) cache tests will cover it. Alfred: how much memory does this save in your testing?
Attachment #398849 -
Flags: superreview?(shaver)
Attachment #398849 -
Flags: superreview+
Attachment #398849 -
Flags: review+
Assignee | ||
Comment 18•15 years ago
|
||
Concerning memory savings: this prevents fragmentation by allocating all in one instead of multiple smaller pieces. See comment #5: Estimation is about 2-3K per page.
Keywords: checkin-needed
I was really asking what you'd measured, ideally on a codebase more recent than 2007 (that was from before jemalloc, no?) -- what memory effects do you see, under what test load?
Assignee | ||
Comment 20•15 years ago
|
||
I have instrumented a current build to count number of allocs and function calls. Then started firefox and opened 4 pages, closed them, closed FF. The results are as follows: Before: 730 allocations, totalling 120601 bytes. After: 348 (re)allocations, totalling 100768 bytes. So, for 4 pages, the saving is 20K, and about 380 alloc's. The second is more important, reducing fragmentation. And saving is thus more like 5K per page.
Comment 21•15 years ago
|
||
Comment on attachment 398849 [details] [diff] [review] V4: Updated to current trunk (hg) [Checkin: Comment 21] http://hg.mozilla.org/mozilla-central/rev/ba331f3a3daf
Attachment #398849 -
Attachment description: V4: Updated to current trunk (hg) → V4: Updated to current trunk (hg)
[Checkin: Comment 21]
Attachment #398849 -
Flags: approval1.9.2?
Attachment #398849 -
Flags: approval1.9.1.4?
Updated•15 years ago
|
Status: NEW → RESOLVED
Closed: 15 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: mozilla1.9beta4 → mozilla1.9.3a1
Version: unspecified → Trunk
Comment 22•15 years ago
|
||
Comment on attachment 398849 [details] [diff] [review] V4: Updated to current trunk (hg) [Checkin: Comment 21] This seems wholly inappropriate for the 1.9.1 stability branch unless I'm completely missing the point of what it does.
Attachment #398849 -
Flags: approval1.9.1.4? → approval1.9.1.4-
Assignee | ||
Comment 23•15 years ago
|
||
The wanted flag was from long time ago, and is no longer applicable.
Flags: wanted1.9.1?
Comment 24•15 years ago
|
||
Did any perf improvements show up on Talos? I don't think we need to take this on 1.9.2 unless it has some noticeable benefit.
Assignee | ||
Comment 25•15 years ago
|
||
No need to get this on 1.9.2, as perftastic doesn't any measurable benefit...
Updated•15 years ago
|
Attachment #398849 -
Flags: approval1.9.2? → approval1.9.2-
You need to log in
before you can comment on or make changes to this bug.
Description
•