Open
Bug 232927
Opened 21 years ago
Updated 2 years ago
investigate further performance improvements in strings
Categories
(Core :: XPCOM, defect, P4)
Core
XPCOM
Tracking
()
NEW
Future
People
(Reporter: dbaron, Unassigned)
References
Details
(Keywords: perf)
Attachments
(5 files, 1 obsolete file)
1.64 KB,
patch
|
Details | Diff | Splinter Review | |
2.25 KB,
patch
|
Details | Diff | Splinter Review | |
20.73 KB,
patch
|
Details | Diff | Splinter Review | |
3.39 KB,
patch
|
Details | Diff | Splinter Review | |
1.70 KB,
text/html
|
Details |
Once bug 231995 lands, there are some other performance improvements to strings
that I'd like to investigate (i.e., write them and test to see if they're really
improvements):
1. Remove threadsafe reference counting by having flag for strings accessed on
multiple threads -- and prevent them from sharing.
2. Have an nsSubstring::PromoteToShared() const method (use
|mutable_this| inside) that makes a dependent string, string with
adopted buffer, or auto string (probably not worth it until not
threadsafe) shared. Use this in Assign when the buffer isn't already
shared and it's not a stack-to-stack assignment.
3. Implement multiple sizes of nsAutoString (nsStackString16, nsStackString32,
etc.). (I think Stack is a better name than Auto, and it's a chance to
fix the ns*C*AutoString inconsistency with nsStack*C*String.)
4. Have nsAutoString set a flag that indicates that it has a stack buffer so
we have the ability to use that stack buffer in an assignment after we've
moved to a non-stack buffer (especially important if we do the nsAutoString
part of (2).
Comment 1•21 years ago
|
||
this list looks great.
Reporter | ||
Comment 2•21 years ago
|
||
5. investigate tweaks in the capacity doubling code:
* whether to double initially when going from immutable to mutable
* whether to continue up powers of two or use
max(2*current capacity, required capacity)
Comment 3•21 years ago
|
||
Maybe the malloc size should be 8-byte aligned. This is what win32 malloc does
so this is free and it should help reduce reallocation count. Maybe string data
access should occur on aligned memory so one should ensure that
sizeof(nsStringHeader) is always 8 like it seems to be now. This patch is
completely untested.
Comment 4•21 years ago
|
||
Thanks for the patch Bernard. I'll incorporate that into my patch for bug
231995 since that hasn't landed yet.
Comment 5•21 years ago
|
||
What if malloc can return an address that's 4 mod 8? If that's possible, then
rounding size up to a multiple of 8 is not going to align the address (alignment
usually refers to addresses, not sizes, so it seems the code must be assuming 0
mod 8 pointer alignment from malloc).
Is the idea simply to overallocate in case the string grows by 1-7 chars and
there's room in the rounded-up size for the growth? Why is it harmful (to
performance, presumably) to let realloc worry about that, if the malloc heap
uses 8-byte alignment?
What about other platforms?
The patch is against a file in the attic, and I'm rusty on string code anyway,
but I hope my comment here is helpful.
/be
Comment 6•21 years ago
|
||
(In reply to comment #5)
> What if malloc can return an address that's 4 mod 8?
Then this code will allocate a few more bytes than necessary.
> Is the idea simply to overallocate in case the string grows by 1-7 chars and
> there's room in the rounded-up size for the growth?
Yes it is exactly what is intended
> Why is it harmful (to
> performance, presumably) to let realloc worry about that, if the malloc heap
> uses 8-byte alignment?
I think that it's a liitle gain to avoid some realloc() just for a few bytes.
Those realloc() calls should be fast though because there's actually nothing to
realloc in this case because malloc() did round up the size previously. It just
avoids the "realloc" code path in the string code.
>
> What about other platforms?
The constant 7 used to round up the size should be 3 if malloc returns 4-byte
aligned adresses or could be changed to something like:
"2 * sizeof(size_t) - 1" (platform specific malloc returns 8-byte aligned
addresses) or "sizeof(size_t) - 1" (4-byte aligned adresses).
If 7 is used when 4 mod 8 addresses are returned by malloc then a few bytes will
be wasted by the rounded-up size (too much round-up).
Comment 7•21 years ago
|
||
Hmm... I'm going to hold off on this malloc alignment patch until after the
string branch lands.
Comment 8•21 years ago
|
||
Comment on attachment 141418 [details] [diff] [review]
align malloc size
some testing shows that this patch saves 50% of realloc() calls.
("ReallocCount" string statistics going from 196954 to 97751 browsing a few web
sites)
Comment 9•21 years ago
|
||
This is another variation.
It saves 15% more reallocations compared to the previous patch.
I think that it can be efficient especially when the platform malloc()
implementation uses links of blocks of size 8, 16, 32, ... for small blocks.
If the malloc() implementation is not efficient then another idea would be to
use several linked lists of arena allocated small blocks of size 8, 16, ... and
reuse the free blocks without actually calling free() (for instance using one
nsFixedSizeAllocator for each block size:
http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsFixedSizeAllocator.h#41).
Reporter | ||
Comment 10•21 years ago
|
||
We gained significant memory use improvements when we stopped doing that a
number of years ago. IIRC, something like 70% of strings are never modified.
Comment 11•21 years ago
|
||
Why should we optimize to reduce realloc count, and nothing else? realloc is
not the dominant cost, and you're not measure heap fragmentation or its effect,
virtual memory (vm size and even resident set size) bloat. Size-classifying
allocators, especially power-of-two ones, are fragmentation prone.
See waterson's research from several years ago into the Lea malloc, at
http://www.mozilla.org/projects/footprint/allocators.html.
Bernard, I don't think this patch is being subject to the right system tests.
/be
Comment 12•21 years ago
|
||
As far as I've seen, maxheap went up on brad as well after the checkin of the
string branch... Would this patch help that as well? Would the patch help
Txul/Ts even more? I guess that's the some of the really interesting questions...
Comment 13•21 years ago
|
||
please see my notes in bug 234864 concerning dbaron's point #4 in comment #0 of
this bug.
Comment 14•21 years ago
|
||
I think that to improve the string library you have to improve on sharing and/or
improve allocations/reallocation. So when I first noticed the high reallocation
count noted in bug 231995 I thought that something could be improved here. I
don't like my patch to round up to 2^n very much but I just wanted to see how
many reallocations I could cut down that way, especially when I saw the 50% gain
from the first patch . That patch (the 1st one) is different because I think
that it is free on most platforms and even if it suppresses some realloc() that
are almost no-ops it can avoid some noise in the malloc stats.
Theses two patches can be run against the pageloader tests to see if there are
improvements or not. If there's too little performance gain then we'll know for
sure that reallocation is not important.
Reporter | ||
Comment 15•21 years ago
|
||
I should probably do a more thorough audit rather than just waiting for
assertions and fixing them when I see them. And I should probably test the
installer...
Reporter | ||
Comment 16•21 years ago
|
||
same, but with better comments
Attachment #141870 -
Attachment is obsolete: true
Reporter | ||
Comment 17•21 years ago
|
||
Comment on attachment 141871 [details] [diff] [review]
preliminary patch for non-threadsafe refcounting
Note that, given that it may (given some of the other suggestions above) soon
be possible for any string to become a string with a shared buffer, it would be
better to assert when assigning from any string to any string. However, we
can't do that because there's no place to store the original thread for strings
without shared buffers since increasing the size of nsString breaks the
nsStringContainer code.
The weakness of the assertions is particularly problematic for the case of
nsAutoString, where only a large string would cause problems.
Comment 18•21 years ago
|
||
dbaron,
if i want to let JS share one of our sharable buffers, do i need to worry about
it releasing that buffer on a background thread? IOW, does this patch prevent
me from implementing sharing from C++ to JS in xpconnect?
Reporter | ||
Comment 19•21 years ago
|
||
Comment on attachment 141871 [details] [diff] [review]
preliminary patch for non-threadsafe refcounting
FWIW, I just tested this on btek, and it gains less than a percent (probably
about half) on pageload time.
Reporter | ||
Comment 20•21 years ago
|
||
Reporter | ||
Comment 21•21 years ago
|
||
Comment on attachment 141934 [details] [diff] [review]
patch for more sharing
This has no noticeable affect on pageload.
Comment 22•20 years ago
|
||
I tested changing the size of the ns[C]AutoString initial, stack-based, buffer.
My testcase first created a new profile, then loaded a +2M html-file (saved
bugzilla buglist) from localhost. When the page had loaded ready, I quit moz.
In this testcase there is a ~5.5% reduction of allocation and reallocation if
using a buffer of size 128 compared to a 64. I guess this could do a small
change on Tp?
There might be costs involved with increasing the size of an object that is as
commonly used as nsAutoString, even if it's stackbased. The reason is that on
some systems you'll end up forcing the OS to muck around with increasing the
stacksize when it runs out of what it originally had planned for.
I'm not sure if this is a problem on the platforms that moz runs on these days,
but I do think testing needs to be done across at least the most important
platforms before we do something like this.
IMHO it would be interesting to see some numbers on the patch from comment 3
though. Yes, number of realloc()s is in itself not something to optimize for,
however the benefit of reducing them might very well outweigh the cost of the
patch. Possibly align-code like that should be platform dependant and we'd only
use it on platforms where we know to benefit from it (like win32 apparently)
Updated•15 years ago
|
QA Contact: string
Updated•15 years ago
|
Assignee: dbaron → nobody
Priority: -- → P4
Target Milestone: --- → Future
Assignee | ||
Updated•4 years ago
|
Component: String → XPCOM
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•