Open Bug 232927 Opened 21 years ago Updated 2 years ago

investigate further performance improvements in strings

Tracking

()

Status:

NEW

Milestone:

Future

People

(Reporter: dbaron, Unassigned)

References

Details

(Keywords: perf)

Attachments

(5 files, 1 obsolete file)

align malloc size 21 years ago Bernard Alleysson 1.64 KB, patch		Details \| Diff \| Splinter Review
round up size to 2^n 20 years ago Bernard Alleysson 2.25 KB, patch		Details \| Diff \| Splinter Review
preliminary patch for non-threadsafe refcounting 20 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 20.07 KB, patch		Details \| Diff \| Splinter Review
preliminary patch for non-threadsafe refcounting 20 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 20.73 KB, patch		Details \| Diff \| Splinter Review
patch for more sharing 20 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 3.39 KB, patch		Details \| Diff \| Splinter Review
nsAutoString buffer size allocation compare 20 years ago Mikael Parknert 1.70 KB, text/html		Details

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Description

•

21 years ago

Once bug 231995 lands, there are some other performance improvements to strings
that I'd like to investigate (i.e., write them and test to see if they're really
improvements):

1. Remove threadsafe reference counting by having flag for strings accessed on
   multiple threads -- and prevent them from sharing.

2. Have an nsSubstring::PromoteToShared() const method (use
   |mutable_this| inside) that makes a dependent string, string with
   adopted buffer, or auto string (probably not worth it until not
   threadsafe) shared.  Use this in Assign when the buffer isn't already
   shared and it's not a stack-to-stack assignment.

3. Implement multiple sizes of nsAutoString (nsStackString16, nsStackString32,
   etc.).  (I think Stack is a better name than Auto, and it's a chance to
   fix the ns*C*AutoString inconsistency with nsStack*C*String.)

4. Have nsAutoString set a flag that indicates that it has a stack buffer so
   we have the ability to use that stack buffer in an assignment after we've
   moved to a non-stack buffer (especially important if we do the nsAutoString
   part of (2).

Darin Fisher

Comment 1

•

21 years ago

this list looks great.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 2

•

21 years ago

5. investigate tweaks in the capacity doubling code:
  * whether to double initially when going from immutable to mutable
  * whether to continue up powers of two or use 
    max(2*current capacity, required capacity)

Bernard Alleysson

Comment 3

•

21 years ago

Attached patch align malloc size — Details — Splinter Review

Maybe the malloc size should be 8-byte aligned. This is what win32 malloc does
so this is free and it should help reduce reallocation count. Maybe string data
access should occur on aligned memory so one should ensure that
sizeof(nsStringHeader) is always 8 like it seems to be now. This patch is
completely untested.

Darin Fisher

Comment 4

•

21 years ago

Thanks for the patch Bernard.  I'll incorporate that into my patch for bug
231995 since that hasn't landed yet.

Brendan Eich [:brendan]

Comment 5

•

21 years ago

What if malloc can return an address that's 4 mod 8?  If that's possible, then
rounding size up to a multiple of 8 is not going to align the address (alignment
usually refers to addresses, not sizes, so it seems the code must be assuming 0
mod 8 pointer alignment from malloc).

Is the idea simply to overallocate in case the string grows by 1-7 chars and
there's room in the rounded-up size for the growth?  Why is it harmful (to
performance, presumably) to let realloc worry about that, if the malloc heap
uses 8-byte alignment?

What about other platforms?

The patch is against a file in the attic, and I'm rusty on string code anyway,
but I hope my comment here is helpful.

/be

Bernard Alleysson

Comment 6

•

21 years ago

(In reply to comment #5)
> What if malloc can return an address that's 4 mod 8? 

Then this code will allocate a few more bytes than necessary.

> Is the idea simply to overallocate in case the string grows by 1-7 chars and
> there's room in the rounded-up size for the growth?

Yes it is exactly what is intended

> Why is it harmful (to
> performance, presumably) to let realloc worry about that, if the malloc heap
> uses 8-byte alignment?

I think that it's a liitle gain to avoid some realloc() just for a few bytes.
Those realloc() calls should be fast though because there's actually nothing to
realloc in this case because malloc() did round up the size previously. It just
avoids the "realloc" code path in the string code.

> 
> What about other platforms?

The constant 7 used to round up the size should be 3 if malloc returns 4-byte
aligned adresses or could be changed to something like:
"2 * sizeof(size_t) - 1" (platform specific malloc returns 8-byte aligned
addresses) or "sizeof(size_t) - 1" (4-byte aligned adresses). 
If 7 is used when 4 mod 8 addresses are returned by malloc then a few bytes will
be wasted by the rounded-up size (too much round-up).

Darin Fisher

Comment 7

•

20 years ago

Hmm... I'm going to hold off on this malloc alignment patch until after the
string branch lands.

Bernard Alleysson

Comment 8

•

20 years ago

Comment on attachment 141418 [details] [diff] [review]
align malloc size

some testing shows that this patch saves 50% of realloc() calls.
("ReallocCount" string statistics going from 196954 to 97751 browsing a few web
sites)

Bernard Alleysson

Comment 9

•

20 years ago

Attached patch round up size to 2^n — Details — Splinter Review

This is another variation.
It saves 15% more reallocations compared to the previous patch.
I think that it can be efficient especially when the platform malloc()
implementation uses links of blocks of size 8, 16, 32, ... for small blocks.
If the malloc() implementation is not efficient then another idea would be to
use several linked lists of arena allocated small blocks of size 8, 16, ... and
reuse the free blocks without actually calling free() (for instance using one
nsFixedSizeAllocator for each block size:
http://lxr.mozilla.org/seamonkey/source/xpcom/ds/nsFixedSizeAllocator.h#41).

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 10

•

20 years ago

We gained significant memory use improvements when we stopped doing that a
number of years ago.  IIRC, something like 70% of strings are never modified.

Brendan Eich [:brendan]

Comment 11

•

20 years ago

Why should we optimize to reduce realloc count, and nothing else?  realloc is
not the dominant cost, and you're not measure heap fragmentation or its effect,
virtual memory (vm size and even resident set size) bloat.  Size-classifying
allocators, especially power-of-two ones, are fragmentation prone.

See waterson's research from several years ago into the Lea malloc, at
http://www.mozilla.org/projects/footprint/allocators.html.

Bernard, I don't think this patch is being subject to the right system tests.

/be

Robert Kaiser

Comment 12

•

20 years ago

As far as I've seen, maxheap went up on brad as well after the checkin of the
string branch... Would this patch help that as well? Would the patch help
Txul/Ts even more? I guess that's the some of the really interesting questions...

Darin Fisher

Comment 13

•

20 years ago

please see my notes in bug 234864 concerning dbaron's point #4 in comment #0 of
this bug.

Bernard Alleysson

Comment 14

•

20 years ago

I think that to improve the string library you have to improve on sharing and/or
improve allocations/reallocation. So when I first noticed the high reallocation
count noted in bug 231995 I thought that something could be improved here. I
don't like my patch to round up to 2^n very much but I just wanted to see how
many reallocations I could cut down that way, especially when I saw the 50% gain
from the first patch . That patch (the 1st one) is different because I think
that it is free on most platforms and even if it suppresses some realloc() that
are almost no-ops it can avoid some noise in the malloc stats.
Theses two patches can be run against the pageloader tests to see if there are
improvements or not. If there's too little performance gain then we'll know for
sure that reallocation is not important.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 15

•

20 years ago

Attached patch preliminary patch for non-threadsafe refcounting (obsolete) — Details — Splinter Review

I should probably do a more thorough audit rather than just waiting for
assertions and fixing them when I see them.  And I should probably test the
installer...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 16

•

20 years ago

Attached patch preliminary patch for non-threadsafe refcounting — Details — Splinter Review

same, but with better comments

Attachment #141870 - Attachment is obsolete: true

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 17

•

20 years ago

Comment on attachment 141871 [details] [diff] [review]
preliminary patch for non-threadsafe refcounting

Note that, given that it may (given some of the other suggestions above) soon
be possible for any string to become a string with a shared buffer, it would be
better to assert when assigning from any string to any string.	However, we
can't do that because there's no place to store the original thread for strings
without shared buffers since increasing the size of nsString breaks the
nsStringContainer code.

The weakness of the assertions is particularly problematic for the case of
nsAutoString, where only a large string would cause problems.

Darin Fisher

Comment 18

•

20 years ago

dbaron,

if i want to let JS share one of our sharable buffers, do i need to worry about
it releasing that buffer on a background thread?  IOW, does this patch prevent
me from implementing sharing from C++ to JS in xpconnect?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 19

•

20 years ago

Comment on attachment 141871 [details] [diff] [review]
preliminary patch for non-threadsafe refcounting

FWIW, I just tested this on btek, and it gains less than a percent (probably
about half) on pageload time.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 20

•

20 years ago

Attached patch patch for more sharing — Details — Splinter Review

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 21

•

20 years ago

Comment on attachment 141934 [details] [diff] [review]
patch for more sharing

This has no noticeable affect on pageload.

Markus Hübner

Updated

•

20 years ago

Keywords: perf

Mikael Parknert

Comment 22

•

20 years ago

Attached file nsAutoString buffer size allocation compare — Details

I tested changing the size of the ns[C]AutoString initial, stack-based, buffer.
My testcase first created a new profile, then loaded a +2M html-file (saved
bugzilla buglist) from localhost. When the page had loaded ready, I quit moz.

In this testcase there is a ~5.5% reduction of allocation and reallocation if
using a buffer of size 128 compared to a 64. I guess this could do a small
change on Tp?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 23

•

20 years ago

There might be costs involved with increasing the size of an object that is as 
commonly used as nsAutoString, even if it's stackbased. The reason is that on 
some systems you'll end up forcing the OS to muck around with increasing the 
stacksize when it runs out of what it originally had planned for.

I'm not sure if this is a problem on the platforms that moz runs on these days, 
but I do think testing needs to be done across at least the most important 
platforms before we do something like this.

IMHO it would be interesting to see some numbers on the patch from comment 3 
though. Yes, number of realloc()s is in itself not something to optimize for, 
however the benefit of reducing them might very well outweigh the cost of the 
patch. Possibly align-code like that should be platform dependant and we'd only 
use it on platforms where we know to benefit from it (like win32 apparently)

Phil Ringnalda (:philor)

Updated

•

15 years ago

QA Contact: string

Benjamin Smedberg

Updated

•

15 years ago

Assignee: dbaron → nobody

Priority: -- → P4

Target Milestone: --- → Future

Nobody; OK to take it and work on it

Assignee

Updated

•

3 years ago

Component: String → XPCOM

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.