Closed
Bug 28221
Opened 25 years ago
Closed 16 years ago
[tracking bug] profile string usage; deploy new implementations where appropriate
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: warrensomebody, Assigned: jag+mozilla)
References
Details
(Keywords: helpwanted, perf, Whiteboard: HELPWANTED)
Attachments
(8 files)
558 bytes,
text/plain
|
Details | |
14.08 KB,
patch
|
Details | Diff | Splinter Review | |
15.92 KB,
patch
|
Details | Diff | Splinter Review | |
7.72 KB,
patch
|
Details | Diff | Splinter Review | |
39.32 KB,
patch
|
Details | Diff | Splinter Review | |
133.10 KB,
text/plain
|
Details | |
181.48 KB,
text/plain
|
Details | |
3.12 KB,
patch
|
Details | Diff | Splinter Review |
I still believe that an immutable nsIString interface coupled with appropriate
implementations could be a huge win for us in terms of both space and time.
There would need to be at least 4 implementations to make this work:
- nsUnicharString for double-byte encoding,
- nsCString for single-byte encoding,
- nsSubString would manage a lengh and offset into another nsIString to avoid
copying,
- nsConcatenatedString would manage a sequence of nsIStrings, treating them as a
single concatenated string.
To determine whether my hypothesis is correct, I think we can instrument
nsString and nsCString to gather statistics that indicate how many copies of
strings we make in the process of running our app. Specifically:
- Count the number of times each nsString constructs a char/PRUnichar array.
This is often done when passing them to IDL-generated interfaces. (This number
could be completely eliminated with nsIString.) [ToNewString, ToNewCString,
ToNewUnicode, ToCString]
- Count the number of times we construct nsStrings from char/PRUnichar arrays.
This is often done when we want to manipulate strings that come in from
IDL-generated interfaces. (Some number of these could be eliminated with
nsIString.) (How can we break down first-time constructions from copies -
histogram?)
- Count the number of times we assign the character sequence in a string. Also
count the percentage of strings which are actually assigned. (This number would
indicate the number of additional nsIStrings which would need to be created due
to immutability.) [SetString, Assign, operator=]
- Count the number of substring operations done on nsStrings. (This number could
be replaced by an allocation of an nsSubString object.) [SetLength, Truncate,
Trim, Left, Mid, Right, Cut]
- Count the number of concatenation operations done on nsStrings. Also count the
percentage of strings which are concatenated. (This number could be replaced by
an allocation of an nsConcatenatedString object, saving space.) [operator+,
operator+=, Append, Insert]
- Count the number of mutation operations done on nsStrings. Also count the
percentage of strings which are mutated. (This number would indicate how often
an actual string buffer (e.g. the existing nsString implementation) would
continue to be needed.) [SetCharAt, ToLowerCase, ToUpperCase, StripChars,
StripWhitespace, ReplaceChar, ReplaceSubstring, CompressSet, CompressWhitespace]
Once we have these counts, we can see can do some analysis to determine what
sort of ramifications nsIString might have: Will we allocate far fewer strings
because they're shared more? Will we need to allocate far more substring objects
because they're mutated too often? What sort of space might we expect to save
due to more sharing. What sort of space might we expect to loose due to more
copies made as a result of mutation.
Right now we're in the dark.
Reporter | ||
Updated•25 years ago
|
Keywords: perf
Summary: investigate nsIString → investigate nsIString
Comment 1•25 years ago
|
||
Just counting constructors wont be good I think. We should factor in how many
are destroyed to get a figure of how many will exist. That will influence the
space issue.
Comment 2•25 years ago
|
||
n.b. that vidur & troy are exploring some kind of BSTR-like stuff to reduce
copies in layout. Not sure if their stuff would cross interface boundaries...
Updated•25 years ago
|
Status: NEW → ASSIGNED
Comment 4•25 years ago
|
||
I have wanted this for a long while. With this we could get rid of the
nsXPIDLCString. Adding myself to cc.
Comment 5•25 years ago
|
||
We could get rid of *some* uses of nsXPIDL(c)String -- those where the string
is immutable. If the caller is going to manipulate the string anyway then it
doesn't buy anything.
Comment 6•25 years ago
|
||
Another plan for getting rid of |nsXPIDL[C]String| is to roll its functionality
into |nsString|. See related bugs
<http://bugzilla.mozilla.org/show_bug.cgi?id=28846> -- alecf
<http://bugzilla.mozilla.org/show_bug.cgi?id=28841> -- scc
Updated•25 years ago
|
Target Milestone: M17
Comment 7•25 years ago
|
||
I'm tinkering with instrumenting nsStr with the stack walking code n' stuff to
see what intersting statistics I can produce.
Comment 8•25 years ago
|
||
Ok, made a first cut at gathering some stats. It's not everything that you asked
for, warren, but most of the ones that were really easy to pick up. The data is
below for simple startup & shutdown with www.cnn.com as the homepage.
For each operation, I captured the number of times the operation occurred, the
total number of characters (one- or two-byte) that were involved in the
operation, and the mean and standard deviation of the number of characters per
operation (cuz I knew warren'd ask).
For ctors and Assign operations, I tried to deduce when copy-on-write (COW)
sharing would and would not occur; e.g., "nsString::nsString(const nsStr&) COW"
indicates that the incoming nsStr's buffer could be shared. "NOCOW" means that
the incoming nsStr's buffer was incompatible, and would need to be inflated to
two-byte.
ns[C]String::Append(const nsStr&) sometimes needs to inflate[deflate] the
inbound nsStr, so I factored out INFL[DEFL] appends from normal appends. (This
may be indicative of how useful a segmented buffer implementation would be.)
------ Characters ------
Operation Count Total Mean StdDev
nsCString::Append(char) 3081 3081 1 +/- 0
nsCString::Append(const char*) 33624 704152 21 +/- 85
nsCString::Append(const nsCString&) 1317 17939 14 +/- 11
nsCString::Append(const nsStr&) DEFL 1213 22236 18 +/- 17
nsCString::Assign(const PRUnichar*) 62 3553 57 +/- 19
nsCString::Assign(const char*) 2139 36108 17 +/- 22
nsCString::Assign(const nsStr&) COW 85 1720 20 +/- 4
nsCString::Assign(const nsStr&) NOCOW 7217 551108 76 +/- 48
nsCString::Cut() 570 5015 9 +/- 5
nsCString::ToNewCString 7494 357502 48 +/- 26
nsCString::ToNewUnicode 64 3559 56 +/- 21
nsCString::nsCString() 35033 0 0 +/- 0
nsCString::nsCString(const PRUnichar*) 62 3553 57 +/- 19
nsCString::nsCString(const char*) 105 5971 57 +/- 68
nsCString::nsCString(const nsCString&) COW 7655 364331 48 +/- 26
nsCString::nsCString(const nsStr&) NOCOW 6896 230701 33 +/- 36
nsString::Append(PRUnichar) 153883 153883 1 +/- 0
nsString::Append(char) 982 982 1 +/- 0
nsString::Append(const PRUnichar*) 31627 676237 21 +/- 202
nsString::Append(const char*) 32759 816672 25 +/- 409
nsString::Append(const nsStr&) 11694 71151 6 +/- 8
nsString::Append(const nsStr&) INFL 236 290 1 +/- 1
nsString::Append(const nsString&) 19554 292897 15 +/- 25
nsString::Cut() 4897 68723 14 +/- 228
nsString::Insert(PRUnichar) 1383 1383 1 +/- 0
nsString::Insert(const char*) 70 140 2 +/- 0
nsString::SetCharAt() 4591 4591 1 +/- 0
nsString::ToNewCString 2351 56711 24 +/- 23
nsString::ToNewUTF8String 4204 169205 40 +/- 41
nsString::ToNewUnicode 3674 56392 15 +/- 13
nsString::nsString() 133552 0 0 +/- 0
nsString::nsString(const PRUnichar*) 1055 14819 14 +/- 8
nsString::nsString(const char*) 1953 253280 130 +/- 1671
nsString::nsString(const nsStr&) NOCOW 230 6670 29 +/- 52
nsString::nsString(const nsString&) COW 10843 222664 21 +/- 526
Comment 9•25 years ago
|
||
Comment 10•25 years ago
|
||
Comment 11•25 years ago
|
||
Attached extra files and patches required to gather statistics.
Comment 12•25 years ago
|
||
Reporter | ||
Comment 13•25 years ago
|
||
Here's another run, with more functions accounted for, and visiting more sites:
------ Characters ------
Operation Count Total Mean StdDev
nsCString::Append(char) 8600 8600 1 +/- 0
nsCString::Append(const char*) 157749 2465740 16 +/- 51
nsCString::Append(const nsCString&) 2348 27977 12 +/- 10
nsCString::Append(const nsStr&) DEFL 3549 59550 17 +/- 17
nsCString::Assign(const PRUnichar*) 347 15652 45 +/- 14
nsCString::Assign(const char*) 5762 185482 32 +/- 43
nsCString::Assign(const nsStr&) COW 152 2978 20 +/- 3
nsCString::Assign(const nsStr&) NOCOW 33221 2086625 63 +/- 45
nsCString::Cut() 3306 26758 8 +/- 5
nsCString::SetCharAt() 196 196 1 +/- 0
nsCString::ToNewCString 35072 1739748 50 +/- 30
nsCString::ToNewUnicode 375 15521 41 +/- 18
nsCString::nsCString() 133167 0 0 +/- 0
nsCString::nsCString(const PRUnichar*) 347 15652 45 +/- 14
nsCString::nsCString(const char*) 496 57162 115 +/- 93
nsCString::nsCString(const nsCString&) COW 35300 1747820 50 +/- 30
nsCString::nsCString(const nsStr&) NOCOW 37316 1381001 37 +/- 42
nsString::Append(PRUnichar) 215192 215192 1 +/- 0
nsString::Append(char) 1996 1996 1 +/- 0
nsString::Append(const PRUnichar*) 146947 3483496 24 +/- 152
nsString::Append(const char*) 186147 3901793 21 +/- 195
nsString::Append(const nsStr&) 44255 318232 7 +/- 34
nsString::Append(const nsStr&) INFL 405 1578 4 +/- 4
nsString::Append(const nsString&) 100415 1785264 18 +/- 36
nsString::Assign(PRUnichar) 27116 27116 1 +/- 0
nsString::Assign(char) 322 322 1 +/- 0
nsString::Assign(const PRUnichar*) 28497 464960 16 +/- 52
nsString::Assign(const char*) 88503 3087143 35 +/- 281
nsString::Assign(const nsStr&) COW 133837 1256708 9 +/- 35
nsString::Assign(const nsStr&) NOCOW 11782 83533 7 +/- 16
nsString::Cut() 31814 578205 18 +/- 249
nsString::Insert(PRUnichar) 8674 8674 1 +/- 0
nsString::Insert(const char*) 1461 2932 2 +/- 0
nsString::SetCharAt() 41303 41303 1 +/- 0
nsString::ToNewCString 8461 224481 27 +/- 36
nsString::ToNewUTF8String 28337 1151229 41 +/- 43
nsString::ToNewUnicode 3177 53286 17 +/- 12
nsString::nsString() 705595 0 0 +/- 0
nsString::nsString(const PRUnichar*) 4675 83960 18 +/- 17
nsString::nsString(const char*) 16266 494636 30 +/- 651
nsString::nsString(const nsStr&) NOCOW 490 39884 81 +/- 114
nsString::nsString(const nsString&) COW 33812 515520 15 +/- 329
TOTAL 2326782 27657905
Here, 12.7% of the characters fall into the COW category. On a previous run for
just the mozilla.org page, I got 14.4%. Seems like we can safely assume
that we can save >10% by doing COW.
Comment 14•25 years ago
|
||
Three things to test:
(1) put a flag in |nsStr| to simulate COW semantics ... "this is a reference"
then charge subsequent mutators with the cost of an allocation/copy
this will help us better determine the value of adding COW
(2) count the number of times a string had mutators applied to it, this will
help us better determine the value of adding, e.g., an |nsIImutableString|
(3) put a time in the string, and whenever it changes size or when it gets
destroyed, add its duration to a bucket for that size. This will help
us better determine the value of adding, e.g., arena based allocation
for some capacities.
Comment 15•25 years ago
|
||
*** Bug 28842 has been marked as a duplicate of this bug. ***
Comment 16•25 years ago
|
||
*** Bug 26435 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 17•25 years ago
|
||
Interesting news! I implemented number (2) to determine how many strings are
mutated. Here are the results:
Allocated strings = 833582
Mutated strings = 551607
Unmutated strings = 281252
That's over 50%!! This was for a run visiting about 10-15 pages, including
tinderbox, a build log, cnn, yahoo, abcnews and others.
If you add my numbers up you'll see that there are 700+ strings unaccounted
for. This is because I only determine the number of mutated/unmutated strings
when they're destroyed, so the remining ones must be leaks.
Here's another longer run (60%):
Allocated strings = 1231811
Mutated strings = 777911
Unmutated strings = 467373
and here's one bringing up my mailbox (with 4000+ messages), and forwarding a
message with a lot of extra typing added to it (39%):
Allocated strings = 2170221
Mutated strings = 1594744
Unmutated strings = 621191
Immutable strings should be a huge win!
Comment 18•25 years ago
|
||
That's what I had informally determined as well, that immutable strings would
be a big win
Comment 19•25 years ago
|
||
And that's not even counting char* and PRUnichar* strings that are manually
alloc'd and are never in nsStrings. Right?
Comment 20•25 years ago
|
||
I instrumented |ns[C]String| to determine the amount of character copying and the
number of allocations that would be saved by implementing a Copy-On-Write [COW]
mechanism without necessarily changing the current interface. Here is a sample
run, typical of my results to date.
un-shared work: 15313192
COW work: 1183682
un-shared allocations: 416265
COW allocations: 460985 or about 10.74% more than un-shared
allocations
Yes, we save a lot on copying characters, but we actually end up doing _more_
allocations. The reason? One explanation is callers copying strings and making
small modifications in extant string variables explicitly for this purpose. So,
for example, I have a string variable into which I copy another string (the
allocation and copying are deferred), but now I modify it (and am charged for the
allocation, and some fraction of the copying, depending on the operation) and do
something with it, like compare. Then I copy another string into it (the current
value is released, but the allocation and copy are deferred again), and then, as
before, immediately make a change... now I'm charged for an allocation that I
wouldn't have been in the non-COW implementation.
Interesting results.
Comment 21•25 years ago
|
||
Reporter | ||
Comment 22•25 years ago
|
||
Comment 23•25 years ago
|
||
One comment about SCC's analysis: the 2nd copy would not necessarily require a
subsequent allocation. We can implement COW so that the underlying buffer is
retained until the string is deleted or resized. The original buffer could be
reused in the 2nd copy, so that the 2nd allocation (may not) be neccessary. Of
course a great deal depends on the size of the strings being operated upon.
Comment 24•25 years ago
|
||
With respect to the notion of a segmented string implementation: note that
|GetUnicode| is called in more than a thousand places. |GetBuffer| is called
less, but still quite a bit.
<http://lxr.mozilla.org/seamonkey/search?string=GetUnicode>
<http://lxr.mozilla.org/seamonkey/search?string=GetBuffer>
Both are obstacles to implementing a segmented string since callers expect the
resulting pointer to point to the entire buffer, and they do math with it or pass
it to things expecting an entire string.
Comment 25•25 years ago
|
||
I agree, Scott. It makes you wish we had iterators, doesn't it?
Reporter | ||
Comment 26•25 years ago
|
||
I did some more analysis... of how many strings have GetBuffer or GetUnicode
called for them. Here's the answer:
Allocated strings = 756941
Mutated strings = 521389 (68%)
Unmutated strings = 245669 (32%)
Contiguous buffers = 140836 (18%)
This was for visiting mozilla.org, cnn.com, abcnews.com, usatoday.com.
GetBuffer and GetUnicode were only called for 18% of the strings. So I think a
non-contiguous buffer implementation could still be a win. I didn't count how
many times GetBuffer/GetUnicode were called for the same string, but that would
be easy to add.
On another note... I must have been on crack when I reported the percentages
for unmutated strings. For the 3 runs I listed above, the percentages are
33.74%, 37.94% and 28.62% respectively. And for the above run, 32%. Still a
win, although not quite as spectacular as first reported.
Reporter | ||
Comment 27•25 years ago
|
||
P.S. My GetBuffer/GetUnicode analysis doesn't include places in the code that
use mStr directly, so it's an upper bound. If there's a critical place in the
code that uses mStr, then my numbers could be completely off.
Comment 28•25 years ago
|
||
I am posting the following set of recommendations to this bug to keep external
developers informed of the direction in which we are heading. The main players
in this impending change are already, as far as I know, all on the same page.
Rick Gessner <rickg@netscape.com> recently sent out his recommendations---which
I hope he will also post to this bug---which touch on the same themes. Rick's
recommendations paint a fairly good picture of the new world. I still think
the following evaluation is valuable, because it goes into detail as to what the
actual changes are/should-be, and _why_ those particular changes are important.
We want to move to a world where string clients can select from among a range of
implementations that embody different implementation strategies, e.g., a
general purpose string such as we have now, and specific-use implementations
like an immutable string that optimizes allocations, and a segmented string that
minimizes character copying over editing operations on very large datasets.
These new goals impose new requirements on our current string interfaces. Any
changes we make to the current interface must be source compatible with extant
clients, or we must be willing to pay the penalty of updating callers.
Note: our new goals fall out of our experiences using strings in our
application. They differ significantly from our original goals (which were all
about revealing the underlying implementation to clients for performance) and
so none of these recommendations can or should be taken as a criticism of the
current interface.
Specific recommendations fall into several categories (note: these are not the
the recommendations ... these are the categories). Very roughly in order of
importance with respect to this effort:
[A] removing from the interface any visible members that compromise the
abstraction allowing different underlying representations, else clients
won't be satisfied by alternate implementations
[B] removing from the interface any routines that aren't specifically about
manipulating the underlying representation, else alternate implementations
must re-implement identical functionality
[C] removing from the interface any i18n sensitive functionality, though
mostly instances of this recommendation will be covered by [B] above
[D] removing from the interface unused, unneeded, or unconventionally located
functionality, to reduce the burden on alternate implementations, and to
generally simplify
[E] adding to the interface any support machinery needed to enable changes
falling into one of the categories above, or simply to allow multiple
implementations at all, e.g., |virtual|
I believe rickg and I are already very much in agreement on these points. We
discussed them at length, and his recent message on redesign notes echoes these
sentiments. It is clear from his recent email messages that he has been
focused on these same key issues.
Here are my specific recommendations:
1 [A] Remove public inheritance from |nsStr|. Access to a specific
underlying buffer representation is prohibitive to alternate
implementations, e.g., a segmented string et al. It is also agreed that
having any public data members is a political impediment to crossing
XPCOM boundaries. According to rickg, visible inheritance from |nsStr|
is not exploited heavily, and should be easily removed. This is arguably
the most important thing we can do to enable further enhancements to our
string implementations and uses.
2 [A] |GetUnicode| and |GetBuffer| impose a prohibitive burden on
implementations in a multiple implementation world. As rickg points out,
this is another reason to add iterators. Unfortunately, these two
routines are very heavily used.
3 [E] Make the string interface abstract to allow multiple implementations.
We were already paying for a vtable, so no extra space requirements are
expected. The performance impact should be minimal.
4 [E] Split the abstract interface into layers encouraging read-only
implementations, e.g., an immutable string
5 [BCD] Make narrowing/widening an explicit operation done by constructors.
Do not allow implicit conversion in append and assign operations. Tests
show that we are not exploiting the `double interface' of string very
much, and this is good. Note that like |ToUpperCase| (et al)
functionality mentioned below, encoding conversions are properly in the
domain of i18n, and duplicating the functionality at the low-level in
string is suboptimal.
6 [BDE] Either remove operator overloading from the abstract interface, or
implement it conventionally, that is: non-virtual inlines using only the
abstract signatures for |Append| and |Assign|. Implement |operator+=()|
and |operator=()| as members; implement |operator+()| and relations as
non-members. Virtual assignment operators must be written carefully to
avoid slicing.
7 [BCDE] Remove |ToUpperCase|, |ToLowerCase|, |IsASCII|, |IsUnicode|,
|IsSpace|, and |IsAlpha| from the interface. Of these, only |ToLowerCase|
is heavily used, and i18n functionality like this must be pushed up into
the i18n layer, where, coincidentally, this functionality already happens
to exist.
8 [D] Remove (the little used) |ToNewString| from the interface. This
functionality is already available in the form of the copy-constructor.
In a multiple implementation world, the user will typically need to
select a specific implementation, in any case.
9 [BCD] Remove |IsOrdered| and |BinarySearch| from the interface. These are
not
general purpose routines, and can easily be implemented outside the
string class if they are deemed still needed.
10 [BCD] |EqualsIgnoreCase| and the |Compare| functions when the
|aIgnoreCase| parameter is true are problematic just as the other i18n
dependent routines are. Unfortunately, these routines are very heavily
used. Again, they are a burden in a multiple implementation world. They
should be implemented as non-members (based on extant i18n facilities)
that use iterators into the underlying string ... which also implies that
we will need string iterators.
11 [BCD] |ToFloat| and |ToInteger| should be removed from the interface.
Parsing should not be part of the required functionality for multiple
implementations. Given iterators, this functionality could be moved to
a non-member implementation, which, in any case, is again requires i18n
sensitivity. |ToFloat| is not heavily used. |ToInteger| is.
Similarly, the |Append|s that format a float or an integer are i18n
dependent. Some work may be required to provide similar functionality
that is factored into the i18n support.
12 [D] We probably don't need the power to say something like
myStr.Assign(yourStr).Append(herStr).Cut(20, 15);
It makes sense with operators, but we may want to simplify the client
interface with respect to named member functions. The |Assign|, |Append|,
|Insert|, |Cut|, and |SetString|, signatures should be changed to return
|void|.
13 [BCD] Turn the specialized modification and accessor functions |Trim|,
|CompressSet|, |StripChar|, |StripChars|, |StripWhitespace|,
|ReplaceChar|, |ReplaceSubstring|, and |CountChar| into non-member
`algorithms' that can be applied to any implementation.
14 [DE] Given the current copying signatures of |Left|, |Right|, and |Mid|,
they should probably be turned into non-member algorithms writing to an
iterator as well.
15 [E] Add iterators. Several of the points above are eased or solved by the
introduction of reasonable iterators.
16 [AD] Eliminate |nsSubsume[C]Str|. To much implementation knowledge is
currently required to reasonably utilize this in clients, and it presents
a burden to implementations to facilitate.
Updated•25 years ago
|
Priority: P3 → P1
Comment 30•25 years ago
|
||
fixing summary to better reflect our understanding
Summary: investigate nsIString → factor |ns[C]String| to allow specialized implementations
Comment 31•25 years ago
|
||
mass re-assigning to my new bugzilla account
Assignee: scc → scc
Status: ASSIGNED → NEW
Updated•25 years ago
|
Status: NEW → ASSIGNED
Comment 32•25 years ago
|
||
Well, NEW_STRING_APIS is now switched on. The factoring is accomplished. And
some new implementation exist to solve some problems. We need a replacement for
XPIDL string; we need a COW implementation; we need to deploy the new
implementations. I'm re-summarizing this bug for the work of measuring and
deploying the new implementations.
Keywords: helpwanted
Summary: factor |ns[C]String| to allow specialized implementations → profile string usage; deploy new implementations where appropriate
Whiteboard: HELPWANTED
Target Milestone: M16 → M20
Comment 33•25 years ago
|
||
Putting on nsbeta3 radar. warren say we really need to get this in for PR3, per
beta2 PDT reviews.
Keywords: nsbeta3
Comment 34•24 years ago
|
||
Simon, can you add your recent profiling work to this bug?
For everyone else, under discussion is the idea that `chunk' allocating strings
has turned out to be a bigger source of wasted space than it has been a
performance boon.
(oops, too many cc's, Bugzilla is making me remove one to add simon. Sorry dp)
Comment 35•24 years ago
|
||
Adding newsgroup postings on string usage:
<news:sfraser-F78927.13360202062000@secnews1.mcom.com>
It seems that nsString::SetCapacity() always buffers the string
size in 64-character chunks - this logic lives down in nsStr::Alloc.
So:
nsString foo;
foo.SetCapacity(1);
foo.Assign('a');
will eat up 128 bytes of heap space.
I have not found a way that I can set the capacity of an nsString
to exactly the length I know is needed. This of course has quite
an impact on bloat.
Seeing this leads me to question how often we actually need to
chunk changes in string length; what proportion of strings actually
change length during their lifetime? My guess is that it's < 50%,
which perhaps suggests that the normal behaviour should be to
not round up string sizes, and that we should have an API that
allows the caller to create a string with, or specify that an
existing string is likely to change length frequently from now
on.
<news:sfraser-0720CE.17500902062000@secnews1.mcom.com>
Some data on the bloat that results from string chunking (recall,
bloat = total memory every allocated, not a runtime high-water mark).
Numbers are K.
Test Allocated Used Waste % waste
--------------------------------------------------------------
Simple browser 2938.00 1548.98 1389.01 47.28%
Complex browser 5839.17 3214.73 2624.44 44.95%
Mail 6232.82 3369.89 2862.93 45.93%
So this chunking almost doubles the amount of memory that our
strings use.
<news:sfraser-D08810.16523102062000@secnews1.mcom.com>
In another post, waterson posed the question of how many strings with
identical contents are allocated, and whether we could use atoms for
these common strings. (He was, I think, talking about string usage in
a particular module/API, but the question can be generalized.)
So I put some debug code in nsStr::Destroy, that dumps out the contents
of strings just before they are deleted, if aDest.mOwnsBuffer == PR_TRUE
(which indicates that the buffer was heap-allocated). Some results are below.
These results can be used to find places is the code the might benefit from
shared strings, or cacheing of frequently used strings. Of course, I have
no data on call sites here.
The data look like this:
848 1 dummy:path
'848' is the count (# strings with these contents), '1' is the character width
(1 for char, 2 for PRUnichar), and the rest is the string itself.
Test 1:
Bring up browser, loading simple text-only HTML page, Quit.
<http://www.smfr.org/mozilla/sortedstrings.txt>
848 1 dummy:path
510 2 true
333 2 file:///Other%20stuff/Documents/Mozilla/Users50/Simon/localstore.rdf
303 2 monospace
278 2 component://netscape/layout/element-factory?namespace=http://
www.mozilla.org/xbl
277 1 component://netscape/layout/element-factory?namespace=http://
www.mozilla.org/xbl
255 2 geneva
205 2 component://netscape/layout/element-factory?namespace=http://
www.mozilla.org/keymaster/gatekeeper/there.is.only.xul
200 1 component://netscape/layout/element-factory?namespace=http://
www.mozilla.org/keymaster/gatekeeper/there.is.only.xul
167 1 file:///Other%20stuff/Documents/Mozilla/Users50/Simon/localstore.rdf
159 1 file:///Bleeding%20Edge/Mozilla%20tree/src/mozilla/dist/viewer_debug/
135 2 ISO-8859-1
123 2 UTF-8
117 2 serif
85 2 broadcaster
79 2 menuitem
78 2 menupopup
73 2 <string is a run of 28 spaces>
71 2 vertical
67 2 rdf:http://home.netscape.com/NC-rdf#Name
Test 2:
Bring up browser, surf to mozilla.org, tinderbox, bugzilla, load a bug,
open prefs dialog.
<http://www.smfr.org/mozilla/sortedbrowser.txt>
1797 2 monospace
1698 2 true
1007 2 serif
879 2 geneva
848 1 dummy:path
371 2 component://netscape/layout/element-factory?namespace=http://
www.mozilla.org/xbl
370 1 component://netscape/layout/element-factory?namespace=http://
www.mozilla.org/xbl
333 2 file:///Other%20stuff/Documents/Mozilla/Users50/Simon/localstore.rdf
322 2 ISO-8859-1
315 1 file:///Bleeding%20Edge/Mozilla%20tree/src/mozilla/dist/viewer_debug/
266 2 vertical
248 2 component://netscape/layout/element-factory?namespace=http://
www.mozilla.org/keymaster/gatekeeper/there.is.only.xul
243 2 white
236 1 component://netscape/layout/element-factory?namespace=http://
www.mozilla.org/keymaster/gatekeeper/there.is.only.xul
176 2 \
167 1 file:///Other%20stuff/Documents/Mozilla/Users50/Simon/localstore.rdf
159 2 UTF-8
128 1 css
125 2 never
121 2 black
Test 3:
Bring up browser, open mail-news, load 2 large IMAP folders
(including one of bugzilla mail)
<http://www.smfr.org/mozilla/sortedmail.txt>
7084 2 UTF-8
4785 2 us-ascii
2809 1 mozilla.org
2808 1 bugzilla-daemon
1304 2 true
1034 2 component://netscape/intl/unicode/decoder?charset=x-imap4-modified-utf7
1032 2 x-imap4-modified-utf7
1029 2 never
987 1 netscape.com
957 2 geneva
948 1 %S Receiving: message headers %lu of %lu
947 1 sfraser
941 2 Bugzilla
894 2 monospace
848 1 dummy:path
500 2 file:///Other%20stuff/Documents/Mozilla/Users50/Simon/localstore.rdf
488 2 autostretch
392 2 menuitem
358 2 component://netscape/layout/element-factory?namespace=http://
www.mozilla.org/xbl
357 1 component://netscape/layout/element-factory?namespace=http://
www.mozilla.org/xbl
Comment 36•24 years ago
|
||
Comment 37•24 years ago
|
||
Comment 38•24 years ago
|
||
Comment 39•24 years ago
|
||
Doug, I attached your patch and data to this bug ... which seems like the
appropriate place. What conclusions can we draw from this data? There are
certain very common strings, true, but this is not enough to know if they are
candidates for being replaced with |nsShared[C]String|s, since we don't know how
they were generated. What do you think? The other bug filed on this is bug
#46738. I commented there as well. We should consider marking that bug either a
duplicate or a blocker for this bug.
Updated•24 years ago
|
QA Contact: leger → kandrot
Updated•24 years ago
|
Component: XPCOM → String
OS: Windows NT → All
Hardware: PC → All
Target Milestone: --- → mozilla0.9.1
Comment 40•24 years ago
|
||
marking dependencies, turning this [officially] into a tracking bug for
deploying new string implementations
Updated•24 years ago
|
Summary: profile string usage; deploy new implementations where appropriate → [tracking bug] profile string usage; deploy new implementations where appropriate
Updated•24 years ago
|
Target Milestone: mozilla0.9.1 → ---
Updated•24 years ago
|
Priority: P1 → --
Assignee | ||
Updated•23 years ago
|
QA Contact: kandrot → jaggernaut
Comment 41•22 years ago
|
||
giving up ancient string bugs to the new string owner. jag, you'll want to sort
through these and see which ones still apply and go with or against the
direction in which you intend strings evolve
Assignee: scc → jaggernaut
Status: ASSIGNED → NEW
Assignee | ||
Comment 42•16 years ago
|
||
This work is no longer relevant, strings have had a new implementation for a while now :-)
If the new strings code needs performance tuning, please file new bugs.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
Updated•4 years ago
|
Component: String → XPCOM
You need to log in
before you can comment on or make changes to this bug.
Description
•