558 bytes, text/plain
14.08 KB, patch
|Details | Diff | Splinter Review|
15.92 KB, patch
|Details | Diff | Splinter Review|
7.72 KB, patch
|Details | Diff | Splinter Review|
39.32 KB, patch
|Details | Diff | Splinter Review|
133.10 KB, text/plain
181.48 KB, text/plain
3.12 KB, patch
|Details | Diff | Splinter Review|
I still believe that an immutable nsIString interface coupled with appropriate implementations could be a huge win for us in terms of both space and time. There would need to be at least 4 implementations to make this work: - nsUnicharString for double-byte encoding, - nsCString for single-byte encoding, - nsSubString would manage a lengh and offset into another nsIString to avoid copying, - nsConcatenatedString would manage a sequence of nsIStrings, treating them as a single concatenated string. To determine whether my hypothesis is correct, I think we can instrument nsString and nsCString to gather statistics that indicate how many copies of strings we make in the process of running our app. Specifically: - Count the number of times each nsString constructs a char/PRUnichar array. This is often done when passing them to IDL-generated interfaces. (This number could be completely eliminated with nsIString.) [ToNewString, ToNewCString, ToNewUnicode, ToCString] - Count the number of times we construct nsStrings from char/PRUnichar arrays. This is often done when we want to manipulate strings that come in from IDL-generated interfaces. (Some number of these could be eliminated with nsIString.) (How can we break down first-time constructions from copies - histogram?) - Count the number of times we assign the character sequence in a string. Also count the percentage of strings which are actually assigned. (This number would indicate the number of additional nsIStrings which would need to be created due to immutability.) [SetString, Assign, operator=] - Count the number of substring operations done on nsStrings. (This number could be replaced by an allocation of an nsSubString object.) [SetLength, Truncate, Trim, Left, Mid, Right, Cut] - Count the number of concatenation operations done on nsStrings. Also count the percentage of strings which are concatenated. (This number could be replaced by an allocation of an nsConcatenatedString object, saving space.) [operator+, operator+=, Append, Insert] - Count the number of mutation operations done on nsStrings. Also count the percentage of strings which are mutated. (This number would indicate how often an actual string buffer (e.g. the existing nsString implementation) would continue to be needed.) [SetCharAt, ToLowerCase, ToUpperCase, StripChars, StripWhitespace, ReplaceChar, ReplaceSubstring, CompressSet, CompressWhitespace] Once we have these counts, we can see can do some analysis to determine what sort of ramifications nsIString might have: Will we allocate far fewer strings because they're shared more? Will we need to allocate far more substring objects because they're mutated too often? What sort of space might we expect to save due to more sharing. What sort of space might we expect to loose due to more copies made as a result of mutation. Right now we're in the dark.
Just counting constructors wont be good I think. We should factor in how many are destroyed to get a figure of how many will exist. That will influence the space issue.
n.b. that vidur & troy are exploring some kind of BSTR-like stuff to reduce copies in layout. Not sure if their stuff would cross interface boundaries...
adding myself to cc.
I have wanted this for a long while. With this we could get rid of the nsXPIDLCString. Adding myself to cc.
We could get rid of *some* uses of nsXPIDL(c)String -- those where the string is immutable. If the caller is going to manipulate the string anyway then it doesn't buy anything.
Another plan for getting rid of |nsXPIDL[C]String| is to roll its functionality into |nsString|. See related bugs <http://bugzilla.mozilla.org/show_bug.cgi?id=28846> -- alecf <http://bugzilla.mozilla.org/show_bug.cgi?id=28841> -- scc
I'm tinkering with instrumenting nsStr with the stack walking code n' stuff to see what intersting statistics I can produce.
Ok, made a first cut at gathering some stats. It's not everything that you asked for, warren, but most of the ones that were really easy to pick up. The data is below for simple startup & shutdown with www.cnn.com as the homepage. For each operation, I captured the number of times the operation occurred, the total number of characters (one- or two-byte) that were involved in the operation, and the mean and standard deviation of the number of characters per operation (cuz I knew warren'd ask). For ctors and Assign operations, I tried to deduce when copy-on-write (COW) sharing would and would not occur; e.g., "nsString::nsString(const nsStr&) COW" indicates that the incoming nsStr's buffer could be shared. "NOCOW" means that the incoming nsStr's buffer was incompatible, and would need to be inflated to two-byte. ns[C]String::Append(const nsStr&) sometimes needs to inflate[deflate] the inbound nsStr, so I factored out INFL[DEFL] appends from normal appends. (This may be indicative of how useful a segmented buffer implementation would be.) ------ Characters ------ Operation Count Total Mean StdDev nsCString::Append(char) 3081 3081 1 +/- 0 nsCString::Append(const char*) 33624 704152 21 +/- 85 nsCString::Append(const nsCString&) 1317 17939 14 +/- 11 nsCString::Append(const nsStr&) DEFL 1213 22236 18 +/- 17 nsCString::Assign(const PRUnichar*) 62 3553 57 +/- 19 nsCString::Assign(const char*) 2139 36108 17 +/- 22 nsCString::Assign(const nsStr&) COW 85 1720 20 +/- 4 nsCString::Assign(const nsStr&) NOCOW 7217 551108 76 +/- 48 nsCString::Cut() 570 5015 9 +/- 5 nsCString::ToNewCString 7494 357502 48 +/- 26 nsCString::ToNewUnicode 64 3559 56 +/- 21 nsCString::nsCString() 35033 0 0 +/- 0 nsCString::nsCString(const PRUnichar*) 62 3553 57 +/- 19 nsCString::nsCString(const char*) 105 5971 57 +/- 68 nsCString::nsCString(const nsCString&) COW 7655 364331 48 +/- 26 nsCString::nsCString(const nsStr&) NOCOW 6896 230701 33 +/- 36 nsString::Append(PRUnichar) 153883 153883 1 +/- 0 nsString::Append(char) 982 982 1 +/- 0 nsString::Append(const PRUnichar*) 31627 676237 21 +/- 202 nsString::Append(const char*) 32759 816672 25 +/- 409 nsString::Append(const nsStr&) 11694 71151 6 +/- 8 nsString::Append(const nsStr&) INFL 236 290 1 +/- 1 nsString::Append(const nsString&) 19554 292897 15 +/- 25 nsString::Cut() 4897 68723 14 +/- 228 nsString::Insert(PRUnichar) 1383 1383 1 +/- 0 nsString::Insert(const char*) 70 140 2 +/- 0 nsString::SetCharAt() 4591 4591 1 +/- 0 nsString::ToNewCString 2351 56711 24 +/- 23 nsString::ToNewUTF8String 4204 169205 40 +/- 41 nsString::ToNewUnicode 3674 56392 15 +/- 13 nsString::nsString() 133552 0 0 +/- 0 nsString::nsString(const PRUnichar*) 1055 14819 14 +/- 8 nsString::nsString(const char*) 1953 253280 130 +/- 1671 nsString::nsString(const nsStr&) NOCOW 230 6670 29 +/- 52 nsString::nsString(const nsString&) COW 10843 222664 21 +/- 526
Attached extra files and patches required to gather statistics.
Here's another run, with more functions accounted for, and visiting more sites: ------ Characters ------ Operation Count Total Mean StdDev nsCString::Append(char) 8600 8600 1 +/- 0 nsCString::Append(const char*) 157749 2465740 16 +/- 51 nsCString::Append(const nsCString&) 2348 27977 12 +/- 10 nsCString::Append(const nsStr&) DEFL 3549 59550 17 +/- 17 nsCString::Assign(const PRUnichar*) 347 15652 45 +/- 14 nsCString::Assign(const char*) 5762 185482 32 +/- 43 nsCString::Assign(const nsStr&) COW 152 2978 20 +/- 3 nsCString::Assign(const nsStr&) NOCOW 33221 2086625 63 +/- 45 nsCString::Cut() 3306 26758 8 +/- 5 nsCString::SetCharAt() 196 196 1 +/- 0 nsCString::ToNewCString 35072 1739748 50 +/- 30 nsCString::ToNewUnicode 375 15521 41 +/- 18 nsCString::nsCString() 133167 0 0 +/- 0 nsCString::nsCString(const PRUnichar*) 347 15652 45 +/- 14 nsCString::nsCString(const char*) 496 57162 115 +/- 93 nsCString::nsCString(const nsCString&) COW 35300 1747820 50 +/- 30 nsCString::nsCString(const nsStr&) NOCOW 37316 1381001 37 +/- 42 nsString::Append(PRUnichar) 215192 215192 1 +/- 0 nsString::Append(char) 1996 1996 1 +/- 0 nsString::Append(const PRUnichar*) 146947 3483496 24 +/- 152 nsString::Append(const char*) 186147 3901793 21 +/- 195 nsString::Append(const nsStr&) 44255 318232 7 +/- 34 nsString::Append(const nsStr&) INFL 405 1578 4 +/- 4 nsString::Append(const nsString&) 100415 1785264 18 +/- 36 nsString::Assign(PRUnichar) 27116 27116 1 +/- 0 nsString::Assign(char) 322 322 1 +/- 0 nsString::Assign(const PRUnichar*) 28497 464960 16 +/- 52 nsString::Assign(const char*) 88503 3087143 35 +/- 281 nsString::Assign(const nsStr&) COW 133837 1256708 9 +/- 35 nsString::Assign(const nsStr&) NOCOW 11782 83533 7 +/- 16 nsString::Cut() 31814 578205 18 +/- 249 nsString::Insert(PRUnichar) 8674 8674 1 +/- 0 nsString::Insert(const char*) 1461 2932 2 +/- 0 nsString::SetCharAt() 41303 41303 1 +/- 0 nsString::ToNewCString 8461 224481 27 +/- 36 nsString::ToNewUTF8String 28337 1151229 41 +/- 43 nsString::ToNewUnicode 3177 53286 17 +/- 12 nsString::nsString() 705595 0 0 +/- 0 nsString::nsString(const PRUnichar*) 4675 83960 18 +/- 17 nsString::nsString(const char*) 16266 494636 30 +/- 651 nsString::nsString(const nsStr&) NOCOW 490 39884 81 +/- 114 nsString::nsString(const nsString&) COW 33812 515520 15 +/- 329 TOTAL 2326782 27657905 Here, 12.7% of the characters fall into the COW category. On a previous run for just the mozilla.org page, I got 14.4%. Seems like we can safely assume that we can save >10% by doing COW.
Three things to test: (1) put a flag in |nsStr| to simulate COW semantics ... "this is a reference" then charge subsequent mutators with the cost of an allocation/copy this will help us better determine the value of adding COW (2) count the number of times a string had mutators applied to it, this will help us better determine the value of adding, e.g., an |nsIImutableString| (3) put a time in the string, and whenever it changes size or when it gets destroyed, add its duration to a bucket for that size. This will help us better determine the value of adding, e.g., arena based allocation for some capacities.
*** Bug 28842 has been marked as a duplicate of this bug. ***
*** Bug 26435 has been marked as a duplicate of this bug. ***
Interesting news! I implemented number (2) to determine how many strings are mutated. Here are the results: Allocated strings = 833582 Mutated strings = 551607 Unmutated strings = 281252 That's over 50%!! This was for a run visiting about 10-15 pages, including tinderbox, a build log, cnn, yahoo, abcnews and others. If you add my numbers up you'll see that there are 700+ strings unaccounted for. This is because I only determine the number of mutated/unmutated strings when they're destroyed, so the remining ones must be leaks. Here's another longer run (60%): Allocated strings = 1231811 Mutated strings = 777911 Unmutated strings = 467373 and here's one bringing up my mailbox (with 4000+ messages), and forwarding a message with a lot of extra typing added to it (39%): Allocated strings = 2170221 Mutated strings = 1594744 Unmutated strings = 621191 Immutable strings should be a huge win!
That's what I had informally determined as well, that immutable strings would be a big win
And that's not even counting char* and PRUnichar* strings that are manually alloc'd and are never in nsStrings. Right?
I instrumented |ns[C]String| to determine the amount of character copying and the number of allocations that would be saved by implementing a Copy-On-Write [COW] mechanism without necessarily changing the current interface. Here is a sample run, typical of my results to date. un-shared work: 15313192 COW work: 1183682 un-shared allocations: 416265 COW allocations: 460985 or about 10.74% more than un-shared allocations Yes, we save a lot on copying characters, but we actually end up doing _more_ allocations. The reason? One explanation is callers copying strings and making small modifications in extant string variables explicitly for this purpose. So, for example, I have a string variable into which I copy another string (the allocation and copying are deferred), but now I modify it (and am charged for the allocation, and some fraction of the copying, depending on the operation) and do something with it, like compare. Then I copy another string into it (the current value is released, but the allocation and copy are deferred again), and then, as before, immediately make a change... now I'm charged for an allocation that I wouldn't have been in the non-COW implementation. Interesting results.
Created attachment 5842 [details] [diff] [review] Here is a patch containing the changes I made to measure COW efficacy
Created attachment 5856 [details] [diff] [review] diffs for mutated/unmutated accounting (in addition to Chris' diffs)
One comment about SCC's analysis: the 2nd copy would not necessarily require a subsequent allocation. We can implement COW so that the underlying buffer is retained until the string is deleted or resized. The original buffer could be reused in the 2nd copy, so that the 2nd allocation (may not) be neccessary. Of course a great deal depends on the size of the strings being operated upon.
With respect to the notion of a segmented string implementation: note that |GetUnicode| is called in more than a thousand places. |GetBuffer| is called less, but still quite a bit. <http://lxr.mozilla.org/seamonkey/search?string=GetUnicode> <http://lxr.mozilla.org/seamonkey/search?string=GetBuffer> Both are obstacles to implementing a segmented string since callers expect the resulting pointer to point to the entire buffer, and they do math with it or pass it to things expecting an entire string.
I agree, Scott. It makes you wish we had iterators, doesn't it?
I did some more analysis... of how many strings have GetBuffer or GetUnicode called for them. Here's the answer: Allocated strings = 756941 Mutated strings = 521389 (68%) Unmutated strings = 245669 (32%) Contiguous buffers = 140836 (18%) This was for visiting mozilla.org, cnn.com, abcnews.com, usatoday.com. GetBuffer and GetUnicode were only called for 18% of the strings. So I think a non-contiguous buffer implementation could still be a win. I didn't count how many times GetBuffer/GetUnicode were called for the same string, but that would be easy to add. On another note... I must have been on crack when I reported the percentages for unmutated strings. For the 3 runs I listed above, the percentages are 33.74%, 37.94% and 28.62% respectively. And for the above run, 32%. Still a win, although not quite as spectacular as first reported.
P.S. My GetBuffer/GetUnicode analysis doesn't include places in the code that use mStr directly, so it's an upper bound. If there's a critical place in the code that uses mStr, then my numbers could be completely off.
I am posting the following set of recommendations to this bug to keep external developers informed of the direction in which we are heading. The main players in this impending change are already, as far as I know, all on the same page. Rick Gessner <email@example.com> recently sent out his recommendations---which I hope he will also post to this bug---which touch on the same themes. Rick's recommendations paint a fairly good picture of the new world. I still think the following evaluation is valuable, because it goes into detail as to what the actual changes are/should-be, and _why_ those particular changes are important. We want to move to a world where string clients can select from among a range of implementations that embody different implementation strategies, e.g., a general purpose string such as we have now, and specific-use implementations like an immutable string that optimizes allocations, and a segmented string that minimizes character copying over editing operations on very large datasets. These new goals impose new requirements on our current string interfaces. Any changes we make to the current interface must be source compatible with extant clients, or we must be willing to pay the penalty of updating callers. Note: our new goals fall out of our experiences using strings in our application. They differ significantly from our original goals (which were all about revealing the underlying implementation to clients for performance) and so none of these recommendations can or should be taken as a criticism of the current interface. Specific recommendations fall into several categories (note: these are not the the recommendations ... these are the categories). Very roughly in order of importance with respect to this effort: [A] removing from the interface any visible members that compromise the abstraction allowing different underlying representations, else clients won't be satisfied by alternate implementations [B] removing from the interface any routines that aren't specifically about manipulating the underlying representation, else alternate implementations must re-implement identical functionality [C] removing from the interface any i18n sensitive functionality, though mostly instances of this recommendation will be covered by [B] above [D] removing from the interface unused, unneeded, or unconventionally located functionality, to reduce the burden on alternate implementations, and to generally simplify [E] adding to the interface any support machinery needed to enable changes falling into one of the categories above, or simply to allow multiple implementations at all, e.g., |virtual| I believe rickg and I are already very much in agreement on these points. We discussed them at length, and his recent message on redesign notes echoes these sentiments. It is clear from his recent email messages that he has been focused on these same key issues. Here are my specific recommendations: 1 [A] Remove public inheritance from |nsStr|. Access to a specific underlying buffer representation is prohibitive to alternate implementations, e.g., a segmented string et al. It is also agreed that having any public data members is a political impediment to crossing XPCOM boundaries. According to rickg, visible inheritance from |nsStr| is not exploited heavily, and should be easily removed. This is arguably the most important thing we can do to enable further enhancements to our string implementations and uses. 2 [A] |GetUnicode| and |GetBuffer| impose a prohibitive burden on implementations in a multiple implementation world. As rickg points out, this is another reason to add iterators. Unfortunately, these two routines are very heavily used. 3 [E] Make the string interface abstract to allow multiple implementations. We were already paying for a vtable, so no extra space requirements are expected. The performance impact should be minimal. 4 [E] Split the abstract interface into layers encouraging read-only implementations, e.g., an immutable string 5 [BCD] Make narrowing/widening an explicit operation done by constructors. Do not allow implicit conversion in append and assign operations. Tests show that we are not exploiting the `double interface' of string very much, and this is good. Note that like |ToUpperCase| (et al) functionality mentioned below, encoding conversions are properly in the domain of i18n, and duplicating the functionality at the low-level in string is suboptimal. 6 [BDE] Either remove operator overloading from the abstract interface, or implement it conventionally, that is: non-virtual inlines using only the abstract signatures for |Append| and |Assign|. Implement |operator+=()| and |operator=()| as members; implement |operator+()| and relations as non-members. Virtual assignment operators must be written carefully to avoid slicing. 7 [BCDE] Remove |ToUpperCase|, |ToLowerCase|, |IsASCII|, |IsUnicode|, |IsSpace|, and |IsAlpha| from the interface. Of these, only |ToLowerCase| is heavily used, and i18n functionality like this must be pushed up into the i18n layer, where, coincidentally, this functionality already happens to exist. 8 [D] Remove (the little used) |ToNewString| from the interface. This functionality is already available in the form of the copy-constructor. In a multiple implementation world, the user will typically need to select a specific implementation, in any case. 9 [BCD] Remove |IsOrdered| and |BinarySearch| from the interface. These are not general purpose routines, and can easily be implemented outside the string class if they are deemed still needed. 10 [BCD] |EqualsIgnoreCase| and the |Compare| functions when the |aIgnoreCase| parameter is true are problematic just as the other i18n dependent routines are. Unfortunately, these routines are very heavily used. Again, they are a burden in a multiple implementation world. They should be implemented as non-members (based on extant i18n facilities) that use iterators into the underlying string ... which also implies that we will need string iterators. 11 [BCD] |ToFloat| and |ToInteger| should be removed from the interface. Parsing should not be part of the required functionality for multiple implementations. Given iterators, this functionality could be moved to a non-member implementation, which, in any case, is again requires i18n sensitivity. |ToFloat| is not heavily used. |ToInteger| is. Similarly, the |Append|s that format a float or an integer are i18n dependent. Some work may be required to provide similar functionality that is factored into the i18n support. 12 [D] We probably don't need the power to say something like myStr.Assign(yourStr).Append(herStr).Cut(20, 15); It makes sense with operators, but we may want to simplify the client interface with respect to named member functions. The |Assign|, |Append|, |Insert|, |Cut|, and |SetString|, signatures should be changed to return |void|. 13 [BCD] Turn the specialized modification and accessor functions |Trim|, |CompressSet|, |StripChar|, |StripChars|, |StripWhitespace|, |ReplaceChar|, |ReplaceSubstring|, and |CountChar| into non-member `algorithms' that can be applied to any implementation. 14 [DE] Given the current copying signatures of |Left|, |Right|, and |Mid|, they should probably be turned into non-member algorithms writing to an iterator as well. 15 [E] Add iterators. Several of the points above are eased or solved by the introduction of reasonable iterators. 16 [AD] Eliminate |nsSubsume[C]Str|. To much implementation knowledge is currently required to reasonably utilize this in clients, and it presents a burden to implementations to facilitate.
This is my primary focus at the moment.
fixing summary to better reflect our understanding
mass re-assigning to my new bugzilla account
Well, NEW_STRING_APIS is now switched on. The factoring is accomplished. And some new implementation exist to solve some problems. We need a replacement for XPIDL string; we need a COW implementation; we need to deploy the new implementations. I'm re-summarizing this bug for the work of measuring and deploying the new implementations.
Putting on nsbeta3 radar. warren say we really need to get this in for PR3, per beta2 PDT reviews.
Simon, can you add your recent profiling work to this bug? For everyone else, under discussion is the idea that `chunk' allocating strings has turned out to be a bigger source of wasted space than it has been a performance boon. (oops, too many cc's, Bugzilla is making me remove one to add simon. Sorry dp)
Adding newsgroup postings on string usage: <news:sfraser-F78927.firstname.lastname@example.org> It seems that nsString::SetCapacity() always buffers the string size in 64-character chunks - this logic lives down in nsStr::Alloc. So: nsString foo; foo.SetCapacity(1); foo.Assign('a'); will eat up 128 bytes of heap space. I have not found a way that I can set the capacity of an nsString to exactly the length I know is needed. This of course has quite an impact on bloat. Seeing this leads me to question how often we actually need to chunk changes in string length; what proportion of strings actually change length during their lifetime? My guess is that it's < 50%, which perhaps suggests that the normal behaviour should be to not round up string sizes, and that we should have an API that allows the caller to create a string with, or specify that an existing string is likely to change length frequently from now on. <news:sfraser-0720CE.email@example.com> Some data on the bloat that results from string chunking (recall, bloat = total memory every allocated, not a runtime high-water mark). Numbers are K. Test Allocated Used Waste % waste -------------------------------------------------------------- Simple browser 2938.00 1548.98 1389.01 47.28% Complex browser 5839.17 3214.73 2624.44 44.95% Mail 6232.82 3369.89 2862.93 45.93% So this chunking almost doubles the amount of memory that our strings use. <news:sfraser-D08810.firstname.lastname@example.org> In another post, waterson posed the question of how many strings with identical contents are allocated, and whether we could use atoms for these common strings. (He was, I think, talking about string usage in a particular module/API, but the question can be generalized.) So I put some debug code in nsStr::Destroy, that dumps out the contents of strings just before they are deleted, if aDest.mOwnsBuffer == PR_TRUE (which indicates that the buffer was heap-allocated). Some results are below. These results can be used to find places is the code the might benefit from shared strings, or cacheing of frequently used strings. Of course, I have no data on call sites here. The data look like this: 848 1 dummy:path '848' is the count (# strings with these contents), '1' is the character width (1 for char, 2 for PRUnichar), and the rest is the string itself. Test 1: Bring up browser, loading simple text-only HTML page, Quit. <http://www.smfr.org/mozilla/sortedstrings.txt> 848 1 dummy:path 510 2 true 333 2 file:///Other%20stuff/Documents/Mozilla/Users50/Simon/localstore.rdf 303 2 monospace 278 2 component://netscape/layout/element-factory?namespace=http:// www.mozilla.org/xbl 277 1 component://netscape/layout/element-factory?namespace=http:// www.mozilla.org/xbl 255 2 geneva 205 2 component://netscape/layout/element-factory?namespace=http:// www.mozilla.org/keymaster/gatekeeper/there.is.only.xul 200 1 component://netscape/layout/element-factory?namespace=http:// www.mozilla.org/keymaster/gatekeeper/there.is.only.xul 167 1 file:///Other%20stuff/Documents/Mozilla/Users50/Simon/localstore.rdf 159 1 file:///Bleeding%20Edge/Mozilla%20tree/src/mozilla/dist/viewer_debug/ 135 2 ISO-8859-1 123 2 UTF-8 117 2 serif 85 2 broadcaster 79 2 menuitem 78 2 menupopup 73 2 <string is a run of 28 spaces> 71 2 vertical 67 2 rdf:http://home.netscape.com/NC-rdf#Name Test 2: Bring up browser, surf to mozilla.org, tinderbox, bugzilla, load a bug, open prefs dialog. <http://www.smfr.org/mozilla/sortedbrowser.txt> 1797 2 monospace 1698 2 true 1007 2 serif 879 2 geneva 848 1 dummy:path 371 2 component://netscape/layout/element-factory?namespace=http:// www.mozilla.org/xbl 370 1 component://netscape/layout/element-factory?namespace=http:// www.mozilla.org/xbl 333 2 file:///Other%20stuff/Documents/Mozilla/Users50/Simon/localstore.rdf 322 2 ISO-8859-1 315 1 file:///Bleeding%20Edge/Mozilla%20tree/src/mozilla/dist/viewer_debug/ 266 2 vertical 248 2 component://netscape/layout/element-factory?namespace=http:// www.mozilla.org/keymaster/gatekeeper/there.is.only.xul 243 2 white 236 1 component://netscape/layout/element-factory?namespace=http:// www.mozilla.org/keymaster/gatekeeper/there.is.only.xul 176 2 \ 167 1 file:///Other%20stuff/Documents/Mozilla/Users50/Simon/localstore.rdf 159 2 UTF-8 128 1 css 125 2 never 121 2 black Test 3: Bring up browser, open mail-news, load 2 large IMAP folders (including one of bugzilla mail) <http://www.smfr.org/mozilla/sortedmail.txt> 7084 2 UTF-8 4785 2 us-ascii 2809 1 mozilla.org 2808 1 bugzilla-daemon 1304 2 true 1034 2 component://netscape/intl/unicode/decoder?charset=x-imap4-modified-utf7 1032 2 x-imap4-modified-utf7 1029 2 never 987 1 netscape.com 957 2 geneva 948 1 %S Receiving: message headers %lu of %lu 947 1 sfraser 941 2 Bugzilla 894 2 monospace 848 1 dummy:path 500 2 file:///Other%20stuff/Documents/Mozilla/Users50/Simon/localstore.rdf 488 2 autostretch 392 2 menuitem 358 2 component://netscape/layout/element-factory?namespace=http:// www.mozilla.org/xbl 357 1 component://netscape/layout/element-factory?namespace=http:// www.mozilla.org/xbl
Created attachment 12102 [details] [diff] [review] Here's the patch Doug used to generate this data...
Doug, I attached your patch and data to this bug ... which seems like the appropriate place. What conclusions can we draw from this data? There are certain very common strings, true, but this is not enough to know if they are candidates for being replaced with |nsShared[C]String|s, since we don't know how they were generated. What do you think? The other bug filed on this is bug #46738. I commented there as well. We should consider marking that bug either a duplicate or a blocker for this bug.
marking dependencies, turning this [officially] into a tracking bug for deploying new string implementations
giving up ancient string bugs to the new string owner. jag, you'll want to sort through these and see which ones still apply and go with or against the direction in which you intend strings evolve
This work is no longer relevant, strings have had a new implementation for a while now :-) If the new strings code needs performance tuning, please file new bugs.