Closed Bug 416411 Opened 14 years ago Closed 3 years ago

Analyze mozilla's usage of strings, dynamically and statically


(Firefox Build System :: Source Code Analysis, defect)

Not set


(Not tracked)



(Reporter: benjamin, Assigned: dmandelin)



(3 files)

To make some decisions about Mozilla2, we want to gather some data about Mozilla's usage of strings. I have written up a short proposal and some analyses at

I have still promised roc a more concrete set of possible string changes that we want to gather data against (prove/disprove), but I'm filing this so that dmandelin can start looking at it, poking holes, and asking questions.
For the dynamic analysis, I think I've figured out how to instrument the code to keep track of the strings. All it does is keep the set of nsStringBuffers in a hash table, logging when anything is allocated or deallocated. I attached my code as a patch in case anyone feels like looking to see if I made any obvious mistakes.

More importantly, I have a couple questions about nsStringBuffer contents, for the UTF8 size and ASCII-only measurements. 

1. I believe not all nsStringBuffers contain null-terminated data. Is there some easy place to look for a content size, or do I have to instrument the string classes that use non-null-terminated nsStringBuffers?

2. I'm not sure how to measure what the buffer size would be with UTF8 strings. One idea is to look at the content and see how much space that content takes encoded as UTF8, but that's not the buffer size. I could work up an estimate by looking at the content size ratios between now and UTF8 and assuming that buffer sizes would be scaled down proportionally. The other solution that comes to mind is to log all the string operations that affect buffer size. Then, the operations could be played back (offline, perhaps) using different assumptions about how buffer sizes will be adjusted with UTF8.
Here are some preliminary results. The test suite is starting Minefield and letting it load the default start page. To classify the contents as ASCII or UTF-16, I interpret them both ways, and take whichever gives a longer string, which seems to work. (The common cases are: the string is ASCII, so the ASCII version comes out twice as long; or the the string is UTF-16, so the ASCII version comes out length 1.)

There are 7134 instances of nsStringBuffer. 34.6% (2471) are ASCII. Here is a table of some sizes, giving first total size, second average size per nsStringBuffer. Explanation of the items follows.

  Content Size          310,764     43.6
  Storage Size          362,248     50.7
  *UTF8 Content Size    205,476     28.8
  *UTF8 Storage Size    217,213     30.4

Content Size is the number of bytes of data in the buffer, including string characters and the null terminator, but nothing that comes after that. 

Storage Size is the allocated size of the buffer, nsStringBuffer::mStorageSize.

*UTF8 Content Size is the size of the content if encoded in UTF8.

*UTF8 Storage Size is the storage size required if strings are encoded in UTF8, assuming that the allocated buffer would be smaller in proportion to the ratio of the UTF8 Content Size to the original Content Size.

The *s indicate that the numbers in that row are hypothetical and don't represent measures of real data inside FF. But UTF8 Content Size is an exact result.

A few ratios: The estimated UTF8 Storage Size is 60% of the current Storage Size. Estimated UTF8 Content Size is 66% of current Content Size. Current storage efficiency (content/storage) is 86%. Estimated UTF8 efficiency is 95%.
These are memory usage (bytes taken up for string character data--not counting headers or containers) for different kinds of strings taken as a snapshot after opening some pages. The UTF8 and UTF16 columns give estimated memory usage if that encoding was used exclusively. The test suite title gives language + #of pages opened for that test case.

I'm a little bit surprised at aspects of the CJK results--I'm going to look into them a little more closely.
Wow, this is great work! At some point you should write it up and at least post it to your blog, if not somewhere more formal.

Can you say more about the test suites? E.g. how many pages were open when you did the eng80 snapshot (80?), and which pages?

You probably don't need the UTF-16 results, I'm pretty sure we're not going there. Useful to have in our pocket, I guess.

These wins for UTF8 are bigger than I thought.

The CJK results are very interesting but I'm actually not surprised. Hypotheses:

-- Text fragment storage goes down because a lot of the DOM text fragments contain both CJK characters (forcing us to 16-bit storage) but also ASCII whitespace and punctuation?

-- The stringbuffer results don't change vs English/Euro because almost all of the page text is in DOM nodes and doesn't escape to persistent strings.
Actually thinking about the CJK results a bit more, I am a little surprised we got a significant win in DOM text fragments, since every CJK character gets 50% bigger with UTF8. IRC speculation suggests maybe there are mostly-ASCII test nodes (scripts or styles) with a few CJK characters in them (comments, font names???)?
(In reply to comment #5)
> IRC speculation suggests maybe there are mostly-ASCII test
> nodes (scripts or styles) with a few CJK characters in them (comments, font
> names???)?

I dug into the data (I logged the content of all strings in the snapshot) and found exactly that. Btw, the CJK test reported on here was front-page-heavy, so maybe the script/style-to-text ratio was high, but I reran with a set of longish Wikipedia articles and got similar results.

Here's a quick note with some stats on dynamic string conversions (i.e., profiling data on calls to AppendUTF8ToUTF16 and AppendUTF16toUTF8). I didn't want to spend too much time on it because we're not too interested in changing encodings at this point--this is really just a quick check on that decision for the conversion time cost aspect. If anyone needs more numbers or a broader test suite, let me know.

I profiled the 4 encoding conversion functions (there are versions with nsAString and char * input for each direction) while loading the 20 English pages used in my memory consumption tests. 

First, I profiled the browser in Shark while loading some pages and clicking on things. Those functions appeared in the list, but at the 0.0% level--fewer than 10 calls out of 10k+ calls sampled.

Second, I used dtrace to measuer the number of calls and the time taken for loading my 20 English pages (including using the menu bar to do that). Time taken is elapsed time from entry to return on that thread according to dtrace timestamp. I also profiled SearchTable as a sanity check: according to Shark, it is one of the top functions.

Function                     # calls    total time (s)     time/call (us) *

nsAString  UTF16 -> UTF8       15191       0.068              4.4
ushort *   UTF16 -> UTF8          14       0.0002            15
char *     UTF8  -> UTF16      13877       0.054              3.9
nsACString UTF8  -> UTF16        200       0.003             15
SearchTable                   656961       2.43               3.7

  * "us" == "microseconds"

I don't think these time values should be taken too seriously--the whole operation probably only took a few seconds, and going by Shark, SearchTable takes up about 1% of total time. I'd bet there's a lot of tracing overhead included in these times.

Instead, I'd trust Shark that SearchTable takes about 1% of the time, and then according to dtrace there is about 1 conversion per 20 SearchTables, and the times per call are about the same, so string conversion might be taking up about .05% of total time, which is roughly consistent with the direct Shark results for string conversions.
Out of curiousity, how many non-converting different-encoding compares (like the ones the atom table does) do we do on that workload?  I wonder how much faster those would be if the string we're atomizing were in the same encoding as the atom string....
(In reply to comment #8)
> Out of curiousity, how many non-converting different-encoding compares (like
> the ones the atom table does) do we do on that workload?  I wonder how much
> faster those would be if the string we're atomizing were in the same encoding
> as the atom string....

I tried some similar tests with Shark and found about 1/3000 or 1/4000 samples to be CompareUTF8toUTF16. The opposite direction didn't show up.
Assignee: dmandelin → nobody
Assignee: nobody → dmandelin
In this comment I have a few nsAutoString stats. I'm not exactly sure what bsmedberg was asking for, but I'm guessing that he wanted to know what fraction of nsAutoString dynamic allocations really save a heap allocation. It's tricky to define that property precisely enough for measurement. For now, I measured the fractions of nsAutoStrings that (a) are later passed as an argument to nsAString.Assign (i.e., copied to another string, presumably on the heap) or (b) reallocate their memory on the heap because the string grew too long. In theory, nsAutoStrings under (a) possibly could have been created as regular nsAStrings and passed directly without copying (assuming sharing is OK in that case). nsAutoStrings under (b) are probably fine to stay nsAutoStrings, but if there are a lot of them, then it means it's not really helping much to allocate them on the stack.

I used 2 test suites, ENG20Nav (20 pages clicked from popular English-language sites) and CJK24WP (24 CJK Wikipedia articles). The key results:

                 % assigned (a)          % reallocated (b)
ENG 20 Nav            2.3                      1.4
CJK 24 WP            25                        1.0

Most of the numbers are low, which is expected and accords with the rationale for creating nsAutoString in the first place. The surprising one is the 25% that were assigned to nsAStrings in the CJK test. I have no idea why that happened, including whether it's a bug in my profiling technique. I can look into it further if it seems significant.
My string fu isn't quite as strong these days, but generally speaking heap allocated buffers can be shared, while stack buffers would require a copy. So if we're actually seeing 25% (rounding up 25% of 99% stack) of nsAutoString buffers being copied into another string when the buffer could potentially have been shared (we should be able to detect that) then for those cases it might make sense to convert the  nsAutoString to nsString. Though the nsAutoString might be acting as a sort of StringBuilder, in which case we'd want to tell the nsString to pre-allocate a certain sized buffer.

On the other hand, if the nsAString.Assign() is directly followed by nsAString.Append() or something else that modifies the buffer, and for a shared buffer would force a second copy to be made, then for those cases it's awash.

I'm curious how well the current allocation strategy performs, and also whether there are hot-spots where pre-sizing the string's buffer makes a big difference.
The other possibly interesting case is when an nsAutoString is written to exactly once, and by assigning from a string whose buffer it can share.  In that situation, the stack buffer is unused.  I'm not sure whether the cost of setting up that buffer is something we need to worry about, though...
Oooh, didn't know nsAutoString would prefer to share the buffer over copying it into its stack buffer. Makes sense though.

Why written to exactly once? Wouldn't two consecutive Assign()s both have the nsAutoString share the other string's buffer? I don't think the cost of setup is interesting though.
Well, more precisely I meant "written to via Assign() and then not modified".
Dave, can you figure out where those "25% assigned" assignments are coming from? That's quite remarkable and it may indicate something that we should know about CJK pages ... maybe there's something we can optimize there.

It would be useful to have some perspective on how many heap allocations are being saved by nsAutoString. I'm not sure what to measure though ... perhaps the number of nsAutoStrings compared to the total number of strings? Hmm.
It shouldn't be that hard to "turn off" nsAutoStrings and get some idea of the difference in allocations. Just change the typedef to be nsTString_CharT in string-template-def-{,uni}char.h
I found a reduced test case where about 20% of ~110,000 nsAutoStrings are assigned to an nsString: loading

The highest frequency contexts for the assign are below, in this format:

[freq] [sourceline]
      [source code]
      [additional info on stack trace]

  freq = % of nsAutoString assigns accounted for in this item
  sourceline = source line of deepest non-libxpcom call on stack at assign
  source code = code of that line

44% modules/libpref/src/nsPrefBranch.cpp:343 (in nsPrefBranch::GetComplexValue)
      rv = theString->SetData(NS_ConvertUTF8toUTF16(utf8String));
      [Mostly called from AppendPrefFonts via CopyUnicharPref.]

15% gfx/thebes/src/gfxFont.cpp:671 (in ResolveData::ResolveData)
      [Mostly called from AppendPrefFonts via gfxFontGroup::ForEachFont.]

15% gfx/thebes/src/ 
      (in gfxQuartzFontCache::GenerateFontListKey)
      aResult = aKeyName;
      [Mostly called from AppendPrefFonts via gfxFontGroup::ResolveFontName.]

15% gfx/thebes/src/gfxPlatform.cpp:253 (in AppendGenericFontFromPref)
      [Mostly called from AppendPrefFonts.]

Overall, 88% of the nsAutoStrings that got assigned to another nsString had a call to AppendPrefFonts when Assign was called.
Filed comment 17 as bug 418536 because it looks like we could get some cheap wins.
> 44% modules/libpref/src/nsPrefBranch.cpp:343 (in nsPrefBranch::GetComplexValue)

Yoinks.  Is there any way we can just reduce the number of calls to this altogether?  That code is not anything resembling "fast".  The string copy is possibly the least of our worries, on top of the object creation, etc. that that code entails.  Probably worth a separate bug.

> 15% gfx/thebes/src/gfxFont.cpp:671 (in ResolveData::ResolveData)

I just looked at the consumers, and it looks to me like mGenericFamily could just be a |const nsACString&|.  Then we'd avoid the string copy altogether.  Probably worth a separate bug as well.

> 15% gfx/thebes/src/ 

This one is tough, since we copy and immediately modify.  This might or might not work better with the two-arg form of ToLowerCase.  It also might or might not work better if callers can sort out whether they already have a lowercased string...

> 15% gfx/thebes/src/gfxPlatform.cpp:253 (in AppendGenericFontFromPref)

The code here should be more like:

  NS_ConvertUTF16toUTF8 genericDotLang(genericName);

That would save an object construction and a string copy.

There's similar code above:

  279         genericName = NS_ConvertASCIItoUTF16(aGenericName);

That should likely be:

  CopyASCIItoUTF16(aGenericName, genericName);

But again, if we're spending any sort of serious time in AppendGenericFontFromPref, we would be better served by caching the pref values somehow.  Separate bug on all this too?
I'm starting up on the task of statically analyzing string mutation/immutability. See Roughly, the idea is to see how much work it would be to switch to an API where strings are immutable, or actually produce a patch to make the switch. 

To try to understand better how strings are mutated in FF, I prepared the attached list of call sites of non-const methods of nsAString and nsACString. (I would have liked to use MXR, but there are too many similarly named methods in other classes).
Attachment #305108 - Attachment mime type: application/octet-stream → application/zip
Product: Core → Firefox Build System
I found this old bug while spelunking. I suspect it is sufficiently out of date we can close it?
Flags: needinfo?(n.nethercote)
Yeah... profiling data from 10 years ago probably isn't that useful now, and the instrumentation patch is small without containing anything particularly notable.
Closed: 3 years ago
Flags: needinfo?(n.nethercote)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.