Open Bug 686989 Opened 13 years ago Updated 2 months ago

optimize JSString-to-nsAString conversion by having nsAString ref-count the JSString

Categories

(Core :: XPConnect, defect)

defect

Tracking

()

People

(Reporter: luke, Unassigned)

References

Details

(Whiteboard: [MemShrink:P2])

When marshalling a string parameter from JS to C++, we currently just create a new nsString which makes a new copy of the chars.  Ideally, we'd just pass the JSString to C++ directly without copying.  For "fixed" JSStrings (flat, null-terminated, not extensible -- the "stable" state of a string), we have a free word in the header.  We can use this to embed a reference count that is maintained by nsAString ctor/copy/assign/dtor just like normal string buffers.  Thus, in JS GC, we'd only sweep a string if it wasn't marked AND its ref count was 0.

When converting an nsAString to JS:
 - if the nsAString points to a (same-compartment) JSString, we can just pass that to JS
 - failing that, we create an external string; since its the same chars, we should be able to change the representation of the nsAString to reference this new external string, thereby optimizing future to-JS conversions of the same nsAString.

Naively, this requires the single-threadification of C++ strings.  sicking nominates jlebar ;)

bz's idea is to have a new nsAString subclass nsSingleThreadString (or something) whose constructor sets a bit in the flags "I am only used on a single thread".  When converting a JS string to a C++ nsAString outparam, only nsAString's with this bit set would receive the JSString, others would have to copy.  Furthermore, when copying an nsAString, if the source referenced a JSString and the destination didn't have the "single-thread" bit, there would be a clone.  This could be a shorter path than making C++ strings single-threaded although we'd have to be careful not to introduce cases where we do more cloning.  Also the optimization would only apply so far as we use nsSingleThreadString.
Hmm.  So right now all nsAStrings have a PRUnichar* mData member which points to the actual string data.  This would need to change to a union or something, right?
No, we'd still have that. But we'd also have a pointer to a separately allocated object which acts as root for the JS string. This separate object is located from from a thread-specific arena of string-roots so the allocation should be fast.

This means that we either have to merge flags and length into a single word, or increase the size of nsString to 4 words.
Hmm.  On 64-bit, at least, merging would work fine.  On 32-bit, we may have to take the extra one-word hit.  :(
Though, with enough smarts we might not need to. First off, between the length member and the root pointer we have at least 3 bits free for flags. 4 if we can ensure that the root object is always located at a 2-word boundary. Second, the the root is used we shouldn't need many flags. When it isn't used we'll have plenty of bits for flags.
Jonas: things have changed a bit since we last talked.  There is no root needed (nor pool to hold said roots): comment 0 proposes ref-counting the JSString header directly from the nsAString ctor/cctor/op=/dtor/etc and teaching the JS GC to respect this.

If length can be merged with flags (28 bits of length, 4 bits of flags, just like JSString :), then one word points to chars, one word points to the JSString header; that's 3 words total.
Oooh, I like this idea even more! Sorry for assuming that things hadn't changed since our conversation.

I don't however like the idea of having a subclass of nsAString which is single-threaded. This would require knowing on a callee side if a given nsAString is single threaded or not before passing it across threads.

I think we should simply make our strings single threaded instead and just have specific code to deal with passing strings across threads.
(In reply to Jonas Sicking (:sicking) from comment #6)
> I think we should simply make our strings single threaded instead and just
> have specific code to deal with passing strings across threads.

Me too, just recording ideas for completeness.
Blocks: 697332
Bobby, we're interested in knowing how much string data we're actually holding onto because of this. Could you look into instrumenting this code to add a number to about:memory for this? If we're not holding on to a huge amount of data here then I don't think we need to worry about fixing this from a memory point of view, but this might still be significant from a performance point of view.
Assignee: nobody → bobbyholley+bmo
Whiteboard: [MemShrink]
(In reply to Johnny Stenback (:jst, jst@mozilla.com) from comment #8)
> Bobby, we're interested in knowing how much string data we're actually
> holding onto because of this. Could you look into instrumenting this code to
> add a number to about:memory for this?

Is this the same as what njn was asking about? As I mentioned in bug 697041 comment 3, I don't think it would be very fruitful to add memory reporters for xpconnect string data (unless we care about, and can reliable track, peak usage). The string data is very ephemeral (it has stack scope), so what we'd see would depend entirely on what was on the stack when the memory analysis was done. Again, if we're just interested in peak usage, this is a different story.

> this might still be significant from a
> performance point of view.

I think this is really the main reason to do this. I can work on it (including the single-threadification of strings) if we think it's important enough.
We're just trying to get a handle on how much of it is really ephemeral, and how ephemeral it is.  If they are all stack-scoped, and the calls are not that long, then I think that's good enough to know that we don't need to worry about this for memory usage.
(In reply to Bobby Holley (:bholley) from comment #9)
> Is this the same as what njn was asking about? As I mentioned in bug 697041
> comment 3, I don't think it would be very fruitful to add memory reporters
> for xpconnect string data (unless we care about, and can reliable track,
> peak usage). The string data is very ephemeral (it has stack scope), so what
> we'd see would depend entirely on what was on the stack when the memory
> analysis was done. Again, if we're just interested in peak usage, this is a
> different story.

The interesting case here is when we're calling from JS into C++, where we can *not* share string data because we can not share the underlying string buffer past the lifetime of this call, but come to think of it, that's not something that we can actually measure in XPConnect. When we call from JS into C++, we wrap string data in dependent strings, and those strings can not share their underlying buffer, so if the called C++ needs to hold on to that string we're forced to copy at that point. But I don't think whether that happened is detectable in the XPConnect code, nor do I think a memory reporter is the right thing there.

Telemetry for string usage in general could be useful, and as part of that the amount of string data that's copied due to the underlying string buffer not being sharable could be a useful piece of information to have, but that's not this bug.

So yes, this bug is purely about the performance impact of the string handling here, not the memory usage.
A few notes:

1)  There _is_ a case we can measure in xpconnect: JS strings coming in that are actually wrapping an XPCOM string buffer.  Right now we still do the non-sharing dependent thing; we could see whether sharing those would help much.

2)  This is mostly about the performance impact, but there might be memory impact. Hard to say.
Whiteboard: [MemShrink] → [MemShrink:P2]
This is not something I'm going to get to any time in the reasonable future.
Assignee: bobbyholley+bmo → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.