Closed Bug 578205 Opened 11 years ago Closed 11 years ago

Keep string characters inline for small strings

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: alangpierce, Assigned: alangpierce)

References

(Blocks 1 open bug)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 2 obsolete files)

For small strings (<= 6 jschars on 32 bit, <= 12 jschars on 64-bit), we can avoid some malloc overhead by keeping the chars inline in the string header.

After the ropes patch (see bug 571549), here are some statistics on how often initFlat is called in all of sunspider (this is probably the same as the number of actual flat strings):

Length:        Number of initFlat calls:
<= 6           12266
>= 7, <= 12    14347
>= 13, <= 32   8509
>= 33, <= 150  9724
>= 150         129
Total          44975
Blocks: JaegerSpeed
Dup of bug 553541 (which was a dup of an older bug).

/be
Dup either way, I don't care -- good to fix, hope we can avoid losing bugs like this in the future.

/be
Assignee: general → apierce
Duplicate of this bug: 553541
Blocks: 553824
It turns out that this is harder than we thought. The plan was to keep the invariant that any string shorter than some fixed length would be a short string, and a short string must keep its chars inline. Ideally, callers would be changed to find ways around passing in a malloced buffer. If a malloced buffer must be passed in, though, we can keep the invariant by copying from the buffer and then freeing it. On GC, we would know that short strings don't have a buffer that needs to be freed.

I'm pretty sure JS_NewExternalString doesn't work with this model. In order to follow the API, we need to call the finalizer on the buffer when the string is GCed (we cause a refcounting bug in at least one call site if the finalizer is called during NewExternalString, so we can't free the buffer early), so we need to keep track of the pointer, probably in the string header. Losing a pointer of space means we can only hold strings of length 4 (or maybe 3) on 32-bit x86, which probably isn't worth it. It also seems bad to waste a pointer for a stupid special case like this.

I think a better approach is the one from bug 553541, where we allocate a double-size GC thing for these strings, and keep the string contents in the dummy string object (and possibly in the two unused pointers in the normal string header). That way, we can just have a mChars pointer that (except in the case of ropes) always points to the string characters.
(In reply to comment #4)
> In order to
> follow the API, we need to call the finalizer on the buffer when the string is
> GCed (we cause a refcounting bug in at least one call site if the finalizer is
> called during NewExternalString, so we can't free the buffer early)

Cannot we fix all the users NewExternalString not to depend on this or at least explicitly check if a string is short and avoid using NewExternalString if so?

If this is not that easy, what about just adding short external chars to a buffer and releasing them during the next GC?
Yeah, if we can copy something cheaper than we can refcnt it, we should do that, make a local copy and don't refcnt up (which requires an API change).
Is there a better API for external strings that would be easier?  it's not a complex API, so consumers should be able to update pretty easily.
As far as I can tell, the only thing that would break is XPCStringConvert::ReadableToJSVal in xpcstring.cpp. It creates a string, and then adds a ref on success. Calling the finalizer under JS_NewExternalString would decrement the refcount before it is incremented, so we would free to early and then try to use the freed memory.

We could fix this without changing any function signatures by moving the addRef call to before NewExternalString and then releasing on failure. Alternatively, we could avoid the increment/decrement by giving back a flag that says whether or not the string was actually used. It would then be the caller's responsibility to free the buffer.

I still think that keeping a double-size header for short strings (and have mChars point to the right place in the string header) is better though, since it avoids a branch in chars() (which is heavily-used) and lets us use strings with length up to 12 on 32-bit. It's also safer and less error-prone, since it doesn't add any string invariants; it just sometimes changes the way strings are created.

Of course, if avoiding refcounting in ReadableToJSVal is important, we could do both.
(In reply to comment #8)
> As far as I can tell, the only thing that would break is
> XPCStringConvert::ReadableToJSVal in xpcstring.cpp. It creates a string, and
> then adds a ref on success.

This is just a bug in that code: it should hold first, drop only on error.

/be
Status: NEW → ASSIGNED
Attached patch Patch (obsolete) — Splinter Review
This is based off of the backed-out patch from bug 553541, with a few changes. I fixed a bug with the original version (js_InflateStringToBuffer was being called incorrectly), and made it create short strings in js_ConcatStrings, and hopefully made the usage of short strings a bit cleaner.
Attached file SunSpider before/after
Attachment #458484 - Flags: review?(gal)
Comment on attachment 458484 [details] [diff] [review]
Patch

>+inline void
>+js_short_strncpy(jschar *dest, const jschar *src, size_t num)
>+{
>+    for (size_t i = 0; i < num; i++)
>+        dest[i] = src[i];
>+}
>+

I think this wants a switch with specialized cases:

switch (num) {
case 1: *dest = *src; break;
case 2: *(uint32 *)dest = *(uint32 *)src; break;
etc
}

Also, a bunch more asserts would be nice, i.e. that short_strncpy is only called for known, short nums.

> /*
>  * Return s advanced past any Unicode white space characters.
>  */
> static inline const jschar *
> js_SkipWhiteSpace(const jschar *s, const jschar *end)
> {
>     JS_ASSERT(s <= end);
>     while (s != end && JS_ISSPACE(*s))
Attachment #458484 - Flags: review?(gal) → review+
Attached patch Patch (obsolete) — Splinter Review
Attachment #458484 - Attachment is obsolete: true
This patch is on top of the one from bug 578189, so that one needs to get pushed before this one.
Depends on: 578189
Attached patch Rebased patchSplinter Review
Attachment #459620 - Attachment is obsolete: true
Blocks: 584493
http://hg.mozilla.org/mozilla-central/rev/096eadc75555
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.