Closed Bug 1330776 Opened 3 years ago Closed 5 months ago

Remove flat (null-terminated) strings


(Core :: JavaScript Engine, defect, P3)




Tracking Status
firefox72 --- fixed


(Reporter: Waldo, Assigned: jandem)



(Keywords: triage-deferred)


(1 file)

Strings are a pointer and a length.  Strings *can* contain '\0', so depending on null-termination in the general case can be lossy.  Ergo, people shouldn't depend on null termination, we shouldn't offer it, and they should use the explicit string length we have on hand for them to consult.

This also has the benefit of not requiring new-string allocations when a caller wants a null-terminated string, but the only thing handy is an arbitrary non-null-terminated string -- which is most any string that wasn't originally an atom.  Most callers can probably handle a pointer/length combo with a little massaging.  And the ones that can't?  Well, at least they're the only ones paying the cost of bespoke flattening, not everyone.

Searchfox sez there are ~265 uses of JSFlatString in the tree, so this will be a mini-slog, but definitely a good one.
Don't we depend on 3rd party libraries in places that use null terminated strings?
This is just inside the JS engine.  (For now, at least -- removing null termination from nsString would be wonderful, but out of scope here.)  If the JS engine itself needs to call something third-party that depends on null-termination, we can either fix it not to, or make a copy.  (And if the third-party code depends on null-termination, it's already broken in the presence of JS strings containing nulls.)

Note that ICU, as possibly the most notable third-party dependency, takes pointer-length combos in its APIs almost universally -- or at least in our use of it I remember that almost always being the case.
So I just looked at the DOM consumers.  There's some JSID_TO_FLAT_STRING and JS_ASSERT_STRING_IS_FLAT uses, mostly to call js::CopyFlatStringChars.  We could totally replace all that with linear strings, asserting things are linear, etc.

There's uses of JS_PutEscapedFlatString, but again that can totally be linear; the string is coming from JS_GetFunctionId to start with.  

There's the IsPermitted stuff, but again it doesn't care about flatness per se.  It just starts with a jsid and wants to call things like JS_FlatStringEqualsAscii and JS_GetFlatStringCharAt, both of which could totally work on linear strings.  Esp. because it just wants the char at index 0.

There's this lovely bit:

    char* propName =

(which then passes propName to nsPrintfCString; we could probably do something else here that never involved going through a char* at the expense of doing a bit more concatenation).  In any case, this is clearly not using flat strings for much of anything, except they're the only sort of string you can get out of a jsid!

That seems to be it in DOM code.  I'm happy to convert it all to better APIs once those exist.
This should also be a small win for ThinInline and FatInline strings because we will be able to store an extra character instead of \0. On 32-bit, a TwoByte ThinInlineString will be able to store strings with length <= 4 instead of the current <= 3. For Latin1 strings, we will be able to store 8 chars instead of 7.
Keywords: triage-deferred
Priority: -- → P3
Duplicate of this bug: 1584133
Summary: Remove flat strings → Remove flat (null-terminated) strings
Depends on: 1586683

Stealing.. I have this working in the JS shell; will start posting patches tomorrow.

Assignee: jwalden → jdemooij
Depends on: 1586991
Depends on: 1588810

It's mostly green at this point:

Still some minor known issues to fix the coming days and I need to do more auditing. We don't want to land this before the merge next week anyway. A ~10 KB memory reduction for "Base Content JS opt" and diffstat of +386/-660 (no more "undepended" strings for example).

Oh, wow. I was intending to get to this eventually (and had some starting patches for it), but postponed it and in fact intended to eliminate undepended strings in a different way first. This is great!

JS strings can contain null bytes so relying on null-termination generally isn't
the best idea. The (relatively few) places that did rely on null termination
have been fixed separately to copy their characters to a new buffer.

This saves memory and avoids malloc/free for strings that now fit inline
(because this bumps the length limits for thin-inline and fat-inline strings).

It also simplifies dependent strings and external strings because we can remove
both JSUndependedString and JSExternalString::ensureFlat.

Pushed by
Stop null-terminating JS strings, remove JSFlatString. r=jwalden,sfink
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
Regressions: 1591051

== Change summary for alert #23549 (as of Thu, 24 Oct 2019 10:37:32 GMT) ==


0.32% Base Content JS windows7-32 opt 3,124,744.00 -> 3,114,776.00
0.32% Base Content JS windows7-32-shippable opt 3,124,744.00 -> 3,114,776.00
0.32% Base Content JS windows7-32-shippable opt 3,124,738.67 -> 3,114,776.00

For up to date results, see:

You need to log in before you can comment on or make changes to this bug.