Closed
Bug 785551
Opened 12 years ago
Closed 12 years ago
Do not use internal string pointers in the frontend
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla18
People
(Reporter: terrence, Assigned: terrence)
References
Details
(Whiteboard: [js:t])
Attachments
(2 files, 4 obsolete files)
4.14 KB,
patch
|
Details | Diff | Splinter Review | |
66.61 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
The compiler frontend needs to work with raw jschar* for speed. If these jschar* are |JSString::isInline()|, then a moving gc triggered during compilation could invalidate the jschar*s we have stashed away all over the frontend. Since the vast majority of compilations are going to be for programs larger than our max inline size, it makes sense to just check if we are inline before compiling and make the string not inline in the rare case that it is.
I've split this off from bug 780309 because this is orthogonal to the changes there and it will be easier to test separately given the expensive asserts we require to check that we've caught all the users.
Assignee | ||
Comment 1•12 years ago
|
||
Assignee | ||
Comment 2•12 years ago
|
||
This patch + assertions is green on try:
https://tbpl.mozilla.org/?tree=Try&rev=f8e4f625b108
Attachment #656096 -
Flags: review?(luke)
Comment 3•12 years ago
|
||
Comment on attachment 656096 [details] [diff] [review]
v0
Review of attachment 656096 [details] [diff] [review]:
-----------------------------------------------------------------
Looking generally good.
::: js/src/jsapi.cpp
@@ +4666,5 @@
> }
> + if (JSID_IS_STRING(*idp)) {
> + JSString *str = JSID_TO_STRING(*idp);
> + if (!str->ensureNotInline(cx))
> + return false;
It seems like this one shouldn't be necessary: JSAPI callers should be using APIs that ensureNotInline when chars are requested.
@@ +6119,5 @@
> assertSameCompartment(cx, str);
> + JSFlatString *flat = str->ensureFlat(cx);
> + if (!flat)
> + return NULL;
> + JSNotInlineString *notInline = flat->ensureNotInline(cx);
Why do these two use a two-step ensureFlat->ensureNotInline while the one below only does ensureNotInline? The latter seems preferable.
@@ +6177,5 @@
>
> extern JS_PUBLIC_API(const jschar *)
> JS_GetFlatStringChars(JSFlatString *str)
> {
> + JSNotInlineString *notInline = str->ensureNotInline(NULL /* FIXME */);
These APIs were added recently and don't have a ton of uses. They are mostly for a few cases where we want getting chars to be infallible which exactly what this violates. I think we should just change JSAPI here and make up some new opaque never-defined type JSInfallibleString and use that instead of JSFlatString in JSAPI.
::: js/src/jsatom.cpp
@@ +342,3 @@
> return NULL;
> + const jschar *chars = notInline->chars();
> + size_t length = notInline->length();
Can you add a \n before these (after return NULL) and remove the \n after?
::: js/src/jsclone.cpp
@@ +808,3 @@
> return false;
> + size_t length = notInline->length();
> + const jschar *chars = notInline->chars();
ditto
::: js/src/jsfun.cpp
@@ +621,3 @@
> if (!src)
> return NULL;
> + const jschar *chars = src->chars();
ditto
::: js/src/json.cpp
@@ +70,5 @@
> } else {
> linear = cx->runtime->atomState.typeAtoms[JSTYPE_VOID];
> }
> JS::Anchor<JSString *> anchor(linear);
> + JSNotInlineString *notInline = linear->ensureNotInline(cx);
Can you change this to ensureNotInline higher up?
::: js/src/jsreflect.cpp
@@ +3370,3 @@
> return JS_FALSE;
> + const jschar *chars = notInline->chars();
> + size_t length = notInline->length();
ditto
::: js/src/vm/String.h
@@ +25,5 @@
> class JSExternalString;
> class JSLinearString;
> class JSFixedString;
> +class JSNotInlineString;
> +class JSInlineString;
I was thinking about these names more. not-x names are less than ideal since it leaves the reader wondering 'well, what *are* they?'. How about JSStableString? Also, could you make this new string type fit in with the rest of String.h? That is
- update the ascii-art hierarchy (JSStableString would derive JSFixedString)
- add a row in the comment below (for instance encoding and subtype predicate)
Also, "invariants+properties" comment for JSFixedString is now wrong, it should be the comment for JSStableString; JSFixedString should say "permanently null-terminated".
(By the way, are there any remaining uses of JSFixedString? It seems like anyone who used to want a JSFixedString would want a JSStableString.)
Attachment #656096 -
Flags: review?(luke)
Comment 4•12 years ago
|
||
> How about JSStableString?
"Stable" doesn't make me think "opposite of inline". "OutOfLine"? "Separate"? "Split"? "Long"? Not sure...
> Also, could you make this new string type fit in with the rest of String.h?
> That is
> - update the ascii-art hierarchy (JSStableString would derive JSFixedString)
> - add a row in the comment below (for instance encoding and subtype predicate)
Yes please! I live and die by those comments.
Comment 5•12 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #4)
> > How about JSStableString?
>
> "Stable" doesn't make me think "opposite of inline". "OutOfLine"?
> "Separate"? "Split"? "Long"? Not sure...
I don't think the goal is to say the opposite of inline but, rather, the positive properties that this type of string has which is, in this case, the stability of the address of the chars. "Stable" is a bit ambiguous because the string header itself moves, just not the chars. I was thinking of JSStableCharsString, but it was longer.
> Yes please! I live and die by those comments.
me too :)
Comment 6•12 years ago
|
||
I see. Ok, "Stable" sounds ok to me.
Assignee | ||
Comment 7•12 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #3)
>
> Looking generally good.
Thank you for the review.
> ::: js/src/jsapi.cpp
> @@ +4666,5 @@
> > }
> > + if (JSID_IS_STRING(*idp)) {
> > + JSString *str = JSID_TO_STRING(*idp);
> > + if (!str->ensureNotInline(cx))
> > + return false;
>
> It seems like this one shouldn't be necessary: JSAPI callers should be using
> APIs that ensureNotInline when chars are requested.
This one surprised me too. The caller that tripped this is at ctypes/CTypes.cpp:2318, in the middle of a 362 line function. I had assumed that we simply did not ever hand out the offending atom in *idp through another method. Do you see an efficient way that we can avoid this hack?
> @@ +6119,5 @@
> > assertSameCompartment(cx, str);
> > + JSFlatString *flat = str->ensureFlat(cx);
> > + if (!flat)
> > + return NULL;
> > + JSNotInlineString *notInline = flat->ensureNotInline(cx);
>
> Why do these two use a two-step ensureFlat->ensureNotInline while the one
> below only does ensureNotInline? The latter seems preferable.
Internal confusion about when exactly a string is flat -- fixed now.
> @@ +6177,5 @@
> >
> > extern JS_PUBLIC_API(const jschar *)
> > JS_GetFlatStringChars(JSFlatString *str)
> > {
> > + JSNotInlineString *notInline = str->ensureNotInline(NULL /* FIXME */);
>
> These APIs were added recently and don't have a ton of uses. They are
> mostly for a few cases where we want getting chars to be infallible which
> exactly what this violates. I think we should just change JSAPI here and
> make up some new opaque never-defined type JSInfallibleString and use that
> instead of JSFlatString in JSAPI.
I'll take a stab at it. I'm actually rather distressed to see that a FIXME slipped in: I really thought I had killed all of them when I made ensureNotInline take a maybecx.
> ::: js/src/vm/String.h
> @@ +25,5 @@
> > class JSExternalString;
> > class JSLinearString;
> > class JSFixedString;
> > +class JSNotInlineString;
> > +class JSInlineString;
>
> I was thinking about these names more. not-x names are less than ideal
> since it leaves the reader wondering 'well, what *are* they?'. How about
> JSStableString? Also, could you make this new string type fit in with the
> rest of String.h? That is
> - update the ascii-art hierarchy (JSStableString would derive JSFixedString)
> - add a row in the comment below (for instance encoding and subtype
> predicate)
_,_
;'._\
';) \._,
/ /`-'
~~( )/
)))
\\\
I shall ensure that none escape.
Is your preference for ensureNotInline or ensureStable: e.g. the property we wish to guarantee or the type we wish to convert to?
> (By the way, are there any remaining uses of JSFixedString? It seems like
> anyone who used to want a JSFixedString would want a JSStableString.)
js_NewString* and several caches return it. In the case of the DtoaCache, we know that these strings are explicitly _not_ JSStableStrings. We could probably use JSShortString for the DtoaCache. I'll look harder at the other users after I've updated this patch.
Comment 8•12 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #7)
> > It seems like this one shouldn't be necessary: JSAPI callers should be using
> > APIs that ensureNotInline when chars are requested.
>
> This one surprised me too. The caller that tripped this is at
> ctypes/CTypes.cpp:2318, in the middle of a 362 line function. I had assumed
> that we simply did not ever hand out the offending atom in *idp through
> another method. Do you see an efficient way that we can avoid this hack?
Take away JSID_TO_FLAT_STRING (leaving JSID_TO_STRING, which produces a JSString which will hit the normal barriers when reading the chars).
> Is your preference for ensureNotInline or ensureStable: e.g. the property we
> wish to guarantee or the type we wish to convert to?
ensureStable would match the "JSXString *ensureX()" pattern.
> js_NewString* and several caches return it. In the case of the DtoaCache,
> we know that these strings are explicitly _not_ JSStableStrings. We could
> probably use JSShortString for the DtoaCache. I'll look harder at the other
> users after I've updated this patch.
If there are no users that depend on JSFixedString (that can't be specialized to some other string type), then perhaps we could consider removing it (making all its subtypes derive JSFlatString).
Assignee | ||
Comment 9•12 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #8)
> > js_NewString* and several caches return it. In the case of the DtoaCache,
> > we know that these strings are explicitly _not_ JSStableStrings. We could
> > probably use JSShortString for the DtoaCache. I'll look harder at the other
> > users after I've updated this patch.
>
> If there are no users that depend on JSFixedString (that can't be
> specialized to some other string type), then perhaps we could consider
> removing it (making all its subtypes derive JSFlatString).
Well, I've now audited all the existing users of JSFixedString: I believe it is still very much in use. Specifically, the places where it is useful are places that can return either an inline or stable string, but where the string being returned *cannot* be extensible (e.g. a freshly malloced, exactly-sized region).
1) js_NewStringCopy
2) StringBuffer::finishString
3) ScriptSource::substring
4) dtoaCache (would be Short except for Number.toString with a small radix)
That said, there were quite a few places where it was valid to tighten up the string typing. I've attached a patch that updates the places I found in my FixedString audit. If you think this patch is basically sound, I will clean it up and fold it into the other string changes (or keep it separate if you would prefer).
Attachment #657453 -
Flags: feedback?(luke)
Assignee | ||
Comment 10•12 years ago
|
||
> 4) dtoaCache (would be Short except for Number.toString with a small radix)
Nevermind, js_NumberToStringWithRadix works on the full number range, not just int32, so we're doomed even without small radicies.
Comment 11•12 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #9)
> Specifically, the places where it is useful are
> places that can return either an inline or stable string, but where the
> string being returned *cannot* be extensible (e.g. a freshly malloced,
> exactly-sized region).
Yes I know where we might *return* it, but does anywhere *depend* on actually having a JSFixedString. Those are the cases I'd be suspicious of (b/c pretty much the only time we need null-terminated-ness is at JSAPI boundaries when we can probably GC and hence probably need a stable string). That is, the existence of JSFixedString seems like a footgun.
Assignee | ||
Comment 12•12 years ago
|
||
Right, sorry, was being dense there. Is this what you had in mind?
Attachment #657453 -
Attachment is obsolete: true
Attachment #657453 -
Flags: feedback?(luke)
Attachment #659302 -
Flags: feedback?(luke)
Updated•12 years ago
|
Whiteboard: [js:t]
Assignee | ||
Comment 13•12 years ago
|
||
Attachment #659302 -
Attachment is obsolete: true
Attachment #659302 -
Flags: feedback?(luke)
Attachment #661820 -
Flags: feedback?(luke)
Comment 14•12 years ago
|
||
Comment on attachment 661820 [details] [diff] [review]
v1: rebased and fixed up after Ion's landing.
Review of attachment 661820 [details] [diff] [review]:
-----------------------------------------------------------------
Exactly, thanks!
::: js/src/jsscript.cpp
@@ +1087,1 @@
> ScriptSource::substring(JSContext *cx, uint32_t start, uint32_t stop)
Can't be a JSFlatString? (also, sourceData)
::: js/src/vm/String.h
@@ +98,5 @@
> + * JSFlatString___ - / null terminated
> + * | \ \ \ \ \
> + * | \ \ \ \ JSStableString - / may have external pointers into char array
> + * | \ \ \ \
> + * | \ \ \ JSExternalString - / char array memory managed by embedding
This bikeshed is looking a bit goofy now with all the children. How about:
* JSFlatString
* | |
* | +-- JSStableString
* | |
* | +-- JSExternalString
* | |
* | +-- JSExtensibleString
* | |
* | +-- JSUndependedString
* | |
* | +-- JSInlineString
* | | \
* | | JSShortString
* | | |
* JSAtom | |
* | \ | |
* | JSInlineAtom |
* | \ |
* | JSShortAtom
* |
* js::PropertyName
@@ +546,5 @@
> + JS_ALWAYS_INLINE
> + const jschar *chars() const {
> + JS_ASSERT(!JSString::isInline());
> + return JSLinearString::chars();
> + }
Seems like you could just remove this whole member definition.
Attachment #661820 -
Flags: feedback?(luke) → feedback+
Assignee | ||
Comment 15•12 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #14)
> ::: js/src/jsscript.cpp
> ::: js/src/vm/String.h
> @@ +98,5 @@
> > + * JSFlatString___ - / null terminated
> > + * | \ \ \ \ \
> > + * | \ \ \ \ JSStableString - / may have external pointers into char array
> > + * | \ \ \ \
> > + * | \ \ \ JSExternalString - / char array memory managed by embedding
>
> This bikeshed is looking a bit goofy now with all the children. How about:
Wow, yeah, the flat layout is much better. I spent more time that I can reasonable justify trying to make the wedge formation look reasonably okay.
> @@ +546,5 @@
> > + JS_ALWAYS_INLINE
> > + const jschar *chars() const {
> > + JS_ASSERT(!JSString::isInline());
> > + return JSLinearString::chars();
> > + }
>
> Seems like you could just remove this whole member definition.
Ah, right you are! I love how much better the new factoring you came up with works.
I've rebased again and folded this together with the original patch. I think it's ready for another review pass now.
Also, one more try run is up at:
https://tbpl.mozilla.org/?tree=Try&rev=f108e2efccbf
Attachment #656096 -
Attachment is obsolete: true
Attachment #661820 -
Attachment is obsolete: true
Attachment #662635 -
Flags: review?(luke)
Comment 16•12 years ago
|
||
Comment on attachment 662635 [details] [diff] [review]
v2: rebased and folded
Review of attachment 662635 [details] [diff] [review]:
-----------------------------------------------------------------
Awesome, looks good.
::: js/src/json.cpp
@@ +59,5 @@
> CallArgs args = CallArgsFromVp(argc, vp);
>
> /* Step 1. */
> + JSString *str = (argc >= 1) ? ToString(cx, args[0])
> + : cx->names().undefined;
SM style is either all on one line (I think we have enough chars here) or, failing that:
cond
? then
: else
::: js/src/vm/String.cpp
@@ +369,5 @@
> +{
> + JS_ASSERT(isInline());
> + size_t n = length();
> + size_t nbytes = (n + 1) * sizeof(jschar);
> + jschar *news = maybecx ? (jschar *)maybecx->malloc_(nbytes) : js_pod_malloc<jschar>(n + 1);
maybecx->pod_malloc<jschar>(n + 1) and we can remove nbytes.
::: js/src/vm/String.h
@@ +108,5 @@
> + * | +-- JSInlineString - / chars stored in header
> + * | | \
> + * | | JSShortString - / header is fat
> + * | | |
> + * JSAtom | | - / string equality === pointer equality
I just realized that JSAtom may or may not be stable. That suggests, analogous to JSInlineAtom, we have a JSStableAtom that derives both JSStableString and JSAtom. While we could do this (grossing up the ASCII art diagram even more), I was thinking that there are zero uses of JSInlineAtom/JSShortAtom anyways; they serve no purpose but to complete the set of "most-derived types". How about instead just add the comment I suggest below and grep/delete any occurrence of "InlineAtom" and "ShortAtom"?
@@ +119,5 @@
> *
> * Classes marked with (abstract) above are not literally C++ Abstract Base
> * Classes (since there are no virtual functions, pure or not, in this
> * hierarchy), but have the same meaning: there are no strings with this type as
> * its most-derived type.
...right below this paragraph saying:
Technically, there are three additional most-derived types that satisfy the invariants of more than one of the abovementioned most-derived types:
- InlineAtom = JSInlineString + JSAtom (atom with inline chars)
- ShortAtom = JSShortString + JSAtom (atom with (more) inline chars)
- StableAtom = JSStableString + JSAtom (atom with out-of-line chars)
@@ +851,5 @@
> + }
> +
> + if (!isInline()) {
> + return &asStable();
> + }
No { }
Attachment #662635 -
Flags: review?(luke) → review+
Assignee | ||
Comment 17•12 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #16)
> Awesome, looks good.
This patch is indeed starting to look pretty awesome! Thanks for helping me get it into such good shape.
> @@ +119,5 @@
> > *
> > * Classes marked with (abstract) above are not literally C++ Abstract Base
> > * Classes (since there are no virtual functions, pure or not, in this
> > * hierarchy), but have the same meaning: there are no strings with this type as
> > * its most-derived type.
>
> ...right below this paragraph saying:
>
> Technically, there are three additional most-derived types that satisfy the
> invariants of more than one of the abovementioned most-derived types:
> - InlineAtom = JSInlineString + JSAtom (atom with inline chars)
> - ShortAtom = JSShortString + JSAtom (atom with (more) inline chars)
> - StableAtom = JSStableString + JSAtom (atom with out-of-line chars)
I've been wondering about why we had those, rather than just the comment in their bodies in a form like this, for awhile.
Green try run at:
https://tbpl.mozilla.org/?tree=Try&rev=f108e2efccbf
Pushed at:
https://hg.mozilla.org/integration/mozilla-inbound/rev/326ee1d5c9b0
Assignee | ||
Comment 18•12 years ago
|
||
Whoops! Forgot to kill nbytes when I update to pod_malloc.
https://hg.mozilla.org/integration/mozilla-inbound/rev/7d54dc3c254e
Comment 19•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/326ee1d5c9b0
https://hg.mozilla.org/mozilla-central/rev/7d54dc3c254e
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
relnote-firefox:
--- → ?
Updated•2 months ago
|
relnote-firefox:
? → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•