Closed Bug 613457 Opened 9 years ago Closed 9 years ago

Clean up JSString

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: njn, Assigned: luke)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(3 files, 4 obsolete files)

JSString is a bloody mess.  I defy anyone to read its declaration in jsstr.h
and proclaim confidently (a) which of the 16 possible combinations of types
and flags are valid, and (b) which of the various fields (including both a
union and two unions-within-a-union) are used for each of the valid
type/flags combinations.  And JSShortString just adds to the fun.

We've had a lot of JSString-related crashes and problems lately (eg. bug
596988, bug 597886, bug 598695, bug 603554, bug 604756, bug 605852) and I
think this complexity is a significant cause.

This patch makes a start on fixing the mess by merging the separate 'type'
and 'flags' fields into a single 'mode' (I deliberately used a new word
because that made it easier to find all the places that needed changing).
There are six modes: FLAT_VANILLA, FLAT_ATOMIZED, FLAT_EXTENSIBLE,
DEPENDENT, ROPE_INTERIOR_NODE, ROPE_TOP_NODE.  (Thus answering question (a)
above, if we ignore JSShortString's abuses.)  Someone will undoubtedly whine
about my use of "vanilla", so I proffer "plain" as a back-up.

The patch is a draft, I was hoping to get feedback as to whether people
think this is a good idea, whether I should go further (by cleaning up the
union fields as well) and whether that should be done as part of this bug or
a follow-up (or multiple follow-ups).  I wonder if introducing a sub-class
for each of mode would help things.
Attachment #491799 - Flags: feedback?(lw)
Attachment #491799 - Flags: feedback?(bzbarsky)
Comment on attachment 491799 [details] [diff] [review]
draft patch (against TM 57585:3ce2cfd22078)

I'm probably the wrong person to ask here... but reducing the space of possible modes to the space of valid modes sounds good!
Attachment #491799 - Flags: feedback?(bzbarsky) → feedback+
Comment on attachment 491799 [details] [diff] [review]
draft patch (against TM 57585:3ce2cfd22078)

Nice start.  Since you are interested in doing this, I'd like to make some requests, mostly to the tune of "could JSString follow the direction of JSStackFrame?":

Make the data members private and replace offsetof usage with static offsetOfX members.

Right after 'public:', have a comment that says "these are the sorts of strings", and shortly describe each one.  Then under that comment put the bool isSortX() queries for each sort.  I called it 'sort' since 'mode' implies that a single frame may change its modes and 'type' is heavily overloaded.

After the sort section, JSStackFrame has section for the initialization of stack frames into each mode.

Then it seems like you could group the string accessors by the assumed string sort (separated by a comment block describing the sort in more detail).  JSStackFrame grouped them by field, not required sort, since most fields applied to most sorts, but string is different in this manner -- its basically an Algebraic Datatype.

Sorry for the dump, but you did ask for feedback ;-)
Attachment #491799 - Flags: feedback?(lw) → feedback+
(In reply to comment #2)
> 
> Sorry for the dump, but you did ask for feedback ;-)

No need to apologize, it's all good suggestions.  Except I don't much like "sort"... I read "string sort" in a sentence and I want to interpret it as talking about string sorting.  But then strings, unlike JSStackFrame, can change mode so maybe that fits with what you said.
(In reply to comment #3)
> But then strings, unlike JSStackFrame, can
> change mode so maybe that fits with what you said.

Ah, of course, mode sounds perfect then!
I've decided for now not to combine 'flags' and 'type' into 'mode', because
I can imagine a larger future clean-up in which the various types of string
actually become (C++) sub-classes, in which case 'flags' should belong to
the FLAT sub-class.

Instead, this patch greatly improves the documentation of JSString, and does
some minor clean-ups of it and JSShortString.  The following points are
listed in the order you'll encounter them if reading the patch from top to
bottom.  Specifically, the patch:

- Introduces the "flat plain" terminology to contrast with "flat atomized"
  and "flat extensible" strings.

- Renames externalStringType as mExternal for consistency.

- Removes dependentLength() and flatLength();  they're identical to
  length().

- Renames {TOP,INTERIOR}_NODE as ROPE_{TOP,INTERIOR}_NODE for clarity;
  likewise with related function names.

- Renamed mLengthAndFlags as mLengthFlagsType, which is more accurate.

- Introduces names for the hacky constants used in flatten(), and asserts
  that mChar and mLeft alias, which is crucial to flatten()'s operation.

- Removes finishTraversalConversion(), it's very similar to initDependent(),
  and using initDependent() is clearer.

- Adds the LENGTH_FLAGS_TYPE macro which makes setting mLengthFlagsType
  simpler.

- Greatly expands the comment at the top of JSString.  I'm pretty sure it
  covers all the cases now.  Likewise, the uses of the fields are now much
  clearer.

- Moves mBase into the union with mCapacity, mParent, mBufferWithInfo;  it's
  never used concurrently with any of them.

- Renames hasFlag() and hasBit() because it's used to match bits in both the
  'flags' and the 'type' sub-fields.

- Tightens/adds some assertions in some accessor functions.

- In the various initFoo() functions, now only the fields used by the
  relevant string type are initialized.

- Removes initShortString(), it ended up being identical to initFlatPlain().

- Removes flatSetExtensible(), which is dead.

- JSShortString consists of a JSString immediately followed by some extra
  space in which to hold additional chars.  Prior to this patch this extra
  space was itself another JSString.  But there's no need for that, and it's
  a bit confusing.  So this patch changes that space to a jschar array and
  adds comments to make things clearer.

This patch should go in after bug 609440 is done;  it'll need some minor
changes then.
Attachment #491799 - Attachment is obsolete: true
Summary: clean up JSString → Better documentation and minor clean-ups for JSString, JSShortString
Depends on: 609440
Blocks: 614459
This is similar to v2 of the patch, but updated for all the recent changes.
Attachment #492878 - Attachment is obsolete: true
Attachment #516806 - Flags: review?(luke)
Sorry for taking so long to get back to you.

The patch is a nice improvement.  As I was reading the big string comment, I liked how it was breaking things out into clean orthogonal categories.  However, due to the tricky nature of our strings, there are still a few things that aren't quite orthogonalized in the comment or in the C++ interface.  I started thinking about if there was any way to take it further, and I came to a conclusion we could use and extend the hierarchy of JSString/JSFlatString/JSLinearString/JSAtom and use that as the central way of describing/understanding strings.

Basically, what I'd like is for one ASCII-art diagram to cover all the relevant externally-facing string cases and, unless I'm missing one, I think this one achieves that:

/*
 *   C++ type                Supported operations / properties / invariants
 *
 * JSString                  length
 *    | \                    fallible operation to get an array of chars
 *    |  \
 *    |  JSRope              left and right child strings
 *    |                      fallible conversion to JSFlatString
 *    |
 * JSLinearString            (infallible) array of chars (not null-terminated)
 *    | \
 *    | JSDependentString    base string
 *    |                      fallible conversion to JSFlatString
 *    |
 * JSFlatString              chars[length] == 0
 *    | \    \
 *    |  \    \              external string GC type
 *    |   \ JSExternalString string header on GC heap, but char array allocation
 *    |    \                  managed by embedder
 *    |     \
 *    | JSShortString        chars stored in string header, allocated on the
 *    |                      GC heap
 *    |
 * JSAtom                    equality between atoms can be tested by testing
 *    | \                     identity of JSAtom*
 *    |  \
 *    | JSShortAtom          atomized JSShortString
 *    |
 * JSStaticString            short atom stored in static memory (not GC heap)
 */

The new type JS{DependentString,Rope} classes would allow us to pull rope/dependent stuff out of JSString. (Btw, 'rope' is a kindof a misnomer for ropes that got borrowed from JSC.  I don't think it'd be too much work to choose a different name.  I like "class LazyConcat", "isLazyConcat", etc.)

Second, the initX() members could also be hoisted into static member new_() functions of the X types (with the exception being init functions called on existing strings which we can rename (and should be private members since they are only called from flatten/undepend).

Then, all JSString would contain is (in addition to length/getChars) a set of isX()/asX()/morphToX() members (with 'morph' regularizing the names flatten/undepend) for moving down the hierarchy.  Then, as a separate comment near the private data section of JSString, we could describe the nitty gritty representation details of each type and we distinguish them (or, as you pointed out in the case of short/external strings, don't distinguish them).

Lastly (as discussed somewhere that escapes me) it would be nice to make JSStaticString not actually derive JSString (but rather have a named conversion) so that we could avoid the tyranny of struct initialization rues and (1) make string private data private, (2) be more free to tweak the private data decls, and (3) give strings constructors.

Sorry again for the big dump but the quality of your comment in the patch got me thinking.  If you'd rather just have me do it, feel free to assign the bug to me; I also really want to see strings cleaned up and documented.
(In reply to comment #7)
>  *    |   \ JSExternalString string header on GC heap, but char array
> allocation

Oh come on, fit in exactly 80 chars!
(In reply to comment #7)
> 
> Lastly (as discussed somewhere that escapes me) it would be nice to make
> JSStaticString not actually derive JSString (but rather have a named
> conversion) so that we could avoid the tyranny of struct initialization rues
> and (1) make string private data private, (2) be more free to tweak the private
> data decls, and (3) give strings constructors.

That's bug 614459, which depends on this one :)

> Sorry again for the big dump but the quality of your comment in the patch got
> me thinking.  If you'd rather just have me do it, feel free to assign the bug
> to me; I also really want to see strings cleaned up and documented.

That sounds easiest, you know exactly what you want and it saves me misinterpreting what you wrote in comment 7 :)  I'm happy to review.

Your suggestions all sound pretty good to me.  Though I'd keep "Rope" -- aren't we using it in the standard way? (http://en.wikipedia.org/wiki/Rope_%28computer_science%29)

Oh, one other thing:  finishTraversalConversion() really annoys me.  We could inline it, but really we should just use initDependent() there.
Assignee: nnethercote → luke
Attachment #516806 - Flags: review?(luke)
(In reply to comment #9)
> Your suggestions all sound pretty good to me.  Though I'd keep "Rope" -- aren't
> we using it in the standard way?
> (http://en.wikipedia.org/wiki/Rope_%28computer_science%29)

Hmmm, maybe you're right.  What I was getting at is that all our ropes do is represent a lazy concatenation -- as soon as you want to do anything you do the concatenation (less so after bug 615290) -- whereas general ropes are meant to be the object of manipulation: operations on ropes produce ropes.  But, now that I say that, the difference is a bit subtle.

> Oh, one other thing:  finishTraversalConversion() really annoys me.  We could
> inline it, but really we should just use initDependent() there.

Can do (inline); its really a private detail of strings and the conversion routine.
Depends on: 640274
Depends on: 640294
Depends on: 640852
Depends on: 616562
While I was here and confused anyway...
Attachment #519301 - Flags: review?(dvander)
Attached patch cleanup patch (obsolete) — Splinter Review
...and finally the big cleanup patch that spawned all these smaller patches.
Attachment #516806 - Attachment is obsolete: true
Attachment #519304 - Flags: review?(nnethercote)
Attached patch cleanup patchSplinter Review
Remove overzealous assert, factor accounting logic into JS*String::new_, fix bug in JSExternalString::_new (s/js_NewGCString/js_NewGCExternalString/).
Attachment #519304 - Attachment is obsolete: true
Attachment #519528 - Flags: review?(nnethercote)
Attachment #519304 - Flags: review?(nnethercote)
While I was investigating said overzealous assert: use *ById functions in storage to avoid atomizing atoms.  (Re: the removal of PRUnichar cast, bug 578340 sync'd em :)
Attachment #519530 - Flags: review?(sdwilsh)
Green on try.
Comment on attachment 519530 [details] [diff] [review]
use *ById APIs in storage

>+++ b/storage/src/mozStorageStatementParams.cpp
>@@ -213,23 +213,22 @@ StatementParams::NewResolve(nsIXPConnect
>   else if (JSID_IS_STRING(aId)) {
>     JSString *str = JSID_TO_STRING(aId);
>     size_t nameLength;
>     const jschar *nameChars = JS_GetStringCharsAndLength(aCtx, str, &nameLength);
>     NS_ENSURE_TRUE(nameChars, NS_ERROR_UNEXPECTED);
> 
>     // Check to see if there's a parameter with this name, and if not, let
>     // the rest of the prototype chain be checked.
>-    NS_ConvertUTF16toUTF8 name(reinterpret_cast<const PRUnichar *>(nameChars),
>-                               nameLength);
>+    NS_ConvertUTF16toUTF8 name(nameChars, nameLength);
The reinterpret_cast is actually important here for compiling on mingw (bug 505734)

r=sdwilsh
Attachment #519530 - Flags: review?(sdwilsh) → review+
(In reply to comment #16)
Shawn, did you see bug 578340 that I referenced in comment 14?  That bug was targeted specifically at mingw type equivalence.
(In reply to comment #17)
> (In reply to comment #16)
> Shawn, did you see bug 578340 that I referenced in comment 14?  That bug was
> targeted specifically at mingw type equivalence.
Nope; I neglected to read the bug!  You can ignore my comment then.
Comment on attachment 519528 [details] [diff] [review]
cleanup patch

Nice cleanup.  Lots of comments below, but that's because it's a big patch.  r=me with them addressed.

BTW, you've fixed bug 614459 (privatize fields in JSString) as part of this change, cool!


>-#define STRING_TO_ATOM(str)       (JS_ASSERT(str->isAtom()),                  \
>-                                   (JSAtom *)str)
>-#define ATOM_TO_STRING(atom)      (atom)
>-#define ATOM_TO_JSVAL(atom)       STRING_TO_JSVAL(ATOM_TO_STRING(atom))

Good riddance to those!


>diff --git a/js/src/jsatominlines.h b/js/src/jsatominlines.h
>--- a/js/src/jsatominlines.h
>+++ b/js/src/jsatominlines.h
>@@ -51,23 +51,23 @@ js_ValueToAtom(JSContext *cx, const js::
> {
>     JSString *str;
>     JSAtom *atom;
> 
>     /*
>      * Optimize for the common case where v is an already-atomized string. The
>      * comment in jsstr.h before JSString::flatSetAtomized explains why this is
>      * thread-safe. The extra rooting via lastAtom (which would otherwise be
>-     * done in js_js_AtomizeString) ensures the caller that the resulting id at
>+     * done in js_AtomizeString) ensures the caller that the resulting id at
>      * is least weakly rooted.

"resulting id at is least"?  Reword.


>diff --git a/js/src/json.cpp b/js/src/json.cpp
>--- a/js/src/json.cpp
>+++ b/js/src/json.cpp
>@@ -59,16 +59,17 @@
> #include "jsatominlines.h"
> #include "jsobjinlines.h"
>+#include "jsstrinlines.h"
> 
>@@ -1257,17 +1258,17 @@ js_ConsumeJSONText(JSContext *cx, JSONPa
> #if JS_HAS_TOSOURCE
> static JSBool
> json_toSource(JSContext *cx, uintN argc, Value *vp)
> {
>-    vp->setString(ATOM_TO_STRING(CLASS_ATOM(cx, JSON)));
>+    vp->setString(CLASS_ATOM(cx, JSON));
>     return JS_TRUE;
> }
> #endif

If that's the only change to this file, it looks wrong that jsstrinlines.h
must be included.  Can you double check?


>+JSFlatString *
>+JSRope::flatten(JSContext *maybecx)
> {
>-    JS_ASSERT(isRope());
>-
>     /*
>      * Perform a depth-first dag traversal, splatting each node's characters
>      * into a contiguous buffer. Visit each rope node three times:
>      *  1. record position in the buffer and recurse into left child;
>      *  2. recurse into the right child;
>      *  3. transform the node into a dependent string.
>      * To avoid maintaining a stack, tree nodes are mutated to indicate how
>      * many times they have been visited. Since ropes can be dags, a node may
>      * be encountered multiple times during traversal. However, step 3 above
>-     * leaves a valid dependent string, so everythings works out.  This
>+     * leaves a valid dependent string, so everythings works out. This

s/everythings/everything/


>         if (str->isAtom()) {
>-            atom = STRING_TO_ATOM(str);
>+            atom = &str->asAtom();

+1 for killing STRING_TO_ATOM and replacing it with something checked.


>-static JS_ALWAYS_INLINE JSFlatString *
>+static JS_ALWAYS_INLINE JSFixedString *
> NewShortString(JSContext *cx, const jschar *chars, size_t length)
> 
>-static JSFlatString *
>+static JSFixedString *
> NewShortString(JSContext *cx, const char *chars, size_t length)

Why isn't the return type JSShortString?


> JSLinearString *
>-js_NewDependentString(JSContext *cx, JSString *baseArg, size_t start,
>-                      size_t length)
>+js_NewDependentString(JSContext *cx, JSString *baseArg, size_t start, size_t length)

Similarly, why isn't the return type JSDependentString?


>@@ -3854,46 +3873,38 @@ EqualStrings(JSContext *cx, JSString *st
>     }
> 
>     size_t length1 = str1->length();
>     if (length1 != str2->length()) {
>         *result = false;
>         return true;
>     }
> 
>-    if (length1 == 0) {
>-        *result = true;
>-        return true;
>-    }

Do you know that this saves some instructions, or are you assuming it
doesn't help much in practice?


> bool
> EqualStrings(JSLinearString *str1, JSLinearString *str2)
> {
>     if (str1 == str2)
>         return true;
> 
>     size_t length1 = str1->length();
>     if (length1 != str2->length())
>         return false;
> 
>-    if (length1 == 0)
>-        return true;

Same question here.


>+ * JS strings

This is a really nice comment.  It's also really important, so please excuse
the close nit-picking below.


>+ * Conceptually, a JS string is just an array of chars and a length. To improve
>+ * performance of common string operations, the following optimizations are
>+ * made which affect the engine's representation of strings:

Flat strings should be mentioned first in this list, as the vanilla case.

And the header/chars separation is important (esp. for explaining inline
short strings), and worth mentioning up front.


>+ *  - To avoid copying a substring of an existing "base" string , a "dependent"
>+ *    string (JSDependentString) can be created which points into the base
>+ *    string's char array.
>+ *
>+ *  - To avoid eager copying, a "rope" node (JSRope) can be created to represent
>+ *    a delayed string concatenation. This concatenation (called flattening) is
>+ *    performed if and when a linear char array is requested.

This doesn't explain what a rope is, ie. a binary tree with internal nodes
that only contain pointers to children, and leaf nodes that are flat
strings.  Please add this info.


>+ *  - To avoid copying the left-hand side when flattening, the left-hand side's
>+ *    string's buffer may be grown to make space for a copy of the right-hand

"left-hand side's string's" reads badly.  "left-hand side's buffer"?


>+ *  - To avoid allocating small char arrays, short strings can be stored inline
>+ *    in the string header. To increase the max size of such inline strings,
>+ *    double-wide string headers (JSShortString) can be used.
>+ *
>+ *  - To avoid dynamic creation of common very-short strings, string headers and
>+ *    their char arrays are allocated in static memory (JSStaticString).

That is unclear about exactly which strings are static.  Maybe: "To avoid
dynamic creation of certain common very-short strings (eg. all single-letter
strings such as "a", "b", "c", etc), the headers and char arrays of these
strings are pre-allocated in static memory (JSStaticString)."


>+ * C++ type                     operations / fields / invariants
>+ *
>+ * JSString (abstract)          'length' field
>+ *    | \                       (fallible) 'getCharsZ'/'getChars' operations
>+ *    |  \
>+ *    |  JSRope                 'leftChild' and 'rightChild' fields
>+ *    |
>+ * JSLinearString (abstract)    (not null terminated) 'chars' field
>+ *    | \
>+ *    | JSDependentString       'base' field
>+ *    |
>+ * JSFlatString (abstract)      (null terminated) 'chars' field
>+ *    |                         'extensible' field
>+ *    |
>+ * JSFixedString                !extensible
>+ *    | \  \
>+ *    |  \  JSShortString       chars stored in header
>+ *    |   \
>+ *    |    JSExternalString     'external string GC type' field
>+ *    |
>+ * JSAtom                       string equality === pointer equality
>+ *    | \
>+ *    | JSShortAtom             atomized JSShortString
>+ *    |
>+ * JSAtom (static)              header and chars statically allocated

This picture is really good, but I find the RHS hard to follow.  You have
ops/fields/invariants as the header, but don't follow that layout in most of
the entries.  Here's how I'd rewrite it:

>+ * C++ type                     operations / fields / invariants+properties
>+ *
>+ * JSString (abstract)          (fallible) 'getCharsZ', 'getChars' / 'length' / - 
>+ *    | \                       
>+ *    |  \
>+ *    |  JSRope                 - / 'leftChild', 'rightChild' / -
>+ *    |
>+ * JSLinearString (abstract)    - / 'chars' / not null-terminated
>+ *    | \
>+ *    | JSDependentString       - / 'base' / -
>+ *    |
>+ * JSFlatString (abstract)      - / 'chars', 'extensible' / not null-terminated
>+ *    |
>+ * JSFixedString                - / - / not extensible
>+ *    | \  \
>+ *    |  \  JSShortString       - / - / chars stored in header
>+ *    |   \
>+ *    |    JSExternalString     - / 'externalStringType' field / -
>+ *    |
>+ * JSAtom                       - / - / string equality === pointer equality
>+ *    | \
>+ *    | JSShortAtom             - / - / atomized JSShortString
>+ *    |
>+ * JSAtom (static)              - / - / header and chars statically allocated

Note that I renamed "invariants" as "invariants+properties".

Are all static strings atoms?  If so, that should be mentioned in the
string descriptions above.

It's weird having "JSAtom" and "JSAtom (static)" as separate nodes in the
graph.  Should it be "JSStaticAtom"?  Can a JSShortAtom be static?

In JSFlatString, what is the 'extensible' field?  Do you mean a flag?  You
haven't mentioned the flag settings for the other string kinds, so it feels
out of place.

JSShortAtom doesn't exist as an actual class.  Is that intended?  It's
confusing.

Most operations aren't mentioned.  A couple of examples:
JSLinearString::{chars,mark}, JSLinearString::undepend.  In fact, JSString
is the only entry with operations listed.  Maybe just omit that column?

The fields are actually all getter methods, AFAICT.  This confused me for a
bit.  Clarify?


>+ * Classes marked with (abstract) above are not literally C++ Abstract Base
>+ * Classes (since there are no virtual functions, pure or not, in this
>+ * hierarchy), but have the same meaning: there are no instances with this

Some text got cut here?


>+  public:
>+    /* Flags exposed only for jits */

Can you make the relevant JIT classes friends and keep these members private?


>+    /*
>+     * Instead of using a dense index to represent the most-derived type, string
>+     * types are encoded to allow single-op tests for hot queries (isRope,
>+     * isDependent, isFlat, isAtom, isStaticAtom):
>+     *
>+     *  JSRope                       xxx1
>+     *  JSLinearString               xxx0
>+     *  JSDependentString            xx1x
>+     *  JSFlatString                 xx00
>+     *  JSFlatString && isExtensible 1100
>+     *  JSFixedString                xy00 where xy != 11
>+     *  JSShortString                0100 and in FINALIZE_SHORT_STRING arena
>+     *  JSExternalString             0100 and in FINALIZE_EXTERNAL_STRING arena
>+     *  JSAtom                       x000
>+     *  JSAtom (static)              0000
>+     *
>+     * NB: this scheme takes advantage of the fact that there are no string
>+     * instances whose most-derived type is JSString, JSLinearString, or
>+     * JSFlatString.
>+     */

Lovely comment!  But we seem to have lost the information about the layout
of lengthAndFlags, ie. that the length is the high 28 bits and the flags are
the low 4 bits.  You should mention that.


>+    JS_ALWAYS_INLINE
>+    bool isRope() const {
>+        bool rope = d.lengthAndFlags & ROPE_BIT;
>+        JS_ASSERT_IF(rope, (d.lengthAndFlags & FLAGS_MASK) == ROPE_BIT);
>         return rope;
>     }

The assertion doesn't seem necessary.  Likewise in isDependent().


>+    JS_ALWAYS_INLINE
>+    bool isStaticAtom() const {
>+        return (d.lengthAndFlags & FLAGS_MASK) == STATIC_ATOM_FLAGS;
>     }

Why did you add the Data struct -- is there a benefit I'm missing?  
If not, remove it, it's just more typing -- all those "d." prefixes.  


>+/* See JSString comment */
>+class JSRope : public JSString
> {

These "See JSString comment" comments aren't necessary, here and below.


>+class JSShortString : public JSFixedString
> {
>-    JSString mHeader;
>-    JSString mDummy;
>+    jschar inlineStorageExtension[sizeof(JSString::Data) / sizeof(jschar)];

There's no reason why inlineStorageExtension must have the same length as
sizeof(JSString::Data), and in my patch I made this clear.  Can you
introduce a new constant (which can be given the value
|sizeof(JSString::Data)|), and say something like "the length of this is
arbitrary, except it must be a multiple of 8 for GC purposes".


>+    static const size_t MAX_SHORT_LENGTH = JSString::NUM_INLINE_CHARS +
>+                                           sizeof(JSString::Data) / sizeof(jschar)
>+                                           -1 /* null terminator */;

That new constant can be used as the middle term in this addition.


>+class JSAtom : public JSFixedString
>+{
>+  public:
>+    /* Exposed only for jits. */

Again, use friend classes?


>+JS_ALWAYS_INLINE JSFixedString *
>+JSString::ensureFixed(JSContext *cx)
>+{
>+    if (!ensureFlat(cx))
>+        return NULL;
>+    if (isFlatAndExtensible()) {
>+        JS_ASSERT((d.lengthAndFlags & FLAT_MASK) == 0);

This assertion seems unnecessary.


>         /*
>-         * flatChars for stillborn string is null, but cx->free checks
>-         * for a null pointer on its own.
>+         * Stillborn strings may have null chars, but cx->free checks for a null
>+         * pointer on its own.
>          */

What is a stillborn string?


>+template <class T>
>+JS_ALWAYS_INLINE static bool
>+PodEqual(T *one, T *two, size_t len)
>+{
>+    if (len < 128) {
>+        T *p1end = one + len;
>+        for (T *p1 = one, *p2 = two; p1 != p1end; ++p1, ++p2) {
>+            if (*p1 != *p2)
>+                return false;
>+        }
>+        return true;
>+    }
>+
>+    return !memcmp(one, two, len * sizeof(T));
>+}

Why not just call memcmp() straight off?
Attachment #519528 - Flags: review?(nnethercote) → review+
Attachment #519301 - Flags: review?(dvander) → review+
Duplicate of this bug: 614459
(In reply to comment #19)

Thanks for taking the time to dig into this!  There are lots of great comments.

> >@@ -51,23 +51,23 @@ js_ValueToAtom(JSContext *cx, const js::
> "resulting id at is least"?  Reword.

Your comment led me to find that this whole comment is out of date and find yet another one of these preexisting JSString/chars rooting errors.

> >@@ -3854,46 +3873,38 @@ EqualStrings(JSContext *cx, JSString *st
> >-    if (length1 == 0) {
> >-        *result = true;
> >-        return true;
> >-    }

> > EqualStrings(JSLinearString *str1, JSLinearString *str2)
> Same question here.

My thinking was: extra code must justify its existence, and previously this code may have (avoiding a call to memcmp), but given that a length 0 string will immediately branch out (inline), it doesn't now.  I should mention that I wasn't able to measure any SS or V8 changes for this whole patch queue.

> This picture is really good, but I find the RHS hard to follow.  You have
> ops/fields/invariants as the header, but don't follow that layout in most of
> the entries.  Here's how I'd rewrite it:

Sold!  (with tweaks as motivated below)

> It's weird having "JSAtom" and "JSAtom (static)" as separate nodes in the
> graph.  Should it be "JSStaticAtom"?  Can a JSShortAtom be static?

Ah, thanks, this one was in flux and I forgot to resolve it (you can even see JSStaticString lingering in the comments).  Yeah, I'll make an empty JSStaticAtom subtype for regularity.

At the moment, all static atoms are non-short.

> In JSFlatString, what is the 'extensible' field?  Do you mean a flag?  You
> haven't mentioned the flag settings for the other string kinds, so it feels
> out of place.

Yeah, I don't know why I didn't just make JSExtensibleString.  Fixed.

> JSShortAtom doesn't exist as an actual class.  Is that intended?  It's
> confusing.

Well, I was lazy; its not used anywhere and has no operations.  But if it bugs you then it will probably bug others so I should add it for consistency with a comment as to why its not explicitly mentioned elsewhere.  It also adds a fun diamond in the is-a graph.

> Most operations aren't mentioned.  A couple of examples:
> JSLinearString::{chars,mark}, JSLinearString::undepend.  In fact, JSString
> is the only entry with operations listed.  Maybe just omit that column?
>
> The fields are actually all getter methods, AFAICT.  This confused me for a
> bit.  Clarify?

Well, most operations are impl details that don't help with the high-level understanding, which is the point of the diagram.  Incidentally, undepend is a private operation, so it definitely doesn't belong.  JSLinearString::chars() is mentioned in the diagram (as a "field", since its infallible, so, effectively, a field).  But your observation makes me want to go from three columns to two (operations+fields / invariants+properties).

> >+  public:
> >+    /* Flags exposed only for jits */
> 
> Can you make the relevant JIT classes friends and keep these members private?

Well, friendship with jits is scary because then you are saying "the jit can do anything" when you really want to make a more restricted statement like "the jit needs to know the offset of X and the value of flag Y".  Also there's not a single class to cover the mjit.  I tried doing this through various static public member functions but it ultimately didn't seem any cleaner and it was more syntactic noise.

> >+    JS_ALWAYS_INLINE
> >+    bool isRope() const {
> >+        bool rope = d.lengthAndFlags & ROPE_BIT;
> >+        JS_ASSERT_IF(rope, (d.lengthAndFlags & FLAGS_MASK) == ROPE_BIT);
> >         return rope;
> >     }
> 
> The assertion doesn't seem necessary.  Likewise in isDependent().

True, these tests are implied by the encoding rules.  However, I like these asserts because, in my experience with other code, it catches memory-corruption/GC related bugs.  The assert also explains the optimization at hand.

> >+    JS_ALWAYS_INLINE
> >+    bool isStaticAtom() const {
> >+        return (d.lengthAndFlags & FLAGS_MASK) == STATIC_ATOM_FLAGS;
> >     }
> 
> Why did you add the Data struct -- is there a benefit I'm missing?  
> If not, remove it, it's just more typing -- all those "d." prefixes.  

Data is the element type of the static string arrays (instead of JSString).  This is an alternative to the different-but-isomorphic-struct idea.  Since use of "d." is necessarily only done by a few string members, the syntactic overhead seems minimal.

> >         /*
> >-         * flatChars for stillborn string is null, but cx->free checks
> >-         * for a null pointer on its own.
> >+         * Stillborn strings may have null chars, but cx->free checks for a null
> >+         * pointer on its own.
> >          */
> 
> What is a stillborn string?

GC'd before initialized.  Now that I actually think about this code, strings cannot be GC'd before initialized: all sites that call js_NewGC(Short)String init the result before any potential GC.  So, removing comment.

> >+template <class T>
> >+JS_ALWAYS_INLINE static bool
> >+PodEqual(T *one, T *two, size_t len)
> >+{
> >+    if (len < 128) {
> >+        T *p1end = one + len;
> >+        for (T *p1 = one, *p2 = two; p1 != p1end; ++p1, ++p2) {
> >+            if (*p1 != *p2)
> >+                return false;
> >+        }
> >+        return true;
> >+    }
> >+
> >+    return !memcmp(one, two, len * sizeof(T));
> >+}
> 
> Why not just call memcmp() straight off?

memcmp() overhead (call, unrolled loop setup) makes it slower for short loops.  This has been measured and confirmed several times, although I don't have a bug on hand.  You can see the same thing in PodCopy.
(In reply to comment #21)
> > 
> > Why did you add the Data struct -- is there a benefit I'm missing?  
> > If not, remove it, it's just more typing -- all those "d." prefixes.  
> 
> Data is the element type of the static string arrays (instead of JSString). 
> This is an alternative to the different-but-isomorphic-struct idea.  Since use
> of "d." is necessarily only done by a few string members, the syntactic
> overhead seems minimal.

Oh, that allows static initialization of JSStrings?  Nice.

Thanks for the other responses, it all sounds good.
http://hg.mozilla.org/tracemonkey/rev/0e8639e561f5
http://hg.mozilla.org/tracemonkey/rev/4290338c3956
http://hg.mozilla.org/tracemonkey/rev/6afe562a0cb8

hg rebase apparently decided I wanted to delete nsISSLStatusProvider.idl which I, in fact, did not want to do, so:
http://hg.mozilla.org/tracemonkey/rev/ebd7cc5d0daa

fix windows warnings:
http://hg.mozilla.org/tracemonkey/rev/41ed083faf4a

and then a workaround for an Apple gcc bug (filed bug 644250 for explanation and for a better fix):
http://hg.mozilla.org/tracemonkey/rev/663a3afb238e
Summary: Better documentation and minor clean-ups for JSString, JSShortString → Clean up JSString
Whiteboard: fixed-in-tracemonkey
Depends on: 646304
You need to log in before you can comment on or make changes to this bug.