Closed
Bug 613457
Opened 14 years ago
Closed 14 years ago
Clean up JSString
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: n.nethercote, Assigned: luke)
References
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(3 files, 4 obsolete files)
1.36 KB,
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
252.87 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
2.74 KB,
patch
|
sdwilsh
:
review+
|
Details | Diff | Splinter Review |
JSString is a bloody mess. I defy anyone to read its declaration in jsstr.h and proclaim confidently (a) which of the 16 possible combinations of types and flags are valid, and (b) which of the various fields (including both a union and two unions-within-a-union) are used for each of the valid type/flags combinations. And JSShortString just adds to the fun. We've had a lot of JSString-related crashes and problems lately (eg. bug 596988, bug 597886, bug 598695, bug 603554, bug 604756, bug 605852) and I think this complexity is a significant cause. This patch makes a start on fixing the mess by merging the separate 'type' and 'flags' fields into a single 'mode' (I deliberately used a new word because that made it easier to find all the places that needed changing). There are six modes: FLAT_VANILLA, FLAT_ATOMIZED, FLAT_EXTENSIBLE, DEPENDENT, ROPE_INTERIOR_NODE, ROPE_TOP_NODE. (Thus answering question (a) above, if we ignore JSShortString's abuses.) Someone will undoubtedly whine about my use of "vanilla", so I proffer "plain" as a back-up. The patch is a draft, I was hoping to get feedback as to whether people think this is a good idea, whether I should go further (by cleaning up the union fields as well) and whether that should be done as part of this bug or a follow-up (or multiple follow-ups). I wonder if introducing a sub-class for each of mode would help things.
Attachment #491799 -
Flags: feedback?(lw)
Reporter | ||
Updated•14 years ago
|
Attachment #491799 -
Flags: feedback?(bzbarsky)
Comment 1•14 years ago
|
||
Comment on attachment 491799 [details] [diff] [review] draft patch (against TM 57585:3ce2cfd22078) I'm probably the wrong person to ask here... but reducing the space of possible modes to the space of valid modes sounds good!
Attachment #491799 -
Flags: feedback?(bzbarsky) → feedback+
Assignee | ||
Comment 2•14 years ago
|
||
Comment on attachment 491799 [details] [diff] [review] draft patch (against TM 57585:3ce2cfd22078) Nice start. Since you are interested in doing this, I'd like to make some requests, mostly to the tune of "could JSString follow the direction of JSStackFrame?": Make the data members private and replace offsetof usage with static offsetOfX members. Right after 'public:', have a comment that says "these are the sorts of strings", and shortly describe each one. Then under that comment put the bool isSortX() queries for each sort. I called it 'sort' since 'mode' implies that a single frame may change its modes and 'type' is heavily overloaded. After the sort section, JSStackFrame has section for the initialization of stack frames into each mode. Then it seems like you could group the string accessors by the assumed string sort (separated by a comment block describing the sort in more detail). JSStackFrame grouped them by field, not required sort, since most fields applied to most sorts, but string is different in this manner -- its basically an Algebraic Datatype. Sorry for the dump, but you did ask for feedback ;-)
Attachment #491799 -
Flags: feedback?(lw) → feedback+
Reporter | ||
Comment 3•14 years ago
|
||
(In reply to comment #2) > > Sorry for the dump, but you did ask for feedback ;-) No need to apologize, it's all good suggestions. Except I don't much like "sort"... I read "string sort" in a sentence and I want to interpret it as talking about string sorting. But then strings, unlike JSStackFrame, can change mode so maybe that fits with what you said.
Assignee | ||
Comment 4•14 years ago
|
||
(In reply to comment #3) > But then strings, unlike JSStackFrame, can > change mode so maybe that fits with what you said. Ah, of course, mode sounds perfect then!
Reporter | ||
Comment 5•14 years ago
|
||
I've decided for now not to combine 'flags' and 'type' into 'mode', because I can imagine a larger future clean-up in which the various types of string actually become (C++) sub-classes, in which case 'flags' should belong to the FLAT sub-class. Instead, this patch greatly improves the documentation of JSString, and does some minor clean-ups of it and JSShortString. The following points are listed in the order you'll encounter them if reading the patch from top to bottom. Specifically, the patch: - Introduces the "flat plain" terminology to contrast with "flat atomized" and "flat extensible" strings. - Renames externalStringType as mExternal for consistency. - Removes dependentLength() and flatLength(); they're identical to length(). - Renames {TOP,INTERIOR}_NODE as ROPE_{TOP,INTERIOR}_NODE for clarity; likewise with related function names. - Renamed mLengthAndFlags as mLengthFlagsType, which is more accurate. - Introduces names for the hacky constants used in flatten(), and asserts that mChar and mLeft alias, which is crucial to flatten()'s operation. - Removes finishTraversalConversion(), it's very similar to initDependent(), and using initDependent() is clearer. - Adds the LENGTH_FLAGS_TYPE macro which makes setting mLengthFlagsType simpler. - Greatly expands the comment at the top of JSString. I'm pretty sure it covers all the cases now. Likewise, the uses of the fields are now much clearer. - Moves mBase into the union with mCapacity, mParent, mBufferWithInfo; it's never used concurrently with any of them. - Renames hasFlag() and hasBit() because it's used to match bits in both the 'flags' and the 'type' sub-fields. - Tightens/adds some assertions in some accessor functions. - In the various initFoo() functions, now only the fields used by the relevant string type are initialized. - Removes initShortString(), it ended up being identical to initFlatPlain(). - Removes flatSetExtensible(), which is dead. - JSShortString consists of a JSString immediately followed by some extra space in which to hold additional chars. Prior to this patch this extra space was itself another JSString. But there's no need for that, and it's a bit confusing. So this patch changes that space to a jschar array and adds comments to make things clearer. This patch should go in after bug 609440 is done; it'll need some minor changes then.
Attachment #491799 -
Attachment is obsolete: true
Reporter | ||
Updated•14 years ago
|
Summary: clean up JSString → Better documentation and minor clean-ups for JSString, JSShortString
Reporter | ||
Comment 6•14 years ago
|
||
This is similar to v2 of the patch, but updated for all the recent changes.
Attachment #492878 -
Attachment is obsolete: true
Attachment #516806 -
Flags: review?(luke)
Assignee | ||
Comment 7•14 years ago
|
||
Sorry for taking so long to get back to you. The patch is a nice improvement. As I was reading the big string comment, I liked how it was breaking things out into clean orthogonal categories. However, due to the tricky nature of our strings, there are still a few things that aren't quite orthogonalized in the comment or in the C++ interface. I started thinking about if there was any way to take it further, and I came to a conclusion we could use and extend the hierarchy of JSString/JSFlatString/JSLinearString/JSAtom and use that as the central way of describing/understanding strings. Basically, what I'd like is for one ASCII-art diagram to cover all the relevant externally-facing string cases and, unless I'm missing one, I think this one achieves that: /* * C++ type Supported operations / properties / invariants * * JSString length * | \ fallible operation to get an array of chars * | \ * | JSRope left and right child strings * | fallible conversion to JSFlatString * | * JSLinearString (infallible) array of chars (not null-terminated) * | \ * | JSDependentString base string * | fallible conversion to JSFlatString * | * JSFlatString chars[length] == 0 * | \ \ * | \ \ external string GC type * | \ JSExternalString string header on GC heap, but char array allocation * | \ managed by embedder * | \ * | JSShortString chars stored in string header, allocated on the * | GC heap * | * JSAtom equality between atoms can be tested by testing * | \ identity of JSAtom* * | \ * | JSShortAtom atomized JSShortString * | * JSStaticString short atom stored in static memory (not GC heap) */ The new type JS{DependentString,Rope} classes would allow us to pull rope/dependent stuff out of JSString. (Btw, 'rope' is a kindof a misnomer for ropes that got borrowed from JSC. I don't think it'd be too much work to choose a different name. I like "class LazyConcat", "isLazyConcat", etc.) Second, the initX() members could also be hoisted into static member new_() functions of the X types (with the exception being init functions called on existing strings which we can rename (and should be private members since they are only called from flatten/undepend). Then, all JSString would contain is (in addition to length/getChars) a set of isX()/asX()/morphToX() members (with 'morph' regularizing the names flatten/undepend) for moving down the hierarchy. Then, as a separate comment near the private data section of JSString, we could describe the nitty gritty representation details of each type and we distinguish them (or, as you pointed out in the case of short/external strings, don't distinguish them). Lastly (as discussed somewhere that escapes me) it would be nice to make JSStaticString not actually derive JSString (but rather have a named conversion) so that we could avoid the tyranny of struct initialization rues and (1) make string private data private, (2) be more free to tweak the private data decls, and (3) give strings constructors. Sorry again for the big dump but the quality of your comment in the patch got me thinking. If you'd rather just have me do it, feel free to assign the bug to me; I also really want to see strings cleaned up and documented.
Assignee | ||
Comment 8•14 years ago
|
||
(In reply to comment #7) > * | \ JSExternalString string header on GC heap, but char array > allocation Oh come on, fit in exactly 80 chars!
Reporter | ||
Comment 9•14 years ago
|
||
(In reply to comment #7) > > Lastly (as discussed somewhere that escapes me) it would be nice to make > JSStaticString not actually derive JSString (but rather have a named > conversion) so that we could avoid the tyranny of struct initialization rues > and (1) make string private data private, (2) be more free to tweak the private > data decls, and (3) give strings constructors. That's bug 614459, which depends on this one :) > Sorry again for the big dump but the quality of your comment in the patch got > me thinking. If you'd rather just have me do it, feel free to assign the bug > to me; I also really want to see strings cleaned up and documented. That sounds easiest, you know exactly what you want and it saves me misinterpreting what you wrote in comment 7 :) I'm happy to review. Your suggestions all sound pretty good to me. Though I'd keep "Rope" -- aren't we using it in the standard way? (http://en.wikipedia.org/wiki/Rope_%28computer_science%29) Oh, one other thing: finishTraversalConversion() really annoys me. We could inline it, but really we should just use initDependent() there.
Assignee: nnethercote → luke
Reporter | ||
Updated•14 years ago
|
Attachment #516806 -
Flags: review?(luke)
Assignee | ||
Comment 10•14 years ago
|
||
(In reply to comment #9) > Your suggestions all sound pretty good to me. Though I'd keep "Rope" -- aren't > we using it in the standard way? > (http://en.wikipedia.org/wiki/Rope_%28computer_science%29) Hmmm, maybe you're right. What I was getting at is that all our ropes do is represent a lazy concatenation -- as soon as you want to do anything you do the concatenation (less so after bug 615290) -- whereas general ropes are meant to be the object of manipulation: operations on ropes produce ropes. But, now that I say that, the difference is a bit subtle. > Oh, one other thing: finishTraversalConversion() really annoys me. We could > inline it, but really we should just use initDependent() there. Can do (inline); its really a private detail of strings and the conversion routine.
Assignee | ||
Comment 11•14 years ago
|
||
While I was here and confused anyway...
Attachment #519301 -
Flags: review?(dvander)
Assignee | ||
Comment 12•14 years ago
|
||
...and finally the big cleanup patch that spawned all these smaller patches.
Attachment #516806 -
Attachment is obsolete: true
Attachment #519304 -
Flags: review?(nnethercote)
Assignee | ||
Comment 13•14 years ago
|
||
Remove overzealous assert, factor accounting logic into JS*String::new_, fix bug in JSExternalString::_new (s/js_NewGCString/js_NewGCExternalString/).
Attachment #519304 -
Attachment is obsolete: true
Attachment #519528 -
Flags: review?(nnethercote)
Attachment #519304 -
Flags: review?(nnethercote)
Assignee | ||
Comment 14•14 years ago
|
||
While I was investigating said overzealous assert: use *ById functions in storage to avoid atomizing atoms. (Re: the removal of PRUnichar cast, bug 578340 sync'd em :)
Attachment #519530 -
Flags: review?(sdwilsh)
Assignee | ||
Comment 15•14 years ago
|
||
Green on try.
Comment 16•14 years ago
|
||
Comment on attachment 519530 [details] [diff] [review] use *ById APIs in storage >+++ b/storage/src/mozStorageStatementParams.cpp >@@ -213,23 +213,22 @@ StatementParams::NewResolve(nsIXPConnect > else if (JSID_IS_STRING(aId)) { > JSString *str = JSID_TO_STRING(aId); > size_t nameLength; > const jschar *nameChars = JS_GetStringCharsAndLength(aCtx, str, &nameLength); > NS_ENSURE_TRUE(nameChars, NS_ERROR_UNEXPECTED); > > // Check to see if there's a parameter with this name, and if not, let > // the rest of the prototype chain be checked. >- NS_ConvertUTF16toUTF8 name(reinterpret_cast<const PRUnichar *>(nameChars), >- nameLength); >+ NS_ConvertUTF16toUTF8 name(nameChars, nameLength); The reinterpret_cast is actually important here for compiling on mingw (bug 505734) r=sdwilsh
Attachment #519530 -
Flags: review?(sdwilsh) → review+
Assignee | ||
Comment 17•14 years ago
|
||
(In reply to comment #16) Shawn, did you see bug 578340 that I referenced in comment 14? That bug was targeted specifically at mingw type equivalence.
Comment 18•14 years ago
|
||
(In reply to comment #17) > (In reply to comment #16) > Shawn, did you see bug 578340 that I referenced in comment 14? That bug was > targeted specifically at mingw type equivalence. Nope; I neglected to read the bug! You can ignore my comment then.
Reporter | ||
Comment 19•14 years ago
|
||
Comment on attachment 519528 [details] [diff] [review] cleanup patch Nice cleanup. Lots of comments below, but that's because it's a big patch. r=me with them addressed. BTW, you've fixed bug 614459 (privatize fields in JSString) as part of this change, cool! >-#define STRING_TO_ATOM(str) (JS_ASSERT(str->isAtom()), \ >- (JSAtom *)str) >-#define ATOM_TO_STRING(atom) (atom) >-#define ATOM_TO_JSVAL(atom) STRING_TO_JSVAL(ATOM_TO_STRING(atom)) Good riddance to those! >diff --git a/js/src/jsatominlines.h b/js/src/jsatominlines.h >--- a/js/src/jsatominlines.h >+++ b/js/src/jsatominlines.h >@@ -51,23 +51,23 @@ js_ValueToAtom(JSContext *cx, const js:: > { > JSString *str; > JSAtom *atom; > > /* > * Optimize for the common case where v is an already-atomized string. The > * comment in jsstr.h before JSString::flatSetAtomized explains why this is > * thread-safe. The extra rooting via lastAtom (which would otherwise be >- * done in js_js_AtomizeString) ensures the caller that the resulting id at >+ * done in js_AtomizeString) ensures the caller that the resulting id at > * is least weakly rooted. "resulting id at is least"? Reword. >diff --git a/js/src/json.cpp b/js/src/json.cpp >--- a/js/src/json.cpp >+++ b/js/src/json.cpp >@@ -59,16 +59,17 @@ > #include "jsatominlines.h" > #include "jsobjinlines.h" >+#include "jsstrinlines.h" > >@@ -1257,17 +1258,17 @@ js_ConsumeJSONText(JSContext *cx, JSONPa > #if JS_HAS_TOSOURCE > static JSBool > json_toSource(JSContext *cx, uintN argc, Value *vp) > { >- vp->setString(ATOM_TO_STRING(CLASS_ATOM(cx, JSON))); >+ vp->setString(CLASS_ATOM(cx, JSON)); > return JS_TRUE; > } > #endif If that's the only change to this file, it looks wrong that jsstrinlines.h must be included. Can you double check? >+JSFlatString * >+JSRope::flatten(JSContext *maybecx) > { >- JS_ASSERT(isRope()); >- > /* > * Perform a depth-first dag traversal, splatting each node's characters > * into a contiguous buffer. Visit each rope node three times: > * 1. record position in the buffer and recurse into left child; > * 2. recurse into the right child; > * 3. transform the node into a dependent string. > * To avoid maintaining a stack, tree nodes are mutated to indicate how > * many times they have been visited. Since ropes can be dags, a node may > * be encountered multiple times during traversal. However, step 3 above >- * leaves a valid dependent string, so everythings works out. This >+ * leaves a valid dependent string, so everythings works out. This s/everythings/everything/ > if (str->isAtom()) { >- atom = STRING_TO_ATOM(str); >+ atom = &str->asAtom(); +1 for killing STRING_TO_ATOM and replacing it with something checked. >-static JS_ALWAYS_INLINE JSFlatString * >+static JS_ALWAYS_INLINE JSFixedString * > NewShortString(JSContext *cx, const jschar *chars, size_t length) > >-static JSFlatString * >+static JSFixedString * > NewShortString(JSContext *cx, const char *chars, size_t length) Why isn't the return type JSShortString? > JSLinearString * >-js_NewDependentString(JSContext *cx, JSString *baseArg, size_t start, >- size_t length) >+js_NewDependentString(JSContext *cx, JSString *baseArg, size_t start, size_t length) Similarly, why isn't the return type JSDependentString? >@@ -3854,46 +3873,38 @@ EqualStrings(JSContext *cx, JSString *st > } > > size_t length1 = str1->length(); > if (length1 != str2->length()) { > *result = false; > return true; > } > >- if (length1 == 0) { >- *result = true; >- return true; >- } Do you know that this saves some instructions, or are you assuming it doesn't help much in practice? > bool > EqualStrings(JSLinearString *str1, JSLinearString *str2) > { > if (str1 == str2) > return true; > > size_t length1 = str1->length(); > if (length1 != str2->length()) > return false; > >- if (length1 == 0) >- return true; Same question here. >+ * JS strings This is a really nice comment. It's also really important, so please excuse the close nit-picking below. >+ * Conceptually, a JS string is just an array of chars and a length. To improve >+ * performance of common string operations, the following optimizations are >+ * made which affect the engine's representation of strings: Flat strings should be mentioned first in this list, as the vanilla case. And the header/chars separation is important (esp. for explaining inline short strings), and worth mentioning up front. >+ * - To avoid copying a substring of an existing "base" string , a "dependent" >+ * string (JSDependentString) can be created which points into the base >+ * string's char array. >+ * >+ * - To avoid eager copying, a "rope" node (JSRope) can be created to represent >+ * a delayed string concatenation. This concatenation (called flattening) is >+ * performed if and when a linear char array is requested. This doesn't explain what a rope is, ie. a binary tree with internal nodes that only contain pointers to children, and leaf nodes that are flat strings. Please add this info. >+ * - To avoid copying the left-hand side when flattening, the left-hand side's >+ * string's buffer may be grown to make space for a copy of the right-hand "left-hand side's string's" reads badly. "left-hand side's buffer"? >+ * - To avoid allocating small char arrays, short strings can be stored inline >+ * in the string header. To increase the max size of such inline strings, >+ * double-wide string headers (JSShortString) can be used. >+ * >+ * - To avoid dynamic creation of common very-short strings, string headers and >+ * their char arrays are allocated in static memory (JSStaticString). That is unclear about exactly which strings are static. Maybe: "To avoid dynamic creation of certain common very-short strings (eg. all single-letter strings such as "a", "b", "c", etc), the headers and char arrays of these strings are pre-allocated in static memory (JSStaticString)." >+ * C++ type operations / fields / invariants >+ * >+ * JSString (abstract) 'length' field >+ * | \ (fallible) 'getCharsZ'/'getChars' operations >+ * | \ >+ * | JSRope 'leftChild' and 'rightChild' fields >+ * | >+ * JSLinearString (abstract) (not null terminated) 'chars' field >+ * | \ >+ * | JSDependentString 'base' field >+ * | >+ * JSFlatString (abstract) (null terminated) 'chars' field >+ * | 'extensible' field >+ * | >+ * JSFixedString !extensible >+ * | \ \ >+ * | \ JSShortString chars stored in header >+ * | \ >+ * | JSExternalString 'external string GC type' field >+ * | >+ * JSAtom string equality === pointer equality >+ * | \ >+ * | JSShortAtom atomized JSShortString >+ * | >+ * JSAtom (static) header and chars statically allocated This picture is really good, but I find the RHS hard to follow. You have ops/fields/invariants as the header, but don't follow that layout in most of the entries. Here's how I'd rewrite it: >+ * C++ type operations / fields / invariants+properties >+ * >+ * JSString (abstract) (fallible) 'getCharsZ', 'getChars' / 'length' / - >+ * | \ >+ * | \ >+ * | JSRope - / 'leftChild', 'rightChild' / - >+ * | >+ * JSLinearString (abstract) - / 'chars' / not null-terminated >+ * | \ >+ * | JSDependentString - / 'base' / - >+ * | >+ * JSFlatString (abstract) - / 'chars', 'extensible' / not null-terminated >+ * | >+ * JSFixedString - / - / not extensible >+ * | \ \ >+ * | \ JSShortString - / - / chars stored in header >+ * | \ >+ * | JSExternalString - / 'externalStringType' field / - >+ * | >+ * JSAtom - / - / string equality === pointer equality >+ * | \ >+ * | JSShortAtom - / - / atomized JSShortString >+ * | >+ * JSAtom (static) - / - / header and chars statically allocated Note that I renamed "invariants" as "invariants+properties". Are all static strings atoms? If so, that should be mentioned in the string descriptions above. It's weird having "JSAtom" and "JSAtom (static)" as separate nodes in the graph. Should it be "JSStaticAtom"? Can a JSShortAtom be static? In JSFlatString, what is the 'extensible' field? Do you mean a flag? You haven't mentioned the flag settings for the other string kinds, so it feels out of place. JSShortAtom doesn't exist as an actual class. Is that intended? It's confusing. Most operations aren't mentioned. A couple of examples: JSLinearString::{chars,mark}, JSLinearString::undepend. In fact, JSString is the only entry with operations listed. Maybe just omit that column? The fields are actually all getter methods, AFAICT. This confused me for a bit. Clarify? >+ * Classes marked with (abstract) above are not literally C++ Abstract Base >+ * Classes (since there are no virtual functions, pure or not, in this >+ * hierarchy), but have the same meaning: there are no instances with this Some text got cut here? >+ public: >+ /* Flags exposed only for jits */ Can you make the relevant JIT classes friends and keep these members private? >+ /* >+ * Instead of using a dense index to represent the most-derived type, string >+ * types are encoded to allow single-op tests for hot queries (isRope, >+ * isDependent, isFlat, isAtom, isStaticAtom): >+ * >+ * JSRope xxx1 >+ * JSLinearString xxx0 >+ * JSDependentString xx1x >+ * JSFlatString xx00 >+ * JSFlatString && isExtensible 1100 >+ * JSFixedString xy00 where xy != 11 >+ * JSShortString 0100 and in FINALIZE_SHORT_STRING arena >+ * JSExternalString 0100 and in FINALIZE_EXTERNAL_STRING arena >+ * JSAtom x000 >+ * JSAtom (static) 0000 >+ * >+ * NB: this scheme takes advantage of the fact that there are no string >+ * instances whose most-derived type is JSString, JSLinearString, or >+ * JSFlatString. >+ */ Lovely comment! But we seem to have lost the information about the layout of lengthAndFlags, ie. that the length is the high 28 bits and the flags are the low 4 bits. You should mention that. >+ JS_ALWAYS_INLINE >+ bool isRope() const { >+ bool rope = d.lengthAndFlags & ROPE_BIT; >+ JS_ASSERT_IF(rope, (d.lengthAndFlags & FLAGS_MASK) == ROPE_BIT); > return rope; > } The assertion doesn't seem necessary. Likewise in isDependent(). >+ JS_ALWAYS_INLINE >+ bool isStaticAtom() const { >+ return (d.lengthAndFlags & FLAGS_MASK) == STATIC_ATOM_FLAGS; > } Why did you add the Data struct -- is there a benefit I'm missing? If not, remove it, it's just more typing -- all those "d." prefixes. >+/* See JSString comment */ >+class JSRope : public JSString > { These "See JSString comment" comments aren't necessary, here and below. >+class JSShortString : public JSFixedString > { >- JSString mHeader; >- JSString mDummy; >+ jschar inlineStorageExtension[sizeof(JSString::Data) / sizeof(jschar)]; There's no reason why inlineStorageExtension must have the same length as sizeof(JSString::Data), and in my patch I made this clear. Can you introduce a new constant (which can be given the value |sizeof(JSString::Data)|), and say something like "the length of this is arbitrary, except it must be a multiple of 8 for GC purposes". >+ static const size_t MAX_SHORT_LENGTH = JSString::NUM_INLINE_CHARS + >+ sizeof(JSString::Data) / sizeof(jschar) >+ -1 /* null terminator */; That new constant can be used as the middle term in this addition. >+class JSAtom : public JSFixedString >+{ >+ public: >+ /* Exposed only for jits. */ Again, use friend classes? >+JS_ALWAYS_INLINE JSFixedString * >+JSString::ensureFixed(JSContext *cx) >+{ >+ if (!ensureFlat(cx)) >+ return NULL; >+ if (isFlatAndExtensible()) { >+ JS_ASSERT((d.lengthAndFlags & FLAT_MASK) == 0); This assertion seems unnecessary. > /* >- * flatChars for stillborn string is null, but cx->free checks >- * for a null pointer on its own. >+ * Stillborn strings may have null chars, but cx->free checks for a null >+ * pointer on its own. > */ What is a stillborn string? >+template <class T> >+JS_ALWAYS_INLINE static bool >+PodEqual(T *one, T *two, size_t len) >+{ >+ if (len < 128) { >+ T *p1end = one + len; >+ for (T *p1 = one, *p2 = two; p1 != p1end; ++p1, ++p2) { >+ if (*p1 != *p2) >+ return false; >+ } >+ return true; >+ } >+ >+ return !memcmp(one, two, len * sizeof(T)); >+} Why not just call memcmp() straight off?
Attachment #519528 -
Flags: review?(nnethercote) → review+
Updated•14 years ago
|
Attachment #519301 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 21•14 years ago
|
||
(In reply to comment #19) Thanks for taking the time to dig into this! There are lots of great comments. > >@@ -51,23 +51,23 @@ js_ValueToAtom(JSContext *cx, const js:: > "resulting id at is least"? Reword. Your comment led me to find that this whole comment is out of date and find yet another one of these preexisting JSString/chars rooting errors. > >@@ -3854,46 +3873,38 @@ EqualStrings(JSContext *cx, JSString *st > >- if (length1 == 0) { > >- *result = true; > >- return true; > >- } > > EqualStrings(JSLinearString *str1, JSLinearString *str2) > Same question here. My thinking was: extra code must justify its existence, and previously this code may have (avoiding a call to memcmp), but given that a length 0 string will immediately branch out (inline), it doesn't now. I should mention that I wasn't able to measure any SS or V8 changes for this whole patch queue. > This picture is really good, but I find the RHS hard to follow. You have > ops/fields/invariants as the header, but don't follow that layout in most of > the entries. Here's how I'd rewrite it: Sold! (with tweaks as motivated below) > It's weird having "JSAtom" and "JSAtom (static)" as separate nodes in the > graph. Should it be "JSStaticAtom"? Can a JSShortAtom be static? Ah, thanks, this one was in flux and I forgot to resolve it (you can even see JSStaticString lingering in the comments). Yeah, I'll make an empty JSStaticAtom subtype for regularity. At the moment, all static atoms are non-short. > In JSFlatString, what is the 'extensible' field? Do you mean a flag? You > haven't mentioned the flag settings for the other string kinds, so it feels > out of place. Yeah, I don't know why I didn't just make JSExtensibleString. Fixed. > JSShortAtom doesn't exist as an actual class. Is that intended? It's > confusing. Well, I was lazy; its not used anywhere and has no operations. But if it bugs you then it will probably bug others so I should add it for consistency with a comment as to why its not explicitly mentioned elsewhere. It also adds a fun diamond in the is-a graph. > Most operations aren't mentioned. A couple of examples: > JSLinearString::{chars,mark}, JSLinearString::undepend. In fact, JSString > is the only entry with operations listed. Maybe just omit that column? > > The fields are actually all getter methods, AFAICT. This confused me for a > bit. Clarify? Well, most operations are impl details that don't help with the high-level understanding, which is the point of the diagram. Incidentally, undepend is a private operation, so it definitely doesn't belong. JSLinearString::chars() is mentioned in the diagram (as a "field", since its infallible, so, effectively, a field). But your observation makes me want to go from three columns to two (operations+fields / invariants+properties). > >+ public: > >+ /* Flags exposed only for jits */ > > Can you make the relevant JIT classes friends and keep these members private? Well, friendship with jits is scary because then you are saying "the jit can do anything" when you really want to make a more restricted statement like "the jit needs to know the offset of X and the value of flag Y". Also there's not a single class to cover the mjit. I tried doing this through various static public member functions but it ultimately didn't seem any cleaner and it was more syntactic noise. > >+ JS_ALWAYS_INLINE > >+ bool isRope() const { > >+ bool rope = d.lengthAndFlags & ROPE_BIT; > >+ JS_ASSERT_IF(rope, (d.lengthAndFlags & FLAGS_MASK) == ROPE_BIT); > > return rope; > > } > > The assertion doesn't seem necessary. Likewise in isDependent(). True, these tests are implied by the encoding rules. However, I like these asserts because, in my experience with other code, it catches memory-corruption/GC related bugs. The assert also explains the optimization at hand. > >+ JS_ALWAYS_INLINE > >+ bool isStaticAtom() const { > >+ return (d.lengthAndFlags & FLAGS_MASK) == STATIC_ATOM_FLAGS; > > } > > Why did you add the Data struct -- is there a benefit I'm missing? > If not, remove it, it's just more typing -- all those "d." prefixes. Data is the element type of the static string arrays (instead of JSString). This is an alternative to the different-but-isomorphic-struct idea. Since use of "d." is necessarily only done by a few string members, the syntactic overhead seems minimal. > > /* > >- * flatChars for stillborn string is null, but cx->free checks > >- * for a null pointer on its own. > >+ * Stillborn strings may have null chars, but cx->free checks for a null > >+ * pointer on its own. > > */ > > What is a stillborn string? GC'd before initialized. Now that I actually think about this code, strings cannot be GC'd before initialized: all sites that call js_NewGC(Short)String init the result before any potential GC. So, removing comment. > >+template <class T> > >+JS_ALWAYS_INLINE static bool > >+PodEqual(T *one, T *two, size_t len) > >+{ > >+ if (len < 128) { > >+ T *p1end = one + len; > >+ for (T *p1 = one, *p2 = two; p1 != p1end; ++p1, ++p2) { > >+ if (*p1 != *p2) > >+ return false; > >+ } > >+ return true; > >+ } > >+ > >+ return !memcmp(one, two, len * sizeof(T)); > >+} > > Why not just call memcmp() straight off? memcmp() overhead (call, unrolled loop setup) makes it slower for short loops. This has been measured and confirmed several times, although I don't have a bug on hand. You can see the same thing in PodCopy.
Reporter | ||
Comment 22•14 years ago
|
||
(In reply to comment #21) > > > > Why did you add the Data struct -- is there a benefit I'm missing? > > If not, remove it, it's just more typing -- all those "d." prefixes. > > Data is the element type of the static string arrays (instead of JSString). > This is an alternative to the different-but-isomorphic-struct idea. Since use > of "d." is necessarily only done by a few string members, the syntactic > overhead seems minimal. Oh, that allows static initialization of JSStrings? Nice. Thanks for the other responses, it all sounds good.
Assignee | ||
Comment 23•14 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/0e8639e561f5 http://hg.mozilla.org/tracemonkey/rev/4290338c3956 http://hg.mozilla.org/tracemonkey/rev/6afe562a0cb8 hg rebase apparently decided I wanted to delete nsISSLStatusProvider.idl which I, in fact, did not want to do, so: http://hg.mozilla.org/tracemonkey/rev/ebd7cc5d0daa fix windows warnings: http://hg.mozilla.org/tracemonkey/rev/41ed083faf4a and then a workaround for an Apple gcc bug (filed bug 644250 for explanation and for a better fix): http://hg.mozilla.org/tracemonkey/rev/663a3afb238e
Summary: Better documentation and minor clean-ups for JSString, JSShortString → Clean up JSString
Whiteboard: fixed-in-tracemonkey
Comment 24•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/0e8639e561f5 http://hg.mozilla.org/mozilla-central/rev/4290338c3956 http://hg.mozilla.org/mozilla-central/rev/6afe562a0cb8 http://hg.mozilla.org/mozilla-central/rev/ebd7cc5d0daa http://hg.mozilla.org/mozilla-central/rev/41ed083faf4a http://hg.mozilla.org/mozilla-central/rev/663a3afb238e
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•