Last Comment Bug 609440 - make JSString::chars / JS_GetStringChars fallible
: make JSString::chars / JS_GetStringChars fallible
Status: RESOLVED FIXED
fixed-in-tracemonkey
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal with 2 votes (vote)
: ---
Assigned To: Luke Wagner [:luke]
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on: 607292 621464 627685
Blocks: 593659 551077 JaegerSpeed 608782 613457
  Show dependency treegraph
 
Reported: 2010-11-03 16:45 PDT by Luke Wagner [:luke]
Modified: 2012-12-31 03:16 PST (History)
23 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
betaN+


Attachments
don't need atomic ops in JSString anymore (?) (2.36 KB, patch)
2010-11-30 18:27 PST, Luke Wagner [:luke]
igor: review+
Details | Diff | Splinter Review
ropes redux, asserting we don't oom (73.94 KB, patch)
2010-11-30 18:35 PST, Luke Wagner [:luke]
n.nethercote: review+
Details | Diff | Splinter Review
remove JS_GetStringChars, get mozilla compiling with fallible alternatives (18.79 KB, patch)
2010-12-03 01:33 PST, Luke Wagner [:luke]
no flags Details | Diff | Splinter Review
just JSAPI interface changes for review (4.16 KB, patch)
2010-12-03 01:36 PST, Luke Wagner [:luke]
no flags Details | Diff | Splinter Review
just JSAPI interface changes for review (v.2) (4.95 KB, patch)
2010-12-03 17:19 PST, Luke Wagner [:luke]
brendan: review+
Details | Diff | Splinter Review
union of patches, almost done (509.69 KB, patch)
2010-12-10 18:10 PST, Luke Wagner [:luke]
no flags Details | Diff | Splinter Review
fallibile-ize JSString use in ctypes (19.69 KB, patch)
2010-12-13 17:27 PST, Luke Wagner [:luke]
dwitte: review+
Details | Diff | Splinter Review
web worker changes (10.99 KB, patch)
2010-12-13 17:56 PST, Luke Wagner [:luke]
bent.mozilla: review+
Details | Diff | Splinter Review
qsgen.py changes (2.83 KB, patch)
2010-12-13 17:56 PST, Luke Wagner [:luke]
jorendorff: review+
Details | Diff | Splinter Review
dom / content / NPAPI changes (52.60 KB, patch)
2010-12-13 17:57 PST, Luke Wagner [:luke]
jst: review+
Details | Diff | Splinter Review
xpconnect changes (46.23 KB, patch)
2010-12-13 17:57 PST, Luke Wagner [:luke]
mrbkap: review+
Details | Diff | Splinter Review
storage / idb changes (14.33 KB, patch)
2010-12-13 17:57 PST, Luke Wagner [:luke]
sdwilsh: review+
Details | Diff | Splinter Review
JSAPI (tests, impl) changes (in view of brendan's API r+) (34.99 KB, patch)
2010-12-13 17:58 PST, Luke Wagner [:luke]
jwalden+bmo: review+
Details | Diff | Splinter Review
regexp changes (10.66 KB, patch)
2010-12-13 18:08 PST, Luke Wagner [:luke]
cdleary: review+
Details | Diff | Splinter Review
tracer / mjit changes (41.37 KB, patch)
2010-12-13 18:09 PST, Luke Wagner [:luke]
no flags Details | Diff | Splinter Review
xml changes (66.94 KB, patch)
2010-12-13 18:09 PST, Luke Wagner [:luke]
igor: review-
Details | Diff | Splinter Review
jsstr changes (83.45 KB, patch)
2010-12-13 18:10 PST, Luke Wagner [:luke]
n.nethercote: review+
Details | Diff | Splinter Review
json changes (2.46 KB, patch)
2010-12-13 18:10 PST, Luke Wagner [:luke]
sayrer: review+
Details | Diff | Splinter Review
various changes (91.85 KB, patch)
2010-12-13 18:10 PST, Luke Wagner [:luke]
jwalden+bmo: review+
Details | Diff | Splinter Review
xml changes v.2 (87.38 KB, patch)
2010-12-14 10:23 PST, Luke Wagner [:luke]
igor: review+
Details | Diff | Splinter Review
tracer / mjit changes v.2 (38.12 KB, patch)
2010-12-14 11:45 PST, Luke Wagner [:luke]
dvander: review+
Details | Diff | Splinter Review

Description Luke Wagner [:luke] 2010-11-03 16:45:13 PDT
Currently, to keep JSString::chars() infallible, js_ConcatStrings eagerly malloc's the memory needed by JSString::flatten().  Bug 608776 measured a 2% SS speedup by (fallibly) malloc'ing in flatten() instead.  Bug 608776 also has a patch for an obscenely large ballast against oom which is not pretty, can waste memory, and doesn't achieve the full 2% speedup.

An alternative is to bite the bullet and make chars() fallible (take a cx, possibly return null).  This also involves breaking JSAPI to make JS_GetStringChars fallible, which has been a long-standing API sore spot (judging by bug 373152).  Since JS_GetStringBytes is going away, it seems like a good time to make the change.  If no immediate disagreement, I can ping the newsgroup.
Comment 1 Boris Zbarsky [:bz] (still a bit busy) 2010-11-03 18:59:56 PDT
Do we have any idea what the performance impact of having to null-check JS_GetStringChars everywhere would be, both within the js engine and outside?
Comment 2 Luke Wagner [:luke] 2010-11-03 20:49:00 PDT
We will after I write the patch ;-)

I have been repeatedly impressed, though, by how cheap/free well-predicted branches are.
Comment 3 Boris Zbarsky [:bz] (still a bit busy) 2010-11-03 21:02:26 PDT
Well, except for their icache effects, right?
Comment 4 Luke Wagner [:luke] 2010-11-03 23:39:25 PDT
(In reply to comment #3)
> Well, except for their icache effects, right?

Sure, icache density goes down, but assuming the failure branch is "return NS_SOME_ERROR", it shouldn't hurt too much.  And, as always, on Windows PGO should be moving all these error paths out of the fast path.
Comment 5 Igor Bukanov 2010-11-04 01:43:10 PDT
(In reply to comment #0)
> 
> An alternative is to bite the bullet and make chars() fallible (take a cx,
> possibly return null).

IMO we should also consider infallible API that would allow not to flatten the string. Something like a char iterator that is either initialized with a pointer to the char array or to some support structure to work over the rope. Its next char will look then like:

if (cursor != charEnd) { ++cursor; }
else { doRopeNextChar(); }

That should avoid a penalty for the non-rope case.
Comment 6 Luke Wagner [:luke] 2010-11-04 10:47:34 PDT
(In reply to comment #5)
> IMO we should also consider infallible API that would allow not to flatten the
> string. Something like a char iterator that is either initialized with a
> pointer to the char array or to some support structure to work over the rope.

In the abstract, I totally agree; instead of passing around String-types and jschar*'s, we should pass around types modelling the Range/Iterator/View concepts so that we can decouple algorithms from data structure.  Actually, bhackett and I were discussing yesterday how most (hot) uses of str->chars() could be made to avoid flattening by iterating instead, and that was the plan, e.g., for js_StringIsIndex.

Scanning through mxr hits for JS_GetStringChars, though, iterator-ification would require an enormous effort to do completely.  On the bright side, probably 1/3 of JS_GetStringChars uses could easily be removed entirely by using the JS_*ById variant.  Another 1/4 are calls to NS_ConvertUTF16toUTF8, which it seems like could be iterator-ified.  However, there are a bunch of places that pass the jschar* to some API expecting jschar*/PRUnichar*.  E.g., probably 1/4 of the calls shove the results into an nsDependentString/nsString.

So then a more detailed proposal would be to:
 - make rope char iterator and try to use it throughout SM
 - for the remaining (cold) uses in SM, change infallible "chars()" to fallible "flattenChars(cx)"
 - change as many JS_GetStringChars to use JS_*ById APIs as I can
 - add a version of NS_ConvertUTF16toUTF8 that doesn't need JS_GetStringChars
 - make JS_GetStringChars fallible, change remaining callers
Comment 7 Boris Zbarsky [:bz] (still a bit busy) 2010-11-04 12:07:21 PDT
For what it's worth, xpcom strings used to support such an iterator pattern, multifragment strings, etc.

We removed it all because it led to over-complicated code that was much slower than just indexing into an array and because in practice we ended up with most strings being flat....

Maybe the tradeoffs are different here, but it's worth talking to Benjamin about our XPCOM string experience, especially because I still think we should have a unified storage strategy for strings for JS and XPCOM.  That would let us stop copying at the JS-to-XPCOM boundary in a bunch of cases where we copy now, but it seems like that would almost certainly require flattening at the boundary, right?
Comment 8 Benjamin Smedberg [:bsmedberg] 2010-11-04 12:45:33 PDT
The XPCOM iterator pattern performed horribly partly because it relied on calling through vtables and such, and then the API iterated by character, instead of by fragment, which meant common code optimizations were impossible.

I'm ambivalent, although the swinging back and forth is unfortunate!
Comment 9 Luke Wagner [:luke] 2010-11-04 14:18:48 PDT
(In reply to comment #8)
> I'm ambivalent, although the swinging back and forth is unfortunate!

If it eases your mind, this isn't quite swinging back: in addition to, as you mentioned, not being polymorphic beasts, (despite the name...) ropes are primarily intended to delay concatenation, not  act as a normal/steady-state alternative representation of strings.  In particular, string flatten when observed and it is only special cases that operate on ropes as ropes (for large potential speedup, of course).  Perhaps you are thinking more long term and worrying that JS strings will mutate into old XPCOM strings, but I can't imagine it getting anywhere close without regressing the benchmarks we monitor.
Comment 10 Boris Zbarsky [:bz] (still a bit busy) 2010-11-04 18:27:59 PDT
Ah, perhaps I misread comment 5.  It seemed to be proposing that the only API available API be the one that preserves ropeness...
Comment 11 Igor Bukanov 2010-11-05 01:54:40 PDT
(In reply to comment #10)
> Ah, perhaps I misread comment 5.  It seemed to be proposing that the only API
> available API be the one that preserves ropeness...

The proposal is to a have an infallible API for character enumeration in JS strings that would indeed preserve ropes yet would allow to enumerate non-ropes efficiently.
Comment 12 Luke Wagner [:luke] 2010-11-05 11:01:58 PDT
(In reply to comment #11)
> The proposal is to a have an infallible API for character enumeration in JS
> strings that would indeed preserve ropes yet would allow to enumerate non-ropes
> efficiently.

I was planning on adding such an API (initially, for a non-flattening version of NS_ConvertUTF16toUTF8), but, just to be clear: that would supplement, not replace, JS_GetStringChars, yes?
Comment 13 Igor Bukanov 2010-11-05 14:50:44 PDT
(In reply to comment #12)
> (In reply to comment #11)
> > The proposal is to a have an infallible API for character enumeration in JS
> > strings that would indeed preserve ropes yet would allow to enumerate non-ropes
> > efficiently.
> 
> I was planning on adding such an API (initially, for a non-flattening version
> of NS_ConvertUTF16toUTF8), but, just to be clear: that would supplement, not
> replace, JS_GetStringChars, yes?

Yes, that should supplement a fallible version of JS_GetStringChars with infallible iterator.
Comment 14 Igor Bukanov 2010-11-08 07:56:20 PST
Another usefull infallible API would be one that copies string chars into a buffer. That should take care of rather few cases of using str->chars like js_ValueToCharBuffer.
Comment 15 Luke Wagner [:luke] 2010-11-23 09:42:10 PST
This should fix bug 551077.
Comment 16 Nicholas Nethercote [:njn] 2010-11-23 14:50:42 PST
I think this change will allow the two kinds of rope node (TOP_NODE and INTERIOR_NODE) to be merged.
Comment 17 Luke Wagner [:luke] 2010-11-29 09:47:09 PST
Another bonus: without the need for a "top node" holding the rope's buffer, ropes can be full dags (which removes flattening tests in js_ConcatStrings) and mParent is unnecessary (so length and flags can be unpacked (again)).
Comment 18 Nicholas Nethercote [:njn] 2010-11-29 11:33:43 PST
(In reply to comment #17)
> Another bonus: without the need for a "top node" holding the rope's buffer,
> ropes can be full dags (which removes flattening tests in js_ConcatStrings) and
> mParent is unnecessary (so length and flags can be unpacked (again)).

Hmm... my patch for bug 613457 moves mBase into the first union-within-a-union.  With mParent gone, mRight can also be moved into the first union-within-a-union, leaving the second union-within-a-union empty, thus potentially shortening JSString by one word.  Except there's the issue of mInlineStorage, which would be reduced to only 2 jschars on 32-bit platforms, which means that the short static strings would have to become larger JSShortStrings, which is manageable.  But then, does JSString need to have a size that's a multiple of 8 so it can be treated as a FreeCell?
Comment 19 Luke Wagner [:luke] 2010-11-29 11:49:40 PST
Expanding mLengthAndFlags into 2 words would keep things at 4 words, all used by all string modes.  It does steal a word from short strings, though (which is annoying, since js_IntToString wants 11+1 chars in order to fit the biggest/smallest ints into short strings which it can no longer do on x86).
Comment 20 Nicholas Nethercote [:njn] 2010-11-29 15:53:28 PST
Perhaps that's reason enough to keep mLengthAndFlags packed?
Comment 21 Luke Wagner [:luke] 2010-11-29 16:24:49 PST
Its just the addition of an 'if' to js_IntToString, so no big deal.  Incidentally, I just wrote a tiny speed test to compare the difference.  If length/flags are only used to control a well-predicted branch, then its actually slightly (4%) faster to have the packed representation (since its one less store and the result isn't part of the critical path).  However if you add two string lengths (like, e.g., js_ConcatStrings), then I found it faster (40%) to have the unpacked representation.  It will easy enough to test for real after the patch.
Comment 22 Luke Wagner [:luke] 2010-11-29 18:28:19 PST
... but our static length2StringTable and hundredStringTable use 3 and 4 characters of mInlineStorage so, in lieu of bloating said tables with JSShortStrings, I'll keep 'em packed.
Comment 23 Luke Wagner [:luke] 2010-11-30 18:27:14 PST
Created attachment 494274 [details] [diff] [review]
don't need atomic ops in JSString anymore (?)

(Preliminary patch) Igor, is this valid?  With the exception of atoms in the default compartment, strings are accessed by a single-threaded at a time.  For the atoms, a lock is held when the bit is flipped.
Comment 24 Luke Wagner [:luke] 2010-11-30 18:35:27 PST
Created attachment 494278 [details] [diff] [review]
ropes redux, asserting we don't oom

This patch makes all the core JSString logic changes without actually handling oom, which I will do in the next patch.  Thus, this is all the win, with none of the checking overhead.  On OS X 10.5, I get a 10ms speedup on SS (3.7%) and a 65ms speedup on V8 (4.4%).

As already mentioned, there is also a fundamental improvement: concatenating two strings never flattens (since ropes can be dags).
Comment 25 Igor Bukanov 2010-12-01 04:26:33 PST
Comment on attachment 494274 [details] [diff] [review]
don't need atomic ops in JSString anymore (?)

Nice find - the calls to JS_ATOMIC_SET_MASK should have been removed right after the compartment GC and rewraping changes.

As EXTENSIBLE is never set, the patch should also remove all traces of it including flatClearExtensible and the flag definition itself. r+ with that fixed.
Comment 26 Luke Wagner [:luke] 2010-12-01 09:30:26 PST
(In reply to comment #25)
> As EXTENSIBLE is never set, the patch should also remove all traces of it
> including flatClearExtensible and the flag definition itself.

It is still being set in JSString::flatten by initFlatExtensible.  (The use case for EXTENSIBLE is now commented in the patch at the head of JSString::flatten.)
Comment 27 Igor Bukanov 2010-12-01 12:39:12 PST
(In reply to comment #26)
> It is still being set in JSString::flatten by initFlatExtensible.  (The use
> case for EXTENSIBLE is now commented in the patch at the head of
> JSString::flatten.)

Right, I have missed that EXTENSIBLE is still used to prevent dependent string optimization. Still this suggests to rename the flag to NON_EXTENSIBLE and set it when leaking the char array through API or when atomizing strings.

This also points out that if the char array pointer would never be exposed through api, then EXTENSIBLE can be removed, but this is for another bug.
Comment 28 Luke Wagner [:luke] 2010-12-01 13:48:10 PST
(In reply to comment #27)
> Right, I have missed that EXTENSIBLE is still used to prevent dependent string
> optimization. Still this suggests to rename the flag to NON_EXTENSIBLE and set
> it when leaking the char array through API or when atomizing strings.

Well, since having EXTENSIBLE means the 'capacity' field has been set, extensibility is opt in (in particular, only by strings initialized by JSString::flatten).
Comment 29 Nicholas Nethercote [:njn] 2010-12-01 15:13:02 PST
Comment on attachment 494278 [details] [diff] [review]
ropes redux, asserting we don't oom

This patch is such an improvement, I love it.  In particular,
js_ConcatStrings() is *so* much better now.

r=me with various minor things addressed below.

I was going to do a Cachegrind run to see the instruction count improvement
on Sunspider but I get lots of failures when applying the patch, more than I
cared to fix by hand.  Please post an updated patch if you want to see
these figures.


>diff --git a/js/src/jsgcinlines.h b/js/src/jsgcinlines.h
>--- a/js/src/jsgcinlines.h
>+++ b/js/src/jsgcinlines.h
>@@ -267,18 +267,16 @@ MarkChildren(JSTracer *trc, JSObject *ob
> }
> 
> static inline void
> MarkChildren(JSTracer *trc, JSString *str)
> {
>     if (str->isDependent())
>         MarkString(trc, str->dependentBase(), "base");
>     else if (str->isRope()) {
>-        if (str->isInteriorNode())
>-            MarkString(trc, str->interiorNodeParent(), "parent");
>         MarkString(trc, str->ropeLeft(), "left child");
>         MarkString(trc, str->ropeRight(), "right child");
>     }
> }

I guess some of the lower nodes in a rope may be live but some of the upper
ones might be dead, and the old code would mark those upper ones as live?


>+namespace detail {

Does "detail" have any significance?  It doesn't mean anything to me.

I didn't look too closely at the rest of the GC stuff as I don't know much
about GC in general.


>+    if (u.left->isExtensible() && u.left->s.capacity >= wholeLength) {
>+        wholeCapacity = u.left->s.capacity;
>+        wholeChars = u.left->u.chars;
>+        pos = wholeChars + u.left->length();
>+        u.left->finishTraversalConversion(this, wholeChars, pos);
>+        goto visit_right_child;
>+    }

So this case doesn't just fall out of the normal flatten algorithm?  Is
there a fundamental reason why not?


>+    wholeCapacity = RopeCapacityFor(wholeLength);
>+    wholeChars = (jschar *)js_malloc((wholeCapacity + 1) * sizeof(jschar));

Don't you need to check for OOM here?


> JSString * JS_FASTCALL
> js_ConcatStrings(JSContext *cx, JSString *left, JSString *right)
> {
>-    size_t length, leftLen, rightLen;
>-    bool leftRopeTop, rightRopeTop;
>-
>-    leftLen = left->length();
>+    size_t leftLen = left->length();
>     if (leftLen == 0)
>         return right;
>-    rightLen = right->length();
>+
>+    size_t rightLen = right->length();
>     if (rightLen == 0)
>         return left;
> 
>-    length = leftLen + rightLen;
>+    size_t length = leftLen + rightLen;

You used "wholeLength" elsewhere -- can you do likewise (or something
similar, eg. "concatLength", "finalLength") here?


>-static jsint
>-RopeMatch(JSString *textstr, const jschar *pat, jsuint patlen)
>+static bool
>+RopeMatch(JSContext *cx, JSString *textstr, const jschar *pat, jsuint patlen, jsint *match)

I'd like a comment explaining what the args and return value represent for
this function, please.


> #define R(c) {                                                                \
>-    JSString::FLAT | JSString::ATOMIZED | (1 << JSString::FLAGS_LENGTH_SHIFT),\
>+    JSString::FLAT | JSString::ATOMIZED | (1 << JSString::LENGTH_SHIFT),      \

You added buildLengthAndFlags(), but it can't be used here.  In my patch for
bug 613457 I did likewise but I used a macro instead of a function.  You
should do that here -- sometimes a macro is the right thing to use :)


>+ public:
>+    size_t                 lengthAndFlags;      /* in all strings */

I figure you're moving away from the mFoo member naming convention because
it's not standard within SM?  And I was just getting used to it...

 
>     inline void flatClearExtensible() {
>+        /* N.B. This may be called on static strings, which are in write-protected memory. */
>         JS_ASSERT(isFlat());
>-
>-        /*
>-         * We cannot eliminate the flag check before writing to mLengthAndFlags as
>-         * static strings may reside in write-protected memory. See bug 599481.
>-         */
>-        if (mLengthAndFlags & EXTENSIBLE)
>-            mLengthAndFlags &= ~EXTENSIBLE;
>+        if (lengthAndFlags & EXTENSIBLE)
>+            lengthAndFlags &= ~EXTENSIBLE;

I find the new comment too brief, and the old one wasn't really clear
either.  How about this:  "You might think that we could just clear the
EXTENSIBLE flag without first checking if it's set.  However, this function
may be called on strings that aren't extensible, and that includes static
strings which are stored in write-protected memory and cannot be modified.
Therefore, we check.  (We could instead check with isStatic(), but that's
slower.)"
Comment 30 Luke Wagner [:luke] 2010-12-01 16:05:17 PST
(In reply to comment #29)
> I guess some of the lower nodes in a rope may be live but some of the upper
> ones might be dead, and the old code would mark those upper ones as live?

Before, if any node in a rope stayed a live, the containing tree needed to stay alive (at the very least, so that mParent wasn't a dangling pointer).  That is no longer the case; rope nodes no longer care about if and how many parents they have.

> >+namespace detail {
> 
> Does "detail" have any significance?  It doesn't mean anything to me.

We have started using 'detail' as the namespace to stick helpers that you don't want to litter the enclosing namespace with.  If you grep "namespace detail" you can see that its in a couple of places.  Short form "implementation details", I suppose.

> So this case doesn't just fall out of the normal flatten algorithm?  Is
> there a fundamental reason why not?

Speed, as always :)  We want to avoid copying that left flat node onto itself.

> >+    wholeCapacity = RopeCapacityFor(wholeLength);
> >+    wholeChars = (jschar *)js_malloc((wholeCapacity + 1) * sizeof(jschar));
> 
> Don't you need to check for OOM here?

This is an intermediate patch.  The patches I'm working on now is to deal with fallibility.  That's why this patch is labeled "ropes redux, asserting we don't oom" ;)

> I figure you're moving away from the mFoo member naming convention because
> it's not standard within SM?  And I was just getting used to it...

Yeah, mFoo was never the style, it just appeared in a few places (JSString and (copying JSString) js::Vector) before everyone agreed to stop: https://wiki.mozilla.org/JavaScript:SpiderMonkey:C%2B%2B_Coding_Style#Classical_OOP.

Other comments addressed, thanks for the review!  Still two more patches (intra-SM and extra-SM fallibility changes).
Comment 31 Igor Bukanov 2010-12-02 05:09:27 PST
(In reply to comment #28)
> Well, since having EXTENSIBLE means the 'capacity' field has been set,
> extensibility is opt in (in particular, only by strings initialized by
> JSString::flatten).

What about removing EXTENSIBLE and dropping API that returns 0-terminated jschar array? Since \0 can legally present in JS strings such API can not represent strings in full. So lets remove that and direct the current API users to JS_GetStringCharsAndLength and JS_GetStringCharArray with the promise that the returned pointer will stay valid until the string is GC-ed.
Comment 32 Luke Wagner [:luke] 2010-12-02 09:14:15 PST
(In reply to comment #31)
That sounds reasonable, but as a followup bug.  Btw, I haven't checked whether uses of nsDependentString actually depend on null-terminated-ness, but it asserts null-terminated-ness in its constructor.
Comment 33 Luke Wagner [:luke] 2010-12-03 01:33:13 PST
Created attachment 494967 [details] [diff] [review]
remove JS_GetStringChars, get mozilla compiling with fallible alternatives

This patch switches mozilla to use fallible string APIs and removes JS_GetStringChars.  It ended up being bigger than expected b/c nsDependentJSString and all its users also had to change.  Despite adding a bunch of null-check branches, almost without exception, every piece of code touched should be faster because:
 - JS_GetStringChars/JS_GetStringLength pairs were replaced with a single JS_GetStringChars[Z]AndLength calls
 - In the cases where JS_GetStringCharsAndLength (no Z) replaced JS_GetStringChars (i.e., where there was no null-term assumption), we avoid un-depending strings
 - Places using a string from a jsid (i.e., an atom) can use a (new) infallible API JS_GetStringIdCharsAndLength
 - In the fatval patch I changed like 20 JS_*UC*() calls to JS_*ById() in nsDOMClassInfo.  This patch changed like 10 more in other places, mostly NPAPI.
Comment 34 Luke Wagner [:luke] 2010-12-03 01:36:02 PST
Created attachment 494968 [details] [diff] [review]
just JSAPI interface changes for review

Brendan, do these new APIs make sense?
Comment 35 Igor Bukanov 2010-12-03 02:57:25 PST
(In reply to comment #34)
> Created attachment 494968 [details] [diff] [review]
> just JSAPI interface changes for review

JS_GetStringIdChars should take jsid parameter, not string. Otherwise it would be way to easy to use it on non-id string. Since literal strings are atomized it would be very easy to miss such bad usage.
Comment 36 Luke Wagner [:luke] 2010-12-03 09:05:36 PST
(In reply to comment #35)
> JS_GetStringIdChars should take jsid parameter, not string. Otherwise it would
> be way to easy to use it on non-id string.

I considered that, but that, of course, introduces the mistake of passing a non-string id (which would be a tempting mistake: I saw quite a few cases of wanting to get string chars out of an id and might choose this API instead of the path JS_IdToValue/JS_ValueToString/JS_GetStringChars*)).  Ultimately, I decided "well, the name says string id".

> Since literal strings are atomized it would be very easy to miss such bad usage.

That would be fine.  I guess the name "StringId" is too strict; I could instead rename it to JS_GetInternedStringChars.
Comment 37 Igor Bukanov 2010-12-03 11:14:25 PST
(In reply to comment #36)
> I considered that, but that, of course, introduces the mistake of passing a
> non-string id

IMO it is more likely to hit a debug assert about wrong id passed to the function rather then a debug assert about non-atomized string. The latter requires non-literal string so if code would be cut-and-pasted or just evolved into non-id context the bug may not be exposed immediately.
Comment 38 Luke Wagner [:luke] 2010-12-03 11:34:45 PST
(In reply to comment #37)
Based on the realization in comment 36, I'd like to change the name to JS_GetInternedStringChars, so string seems right.  Additionally, "interned strings" are already an API concept.
Comment 39 Nicholas Nethercote [:njn] 2010-12-03 12:01:16 PST
What happens if you call JS_GetStringCharsInfallible() without first calling JS_MakeGetStringCharsInfallible()?  It would be nice if such a thing was impossible, eg. maybe introduce a new type that is identical to JSString but indicates that it's been infallibilized?

BTW, I think you attached the wrong patch for the "remove JS_GetStringChars..." patch.
Comment 40 Luke Wagner [:luke] 2010-12-03 13:34:41 PST
(In reply to comment #39)
> What happens if you call JS_GetStringCharsInfallible() without first calling
> JS_MakeGetStringCharsInfallible()?  It would be nice if such a thing was
> impossible, eg. maybe introduce a new type that is identical to JSString but
> indicates that it's been infallibilized?

It would be a dynamic error.  Fun idea with the type; I generally like using types to prove things, but this might be a bit overkill for such a restricted use case.

> BTW, I think you attached the wrong patch for the "remove JS_GetStringChars..."

Thanks, I'll attach the right one after try server results and fixing some windows build bustage.
Comment 41 Nicholas Nethercote [:njn] 2010-12-03 13:42:09 PST
(In reply to comment #40)
> 
> It would be a dynamic error.

Can you be more specific?

> Fun idea with the type; I generally like using
> types to prove things, but this might be a bit overkill for such a restricted
> use case.

If you just make it a sub-class of JSString the declaration is short and it can be used anywhere a JSString can.  I'll leave it up to you, but it doesn't seem like overkill to me.
Comment 42 Luke Wagner [:luke] 2010-12-03 13:54:41 PST
(In reply to comment #41)
> (In reply to comment #40)
> > 
> > It would be a dynamic error.
> 
> Can you be more specific?

If you call JS_GetStringCharsInfallible on a string that has not had JS_MakeGetStringCharsInfallible, it would assert in debug mode if the string was not flat or produce undefined behavior in release mode.

> If you just make it a sub-class of JSString the declaration is short and it can
> be used anywhere a JSString can.

This is a C API; you'd need an opaque typedef and probably a conversion macro.
Comment 43 Luke Wagner [:luke] 2010-12-03 16:35:43 PST
(In reply to comment #41)
> I'll leave it up to you, but it doesn't seem like overkill to me.

Actually, I think you're right; it felt like a pretty rare case until I returned to fallible-izing SM.
Comment 44 Luke Wagner [:luke] 2010-12-03 17:19:28 PST
Created attachment 495183 [details] [diff] [review]
just JSAPI interface changes for review (v.2)
Comment 45 Brendan Eich [:brendan] 2010-12-06 15:26:43 PST
Comment on attachment 495183 [details] [diff] [review]
just JSAPI interface changes for review (v.2)

Looks good.

I grumbled about the MatchStringAndAscii name, I should have looked more closely when I blessed the "Ascii" spelling. That is a fine spelling in this API but the "And" seems wrong. Just JS_MatchAsciiString or JS_MatchStringAgainstAscii, in case anyone has an opinion.

/be
Comment 46 Boris Zbarsky [:bz] (still a bit busy) 2010-12-06 16:44:47 PST
JS_StringEqualsAscii ?
Comment 47 Robert Sayre 2010-12-09 11:23:57 PST
Looks like this blocker is ready to land.
Comment 48 Nicholas Nethercote [:njn] 2010-12-09 14:00:04 PST
(In reply to comment #47)
> Looks like this blocker is ready to land.

Not quite, the "remove JS_GetStringChars..." patch needs review, but the wrong patch was attached (see comment 39).

Also, I'd love some up-to-date copies of all the patches in order to do some measurements with Cachegrind.

It'd be great to knock this bug over, it's blocking two other blockers.
Comment 49 Luke Wagner [:luke] 2010-12-09 14:32:58 PST
This is what I have been doing all week.  There is a lot of code to change.  In particular, every time I add an early return, I need to take time to understand the local code to make sure I'm doing the right thing.  I would post updated patches, but I am changing them as I go.
Comment 50 Nicholas Nethercote [:njn] 2010-12-09 16:05:19 PST
(In reply to comment #49)
> This is what I have been doing all week.  There is a lot of code to change.

I didn't mean to imply that you've been slacking or anything :)  It is a big change.
Comment 51 Luke Wagner [:luke] 2010-12-09 21:20:30 PST
(In reply to comment #45)
> I grumbled about the MatchStringAndAscii name, I should have looked more
> closely when I blessed the "Ascii" spelling. That is a fine spelling in this
> API but the "And" seems wrong. Just JS_MatchAsciiString or
> JS_MatchStringAgainstAscii, in case anyone has an opinion.

Well I'm going to fallible-ize this too (and add an infallible JSFlatString-taking version), so I might as well rename it proper.

JS_MatchAsciiString was my favorite, but there is no clean place to insert "Flat" in that name, so how about Boris's suggestion JS_StringEqualsAscii?  "Equal" pairs well with js_EqualStrings.
Comment 52 Luke Wagner [:luke] 2010-12-10 18:10:53 PST
Created attachment 497006 [details] [diff] [review]
union of patches, almost done

This patch is the rebased (on top of TM fb3b0fd656bf) and updated union of previous patches and a new patch of actually handling oom in JSString::chars().  What remains is some stragglers outside js/src that use JSString directly (ctypes...).  Also I left some TODO's that need a nice explanatory comment.

Inside SM, I ended up with an implicitly-convertible hierarchy of string types:
  JSAtom > JSFlatString > JSLinearString > JSString
where the new "JSLinearString" means that you have an infallible chars(), but that chars()[length()] may not be 0.  This tended to be the type needed most often in js/src, since we practically never depend on null-terminatedness.  This helped carve out some broad paths that only deal with flattened strings.

So, almost done.  The speedup remains; on OSX10.5, I measured 2.4%/5.4% (6ms/80ms) speedups on SS/V8 in the shell.
Comment 53 Jeff Walden [:Waldo] (remove +bmo to email) 2010-12-11 01:12:51 PST
Wouldn't the right new-world names be js::FlatString and js::LinearString?  Parity with js::Value and all that.  Happily, that seems like the sort of change which would be mechanical even with only a shell one-liner (plus a minor bit to change the actual definition).

Mozilla at large has used the "get" prefix to imply fallibility, and lack thereof to imply infallibility.  If we did this here it wouldn't be following a SpiderMonkey convention, because there isn't one on the matter.  But I can't think of an argument *against* chars() for infallible and getChars() for fallible even if it's not adherence to any SpiderMonkey convention, and it would be a clear differentiator to readers (reviewers especially).
Comment 54 Luke Wagner [:luke] 2010-12-11 14:17:18 PST
(In reply to comment #53)
> Wouldn't the right new-world names be js::FlatString and js::LinearString? 

Like JSString, JSFlatString is in the public jsapi.h.  JSLinearString isn't, but its just irregular to have
  JSAtom > JSFlatString > js::LinearString > JSString
Also, some day we may want to to expose JSLinearString and JS_LinearizeString for some perf reason.
Comment 55 Robert Sayre 2010-12-11 15:01:42 PST
In our blocker-focused environment, I think naming arguments are frowned upon.
Comment 56 Jeff Walden [:Waldo] (remove +bmo to email) 2010-12-13 16:19:18 PST
(In reply to comment #54)
> Like JSString, JSFlatString is in the public jsapi.h.  JSLinearString isn't,
> but its just irregular to have
>   JSAtom > JSFlatString > js::LinearString > JSString
> Also, some day we may want to to expose JSLinearString and JS_LinearizeString
> for some perf reason.

Okay, sounds good -- just making sure there's a reason for it while you're modifying things.

I understand and generally agree with the sentiment behind comment 55.  At the same time, I don't think the two previous comments rose to the level of a naming argument.
Comment 57 Luke Wagner [:luke] 2010-12-13 17:27:26 PST
Created attachment 497383 [details] [diff] [review]
fallibile-ize JSString use in ctypes

With this, everything is fallible-ized.  Going to split into smaller patches for review, fill in TODO comments.
Comment 58 Luke Wagner [:luke] 2010-12-13 17:56:16 PST
Created attachment 497392 [details] [diff] [review]
web worker changes
Comment 59 Luke Wagner [:luke] 2010-12-13 17:56:37 PST
Created attachment 497393 [details] [diff] [review]
qsgen.py changes
Comment 60 Luke Wagner [:luke] 2010-12-13 17:57:04 PST
Created attachment 497394 [details] [diff] [review]
dom / content / NPAPI changes
Comment 61 Luke Wagner [:luke] 2010-12-13 17:57:26 PST
Created attachment 497395 [details] [diff] [review]
xpconnect changes
Comment 62 Luke Wagner [:luke] 2010-12-13 17:57:55 PST
Created attachment 497396 [details] [diff] [review]
storage / idb changes
Comment 63 Luke Wagner [:luke] 2010-12-13 17:58:40 PST
Created attachment 497397 [details] [diff] [review]
JSAPI (tests, impl) changes (in view of brendan's API r+)
Comment 64 Luke Wagner [:luke] 2010-12-13 18:08:42 PST
Created attachment 497400 [details] [diff] [review]
regexp changes
Comment 65 Luke Wagner [:luke] 2010-12-13 18:09:16 PST
Created attachment 497401 [details] [diff] [review]
tracer / mjit changes
Comment 66 Luke Wagner [:luke] 2010-12-13 18:09:39 PST
Created attachment 497402 [details] [diff] [review]
xml changes
Comment 67 Luke Wagner [:luke] 2010-12-13 18:10:02 PST
Created attachment 497403 [details] [diff] [review]
jsstr changes
Comment 68 Luke Wagner [:luke] 2010-12-13 18:10:23 PST
Created attachment 497404 [details] [diff] [review]
json changes
Comment 69 Luke Wagner [:luke] 2010-12-13 18:10:49 PST
Created attachment 497406 [details] [diff] [review]
various changes
Comment 70 Luke Wagner [:luke] 2010-12-13 18:12:40 PST
Note to reviewers: hopefully the patches are independently understandable, but if not, see the "union of patches" patch for context (in particular, to see the new interface of nsDependentJSString / JSString).
Comment 71 Igor Bukanov 2010-12-13 19:03:57 PST
Comment on attachment 497402 [details] [diff] [review]
xml changes

>+inline JSLinearString *
>+JSObject::getNamePrefix() const
>+{
>+    JS_ASSERT(isNamespace() || isQName());
>+    const js::Value &v = getSlot(JSSLOT_NAME_PREFIX);
>+    return v.isString() ? v.toString()->assertIsLinear() : NULL;

This together with the GetPrefix change below loses assert coverage as code no longer verifies the slot is undefined if it is not a string. Changing v.isString() to !v.isUndefined() should fix this.


>+inline JSLinearString *
>+JSObject::getNameURI() const
>+{
>+    JS_ASSERT(isNamespace() || isQName());
>+    const js::Value &v = getSlot(JSSLOT_NAME_URI);
>+    return v.isString() ? v.toString()->assertIsLinear() : NULL;

The same here.

>+inline JSLinearString *
>+JSObject::getQNameLocalName() const
>+{
>+    JS_ASSERT(isQName());
>+    const js::Value &v = getSlot(JSSLOT_QNAME_LOCAL_NAME);
>+    return v.isString() ? v.toString()->assertIsLinear() : NULL;

The same here.

>diff --git a/js/src/jsxml.cpp b/js/src/jsxml.cpp
>-static JS_INLINE JSString *
>+static JS_INLINE JSLinearString *
> GetPrefix(const JSObject *obj)
> {
>+    return obj->getNamePrefix();
>+}

Nit: replace GetPrefix call sites with obj->getNamePrefix() or at least add some FIXME comments to do it later.

> static JSFunctionSpec namespace_methods[] = {
>     JS_FN(js_toString_str,  namespace_toString,        0,0),
>     JS_FS_END
> };
> 
> static JSObject *
>-NewXMLNamespace(JSContext *cx, JSString *prefix, JSString *uri, JSBool declared)
>+NewXMLNamespace(JSContext *cx, JSLinearString *prefix, JSLinearString *uri, JSBool declared)
> {
>     JSObject *obj;
> 
>     obj = NewBuiltinClassInstanceXML(cx, &js_NamespaceClass);
>     if (!obj)
>         return JS_FALSE;
>-    JS_ASSERT(JSVAL_IS_VOID(obj->getNamePrefix()));
>-    JS_ASSERT(JSVAL_IS_VOID(obj->getNameURI()));
>+    JS_ASSERT(JSVAL_IS_VOID(obj->getNamePrefixValue()));
>+    JS_ASSERT(JSVAL_IS_VOID(obj->getNameURIValue()));

Nit: rename getSomethingValue into getSomethingVal as the functions returns jsval, not js::Value.


>@@ -492,17 +481,20 @@ qname_toString(JSContext *cx, uintN argc
>         return JS_FALSE;
> 
>     if (str && clasp == &js_AttributeNameClass) {
>         length = str->length();
>         chars = (jschar *) cx->malloc((length + 2) * sizeof(jschar));
>         if (!chars)
>             return JS_FALSE;
>         *chars = '@';
>-        js_strncpy(chars + 1, str->chars(), length);
>+        const jschar *strChars = str->getChars(cx);
>+        if (!strChars)
>+            return JS_FALSE;

Bug: missing cx->free(chars) on failure.

>@@ -1736,17 +1750,19 @@ ParseXMLSource(JSContext *cx, JSString *
>     js_InflateStringToBuffer(cx, prefix, constrlen(prefix), chars, &dstlen);
>     offset = dstlen;
>     js_strncpy(chars + offset, uri->chars(), urilen);
>     offset += urilen;
>     dstlen = length - offset + 1;
>     js_InflateStringToBuffer(cx, middle, constrlen(middle), chars + offset,
>                              &dstlen);
>     offset += dstlen;
>-    srcp = src->chars();
>+    srcp = src->getChars(cx);
>+    if (!srcp)
>+        return NULL;

Bug: missing cx->free(chars) on failure. There is also same preexisting bug in ParseXMLSource when calls GetScopeChain and does not release the buffer on failure as well.

I will promptly r+ a patch with this fixed.
Comment 72 Luke Wagner [:luke] 2010-12-14 10:23:38 PST
Created attachment 497532 [details] [diff] [review]
xml changes v.2
Comment 73 Luke Wagner [:luke] 2010-12-14 11:45:17 PST
Created attachment 497561 [details] [diff] [review]
tracer / mjit changes v.2
Comment 74 Jason Orendorff [:jorendorff] 2010-12-14 15:28:08 PST
Comment on attachment 497393 [details] [diff] [review]
qsgen.py changes

In qsgen.py:
>     '[astring]':
>-          "    XPCReadableJSStringWrapper ${name}(${argVal});\n",
>+          "    XPCReadableJSStringWrapper ${name};\n"
>+          "    if (!${name}.init(cx, ${argVal})) {\n"
>+          "${error}\n",

No \n at the end of that.

r=me with that.
Comment 75 Shawn Wilsher :sdwilsh 2010-12-14 15:40:39 PST
Comment on attachment 497396 [details] [diff] [review]
storage / idb changes

r=sdwilsh
Comment 76 Nicholas Nethercote [:njn] 2010-12-14 15:58:00 PST
Comment on attachment 497403 [details] [diff] [review]
jsstr changes

I'm happy with the correctness of the patch so I'm giving an r+.  I have
some suggestions below, mostly about names of things, that I'd like you to
consider carefully but, because this is a blocker I don't want to hold
things up, so I'll leave acting on them up to you (though I'm happy to
discuss).



>     wholeCapacity = RopeCapacityFor(wholeLength);
>-    wholeChars = (jschar *)js_malloc((wholeCapacity + 1) * sizeof(jschar));
>+    wholeChars = AllocChars(maybecx, wholeCapacity);
>+    if (!wholeCapacity)
>+        return NULL;

There was no null check here previously?  That's a good thing to have
fixed...



>@@ -463,20 +478,22 @@ js_str_escape(JSContext *cx, JSObject *o
>         }
>     }
> 
>     if (newlength >= ~(size_t)0 / sizeof(jschar)) {
>         js_ReportAllocationOverflow(cx);
>         return JS_FALSE;
>     }
> 
>-    newchars = (jschar *) cx->malloc((newlength + 1) * sizeof(jschar));
>+    jschar *newchars = (jschar *) cx->malloc((newlength + 1) * sizeof(jschar));

I think you can use AllocChars() here?



>     /* Don't bother allocating less space for the new string. */
>-    newchars = (jschar *) cx->malloc((length + 1) * sizeof(jschar));
>+    jschar *newchars = (jschar *) cx->malloc((length + 1) * sizeof(jschar));

Use AllocChars() here, too?  And some other places below in jsstr.cpp.


> JSBool JS_FASTCALL
>-js_EqualStrings(JSString *str1, JSString *str2)
>+js_EqualStringsOnTrace(JSContext *cx, JSString *str1, JSString *str2)
> {
>-    size_t n;
>-    const jschar *s1, *s2;
>-
>+    JSBool result;
>+    return EqualStrings(cx, str1, str2, &result) ? result : JS_NEITHER;
>+}
>+JS_DEFINE_CALLINFO_3(extern, BOOL, js_EqualStringsOnTrace,
>+                     CONTEXT, STRING, STRING, 1, nanojit::ACCSET_NONE)

The definition of JS_NEITHER says it shouldn't be used in new code, and fair
enough, too, it's dastardly!  Can you change js_EqualStringsOnTrace() to
return an int where 0==equal, 1=unequal, 2==oom?  Ie. emulate JS_NEITHER but
in an explicit (and less error-prone) way.



>+ * To allow static type-based checking that a given JSString* always points
>+ * to a flat or non-rope string, the JSLinearString and JSFlatString types may
>+ * be use. Instead of casting, callers should use ensureX() and assertIsX().

s/use/used.

Shouldn't "JSLinearString" and "JSFlatString" be reordered in that sentence to
match the order of "flat" and "non-rope"?

Re assertIsX() -- assertions are normally debug-only, so having a function
with "assert" in the name is weird.  Besides, aren't convert-to-a-subclass
functions usually called asX()?  That's how we're planning to do it for
converting JSObject to sub-classes such as JSDateObject (eg. bug 566789),
right?

Also, the "ensureX()" form -- would "toX()" be better?  ensureX() sounds a
bit to me like it checks, but not necessarily converts.


>+    JS_ALWAYS_INLINE const jschar *getChars(JSContext *cx) {
>+        if (isRope())
>+            return flatten(cx);
>+        return nonRopeChars();
>     }
> 
>+    JS_ALWAYS_INLINE const jschar *getCharsZ(JSContext *cx) {
>+        if (!isFlat())
>+            return undepend(cx);
>+        return flatChars();
>     }

These two functions are making my head spin.  It's getting hard to keep
track of the different kinds of string, the different names for them (and
equivalent names, eg "linear" == "non-rope"), and
which kinds of string different functions operate on.  For example,
flatten() can only be called on Ropes, so flattenRope() would be a better
name.  Actually, flattenRopeAndGetChars() would be even better, because it
makes it clear that it both flattens the rope and gets the chars -- I would
expect the return value of a function called flattenRope() to be a
JSFlatString.

As for getCharsZ()... undepend() in an increasingly awful name;  I guess it
now should be called flattenAndGetChars()?  Since it converts any kind of
string into a flat string and extracts the chars.  Or maybe
flattenAnyAndGetChars()?

Those changes would result in the following, which spins my head much less:

    JS_ALWAYS_INLINE const jschar *getChars(JSContext *cx) {
        if (isRope())
            return flattenRopeAndGetChars(cx);
        return nonRopeChars();
    }

    JS_ALWAYS_INLINE const jschar *getCharsZ(JSContext *cx) {
        if (!isFlat())
            return flattenAndGetChars(cx);
        return flatChars();
    }

BTW, does the "get" prefix imply fallibility?  That's worth mentioning in a
comment.



>+/*
>+ * A "linear" string may or may not be null-terminated, but it provide
>+ * infallible access to a linear array of characters.
>+ */

s/provide/provides/

So a linear string is a non-rope?  Can you add that to the comment?

Alternatively, I wonder if JSNonRopeString or JSFlatZString would be better
names than JSLinearString?  Either would avoid introducing yet another word
to describe a subset of string kinds.  JSNonRopeString is a helpful name for
those working on the implementation itself (and we already have eg.
nonRopeChars()), whereas JSFlatZString is more orientied towards those who use
the implementation but don't care about the internal details.


>+struct JSLinearString : JSString
> {
>+    const jschar *chars() const { return JSString::nonRopeChars(); }
>+};
>+
>+JS_STATIC_ASSERT(sizeof(JSLinearString) == sizeof(JSString));
>+
>+/*
>+ * A linear string where, additionally, chars()[length()] == '\0'.
>+ */

So that excludes dependent strings, right?  Can you add that to the comment?



>+struct JSFlatString : JSLinearString
>+{
>+    const jschar *charsZ() const { return chars(); }
> };
> 
> JS_STATIC_ASSERT(sizeof(JSFlatString) == sizeof(JSString));

Re the naming of chars() -- I kept forgetting what kind of string it applied
to.  If JSLinearString is renamed JSNonRopeString, then nonRopeChars() would
be enough, chars() wouldn't be needed.  Or, if JSLinearString isn't renamed,
maybe linearChars() would be better?



> inline void
> JSString::finalize(JSContext *cx) {
>     JS_ASSERT(!JSString::isStatic(this));
>     JS_RUNTIME_UNMETER(cx->runtime, liveStrings);
>     if (isDependent()) {
>-        JS_ASSERT(dependentBase());

Why remove the assertion?
Comment 77 Blake Kaplan (:mrbkap) (PTO until Jan. 2, 2017) 2010-12-14 16:19:59 PST
Comment on attachment 497395 [details] [diff] [review]
xpconnect changes

>-    return reinterpret_cast<PRUnichar*>(JS_GetStringChars(str));
>+    return JS_GetStringCharsZ(cx, str);

You've gotten rid of a bunch of these reinterpret_casts, but as far as I know, they're still required for MinGW to build on Windows. I think you're going to have to add them back in (or get a MinGW buddy to verify that they're truly not needed anymore, but I don't think that's the case). I'm not going to bother pointing out all of the places that you removed them, but there's a bunch.

> static void
> getUTF8StringArgument(JSContext *cx, JSObject *obj, PRUint16 argNum,
>                       uintN argc, jsval *argv, nsCString& aRetval)
> {
>+    JSString *str;
>+    const PRUnichar *data;

Why declare these C-style all the way up here?

Also, if you move the aRetval.Truncate to here, then you can get rid of the goto and just return in the error cases.

r=mrbkap with those comments addressed.
Comment 78 Luke Wagner [:luke] 2010-12-14 21:49:32 PST
(In reply to comment #76)
> >+    wholeChars = AllocChars(maybecx, wholeCapacity);
> >+    if (!wholeCapacity)
> >+        return NULL;
> 
> There was no null check here previously?  That's a good thing to have
> fixed...

This is the same unchecked allocation you complained about in comment 29.  This is the fix promised by comment 30.

> >     /* Don't bother allocating less space for the new string. */
> >-    newchars = (jschar *) cx->malloc((length + 1) * sizeof(jschar));
> >+    jschar *newchars = (jschar *) cx->malloc((length + 1) * sizeof(jschar));
> 
> Use AllocChars() here, too?  And some other places below in jsstr.cpp.

That sounds like a nice follow-up factoring.  AllocChars atm is simply trying to make flatten() easier to read.

> >+    return EqualStrings(cx, str1, str2, &result) ? result : JS_NEITHER;
> >+}
> >+JS_DEFINE_CALLINFO_3(extern, BOOL, js_EqualStringsOnTrace,
> >+                     CONTEXT, STRING, STRING, 1, nanojit::ACCSET_NONE)
> 
> The definition of JS_NEITHER says it shouldn't be used in new code, and fair
> enough, too, it's dastardly!  Can you change js_EqualStringsOnTrace() to
> return an int where 0==equal, 1=unequal, 2==oom?  Ie. emulate JS_NEITHER but
> in an explicit (and less error-prone) way.

JS_NEITHER is an out-of-band bool for tracer purposes and this is a traceable native; this is the same pattern as the originating traceable native that wanted JS_NEITHER.  Using explicit integers seems strictly less readable/understandable.

> Also, the "ensureX()" form -- would "toX()" be better?  ensureX() sounds a
> bit to me like it checks, but not necessarily converts.

ensureX() *does* check and its fallible.  toX() sounds infallible.

> BTW, does the "get" prefix imply fallibility?  That's worth mentioning in a
> comment.

Yes, see previous comments.  Taking a JSContext* implies fallibility.

As for the other naming suggestions, I didn't fuss too much since I knew you were planning to give JSString a makeover, so I'll just leave this be for now.

> So a linear string is a non-rope?  Can you add that to the comment?

Yes

> Alternatively, I wonder if JSNonRopeString or JSFlatZString would be better
> names than JSLinearString?

I think it is more understandable to describe positive properties vs. set-negation.

> > inline void
> > JSString::finalize(JSContext *cx) {
> >     JS_ASSERT(!JSString::isStatic(this));
> >     JS_RUNTIME_UNMETER(cx->runtime, liveStrings);
> >     if (isDependent()) {
> >-        JS_ASSERT(dependentBase());
> 
> Why remove the assertion?

dependentBase() does return s.base->assertIsLinear().  In the finalizer, the base may have been finalized and its bytes trashed so the assert isn't valid.  Also, it didn't seem very useful to assert that a dependent string's dependent base is not null; that would almost certainly explode earlier.  I could just add JS_ASSERT(s.base) instead... I didn't because the assertion doesn't seem particularly useful; if a NULL is passed we'll explode much much earlier.
Comment 79 Luke Wagner [:luke] 2010-12-15 10:13:39 PST
(In reply to comment #77)
> You've gotten rid of a bunch of these reinterpret_casts, but as far as I know,
> they're still required for MinGW to build on Windows. I think you're going to
> have to add them back in (or get a MinGW buddy to verify that they're truly not
> needed anymore, but I don't think that's the case). I'm not going to bother
> pointing out all of the places that you removed them, but there's a bunch.

I believe the types were officially synced by bug 578340.  (It says 'Windows' in the title, but I see references to MinGW in the comments.)

> > static void
> > getUTF8StringArgument(JSContext *cx, JSObject *obj, PRUint16 argNum,
> >                       uintN argc, jsval *argv, nsCString& aRetval)
> > {
> >+    JSString *str;
> >+    const PRUnichar *data;
> 
> Why declare these C-style all the way up here?

because of the ever-annoying "goto skips initialization of local" rule and:

  JSString *str;
  str = foo():

looks kinda lame.

> Also, if you move the aRetval.Truncate to here, then you can get rid of the
> goto and just return in the error cases.

I was reticent to hoist Truncate, since then its called twice on the success path, but if you're telling me its not hot enough to care, that's sounds nice.
Comment 80 Luke Wagner [:luke] 2010-12-15 10:17:29 PST
s/reticent/reluctant/ :)
Comment 81 Jeff Walden [:Waldo] (remove +bmo to email) 2010-12-15 14:02:24 PST
Comment on attachment 497406 [details] [diff] [review]
various changes

== Easy things to fix ==

jsreflect.cpp: do I see a getChars() without fallibility check, then flowing into js_DeflateString?  It looks like js_DeflateString doesn't handle that and would deref null if that failed.

In obj_toSource, you have |bool idIsLexicalIdentifier = js_IsIdentifier(idstr->assertIsLinear());|.  But idstr is JSLinearString*, so is that assert necessary?  Seems not to me.

(Aside: I agree with njn that the name should be asLinear or asLinearString, or something, but I can live with what you have for now.  I very slightly prefer toLinear to ensureLinear, but I don't much care.  I don't mind inline methods named assert*, because then the exact mechanics of the assert can be hidden [and #ifdef DEBUG'd in the method, of course].  But this latter point is moot here and so not worth dwelling on.)

== Worrisome issues to fix ==

Is js_OutofBand implemented correctly?  It's a NaN value, so comparing it to anything, even js_OutOfBand, via == is always false.  Doesn't this mean OOM handling won't work?

(Aside: at what central location are the meanings of all non-canonical NaN values like this documented?  Please start one if there isn't already; the list won't stop here.)

== Future things to fix ==

Perhaps lstr/fstr should be the canonical names (if informative names aren't actually useful) for instances of the new types, going forward?  But for now it's not much trouble to change incrementally and deal with less-precise names now.

An aside, but PodCopy should assert non-overlappingness.  Also, as usual, I prefer |src < srcend| to using !=, which here in particular is more conceivably worrisome than in some places where I suggest this, nelem being user-provided and somewhat susceptible to integer overflow.  File a followup or something on this?

jsscope.cpp/jsgcstats.cpp: I noticed first with these two instances, but it seems to me that PutEscapedString is called with (sizeof buf - 1) in some places, (sizeof buf) in others.  At the least this is inconsistent!  It appears sizeof buf is what's actually correct (rather, makes the most efficient use of space), but double-check me on that, the PutEscapedString stuff was a little confusing to me.  Making those correct can also be a followup, since the off-by-one is in the conservative direction and shouldn't result in overflows.

== Verdict ==

All in all, great stuff.  I've wanted JSAtom to be something real and not just completely faked up for awhile, but it was going to be a lot of work without obvious gain -- great to see that happen, and win at the same time.

It's a definite r- for the js_OutOfBand problem, but that's it -- the few other needed tweaks should be easy and not particularly interesting.
Comment 82 Luke Wagner [:luke] 2010-12-15 14:26:14 PST
As noted IRL, js_OutOfBand has already been removed in dvander's tracer changes patch; it shouldn't have been included in waldo's.
Comment 83 Jeff Walden [:Waldo] (remove +bmo to email) 2010-12-15 14:54:41 PST
Comment on attachment 497406 [details] [diff] [review]
various changes

I told him to just flip the bit with r=irl noting that the issue had been fixed, but nooo, he had to go bug me again with a review request.  :-P
Comment 84 Nicholas Nethercote [:njn] 2010-12-15 15:31:46 PST
(In reply to comment #78)
> 
> This is the same unchecked allocation you complained about in comment 29.  This
> is the fix promised by comment 30.

Hopefully my consistency makes up for my poor memory :)

 
> Yes, see previous comments.  Taking a JSContext* implies fallibility.

I've been told it implies fallibility if it's the first argument, infallibility if it's the last argument, and if it's the only argument that it needs a comment to make things clear.
Comment 85 David Anderson [:dvander] 2010-12-15 15:57:30 PST
Comment on attachment 497561 [details] [diff] [review]
tracer / mjit changes v.2

>+    if (length == 1) {
>+        jschar c = chars[0];
>+        if ('0' <= c && c <= '9') {
>+            *result = NumberTraits<T>::toSelfType(T(c - '0'));
>+            return true;
>+        }
>+        else if (JS_ISSPACE(c)) {

Weird alignment on this "else"

>-        args[0] = r_ins, args[1] = cx_ins;
>+        LIns* ok_ins = w.allocp(sizeof(JSBool));
>+        args[0] = ok_ins, args[1] = r_ins, args[2] = cx_ins;
>         r_ins = w.call(&js_StringToNumber_ci, args);
>-        cond = (l.toNumber() == js_StringToNumber(cx, r.toString()));
>+        guard(false,
>+              w.name(w.eqi0(w.ldiAlloc(args[0])), "guard(oom)"),
>+              OOM_EXIT);

At one point, nanojit mutated the contents of args in call(). Not sure if it still does that, but if so, it could bite here.
Comment 86 Blake Kaplan (:mrbkap) (PTO until Jan. 2, 2017) 2010-12-15 16:06:07 PST
(In reply to comment #79)
> I was reticent to hoist Truncate, since then its called twice on the success
> path, but if you're telling me its not hot enough to care, that's sounds nice.

You're in setCanEnablePrivilege here. You could put a sleep() in and I wouldn't complain :-).
Comment 87 dwitte@gmail.com 2010-12-15 16:36:26 PST
Comment on attachment 497383 [details] [diff] [review]
fallibile-ize JSString use in ctypes

>diff --git a/js/src/ctypes/CTypes.cpp b/js/src/ctypes/CTypes.cpp

> static JS_ALWAYS_INLINE bool
>-IsEllipsis(jsval v)
>+IsEllipsis(JSContext* cx, jsval v, bool* isEllipsis)

Guh. Is there a way to just iterate the three chars of the string infallibly?

>@@ -4732,17 +4745,20 @@ NewFunctionInfo(JSContext* cx,

>-    if (IsEllipsis(argTypes[i])) {
>+    bool isEllipsis;
>+    if (!IsEllipsis(cx, argTypes[i], &isEllipsis))
>+      return false;

This needs to JS_ReportError, or IsEllipsis needs to do so for all failure modes internally (and return a JSBool per convention, to imply that it's reported).

r=dwitte with that.
Comment 88 Nicholas Nethercote [:njn] 2010-12-15 16:41:06 PST
(In reply to comment #85)
> 
> At one point, nanojit mutated the contents of args in call(). Not sure if it
> still does that, but if so, it could bite here.

Really?  Yuk.  It doesn't do that any more.
Comment 89 Luke Wagner [:luke] 2010-12-15 21:47:35 PST
(In reply to comment #87)
> > static JS_ALWAYS_INLINE bool
> >-IsEllipsis(jsval v)
> >+IsEllipsis(JSContext* cx, jsval v, bool* isEllipsis)
> 
> Guh. Is there a way to just iterate the three chars of the string infallibly?

Technically, since we are traversing a single rope in an uninterrupted manner, we could use the same string-mutating dag-traversal I put in the GC.  In this case, it seems like the string will need to be flattened anyways later on, so there is not much perf to be had (albeit a little interface simplicity for the used-once IsEllipsis function).

> >@@ -4732,17 +4745,20 @@ NewFunctionInfo(JSContext* cx,
> 
> >-    if (IsEllipsis(argTypes[i])) {
> >+    bool isEllipsis;
> >+    if (!IsEllipsis(cx, argTypes[i], &isEllipsis))
> >+      return false;
> 
> This needs to JS_ReportError, or IsEllipsis needs to do so for all failure
> modes internally (and return a JSBool per convention, to imply that it's
> reported).

str->getChars(cx) does report errors (on the given cx), hence so does IsEllipsis.  I will change the return type to JSBool though (that's not a SM convention that I know of).
Comment 90 dwitte@gmail.com 2010-12-15 22:08:13 PST
(In reply to comment #89)
> str->getChars(cx) does report errors (on the given cx), hence so does
> IsEllipsis.  I will change the return type to JSBool though (that's not a SM
> convention that I know of).

Ah, cool. The convention thing is something jorendorff and I have done for ctypes at least; I wasn't meaning to speak for SM :)
Comment 91 Nicholas Nethercote [:njn] 2010-12-16 11:47:49 PST
OMG you got 13 reviews done in less than 3 days! :)
Comment 92 Luke Wagner [:luke] 2010-12-16 11:52:47 PST
There was some poking :)  Just waiting for try server results to land.
Comment 94 Igor Bukanov 2010-12-17 09:59:18 PST
http://hg.mozilla.org/tracemonkey/rev/dd1ae2902505 - followup to fix compilation errors under MOZ_CALLGRIND
Comment 96 neil@parkwaycc.co.uk 2011-01-13 03:38:32 PST
Any idea what's wrong with my build? I'm getting this error:
>jsutil.h(362) : error C2668: 'abs' : ambiguous call to overloaded function
>        math.h(539): could be 'long double abs(long double)'
>        math.h(491): or 'float abs(float)'
>        math.h(487): or 'double abs(double)'
>        math.h(485): or 'long abs(long)'
>        stdlib.h(415): or 'int abs(int)'
>        while trying to match the argument list '(__int64)'
>        jsstr.cpp(195) : see reference to function template instantiation 'void js::PodCopy<jschar>(T *,const T *,size_t)' being compiled
>        with
>        [
>            T=jschar
>        ]
Comment 97 Luke Wagner [:luke] 2011-01-13 09:22:30 PST
Yeah, we had to nix that abs() in bug 624218 since apparently the stdlib doesn't give us a 64-bit abs().  Not merged to m-c yet, but as a quickfix you can comment out the assertion.
Comment 98 neil@parkwaycc.co.uk 2012-12-31 03:16:59 PST
Comment on attachment 497395 [details] [diff] [review]
xpconnect changes

>+        new(static_cast<nsDependentString *>(this)) nsDependentString(chars, length);
Eww. This should have used Rebind(chars, length) instead.

I can't seem to find any callers for this, so the relevant code should probably switch to nsDependentString directly; I looked and found a similar construct in nsJSUtils.h but couldn't work out which attachment introduced it.

Note You need to log in before you can comment on or make changes to this bug.