Closed Bug 1052839 Opened 10 years ago Closed 10 years ago

selfhost string substr/slice/substring

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: h4writer, Assigned: h4writer)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 5 obsolete files)

I'm testing selfhosting more of our RegExp code. For this we need str.substr inlined. Offcourse this also helps our normal str.substr code.

I get a speed-up from 400ms to 58ms on the code given below. This is more than expected, so potentially there is still some error in the code. But I think 120ms finally wouldn't be far fetched. A speedup of 3x - 4x.


function test(a) {
  var output = ""
  for (var i=0; i<10000; i++) {
      var b = i%100;
      output += a.substr(b,b+10);
  }
  return output;
}

var start = new Date();
var a = "jfqkmsreazuidsqivcmkjfdsqpoiuezapoiufdsqjfqkmsreazuidsqivcmkjfdsqpoiuezapoiufdsqjfqkmsreazuidsqivcmkjfdsqpoiuezapoiufdsqjfqkmsreazuidsqivcmkjfdsqpoiuezapoiufdsqjfqkmsreazuidsqivcmkjfdsqpoiuezapoiufdsqjfqkmsreazuidsqivcmkjfdsqpoiuezapoiufdsqjfqkmsreazuidsqivcmkjfdsqpoiuezapoiufdsqjfqkmsreazuidsqivcmkjfdsqpoiuezapoiufdsqjfqkmsreazuidsqivcmkjfdsqpoiuezapoiufdsqjfqkmsreazuidsqivcmkjfdsqpoiuezapoiufdsqjfqkmsreazuidsqivcmkjfdsqpoiuezapoiufdsq"
for (var i=0; i<1000; i++) {
  test(a);
}
print(new Date() - start);
Attached patch inline-substr (obsolete) — Splinter Review
Assignee: nobody → hv1989
Attached patch inline-substr (obsolete) — Splinter Review
Found some faults already. Now it is 150ms. Which looks like a correct runtime.
Attachment #8471880 - Attachment is obsolete: true
Is it viable to inline String.substring & String.slice with the same machinery? Maybe possible to inline one and implement the other two in terms of the inlined one?
(In reply to Dave Garrett from comment #3)
> Is it viable to inline String.substring & String.slice with the same
> machinery? Maybe possible to inline one and implement the other two in terms
> of the inlined one?

Good question, I'll look into doing that.
Maybe we could self-host substr/substring/slice and have a very simple substr(string, int, int) intrinsic we can inline?
(In reply to Jan de Mooij [:jandem] from comment #5)
> Maybe we could self-host substr/substring/slice and have a very simple
> substr(string, int, int) intrinsic we can inline?

Yeah. That's better than the idea I currently had :D.
Attached patch Inline substr/slice/substring (obsolete) — Splinter Review
Attachment #8471943 - Attachment is obsolete: true
Attachment #8472957 - Flags: review?(till)
This gives 170ms on the given benchmark. I've added a note in the comments on how it would be possible to squeeze a bit more performance out of it. But I don't think it is worth the complexity atm.
Comment on attachment 8472957 [details] [diff] [review]
Inline substr/slice/substring

Review of attachment 8472957 [details] [diff] [review]:
-----------------------------------------------------------------

In general: very nice!

Still cancelling review because I'd much prefer implementations of the specced functions that contain "// Step N."-style comments and references to the spec paragraph being implemented. See String_codePointAt for an example.

I added some comments to the current version, but want to see a new one, too.

Oh, and please ask a JIT person for review for the inlining parts. I think I understand it and it looks good to me, but I'm certainly not confident my review is enough in that area.

::: js/src/builtin/String.js
@@ +3,5 @@
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  /*global intl_Collator: false, */
>  
> +function String_Substring(arg_begin, arg_end) { 

I'd prefer using `begin` and `end` as names here, but not *that* strongly. I do feel strongly about not using an `arg_` prefix, though. You could use `begin_` and `end_` instead.

nit: whitespace

@@ +10,5 @@
> +
> +    if (arguments.length == 0)
> +        return str;
> +
> +    var begin = ClampToInt32(arg_begin);

There isn't really a need to first clamp this and `end` to int32 and then to [0,strLength]. Please only do the latter for both.

@@ +11,5 @@
> +    if (arguments.length == 0)
> +        return str;
> +
> +    var begin = ClampToInt32(arg_begin);
> +    var strLength = str.length;

I wonder if the ordering of these calls is visible. Per spec, the length should be retrieved first, but I don't know if "ToString" strips off any String object we might have, so there cannot be a `length` getter. In any case, we should match the spec's ordering.

@@ +19,5 @@
> +    else if (begin > strLength)
> +        begin = strLength;
> +
> +    var end = strLength;
> +    if (arguments.length >= 2) {

This isn't correct: per spec, `undefined` triggers the default value, not only the argument missing completely. If we don't yet have a test for this, please add one.

@@ +35,5 @@
> +            }
> +        }
> +    }
> +
> +    return DoSubstr(str, begin | 0, (end - begin) | 0);

No need to coerce the parameters to int32: we already did so above, right? Or is this a hint to the range analysis which might not have gotten enough information above? If so, we should try improving that. Perhaps the range analysis could monitor the result of "ToInteger"?

@@ +38,5 @@
> +
> +    return DoSubstr(str, begin | 0, (end - begin) | 0);
> +}
> +
> +function String_Substr(arg_begin, arg_len) {

Please also change these parameter names.

@@ +42,5 @@
> +function String_Substr(arg_begin, arg_len) {
> +    CheckObjectCoercible(this);
> +    var str = ToString(this);
> +
> +    if (arguments.length == 0)

Is this path likely at all? I'd think it's not, so let's remove the `if` check.

@@ +57,5 @@
> +            begin = 0;
> +    }
> +
> +    var len = 0;
> +    if (arguments.length >= 2) {

Same as above: this has to check for `undefined`. Please also add a test if it's missing.

@@ +58,5 @@
> +    }
> +
> +    var len = 0;
> +    if (arguments.length >= 2) {
> +        len = ClampToInt32(arg_len);

This coercion has to be done before any early returns: it might throw, and returning early would make us not throw.

@@ +67,5 @@
> +    } else {
> +        len = strLength - begin;
> +    }
> +
> +    return DoSubstr(str, begin | 0, len | 0);

See comment on "|" above.

@@ +70,5 @@
> +
> +    return DoSubstr(str, begin | 0, len | 0);
> +}
> +
> +function String_Slice(arg_begin, arg_end) {

Mostly, the same comments as for the above two functions apply.

::: js/src/jit-test/tests/ion/bug928423.js
@@ +1,3 @@
>  function f(o, p) {
>      try {} catch(e) {};
> +    print("-", p);

I guess this is debugging code you forgot to remove?

::: js/src/jit/CodeGenerator.cpp
@@ +5233,5 @@
> +    Register temp = ToRegister(lir->temp());
> +
> +    Label isLatin1, fatInline, done, nonZero, isFatLatin1;
> +
> +    // For every edge case use the c++ variant.

uber-nit: "C++"

@@ +5234,5 @@
> +
> +    Label isLatin1, fatInline, done, nonZero, isFatLatin1;
> +
> +    // For every edge case use the c++ variant.
> +    // Note: we also use this upon allocationg failure in newGCString and

"allocation"

@@ +5235,5 @@
> +    Label isLatin1, fatInline, done, nonZero, isFatLatin1;
> +
> +    // For every edge case use the c++ variant.
> +    // Note: we also use this upon allocationg failure in newGCString and
> +    // newGCFatInlineString. To squeeze even more performance those failures

nit: "to squeeze out"

@@ +5236,5 @@
> +
> +    // For every edge case use the c++ variant.
> +    // Note: we also use this upon allocationg failure in newGCString and
> +    // newGCFatInlineString. To squeeze even more performance those failures
> +    // can be handled by ool allocate and returning to jit code to fill in all

I'm not 100% sure I understand what this means. Perhaps "by allocating in ool code and returning [...]"?

@@ +5251,5 @@
> +    masm.movePtr(ImmGCPtr(names.empty), output);
> +    masm.jump(&done);
> +    masm.bind(&nonZero);
> +
> +    // Use ool call for Ropes

nit: missing "." here and twice below

::: js/src/vm/SelfHosting.cpp
@@ +112,5 @@
>      return true;
>  }
>  
>  bool
> +js::intrinsic_DoSubstr(JSContext *cx, unsigned argc, Value *vp) {

I get that the String function is called this, but I really, truly dislike "DoFoo" names. We started naming intrinsics with an "_" prefix, so how about calling the C++ function "intrinsic_Substr" and the JS reflection "_Substr"?
Attachment #8472957 - Flags: review?(till)
Summary: Inline str.substr → Inline string substr/slice/substring
Comment on attachment 8472957 [details] [diff] [review]
Inline substr/slice/substring

Review of attachment 8472957 [details] [diff] [review]:
-----------------------------------------------------------------

Excited/spurred on by till's comments I decided to driveby some.  :-)

I'd admit to feeling comfortable reviewing the JIT changes for JS semantic correctness (not necessarily for JIT-usage-correctness), but I didn't do any reading of it here.

::: js/src/builtin/String.js
@@ +3,5 @@
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  /*global intl_Collator: false, */
>  
> +function String_Substring(arg_begin, arg_end) { 

Naming consistency suggests String_slice, String_substring, String_substr.

@@ +11,5 @@
> +    if (arguments.length == 0)
> +        return str;
> +
> +    var begin = ClampToInt32(arg_begin);
> +    var strLength = str.length;

ToString returns a string value, so no length getter.  (And even if it were a String object, the length property on such is non-writable non-configurable, so no worries about it.)

@@ +35,5 @@
> +            }
> +        }
> +    }
> +
> +    return DoSubstr(str, begin | 0, (end - begin) | 0);

Add some assertions that begin is actually an integer in the int32_t range, and that end - begin is non-negative (or positive, if the code above ensures that).

    assert(-0x80000000 <= begin && begin <= 0x7fffffff, "begin isn't an int32");
    assert(-0x80000000 <= (end - begin) && (end - begin) <= 0x7fffffff, "begin isn't an int32");

All the methods deserve similar sorts of things, before the bits that guarantee we're not screwing up int32_t-ness.

::: js/src/builtin/Utilities.js
@@ +96,5 @@
> +        return 0x7FFFFFFF;
> +    else if (v < -0x80000000) // INT32_MIN
> +        return -0x80000000;
> +    else
> +        return v;

Else after return non-sequitur thingamabob.

::: js/src/jit/CodeGenerator.cpp
@@ +5244,5 @@
> +                                    StoreRegisterTo(output));
> +    if (!ool)
> +        return false;
> +
> +    // Zero length, return emptystring

Is it possible this test might be better written in the methods as JS?  My second inclination (thinking yes originally) is no, that forces it to be written everywhere versus just once.  Verification of that assessment would be useful, to confirm we're on the same page.

::: js/src/jsstr.cpp
@@ +579,2 @@
>  {
>      /*

This method could use some sanity assertions, like begin < str->length(), begin + len < str->length(), etc.  Some of this probably happens in the methods ultimately called by this, but it's good documentation to have those requirements double-checked here.
Attached patch inline-substr (obsolete) — Splinter Review
Addresses review comments and I added Waldo to review the jit changes.

- s/DoSubstr/SubstrImpl/. I thought Substr was too close to str.substr. Which is totally different.
- I kept the |0 but added an explanation.
- @Waldo the if (length == 0) check doesn't really make a difference. It was just more convenient to do it in the codegen, instead of adding it before calling SubstrImpl everywhere.
Attachment #8472957 - Attachment is obsolete: true
Attachment #8478253 - Flags: review?(till)
Attachment #8478253 - Flags: review?(jwalden+bmo)
Comment on attachment 8478253 [details] [diff] [review]
inline-substr

Review of attachment 8478253 [details] [diff] [review]:
-----------------------------------------------------------------

Nice! Still a couple of nits to address, but looks very good.

General nit: "//"-style comments in self-hosted code, please. I know the code isn't 100% consistent on this, but it's the prevailing style.

Oh, and I didn't review the Ion stuff too closely, as my knowledge there hasn't improved meaningfully since the last review ;)

::: js/src/builtin/String.js
@@ +3,5 @@
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  /*global intl_Collator: false, */
>  
> +/* ES5 15.5.4.15 */

Please refer to the latest spec draft here and for the other functions. It's highly unlikely that much has changed, but it also doesn't make sense to implement an outdated spec. (And fix the steps accordingly, of course.)

@@ +26,5 @@
> +
> +    /* Step 7. */
> +    var finalEnd = std_Math_min(std_Math_max(intEnd, 0), len);
> +
> +    /* Step 8-9. */

uber-nit: "Steps"

@@ +37,5 @@
> +        to = finalStart;
> +    }
> +
> +    /* Step 10. */
> +    /* Remark: from and to - from are definitely in the range of int32,

can you mark code as such in this comment? E.g. with backticks: "Remark: `from` and `to - from` are [...]"

@@ +39,5 @@
> +
> +    /* Step 10. */
> +    /* Remark: from and to - from are definitely in the range of int32,
> +       since maximum string length is inside int32 range, but could be typed as
> +       double. Eagerly truncate since SubstrImpl only accepts int32. */

The first sentence isn't entirely clear, I think. Perhaps "While `from` and `to - from` are bounded to the length of `S` and thus definitely in the int32 range, they can still be typed as double."

I still wonder why the above logic doesn't give the range analysis enough information to guarantee that we treat them as int32, though.

@@ +52,5 @@
> +
> +    /* Step 2. */
> +    var intStart = ToInteger(start);
> +
> +    /* Step 3-4. */

uber-nit: "Steps"

@@ +70,5 @@
> +
> +    /* Step 8. */
> +    /* Remark: intStart and intLength are definitely in the range of int32,
> +       since maximum string length is inside int32 range, but could be typed as
> +       double. Eagerly truncate since SubstrImpl only accepts int32. */

Same comment as above.

@@ +101,5 @@
> +    /* Step 8. */
> +    var span = std_Math_max(to - from, 0);
> +
> +    /* Step 9. */
> +    /* Remark: intStart and span are definitely in the range of int32,

s/intStart/from/

@@ +103,5 @@
> +
> +    /* Step 9. */
> +    /* Remark: intStart and span are definitely in the range of int32,
> +       since maximum string length is inside int32 range, but could be typed as
> +       double. Eagerly truncate since SubstrImpl only accepts int32. */

Same comment as above.

::: js/src/jsstr.cpp
@@ +579,4 @@
>  {
> +    MOZ_ASSERT(0 <= begin);
> +    MOZ_ASSERT(begin < str->length());
> +    MOZ_ASSERT(begin + len < str->length());

Given that this checks almost exactly the same things as `intrinsic_SubstrImpl` does, can you move the asserts from the latter here, potentially applying the cleanups I suggested there?

Or does this function handle negative values for `len` and thus behaves slightly differently from the intrinsic? If so, why the difference?

::: js/src/vm/SelfHosting.cpp
@@ +129,5 @@
> +
> +    DebugOnly <int32_t> str_length = str->length();
> +    JS_ASSERT(0 <= begin && begin <= str_length);
> +    JS_ASSERT(0 <= length && length <= str_length);
> +    JS_ASSERT(0 <= begin + length && begin + length <= str_length);

I think it makes sense to simplify these asserts somewhat:

JS_ASSERT(begin >= 0);
JS_ASSERT(length >= 0);
JS_ASSERT(begin + length <= str->length());

This checks all the conditions we really care about, and doesn't check more than one condition per assert. (It also allows you to drop the DebugOnly.)
Attachment #8478253 - Flags: review?(till) → review+
Attached patch Inline substr/slice/substring (obsolete) — Splinter Review
- Use ES6 draft numbering
- Adjust comments
- Added extra remark on deviation to spec (with reasoning why it doesn't make a difference)
- Only check being unsigned in the intrinsic
- Reordered the compares in SubstrImpl to avoid overflow

Note: about why "| 0" is needed. I haven't investigated it yet. But I wouldn't be suprised it is because something could overflow. When I tested in the beginning I needed it. Could be it isn't needed anymore. Though the "| 0" might still help to improve codegeneration ...
Attachment #8478253 - Attachment is obsolete: true
Attachment #8478253 - Flags: review?(jwalden+bmo)
Attachment #8478347 - Flags: review?(till)
Attachment #8478347 - Flags: review?(jwalden+bmo)
(In reply to Hannes Verschore [:h4writer] from comment #14)
> Note: about why "| 0" is needed. I haven't investigated it yet. But I
> wouldn't be suprised it is because something could overflow. When I tested
> in the beginning I needed it. Could be it isn't needed anymore. Though the
> "| 0" might still help to improve codegeneration ...

In the best case we would have debug-only "JS_ASSERT" in JS testing the number is indeed an int32...
Comment on attachment 8478347 [details] [diff] [review]
Inline substr/slice/substring

Review of attachment 8478347 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.

::: js/src/vm/SelfHosting.cpp
@@ +128,5 @@
> +    int32_t length = args[2].toInt32();
> +
> +    DebugOnly <int32_t> str_length = str->length();
> +    JS_ASSERT(0 <= begin);
> +    JS_ASSERT(0 <= str_length);

I suppose you wanted to check `length` instead of `str_length` here? Also, I'm not sure why you distributed the asserts across this function and js:SubstrImpl. Keeping them all together seems to make more sense to me. Your call, though.
Attachment #8478347 - Flags: review?(till) → review+
Comment on attachment 8478347 [details] [diff] [review]
Inline substr/slice/substring

Review of attachment 8478347 [details] [diff] [review]:
-----------------------------------------------------------------

Here are comments on everything *except* CodeGenerator.cpp.  I'll do that later today.

::: js/src/builtin/String.js
@@ +3,5 @@
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  /*global intl_Collator: false, */
>  
> +/* ES6 Draft July 18, 2014 21.1.3.19 */

I'm reviewing against the August 24 draft, so update these references, please.  (If your steps aren't consistent with it, I'll say something.)

@@ +7,5 @@
> +/* ES6 Draft July 18, 2014 21.1.3.19 */
> +function String_substring(start, end) {
> +    // Steps 1-3.
> +    CheckObjectCoercible(this);
> +    var S = ToString(this);

While the spec might use S/O for string/object, I think we've generally considered these a bit too terse/unreadable and gone with str/obj instead for such variables: a rare exception to adhering to spec naming.

@@ +35,5 @@
> +        to = finalStart;
> +    }
> +
> +    // Step 11.
> +    // Remark: `from` and `to - from` are definitely in the range of int32,

|| seems somewhat more commonly used, to me, for code delineation in comments.

@@ +45,5 @@
> +/* ES6 Draft July 18, 2014 B.2.3.1 */
> +function String_substr(start, length) {
> +    // Steps 1-2.
> +    CheckObjectCoercible(this);
> +    var S = ToString(this);

str

@@ +54,5 @@
> +    // Steps 5-7.
> +    var size = S.length;
> +    // Remark: Using `size` instead of +Infinity. Doesn't make a difference
> +    // resultwise, but avoids having doubles.
> +    var intLength = (length === undefined) ? size : ToInteger(length);

Why not |end| for the name here?

@@ +61,5 @@
> +    if (intStart < 0)
> +        intStart = std_Math_max(intStart + size, 0);
> +
> +    // Step 9.
> +    var intLength = std_Math_min(std_Math_max(intLength, 0), size - intStart)

And |resultLength| here.

@@ +78,5 @@
> +/* ES6 Draft July 18, 2014 21.1.3.16 */
> +function String_slice(start, end) {
> +    // Steps 1-3.
> +    CheckObjectCoercible(this);
> +    var S = ToString(this);

str

@@ +102,5 @@
> +    // Step 10.
> +    // Remark: While `from` and `to - from` are bounded to the length of `S`
> +    // and thus definitely in the int32 range, they can still be typed as
> +    // double. Eagerly truncate since SubstrImpl only accepts int32.
> +    return SubstrImpl(S, from | 0, span | 0);

You could check for |span === 0| or |from > to|, perhaps, and bail early, for consistency with the other methods.

::: js/src/jit/MCallOptimize.cpp
@@ +1457,5 @@
>  IonBuilder::InliningStatus
> +IonBuilder::inlineSubstrImpl(CallInfo &callInfo)
> +{
> +    if (callInfo.argc() != 3 || callInfo.constructing())
> +        return InliningStatus_NotInlined;

Could we just flat-out assert that argc() == 3, and that !constructing()?  We control all users.  We don't want anyone using this any other way than in the JITtable way.

@@ +1465,5 @@
> +        return InliningStatus_NotInlined;
> +
> +    // This: String.
> +    if (callInfo.getArg(0)->type() != MIRType_String)
> +        return InliningStatus_NotInlined;

Comment is wrong -- it's checking argument 0, not |this|.  Your subsequent comments are all thus off by one, too.

Again, tho, is there a reason we can't just assert all the argument types are right?  (I probably wouldn't assert return type, that I'd let be non-inlinable.  But the rest is clearly/easily enforced by callers.)

::: js/src/jit/MIR.h
@@ +6230,5 @@
>  };
>  
> +class MSubstr
> +    : public MTernaryInstruction,
> +      public Mix3Policy<StringPolicy<0>, IntPolicy<1>, IntPolicy<2>>

class MSubstringInteral
  : public MTernaryInstruction,
    public Mix3Policy<StringPolicy<0>, IntPolicy<1>, IntPolicy<2>>
{
...

::: js/src/jsstr.cpp
@@ +574,5 @@
>      return true;
>  }
>  
> +JSString *
> +js::SubstrImpl(JSContext *cx, JSString *str, size_t begin, size_t len)

Minor nitpick, but could we please name this with complete words?  "substr" just to save a few characters, or for consistency with Perl that none of us know or care about (spec is all that matters any more), isn't worth it.  SubstringKernel or SubstringInternal or something.

::: js/src/vm/SelfHosting.cpp
@@ +19,5 @@
>  #include "builtin/SelfHostingDefines.h"
>  #include "builtin/TypedObject.h"
>  #include "builtin/WeakSetObject.h"
>  #include "gc/Marking.h"
> +#include "mozilla/DebugOnly.h"

I believe this goes just underneath #include "vm/SelfHosting.h" in its own includes section.  Check that using |make -C objdir check-style|.

@@ +116,5 @@
>      return true;
>  }
>  
> +bool
> +js::intrinsic_SubstrImpl(JSContext *cx, unsigned argc, Value *vp) {

{ on new line.

@@ +120,5 @@
> +js::intrinsic_SubstrImpl(JSContext *cx, unsigned argc, Value *vp) {
> +    CallArgs args = CallArgsFromVp(argc, vp);
> +    JS_ASSERT(args[0].isString());
> +    JS_ASSERT(args[1].isInt32());
> +    JS_ASSERT(args[2].isInt32());

MOZ_ASSERT, please.

@@ +128,5 @@
> +    int32_t length = args[2].toInt32();
> +
> +    DebugOnly <int32_t> str_length = str->length();
> +    JS_ASSERT(0 <= begin);
> +    JS_ASSERT(0 <= str_length);

Yeah, no need for these assertions if SubstringKernel is just going to do them again.

@@ +130,5 @@
> +    DebugOnly <int32_t> str_length = str->length();
> +    JS_ASSERT(0 <= begin);
> +    JS_ASSERT(0 <= str_length);
> +
> +    args.rval().setString(SubstrImpl(cx, str, begin, length));

You can't just assign the result of calling the method here.  If you OOM and can't allocate a string, this will return null, but setString asserts that its input isn't null.  Instead compute this value, null-check and return false if null, then do this setString and return true if the substring computation succeeded.

::: js/src/vm/String.h
@@ +684,5 @@
>    public:
>      static inline JSLinearString *new_(js::ExclusiveContext *cx, JSLinearString *base,
>                                         size_t start, size_t length);
> +
> +    inline static size_t offsetOfBase() {

static before inline, please, for general consistency in having static/virtual/nothing first.
Comment on attachment 8478347 [details] [diff] [review]
Inline substr/slice/substring

Review of attachment 8478347 [details] [diff] [review]:
-----------------------------------------------------------------

I would possibly r+ this, but the patch doesn't work for a few reasons trivially pointed out by jstests.  The total changes to fix that and address remaining nits are small enough I could maybe let it slide, but I'd rather r- in a fit of pique over your failure to run jstests at all.  :-P  (But's pique with a purpose, dangit!  jstests are real.  They point out bugs.  Run them.)

And it probably wouldn't hurt to have one last look at the CodeGenerator.cpp adjustments, in any case.  But mostly this is the jstests thing.  :-)

::: js/src/jit/CodeGenerator.cpp
@@ +5288,5 @@
>      masm.ret();
>  }
>  
> +
> +typedef JSString *(*SubstrImplFn)(JSContext *cx, JSString *str, size_t begin, size_t len);

Hmm.  I'm surprised this doesn't have to be a handle.  Maybe should be for general consistency.

@@ +5311,5 @@
> +    OutOfLineCode *ool = oolCallVM(SubstrImplInfo, lir,
> +                                   (ArgList(), input, begin, length),
> +                                    StoreRegisterTo(output));
> +    if (!ool)
> +        return false;

Given the number of slow-path and error-path exits in this code, I would probably introduce

    Label *slowPath = ool->entry();

and use that everywhere here.

@@ +5322,5 @@
> +    masm.bind(&nonZero);
> +
> +    // Use ool call for Ropes.
> +    masm.load32(Address(input, JSString::offsetOfFlags()), temp);
> +    masm.branchTest32(Assembler::Zero, temp, Imm32(JSString::TYPE_FLAGS_MASK), ool->entry());

Add this:

    static_assert(JSString::ROPE_FLAGS == 0,
                  "rope flags must be zero for (flags & TYPE_FLAGS_MASK) == 0 "
                  "to be a valid is-rope check");

@@ +5324,5 @@
> +    // Use ool call for Ropes.
> +    masm.load32(Address(input, JSString::offsetOfFlags()), temp);
> +    masm.branchTest32(Assembler::Zero, temp, Imm32(JSString::TYPE_FLAGS_MASK), ool->entry());
> +
> +    masm.branchTest32(Assembler::NonZero, temp, Imm32(JSString::INLINE_CHARS_BIT), &fatInline);

Hmm.  Is it truly faster/smaller code to load the flags into a register, then perform tests on them?  Or is it better to have

    Address flags(input, JSString::offsetOfFlags());
    masm.branchTest32(Assembler::Zero, flags, Imm32(JSString::TYPE_FLAGS_MASK), ool->entry());
    masm.branchTest32(Assembler::NonZero, flags, Imm32(JSString::INLINE_CHARS_BIT), &fatInline);

I'm not sure.  Micro-ops are probably going to be identical for the two cases, or near enough as makes no difference.  Please add a comment justifying the right choice, whatever it happens to be.

The name |fatInline| is confusing, because "fat" typically is a qualifier, not an expansion of meaning.  Make this |createInline| or something so that it's clear it's handling inline strings writ large (those including both fat and non-fat variants).

It seems clearer to me to have the cases handled this way:

    test for str->length() == 0, jump to &nonZero if so
    bind "";
    jump to &done;

    nonZero:
    test for str->isRope(), jump to OOL if so

    test for str->isInline(), jump to &notInline if not so
    handle inline case;
    jump to &done;

    notInline:
    handle dependent case;

    done:

This places each condition immediately adjacent to the code that computes the returned string in that case.

@@ +5326,5 @@
> +    masm.branchTest32(Assembler::Zero, temp, Imm32(JSString::TYPE_FLAGS_MASK), ool->entry());
> +
> +    masm.branchTest32(Assembler::NonZero, temp, Imm32(JSString::INLINE_CHARS_BIT), &fatInline);
> +
> +    // Create DependentString.

It's probably worth a note along these lines here:

    // One minor hazard of the dependent-string concept arises if dependent
    // strings may themselves be depended upon.  A very small string might
    // depend upon successively larger dependent strings, all the way to a
    // large base string.
    //
    // We have to keep the base string.  But there's no reason the intermediate
    // strings need be depended upon.  In fact, JSDependentString::new_ only
    // returns dependent strings *not* dependent upon another dependent string,
    // by iterating from the input dependent string through its bases to the
    // ultimate base string (adjusting start index accordingly).
    //
    // For simplicity here, we don't worry about creating dependent-string
    // chains.  If such chains become a problem, we should revisit this
    // decision.

For that matter, did we ever deliberately reach the conclusion posited in this comment?  In the short run I'm all for landing what we have.  But if we didn't deliberately choose to let this code create long dependency chains, we should have a followup to evaluate the tradeoffs (number of strings created, GC latency effects, how many multiply-depended strings are actually created as a fraction of total dependent-string population, possibly others) in depending upon the input as opposed to its ultimate base.  It wouldn't surprise me if the loop-to-base behavior had to be added here, if this case ends up being where most dependent strings are created in practice.

@@ +5335,5 @@
> +    masm.load32(Address(input, JSString::offsetOfFlags()), temp);
> +    masm.branchTest32(Assembler::NonZero, temp, Imm32(JSString::LATIN1_CHARS_BIT), &isLatin1);
> +    {
> +        masm.store32(Imm32(JSString::DEPENDENT_FLAGS), Address(output, JSString::offsetOfFlags()));
> +        masm.loadPtr(Address(input, JSString::offsetOfNonInlineChars()), temp);

Could you add a static_assert(offsetof(JSString, d.s.u2.nonInlineCharsTwoByte) == offsetof(JSString, d.s.u2.nonInlineCharsLatin1), "callers assume both inline chars start at the same location"); to JSString::offsetOfNonInlineChars() to verify the assumption we're making here and below?

@@ +5346,5 @@
> +    {
> +        masm.store32(Imm32(JSString::DEPENDENT_FLAGS | JSString::LATIN1_CHARS_BIT),
> +                     Address(output, JSString::offsetOfFlags()));
> +        masm.loadPtr(Address(input, JSString::offsetOfNonInlineChars()), temp);
> +        masm.computeEffectiveAddress(BaseIndex(temp, begin, ScaleFromElemWidth(sizeof(char))), temp);

Either do both of these in a single line as here, or do both of them on two lines as in the non-Latin1 case.

@@ +5373,5 @@
> +    {
> +        masm.store32(Imm32(JSString::INIT_FAT_INLINE_FLAGS | JSString::LATIN1_CHARS_BIT),
> +                           Address(output, JSString::offsetOfFlags()));
> +        masm.computeEffectiveAddress(Address(input, JSInlineString::offsetOfInlineStorage()), temp);
> +        masm.computeEffectiveAddress(BaseIndex(temp, begin, ScaleFromElemWidth(sizeof(char))), begin);

Again, either both on a single line, or both on two lines.  I'd go with two lines, this is getting unwieldy.

@@ +5375,5 @@
> +                           Address(output, JSString::offsetOfFlags()));
> +        masm.computeEffectiveAddress(Address(input, JSInlineString::offsetOfInlineStorage()), temp);
> +        masm.computeEffectiveAddress(BaseIndex(temp, begin, ScaleFromElemWidth(sizeof(char))), begin);
> +        masm.computeEffectiveAddress(Address(output, JSInlineString::offsetOfInlineStorage()), temp);
> +        CopyStringChars(masm, temp, begin, length, input, sizeof(char), sizeof(char));

Looking at CopyStringChars, I'm surprised there isn't some super-awesome string instruction or something we should be using some of the time on x86/x64 or something.  Maybe it got punted to later or something, in favor of simple to start.

@@ +5376,5 @@
> +        masm.computeEffectiveAddress(Address(input, JSInlineString::offsetOfInlineStorage()), temp);
> +        masm.computeEffectiveAddress(BaseIndex(temp, begin, ScaleFromElemWidth(sizeof(char))), begin);
> +        masm.computeEffectiveAddress(Address(output, JSInlineString::offsetOfInlineStorage()), temp);
> +        CopyStringChars(masm, temp, begin, length, input, sizeof(char), sizeof(char));
> +        masm.store8(Imm32(0), Address(temp, 0));

Blah, null-termination.  We need to get rid of null-termination as a JSAPI feature.  :-(

::: js/src/jsstr.cpp
@@ +579,3 @@
>  {
> +    MOZ_ASSERT(begin < str->length());
> +    MOZ_ASSERT(len < str->length() - begin);

On a whim I brushed off the patch to do some manual testing, and coincidentally the very first test I tried showed this is buggy:

js> "01234567890123456789012".substring(0)
Assertion failure: len < str->length() - begin, at /home/jwalden/moz/slots/js/src/jsstr.cpp:581

These both need to be <=.  Please add tests for both cases:

"foo".substring(0);
"foo".substring(3);

Or, um.  No, we have plenty of jstests for these, and basically every jstest failure with this patch is due to this concern.  So no new tests are wanted.  Please run jstests before posting patches -- particularly when touching standard library sorts of things.  jit-test coverage is extremely spotty when it comes to anything not testing the JITs.

@@ +4007,5 @@
>      JS_FN("match",             str_match,             1,JSFUN_GENERIC_NATIVE),
>      JS_FN("search",            str_search,            1,JSFUN_GENERIC_NATIVE),
>      JS_FN("replace",           str_replace,           2,JSFUN_GENERIC_NATIVE),
>      JS_FN("split",             str_split,             2,JSFUN_GENERIC_NATIVE),
> +    JS_SELF_HOSTED_FN("substr", "String_substr",      2,JSFUN_GENERIC_NATIVE),

It's not valid to use JSFUN_GENERIC_NATIVE with a self-hosted function.  You have to have a separate self-hosted static version, instead, registered in the static-methods JSFunctionSpec[].  Do something like what ArrayStaticIndexOf does to address this.

This is another thing that would have been pointed out by running jstests.  Do it!
Attachment #8478347 - Flags: review?(jwalden+bmo) → review-
This should address the comments. Good point about running jstests, I never had, since my changes are in most cases jit changes (which don't really get triggered in jstests). This patch is off course something totally different.
Attachment #8478347 - Attachment is obsolete: true
Attachment #8494103 - Flags: review?(jwalden+bmo)
Has this slipped from the radar? Asking a review ping.
Comment on attachment 8494103 [details] [diff] [review]
Inline substr/slice/substring

Review of attachment 8494103 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/String.js
@@ +3,5 @@
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  /*global intl_Collator: false, */
>  
> +/* ES6 Draft Aug 24, 2014 21.1.3.19 */

There's an October 14, 2014 draft now, so bump the reference.

@@ +7,5 @@
> +/* ES6 Draft Aug 24, 2014 21.1.3.19 */
> +function String_substring(start, end) {
> +    // Steps 1-3.
> +    CheckObjectCoercible(this);
> +    var S = ToString(this);

It's not quite as bad a name as |O| as appears in some of the object spec algorithms, but I think typically we've named these |str| or similar.  Slight deviation from spec naming, but it gives a better handle for referring to things, and for reading.  And someone doing necessarily-careful implementation/spec algorithm comparisons can internalize the name difference fairly readily, I think.

@@ +51,5 @@
> +/* ES6 Draft Aug 24, 2014 B.2.3.1 */
> +function String_substr(start, length) {
> +    // Steps 1-2.
> +    CheckObjectCoercible(this);
> +    var S = ToString(this);

str

@@ +59,5 @@
> +
> +    // Steps 5-7.
> +    var size = S.length;
> +    // Remark: Using |size| instead of +Infinity. Doesn't make a difference
> +    // resultwise, but avoids having doubles.

"Remark: " is redundant, please remove.  And don't have sentence fragments:

// Use |size| instead of +Infinity to avoid performing calculations with doubles.  (The result's the same either way.)

Or something like that.

@@ +90,5 @@
> +/* ES6 Draft Aug 24, 2014 21.1.3.16 */
> +function String_slice(start, end) {
> +    // Steps 1-3.
> +    CheckObjectCoercible(this);
> +    var S = ToString(this);

str

@@ +113,5 @@
> +
> +    // Step 10.
> +    // Remark: While |from| and |span| are bounded to the length of |S|
> +    // and thus definitely in the int32 range, they can still be typed as
> +    // double. Eagerly truncate since SubstrImpl only accepts int32.

Again no Remark:.

::: js/src/jit/CodeGenerator.cpp
@@ +5314,5 @@
> +    FunctionInfo<SubstringKernelFn>(SubstringKernel);
> +
> +bool CodeGenerator::visitSubstr(LSubstr *lir)
> +{
> +    Register input = ToRegister(lir->string());

Hm, maybe |string| or something is a better name for this.  "input" is quite vague, particularly when there are multiple "inputs".

@@ +5375,5 @@
> +        masm.store32(Imm32(JSString::INIT_FAT_INLINE_FLAGS | JSString::LATIN1_CHARS_BIT),
> +                     Address(output, JSString::offsetOfFlags()));
> +        masm.computeEffectiveAddress(inputStorage, temp);
> +        BaseIndex chars(temp, begin, ScaleFromElemWidth(sizeof(char)));
> +        masm.computeEffectiveAddress(chars, begin);

Again (I'm reviewing this a little out of order) seems nicer to static_assert that scaling is unnecessary.

@@ +5403,5 @@
> +        masm.store32(Imm32(JSString::DEPENDENT_FLAGS | JSString::LATIN1_CHARS_BIT),
> +                     Address(output, JSString::offsetOfFlags()));
> +        masm.loadPtr(Address(input, JSString::offsetOfNonInlineChars()), temp);
> +        BaseIndex chars(temp, begin, ScaleFromElemWidth(sizeof(char)));
> +        masm.computeEffectiveAddress(chars, temp);

I might eliminate the scaling here to eliminate runtime (even if possibly only at codegen time) computation costs.  That, and add a static_assert(sizeof(char) == 1, "begin index shouldn't need scaling"); for clarity.

@@ +5405,5 @@
> +        masm.loadPtr(Address(input, JSString::offsetOfNonInlineChars()), temp);
> +        BaseIndex chars(temp, begin, ScaleFromElemWidth(sizeof(char)));
> +        masm.computeEffectiveAddress(chars, temp);
> +        masm.store32(temp, Address(output, JSString::offsetOfNonInlineChars()));
> +        masm.jump(done);

Fall through (with a comment to that effect) -- again, avoid JIT generation costs.

::: js/src/jit/MCallOptimize.cpp
@@ +1470,5 @@
>  
>  IonBuilder::InliningStatus
> +IonBuilder::inlineSubstringKernel(CallInfo &callInfo)
> +{
> +    MOZ_ASSERT(callInfo.argc() == 3 && !callInfo.constructing());

Make this two assertions, so a failure will report a more-precise error.

::: js/src/jsstr.cpp
@@ +575,5 @@
>  }
>  
> +bool
> +js::SubstringKernel(JSContext *cx, HandleString str, int32_t signed_begin, int32_t signed_len,
> +                    MutableHandleString substr)

Erm, why is this returning a bool that's seemingly a success/failure code, *and* storing a string value into an outparam?  Why isn't it returning the string directly, as before?  Doing it this way makes it possible for the outparam and return value to be inconsistent.

Or is this a necessary consequence of making |str| into a handle?  :-(  If so, revert back to what it was before, I guess (with an explicit Rooted to preserve the string through the various operations in this method).  A non-handle string is better than potential inconsistencies here.

Preference for beginInt and lenInt or so instead of underscore_names, which are mostly the exception to the rule of camelCaps in SpiderMonkey.

@@ +600,5 @@
>  
>          /* Substring is totally in leftChild of rope. */
>          if (begin + len <= rope->leftChild()->length()) {
> +            substr.set(NewDependentString(cx, rope->leftChild(), begin, len));
> +            return true;

...as they may be inconsistent here.  NewDependentString can OOM and return null, in which case it's not correct to return true here.

@@ +609,3 @@
>              begin -= rope->leftChild()->length();
> +            substr.set(NewDependentString(cx, rope->rightChild(), begin, len));
> +            return true;

Or here.

@@ +623,5 @@
>  
>          Rooted<JSRope *> ropeRoot(cx, rope);
>          RootedString lhs(cx, NewDependentString(cx, ropeRoot->leftChild(), begin, lhsLength));
>          if (!lhs)
> +            return false;

But note how here, a NewDependentString OOM *is* treated as failure through the return value.  Inconsistent, not cool.

::: js/src/jsstr.h
@@ +248,5 @@
>      }
>      return true;
>  }
>  
> +bool

Please add a quick overview of what the function does (with a note about the senses of the begin/length arguments for anyone who thinks they might be allowed to get away with anything but the obvious meanings):

/*
 * Computes |str|'s substring for the range [beginInt, beginInt + lengthInt).
 * Negative, overlarge, swapped, etc. |beginInt| and |lengthInt| are forbidden
 * and constitute API misuse.
 */

::: js/src/vm/Shape.h
@@ +1061,5 @@
>      void fixupAfterMovingGC();
>  #endif
>  
>      /* For JIT usage */
> +    inline static size_t offsetOfBase() { return offsetof(Shape, base_); }

Not sure why this change.  And in any event, we're very consistent about the ordering of stuff in a method declaration being [static or virtual] inline? T methodName(), so this is wrong anyway.

::: js/src/vm/String.h
@@ +685,5 @@
>    public:
>      static inline JSLinearString *new_(js::ExclusiveContext *cx, JSLinearString *base,
>                                         size_t start, size_t length);
> +
> +    inline static size_t offsetOfBase() {

static inline
Attachment #8494103 - Flags: review?(jwalden+bmo) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/88041cfff520
Summary: Inline string substr/slice/substring → selfhost string substr/slice/substring
Hannes, when you land this again, can you please fix the signed/unsigned comparison in the MOZ_ASSERT at the top of SubstringKernel. Thanks!
Depends on: 1094827
Depends on: 1099463
https://hg.mozilla.org/mozilla-central/rev/34859490061a
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Blocks: 1102001
Depends on: 1108007
You need to log in before you can comment on or make changes to this bug.