Closed Bug 1480819 Opened 6 years ago Closed 6 years ago

Clean up for generateRegExpMatcherStub

Categories

(Core :: JavaScript Engine: JIT, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: anba, Assigned: anba)

References

Details

Attachments

(7 files, 2 obsolete files)

      No description provided.
JitRealm::generateRegExpMatcherStub() needs five temporary registers, but on x86 we can only get up to four temporary registers. To get around that limitation, generateRegExpMatcherStub() tries to reuse an existing register as a temporary register and later restores the register to its original value.

Except that the actual code doesn't really restore the register in all cases! But that's actually fine and only shows that the whole supporting code to restore the register can be removed.

In detail:
- These two lines [1] are never executed, because the preceding loop never falls through.
- These lines [2] don't work correctly for Latin-1 strings, because for Latin-1 strings we directly jump [3] over the code to save the |lastIndex| register [4].

[1] https://searchfox.org/mozilla-central/rev/e52cd92858800a69b74cb97d26d9bdb960d611ca/js/src/jit/CodeGenerator.cpp#2081-2082
[2] https://searchfox.org/mozilla-central/rev/e52cd92858800a69b74cb97d26d9bdb960d611ca/js/src/jit/CodeGenerator.cpp#2084-2092
[3] https://searchfox.org/mozilla-central/rev/e52cd92858800a69b74cb97d26d9bdb960d611ca/js/src/jit/CodeGenerator.cpp#2035
[4] https://searchfox.org/mozilla-central/rev/e52cd92858800a69b74cb97d26d9bdb960d611ca/js/src/jit/CodeGenerator.cpp#2043-2044



Test case to show the incorrect behaviour:
---
var RegExpMatcher = getSelfHostedValue("RegExpMatcher")

function f() {
    var n = 100;
    var grp = 4;
    var a = ["aaaA" + "a".repeat(n * grp), "bbbB" + "b".repeat(n * grp)];
    var r = RegExp("[AB]" + ("(" + ".".repeat(n) + ")").repeat(grp));

    oomTest(function(){
        for (var i = 0; i < 500; ++i) {
            var ret = RegExpMatcher(r, a[i&1], 1);
        }
    });
}

while (true) {
    f();
}
---

When run under gdb, install a conditional break point |br js::RegExpMatcherRaw if lastIndex!=1|. Eventually gdb should stop with |lastIndex| unequal to 1.
Attachment #8997485 - Flags: review?(mgaudet)
When we switch |base| and |temp2| in CreateDependentString::generate(), we no longer have to special case |temp5| in generateRegExpMatcherStub(), because |base| is always a register which supports single byte instructions (|base| is |input| from generateRegExpMatcherStub(), which is pinned to CallTempReg1 [1], and CallTempReg1 in turn is pinned to eax [2]). And even without all this exact register selection, AutoEnsureByteRegister is nowadays available to ensure the correct register for single-byte instructions is used.

And this change is also necessary for part 6.


[1] https://searchfox.org/mozilla-central/rev/e52cd92858800a69b74cb97d26d9bdb960d611ca/js/src/jit/x86/Assembler-x86.h#120
[2] https://searchfox.org/mozilla-central/rev/e52cd92858800a69b74cb97d26d9bdb960d611ca/js/src/jit/x86/Assembler-x86.h#57
[3] https://searchfox.org/mozilla-central/rev/e52cd92858800a69b74cb97d26d9bdb960d611ca/js/src/jit/x86-shared/MacroAssembler-x86-shared.h#255
Attachment #8997489 - Flags: review?(mgaudet)
Renames the temporary registers so they start at |temp1| again. And separates the parameter names in CreateMatchResultFallback() from the names in generateRegExpMatcherStub().
Attachment #8997492 - Flags: review?(mgaudet)
From what I've understood about volatile registers, it's not necessary to manually save non-volatile registers when performing an ABI call. That's correct, right?
Attachment #8997493 - Flags: review?(mgaudet)
Instead of setting the CreateDependentString fields when calling CreateDependentString::generate(), directly set them in the constructor, so it's easier to spot for the reader which registers are owned by CreateDependentString.
Attachment #8997494 - Flags: review?(mgaudet)
Add some helpers to MacroAssembler for loading and storing character pointers based on CharEncoding to avoid code duplication. And move some additional code into lambda functions to avoid even more code duplication.

And removed these casts [1] because AFAICT they are no longer required by our supported compilers. See [2] for the relevant C++ standards change.

Also removed the unused MacroAssembler::leaNewDependentStringBase() method.

[1] https://searchfox.org/mozilla-central/rev/e52cd92858800a69b74cb97d26d9bdb960d611ca/js/src/jit/CodeGenerator.cpp#1717-1718,1726-1727
[2] http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#712
Attachment #8997503 - Flags: review?(mgaudet)
The last patch for this bug.

- Add named constants for slot numbers and match pair result.
- Add comments about the stack layout to the RegExp stubs.
- Change [1] from subPtr to sub32, because other code [2] already treats |lastIndex| as an int32, so in my opinion it's clearer to use sub32 here.
- Change access to match-result template object to use only TemplateObject.
- Change some variable names for clarity. 
- Make MatchPair a final struct, because the JIT code relies on MatchPair being a simple struct with two int32 fields and sub-classed structs can't really be handled.


[1] https://searchfox.org/mozilla-central/rev/e52cd92858800a69b74cb97d26d9bdb960d611ca/js/src/jit/CodeGenerator.cpp#1505
https://searchfox.org/mozilla-central/rev/e52cd92858800a69b74cb97d26d9bdb960d611ca/js/src/jit/CodeGenerator.cpp#1483,1485
Attachment #8997508 - Flags: review?(mgaudet)
Comment on attachment 8997485 [details] [diff] [review]
bug1480819-part1-remove-register-restore.patch

Review of attachment 8997485 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, and a very helpful explanation of the change.

::: js/src/jit/CodeGenerator.cpp
@@ +2078,5 @@
>              masm.jump(&matchLoop);
>          }
>  
> +#ifdef DEBUG
> +        masm.assumeUnreachable("The match string loop doesn't fall through.");

I appreciate this as a way of enforcing your invariant!
Attachment #8997485 - Flags: review?(mgaudet) → review+
Comment on attachment 8997489 [details] [diff] [review]
bug1480819-part2-switch-temp-registers.patch

Review of attachment 8997489 [details] [diff] [review]:
-----------------------------------------------------------------

My only comment was going to be what turns out to be part 3.
Attachment #8997489 - Flags: review?(mgaudet) → review+
Attachment #8997492 - Flags: review?(mgaudet) → review+
Comment on attachment 8997493 [details] [diff] [review]
bug1480819-part4-dont-save-non-volatile.patch

Review of attachment 8997493 [details] [diff] [review]:
-----------------------------------------------------------------

You are definitely right about the definition of volatile registers. 

Looking at the set of callers to LiveRegisterSet::addUnchecked [1], makes me wonder if this wouldn't be worth an audit/interface change to avoid these kinds of issues (i.e. creating a class for saving a subset of the volatile registers that doesn't provide a facility for adding non-voltile registers to the set; perhaps after we decide on Bug 1475001). 

[1]: https://searchfox.org/mozilla-central/search?q=symbol:_ZN2js3jit16LiveSetAccessorsINS0_11RegisterSetEE12addUncheckedENS0_8RegisterE&redirect=false
Attachment #8997493 - Flags: review?(mgaudet) → review+
Comment on attachment 8997494 [details] [diff] [review]
bug1480819-part5-create-dependent-members.patch

Review of attachment 8997494 [details] [diff] [review]:
-----------------------------------------------------------------

Great improvements :)

::: js/src/jit/CodeGenerator.cpp
@@ +1874,5 @@
>  void
>  CreateDependentString::generateFallback(MacroAssembler& masm)
>  {
> +    JitSpew(JitSpew_Codegen, "# Emitting CreateDependentString fallback (encoding=%s)",
> +            (encoding_ == CharEncoding::Latin1 ? "Latin-1" : "Two-Byte"));

Hooray for better spew :)

@@ +2036,5 @@
> +    // depending on whether the input is a Two-Byte or a Latin-1 string.
> +    CreateDependentString depStrs[] {
> +        { CharEncoding::TwoByte, temp3, temp4, temp5, &oolEntry },
> +        { CharEncoding::Latin1, temp3, temp4, temp5, &oolEntry },
> +    };

So much nicer.

@@ +2043,5 @@
>          Label isLatin1, done;
>          masm.branchLatin1String(input, &isLatin1);
>  
> +        for (auto& depStr : depStrs) {
> +            if (depStr.encoding() == CharEncoding::Latin1)

Yes!
Attachment #8997494 - Flags: review?(mgaudet) → review+
Comment on attachment 8997503 [details] [diff] [review]
bug1480819-part6-masm-char-helpers.patch

Review of attachment 8997503 [details] [diff] [review]:
-----------------------------------------------------------------

Really nice improvements with the lambdas.

::: js/src/jit/CodeGenerator.cpp
@@ +2641,5 @@
> +    masm.bind(&ok);
> +#endif
> +
> +    Register chars = temp0;
> +    masm.loadStringChars(str, chars, encoding);

I like this naming pattern to help keep track of what points where and when.

@@ +8302,5 @@
>      {
>          masm.loadStringChars(input, temp2, CharEncoding::TwoByte);
>          masm.movePtr(temp2, input);
> +        CopyStringChars(masm, destChars, input, temp1, temp2, CharEncoding::TwoByte,
> +                        CharEncoding::TwoByte);

The second encoding argument here can be dropped to use the new 6-arity version defined above right?

@@ +8382,5 @@
> +    // Copy rhs chars. Clobbers the rhs register.
> +    copyChars(rhs);
> +
> +    // Null-terminate.
> +    masm.storeChar(Imm32(0), Address(temp2, 0), encoding);

Nice improvement.
Attachment #8997503 - Flags: review?(mgaudet) → review+
Unfortunately I've run out of time today, and so I'll have to come back to Part 7 on Tuesday. If that's too late, feel free to re-assign.
Comment on attachment 8997508 [details] [diff] [review]
bug1480819-part7-finishing-touches.patch

Review of attachment 8997508 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good :)

::: js/src/jit/CodeGenerator.cpp
@@ +1422,5 @@
>  {
>      JitSpew(JitSpew_Codegen, "# Emitting PrepareAndExecuteRegExp");
>  
> +    /*
> +     * Stack layout:

Maybe annotate these stack layout diagrams with [SMDOC] and give a longer name for indexability?
Attachment #8997508 - Flags: review?(mgaudet) → review+
Updated part 6 per review comments, carrying r+.
Attachment #8997503 - Attachment is obsolete: true
Attachment #8999226 - Flags: review+
Updated part 7 per review comments, carrying r+.
Attachment #8997508 - Attachment is obsolete: true
Attachment #8999227 - Flags: review+
Andre, hi. i tried to land this and got:


patching file js/src/builtin/RegExp.cpp
Hunk #2 FAILED at 1024
1 out of 3 hunks FAILED -- saving rejects to file js/src/builtin/RegExp.cpp.rej
patching file js/src/jit/CodeGenerator.cpp
Hunk #3 FAILED at 1554
Hunk #5 FAILED at 1944
Hunk #6 FAILED at 2005
Hunk #7 FAILED at 2027
Hunk #8 FAILED at 2082
5 out of 9 hunks FAILED -- saving rejects to file js/src/jit/CodeGenerator.cpp.r                                                                                                                                                             ej
abort: patch failed to apply
Flags: needinfo?(andrebargull)
Keywords: checkin-needed
The error in comment #18 occurs when part 7 is applied on its own. When all seven patches are applied in order, no error was reported to me when trying it with current inbound (d0fd03694e57).
Flags: needinfo?(andrebargull) → needinfo?(apavel)
Keywords: checkin-needed
Pushed by btara@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/09d8a8e24648
Part 1: Remove dead and unnecessary code to restore registers from generateRegExpMatcherStub. r=mgaudet
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ad24c33a9b5
Part 2: Switch the temporary variables used for CopyStringChars in CreateDependentString::generate. r=mgaudet
https://hg.mozilla.org/integration/mozilla-inbound/rev/34165b194cf9
Part 3: Reorder temporary register names in generateRegExpMatcherStub. r=mgaudet
https://hg.mozilla.org/integration/mozilla-inbound/rev/215a50752a49
Part 4: Don't save non-volatile registers. r=mgaudet
https://hg.mozilla.org/integration/mozilla-inbound/rev/68db5a68d65a
Part 5: Add registers and encoding type as members to CreateDependentString. r=mgaudet
https://hg.mozilla.org/integration/mozilla-inbound/rev/411fa809129a
Part 6: Add helper methods to MacroAssembler to work with CharEncoding and reduce code duplication. r=mgaudet
https://hg.mozilla.org/integration/mozilla-inbound/rev/b9a687b9bbe2
Part 7: Add comments, constants, and better variable names to RegExp stubs. r=mgaudet
Keywords: checkin-needed
Flags: needinfo?(apavel)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: