Fully inline RegExp.exec in jitcode

RESOLVED FIXED in mozilla35



JavaScript Engine: JIT
3 years ago
3 years ago


(Reporter: bhackett, Assigned: bhackett)


Mac OS X
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)



(1 attachment, 3 obsolete attachments)



3 years ago
Created attachment 8488834 [details] [diff] [review]

RegExp.exec is specialized a small amount by IonMonkey, but almost all work done by this function still happens in C++ and in irregexp jitcode.  There is quite a bit of overhead involved in doing things this way, as accessing the various objects involved, triggering barriers, and, particularly, creating the result objects and strings is much slower in C++ than in Ion jitcode.

Fully inlining RegExp.exec --- so that everything is done in Ion and irregexp jitcode --- is pretty complicated given everything that RegExp.exec needs to do and check, but is worth doing.  The attached patch is a WIP which does this, and improves our octane-regexp score on my machine (x64 10.9) from 2500 to 3300.
Could self-hosting + intrinsics help here?

Comment 2

3 years ago
(In reply to Jan de Mooij [:jandem] from comment #1)
> Could self-hosting + intrinsics help here?

It's hard to say.  RegExp.exec basically just runs the regexp and then stuffs the result into an object.  If we self hosted the first part we would need to use that self hosted implementation for all regexp functions to avoid duplicate implementations, and that self hosted implementation would need to handle various regexp corner cases, some of which we've self inflicted like sticky regexps and PreserveRegExpStatics.  This patch just does a slow call in these cases, which should make it more efficient than a self hosted implementation, at least when these corner cases aren't hit.

Still, the RegExp.exec CodeGenerator implementation is a 200 line wad of code, and needs refactoring.  I'd like to be able to inline RegExp.test too, and doing them both in this patch would force the implementation to be written in a compartmentalized and reusable way.

Comment 3

3 years ago
Thinking about this some more, I think it would be pretty hard to do a good job self hosting this.  The main problem is that the RegExp execution involves placing a MatchPairs, its data, and an InputOutputData on the stack, and holding this data on the stack across some span of self hosted JS code would I think be a substantial change to how IonMonkey manages the stack (and which is code that I don't know at all.)  Doing everything within CodeGenerator means that this extra stack data is local to one LIR operation.

Comment 4

3 years ago
Created attachment 8490282 [details] [diff] [review]

Updated WIP.  This uses a per-compartment stub (like the string concat stub) to avoid generating lots of code if we call RegExp.exec from Ion in a lot of places.  The speedup this gives is similar to the previous patch.
Assignee: nobody → bhackett1024
Attachment #8488834 - Attachment is obsolete: true

Comment 5

3 years ago
Created attachment 8491077 [details] [diff] [review]

Patch to inline both RegExp.exec and RegExp.test.  This passes all jit-tests on all platforms, but doesn't give as big a speedup for octane-regexp on x64 as the previous patches.  The problem is that we have a restriction that dependent strings can't depend on inline strings, so we end up needing to create inline strings in jitcode, which is slower to do than dependent strings.  On x86 and x64 I get a 20-25% improvement in our octane-regexp score, though.  The speedup on RegExp.test is very good too, on a microbenchmark based on the peacekeeper stringFilter test I get an improvement from 200ms to 70ms.
Attachment #8490282 - Attachment is obsolete: true
Attachment #8491077 - Flags: review?(jdemooij)

Comment 6

3 years ago
(In reply to Brian Hackett (:bhackett) from comment #5)

with the patch, firefox fails at http://dromaeo.com/?cssquery-yui
FAIL YUI - div > div 1TypeError: combinators[token.previous.combinator] is not a function
Comment on attachment 8491077 [details] [diff] [review]

Review of attachment 8491077 [details] [diff] [review]:

Looks pretty nice! It's a lot of code, but also nice speedups.

Reviewing this patch requires a lot of time though; I didn't look closely at all code but I wanted to submit the comments so far before I leave on PTO until Thursday.

::: js/src/jit/CodeGenerator.cpp
@@ +1022,5 @@
> +static const size_t RegExpReservedStack = sizeof(irregexp::InputOutputData)
> +                                        + sizeof(MatchPairs)
> +                                        + RegExpMaxPairCount * sizeof(MatchPair);
> +
> +static size_t RegExpPairsVectorStartOffset(size_t inputOutputDataStartOffset)

Nit: \n after size_t

@@ +1027,5 @@
> +{
> +    return inputOutputDataStartOffset + sizeof(irregexp::InputOutputData) + sizeof(MatchPairs);
> +}
> +
> +static Address RegExpPairCountAddress(size_t inputOutputDataStartOffset)

And here.

@@ +1093,5 @@
> +
> +    if (mode == RegExpShared::Normal) {
> +        // Don't handle RegExps with excessive parens.
> +        masm.load32(Address(temp1, RegExpShared::offsetOfParenCount()), temp2);
> +        masm.branch32(Assembler::GreaterThanOrEqual, temp2, Imm32(RegExpMaxPairCount), failure);

Nit: AboveOrEqual since it's unsigned.

@@ +1164,5 @@
> +    // Lazily update the RegExpStatics.
> +    masm.movePtr(ImmPtr(res), temp1);
> +
> +    Address pendingInputAddress(temp1, RegExpStatics::offsetOfPendingInput());
> +    Address matchesInputAddress(temp1, RegExpStatics::offsetOfPendingInput());

This should be offsetOfMatchesInput?

@@ +1167,5 @@
> +    Address pendingInputAddress(temp1, RegExpStatics::offsetOfPendingInput());
> +    Address matchesInputAddress(temp1, RegExpStatics::offsetOfPendingInput());
> +    Address lazySourceAddress(temp1, RegExpStatics::offsetOfLazySource());
> +
> +    masm.patchableCallPreBarrier(pendingInputAddress, MIRType_String);

These need to be toggled/enabled somewhere.

@@ +1203,5 @@
> +    masm.sub32(temp2, temp1);
> +
> +    Label done, nonEmpty;
> +
> +    // Zero length matches use the empty string. 

Nit: trailing whitespace

@@ +1360,5 @@
> +    BaseIndex stringAddress(object, matchIndex, TimesEight, elementsOffset);
> +    BaseIndex stringIndexAddress(StackPointer, matchIndex, TimesEight,
> +                                 pairsVectorStartOffset);
> +    BaseIndex stringLimitAddress(StackPointer, matchIndex, TimesEight,
> +                                 pairsVectorStartOffset + sizeof(int32_t));

Could we avoid hardcoding sizeof(int32_t) somehow?

@@ +1405,5 @@
> +    masm.loadPtr(Address(object, JSObject::offsetOfSlots()), temp2);
> +
> +    masm.load32(pairsVectorAddress, temp3);
> +    masm.storeValue(JSVAL_TYPE_INT32, temp3, Address(temp2, 0));
> +    masm.storeValue(JSVAL_TYPE_STRING, input, Address(temp2, sizeof(Value)));

It'd be nice if we could use LAST_INDEX_SLOT / SOURCE_SLOT constants here.

::: js/src/jit/MCallOptimize.cpp
@@ +1372,5 @@
>          return InliningStatus_NotInlined;
> +    JSContext *cx = GetIonContext()->cx;
> +    if (!cx->compartment()->jitCompartment()->ensureRegExpExecStubExists(cx))
> +        return InliningStatus_Error;

If we really want to do this here, we might as well bring IonBuilder::cx back IMO. This is doing about the same thing but it's slower.
Attachment #8491077 - Flags: review?(jdemooij)

Comment 8

3 years ago
Created attachment 8493455 [details] [diff] [review]

Updated per comments.  I don't think we should have an IonBuilder::cx as it opens the door to doing more effectful stuff, which we should really avoid during Ion compilation.  Using IonContext::cx here is a shortcut, another approach would be to set some flags which are passed back up to the main thread function performing the Ion compilation, which has a cx and can ensure the stubs exist.  I think that whatever happens for RegExp.exec/test should happen for the String concat stubs too, as it seems silly to generate these stubs in every compartment that does Ion compilation, even if no Ion code actually does string concatenation.  But doing that here too is feature creep, and doing the same simple thing for RegExp stubs just seemed distasteful.
Attachment #8491077 - Attachment is obsolete: true
Attachment #8493455 - Flags: review?(jdemooij)
Comment on attachment 8493455 [details] [diff] [review]

Review of attachment 8493455 [details] [diff] [review]:

Sorry for the delay.

::: js/src/jit/CodeGenerator.cpp
@@ +1301,5 @@
> +        masm.branchTest32(Assembler::Zero, Address(base, JSString::offsetOfFlags()),
> +                          Imm32(JSString::FLAT_BIT), &noBase);
> +        masm.loadPtr(Address(base, JSDependentString::offsetOfBase()), temp1);
> +        masm.storePtr(temp1, Address(string, JSDependentString::offsetOfBase()));
> +        masm.bind(&noBase);

Either make this a loop, like in JSDependentString::new_, or add some #ifdef DEBUG + masm.assumeUnreachable code to assert temp1 is not a dependent string.

::: js/src/jit/JitCompartment.h
@@ +536,5 @@
> +    bool ensureRegExpExecStubExists(JSContext *cx) {
> +        if (regExpExecStub_)
> +            return true;
> +        regExpExecStub_ = generateRegExpExecStub(cx);
> +        return regExpExecStub_ != nullptr;

We should also toggle the prebarriers here like this:

    if (!regExpExecStub_)
        return false;

    if (cx->zone()->needsIncrementalBarrier())

    return true;

As a followup we should probably move this call into IonLinker.h though, so that we don't have to do this every time we allocate a JitCode with barriers...

@@ +547,5 @@
> +    bool ensureRegExpTestStubExists(JSContext *cx) {
> +        if (regExpTestStub_)
> +            return true;
> +        regExpTestStub_ = generateRegExpTestStub(cx);
> +        return regExpTestStub_ != nullptr;

Same here.
Attachment #8493455 - Flags: review?(jdemooij) → review+

Comment 10

3 years ago
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35


3 years ago
Blocks: 1078194


3 years ago
No longer blocks: 1078194
Depends on: 1078194


3 years ago
Depends on: 1080542
You need to log in before you can comment on or make changes to this bug.