Closed Bug 919052 Opened 6 years ago Closed 6 years ago

OdinMonkey: support short-circuiting ternaries in if statements

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: sunfish, Assigned: bbouvier)

References

Details

Attachments

(3 files, 8 obsolete files)

asm.js currently has no convenient way to represent code translated from short-circuit && and || operators in the general case.

For example, emscripten currently compiles this C code:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
  if (getenv("A") || getenv("B")) {
      printf("hello world\n");
  }
  printf("and that's that\n");
  return 0;
}

to this asm.js code with -O2:

function _main($argc, $argv) {
 $argc = $argc | 0;
 $argv = $argv | 0;
 var label = 0;
 if ((_getenv(48) | 0) == 0) {
  if ((_getenv(40) | 0) != 0) {
   label = 3;
  }
 } else {
  label = 3;
 }
 if ((label | 0) == 3) {
  _puts(24) | 0;
 }
 _puts(8) | 0;
 return 0;
}

This then translates to machine code in the obvious way, taking up an extra register and branching. Similar things happen with the attached testcase using && (though it's slightly more complex to foil emscripten's cleverness).

This is an example where allowing || and &&, perhaps only at the top level of an if statement controlling expression, would allow asm.js to represent such code in a convenient form, which also ought to translate pretty directly into efficient machine code.

Related to this, LLVM's optimizer has the habit of converting 'if (x && y)' into 'if (x & y)' (effectively) according to heuristics aimed at enabling high-level optimization rather than low-level efficiency. The LLVM machine code generator typically balances by applying different heuristics and converting such code back to '&&' with branches when it looks favorable. If asm.js supported &&, the emscripten backend could implement a similar optimization.
This sounds great.  Ideally, Odin could trust the code generator to generate &,| when possible and &&,|| only when actual branching was desired.
I second this. I had to implement a similar transformation in LLJS, and now it outputs code like this: https://github.com/jlongster/lljs-minecraft/blob/gh-pages/render.js#L78. Does that have a significant performance hit?

Are there clear heuristics for conditionals when I can simply convert it using & and |?
(In reply to James Long (:jlongster) from comment #2)
> I second this. I had to implement a similar transformation in LLJS, and now
> it outputs code like this:
> https://github.com/jlongster/lljs-minecraft/blob/gh-pages/render.js#L78.
> Does that have a significant performance hit?

That code will take a register and three branches. With ||, it could easily be rewritten to skip the register, take two branches, and sometimes short-circuit out one the sides.

> Are there clear heuristics for conditionals when I can simply convert it
> using & and |?

Do you mean && and ||? We may need to refine this later, but I'd guess "whenever you can" is likely to be a pretty good initial heuristic.
(In reply to Dan Gohman [:sunfish] from comment #3)
> > Are there clear heuristics for conditionals when I can simply convert it
> > using & and |?
> 
> Do you mean && and ||? We may need to refine this later, but I'd guess
> "whenever you can" is likely to be a pretty good initial heuristic.

I was initially trying to figure out a way to manipulate integers with & and | to get the same result as && and ||, like if(x & y) {}, but that won't work. So yeah, it'd be great to have && and || in asm.js.
Theoretically, you could do this:

if( ((x != 0) & (y != 0) | (foo != 0) ) {

}

right?
(In reply to James Long (:jlongster) from comment #5)
> Theoretically, you could do this:
> 
> if( ((x != 0) & (y != 0) | (foo != 0) ) {
> 
> }
> 
> right?

The main motivation for adding && and || is the short-circuiting behavior. It's not possible to emulate that using any combination of &, |, or any other non-short-circuiting operators.
Btw, I'm all go for adding this to Odin, if anyone wants to take.
(When it's landed and near a release, we can update the spec and start having Emscripten generate it by default.)
I'd be interested to take it, but as I won't have time before next week, I'll let it to anybody who wants to implement it first.
IIUC, we would like the graph flow from |if (A && B)| like this:

if (!A) j elseOrRejoin
if (!B) j elseOrRejoin
ifBody();
j rejoin;
label else;
elseBody();
label rejoin;

(this is the solution shown in 913935)

And |if (A || B)| like this:

if (A) j ifBody
if (!B) j elseOrRejoin
label ifBody;
ifBody()
j rejoin
label else;
elseBody()
label rejoin;

(that's how gcc -O2 generates it)

Do we want this only in if statements (i.e. not in |var x = A && B;|)?
I see bug 913935. Could we do this after MIR generation instead, so that Ion and Odin directly benefit from that optimization?
(In reply to Benjamin Bouvier [:bbouvier] from comment #10)
> IIUC, we would like the graph flow from |if (A && B)| like this:
> 
> if (!A) j elseOrRejoin
> if (!B) j elseOrRejoin
> ifBody();
> j rejoin;
> label else;
> elseBody();
> label rejoin;
> 
> (this is the solution shown in 913935)
> 
> And |if (A || B)| like this:
> 
> if (A) j ifBody
> if (!B) j elseOrRejoin
> label ifBody;
> ifBody()
> j rejoin
> label else;
> elseBody()
> label rejoin;
> 
> (that's how gcc -O2 generates it)

Yes.

> Do we want this only in if statements (i.e. not in |var x = A && B;|)?

If statements are the most important here.

While/do/for statements would be conceptually straightforward and somewhat nice, but I'm not aware of any cases where they're actually needed to avoid label variables, so they're less important.

Arbitrary expressions like |var x = A && B| are unimportant, because emscripten can always generate good code for them using other techniques. And, they'd add complexity to asm.js, since they don't behave like ordinary expressions. I recommend against them.

> I see bug 913935. Could we do this after MIR generation instead, so that Ion
> and Odin directly benefit from that optimization?

It's actually a separate issue. This bug is about adding && and || to the asm.js spec (and therefore Odin) because there are constructs that emscripten can't easily encode without them; see my examples in comment 0. The kind of code emscripten currently has to emit for && and || in the worst cases is awkward to optimize, is slower in browsers that don't special-case asm.js, uses more temporary variables, and takes up more space.
Attached patch WIP (obsolete) — Splinter Review
This WIP allows to handle the |if (A && B) {}| case only (no else, no else if, no ||, no more than two conditions, there's plenty to do still). Names and code aren't final. Oh boy, CFG generation is fun.
Assignee: general → benj
Status: NEW → ASSIGNED
Attached patch Patch (obsolete) — Splinter Review
I am glad I could make it "less" complicated than it looked like in the first place and as I imagined it. Complex conditions (i.e. with && and ||) and atomic conditions can be checked in the same recursive CheckConditionPart function. I couldn't make it tail-recursive, as |A && B && C| (note, no parenthesis) is emitted as And(And(A, B), C), and not And(A, And(B,C)) by the parser (so we have to go in depth first, to reach the first leaf). The key to understanding this patch is that 1) we try to reuse as many blocks as we can, so as not to create blocks that would only contain a goto and 2) the recursive function always changes the current block to the "then" block, at the end. The latter has the nice advantage to fit well in the CheckIf function without modifying anything :)

There is no test cases right now but I have tested it on several small samples (2 AND'ed conditions with or without an else, 2 OR'd conditions with or without an else, 2 AND'ed conditions with an else if with N conditions and with or without an else, etc.) and also a big test case that summarizes a lot of combinations. I will rewrite them to make them more integrated to the current test suite. All tests pass on my machine (asm.js/ subdirectory + the ones I've written).

Short-circuiting works out of the box, thanks to the controlled CFG generation and Ion cleverness:

function f() {
    "use asm";
    function g(x) {
        x=x|0;
        var z = 0;
        if ((x|0) > 1 && (x|0) < 10) {
            z = 2;
        } else {
            z = 7;
        }
        return z|0;
    }
    return g;
}

   0x7ffff7ff5013:	cmp    $0x1,%edi
   0x7ffff7ff5016:	jle    0x7ffff7ff502f
   0x7ffff7ff501c:	cmp    $0xa,%edi
   0x7ffff7ff501f:	jge    0x7ffff7ff502f
   0x7ffff7ff5025:	mov    $0x2,%eax
   0x7ffff7ff502a:	jmpq   0x7ffff7ff5034
   0x7ffff7ff502f:	mov    $0x7,%eax
   0x7ffff7ff5034:	retq
Attachment #8378374 - Attachment is obsolete: true
Attachment #8379033 - Flags: review?(luke)
Attached patch bug919052.patch (obsolete) — Splinter Review
Just saw that I missed a small but nice factoring opportunity.
Attachment #8379033 - Attachment is obsolete: true
Attachment #8379033 - Flags: review?(luke)
Attachment #8379036 - Flags: review?(luke)
Just needed replacing the condition check in CheckConditional.
The ternary operator is neat, but I'm not aware of a need for it here. I suggest leaving it out for now, to keep the language simpler, but I'm open to persuasion otherwise.
Comment on attachment 8379036 [details] [diff] [review]
bug919052.patch

Review of attachment 8379036 [details] [diff] [review]:
-----------------------------------------------------------------

Very nice!  The one big thing missing is a bunch of torture tests involving all combinations of if/else-if/else with && and || (nested or not).

::: js/src/jit/AsmJS.cpp
@@ +2502,5 @@
>          curBlock_->end(ins);
>          curBlock_ = nullptr;
>      }
>  
> +    bool branchAndStartThenElse(MDefinition *cond, MBasicBlock **thenBlock, MBasicBlock **elseBlock,

If I'm reading correctly, this function still always changes curBlock to thenBlock, so I think the name should stay branchAndStartThen.

@@ +2551,5 @@
>          mirGraph().moveBlockToEnd(curBlock_);
>          return true;
>      }
>  
> +    void switchToBlock(MBasicBlock *block)

I think this function can keep the same name; the one new use added does indeed switch to the 'else' block (of the or).

@@ +4948,5 @@
>  }
>  
>  static bool
> +CheckConditionPart(FunctionCompiler &f, ParseNode *list, MBasicBlock **thenBlock,
> +                   MBasicBlock **elseOrJoinBlock, ParseNode *thenPn, ParseNode *nextPn)

Very nice job with this recursive definition!

Perhaps you could rename CheckConditionPart to just CheckCondition?  Also, can you move the thenBlock and elseOrJoinBlock outparams to the end of the arg list (outparams usually go last)?

Also, instead of the comments about the current block throughout the function, perhaps you could add an f.assertCurrentBlockIs(b) and use that as an executable comment?

Lastly, it'd be good to precisely document the postconditions of this function.  The ones that helped me reason about it are:
 1. if *thenBlock/*elseBlock are non-null on entry, their value is not changed by this function
 2. *thenBlock and *elseOrJoinBlock are non-null on exit
 3. f.assertIsCurrentBlock(*thenBlock) on exit

@@ +4949,5 @@
>  
>  static bool
> +CheckConditionPart(FunctionCompiler &f, ParseNode *list, MBasicBlock **thenBlock,
> +                   MBasicBlock **elseOrJoinBlock, ParseNode *thenPn, ParseNode *nextPn)
> +{

You'll need a JS_CHECK_RECURSION_DONT_REPORT here.

@@ +4950,5 @@
>  static bool
> +CheckConditionPart(FunctionCompiler &f, ParseNode *list, MBasicBlock **thenBlock,
> +                   MBasicBlock **elseOrJoinBlock, ParseNode *thenPn, ParseNode *nextPn)
> +{
> +    if (list->isKind(PNK_AND) || list->isKind(PNK_OR)) {

Perhaps prefix this with a comment explaining the optimization we're doing here?

@@ +4951,5 @@
> +CheckConditionPart(FunctionCompiler &f, ParseNode *list, MBasicBlock **thenBlock,
> +                   MBasicBlock **elseOrJoinBlock, ParseNode *thenPn, ParseNode *nextPn)
> +{
> +    if (list->isKind(PNK_AND) || list->isKind(PNK_OR)) {
> +        MBasicBlock *testBlock = nullptr;

This 'testBlock' is a bit confusing and not used after the if/else so you could move the declaration into the branches and rename it appropriately.  In the PNK_AND branch, I'd just name it andThenBlock (since it's ignored; we don't have to return the then block).  In the PNK_OR branch, I'd name it orElseBlock.

@@ +4958,5 @@
> +
> +        if (list->isKind(PNK_AND)) {
> +            // current block == last block
> +            if (!CheckConditionPart(f, left, &testBlock, elseOrJoinBlock, right, nextPn))
> +                return false;

After this you could f.assertCurrentBlockIs(andThenBlock);
Attachment #8379036 - Flags: review?(luke) → review+
Attachment #8379093 - Flags: review+
Oh, hah, I didn't see comment 13 or comment 16 before reviewing.  Re: ternary: since we basically get it for free b/c of the nice way CheckCondition is factored, seems nice to have.
Oh WAIT, one thing I totally missed was that this isn't an optimization; these patches are actually extending the language to *only* allow && and || in if/ternary form.  I understand Dan's comments better now.

It seems more regular to me to allow && and || in any expression context and that the cases in the above two patches are purely optimizations.  The additional complexity for handling the non-conditional case of && and || seems small; the only downside I can see is that asm.js authors/generators might be enticed to use && and || when & and | were more efficient.  For that, we can add a non-normative note to the spec that "the intended impl of && and || is branching; use & and | if you don't need/benefit from short-circuiting".

Do you guys agree?  If so, then we'll need a third patch to add general support.
And by "extending the language" I mean "extending the asm.js subset" :)
Attached patch General support of && and || (obsolete) — Splinter Review
Adding general support for && and || makes sense. The patch is fairly easy to understand. The only trick (commented in the code) is to see that in the expression A && B, if A is false (i.e. 0), we'll return false, which is the value of A. So that makes it easy to branch on the right value with respect to the operator we're using.
Attachment #8380733 - Flags: review?(luke)
IIUC, this patch would execute both branches of the &&/|| instead of the intended short-circuiting behavior.  (Could you add a testcases for this?  It'd also be good to add some testcases for && and || in if-condition-position to make sure we got short-circuiting right there too.)
Attached patch Another attempt - WIP (obsolete) — Splinter Review
Duh, can't believe I didn't notice that. Thanks Luke for the catch!

So the problem with the general support of && and || is that it created a circular dependency between the phis. I wonder whether the only way to break it was to introduce the second test and if it is the reason why it is done this way in general Ion.

Another approach suggested by sunfish was just to special case lowering of ternary conditions. Indeed, a ? b : 0 <=> a && b, and a ? 1 : b <=> a || b, so asm.js generators would just have to generate this form in conditions and Odin would just generate the short-circuiting CFG.

This patch implements it that way. It adds up above attachment 8379036 [details] [diff] [review] for now, lacks of tests (but passes simple tests on my machine). Let's see whether we can also short-circuit a ? 0 : b (i.e. !a && b) and a ? b : 1 (i.e. !a || b), before asking for review.
Attachment #8379093 - Attachment is obsolete: true
Attachment #8380733 - Attachment is obsolete: true
Attachment #8380733 - Flags: review?(luke)
Attached patch bug919052-ternary-complete.patch (obsolete) — Splinter Review
The other attempt patch is finally complete, and supports control flow optimizations in If statments, whenever in a?b:c, b and c are not literals with the same semantics (i.e. b==c==0, or b!=0 and c!=0). As I found it more complicated to think about than the initial patch (that didn't support (!a && b) and (!a || b) optimizations, I've tried to comment it A LOT and I finally think it's clear now.

This patch works as expected on all the basic cases I have tried (all combination of literals, sub conditions in the AND or OR conditions), but I will still fuzz it now to have sub-conditions in each of the three positions.
Attachment #8382307 - Attachment is obsolete: true
Attachment #8385203 - Flags: feedback?(luke)
Attached patch bug919052-ternary-complete.patch (obsolete) — Splinter Review
I made a program that generates a lot of possible test cases, compiles them and compares the results obtained in the interpreter (by not adding the "use asm" hint). It found one edge case I didn't think about, but now that I've fixed it, it's been running for two hours and hasn't failed since.
Attachment #8385203 - Attachment is obsolete: true
Attachment #8385203 - Flags: feedback?(luke)
Attachment #8385432 - Flags: feedback?(luke)
It's impressive you've got this working to handle all (except for 0:0, 1:1) the literal cases, but this algorithm is way more complicated than the nice little CheckConditionPart you wrote earlier.  If there is no simpler version of this algorithm, then I feel like the right complexity/optimization tradeoff would be to basically just do what CheckConditionPart did, exception swapping "a&&b" for "a?b:0" and "a||b" for "a?1:b".

It'd also be nice if we were able to do general CFG optimizations as an Ion optimization pass (to catch more cases than we do in the frontend).  It'd still be nice to catch the canonical &&/|| in the frontend since, in theory, once we tweak Emscripten, it'll emit a ton of these.
Attachment #8379036 - Attachment is obsolete: true
Attachment #8386280 - Flags: feedback?(luke)
Luke: this patch is the same as the previous one, easier to follow, as there are no more cases where thenBlock or elseOrJoinBlock could contain the wrong value (and thus needed to be reassigned - the trick was to use a double pointer, duh). As this is still more code and slightly harder to follow than the simplest one (that optimizes only the a?b:0 and a?1:b cases), I'd be fine if we choose the simplest one. Both have been locally fuzzed and don't assert nor return the wrong result.

There is another third possibility which is investigating the circular dependency in the && and || patch, but the simple ternary optimization as is seems sufficient. What do you think?
Attachment #8385432 - Attachment is obsolete: true
Attachment #8385432 - Flags: feedback?(luke)
Attachment #8386285 - Flags: feedback?(luke)
Comment on attachment 8386285 [details] [diff] [review]
Simpler patch that optimizes a?X:Y for all X, Y literals

Review of attachment 8386285 [details] [diff] [review]:
-----------------------------------------------------------------

Wow, great job distilling this down!  I totally get this one, let's do it.

Just nits:

::: js/src/jit/AsmJS.cpp
@@ +4874,5 @@
>      return f.bindLabeledBreaks(&labels, labeledStmt);
>  }
>  
>  static bool
> +CheckLeafCondition(FunctionCompiler &f, ParseNode *list, ParseNode *thenExpr, ParseNode *nextExpr,

Can you rename 'thenExpr' to 'thenStmt' and 'nextExpr' to 'elseOrJoinStmt' here and in CheckCondition?  The 'expr' was confusing me with the then/else expressions of ?:.  Also, 'elseOrJoinStmt' is more symmetric with elseOrJoinBlock.

@@ +4900,5 @@
> + * - *thenBlock and *elseOrJoinBlock are non-null on exit.
> + * - the current block on exit is the *thenBlock.
> + */
> +static bool
> +CheckCondition(FunctionCompiler &f, ParseNode *list, ParseNode *thenExpr, ParseNode *nextExpr,

Perhaps this could now be renamed to CheckIfCondition since it's highly specific to if.  Also, can you rename 'list' to 'cond'?  That conflicts with the 'cond' below, but I was thinking we could split this function into two so that the majority of CheckCondition wasn't indented under an if branch.

CheckIfConditional(FunctionCompiler &f, ParseNode *conditional, ...)
{
    JS_ASSERT(conditional->isKind(PNK_CONDITIONAL));
    ParseNode *cond = TernaryKid1(conditional);
    ...
    CheckIfCondition(f, ...);
    ...
}

CheckIfCondition(FunctionCompiler &f, ParseNode *cond, ...)
{
    JS_CHECK_RECURSION_DONT_REPORT(...);

    if (cond->isKind(PNK_CONDITIONAL))
        return CheckIfConditional(f, cond, ...);
    return CheckLeafCondition(f, cond, ...);
}

@@ +4915,5 @@
> +        ParseNode *rhs = TernaryKid3(list);
> +
> +        MBasicBlock *maybeAndTest = nullptr, *maybeOrTest = nullptr;
> +        MBasicBlock **ifTrueBlock = &maybeAndTest, **ifFalseBlock = &maybeOrTest;
> +        ParseNode *ifTrueExpr = lhs, *ifFalseExpr = rhs;

Could you name this ifTrueBlockNode and ifFalseBlockNode to indicate that these are only being passed as "the node of this block for profiling data".  Otherwise I was getting confused that we had this expression that was either lhs/rhs expression or the then/else statement.

@@ +4924,5 @@
> +
> +        if (IsLiteralInt(f.m(), lhs, &andTestLiteral)) {
> +            skipAndTest = true;
> +            if (andTestLiteral == 0) {
> +                // (a ? 0: b) is equivalent to !a && b

The usual SM style is to put a space before the : in (a ? 0 : b).

@@ +4973,5 @@
> +        if (!skipAndTest) {
> +            // There is a distinct block for the AND section that will contain the AND test,
> +            // so ifTrueBlock can't be the then or the else block.
> +            JS_ASSERT(maybeAndTest);
> +            JS_ASSERT(ifTrueBlock != thenBlock && ifTrueBlock != elseOrJoinBlock);

While these conditions are true, they didn't really help me so I'd be fine leaving them out (here and for the 3 analogous assertion pairs below) to make the function a bit smaller.
Attachment #8386285 - Flags: feedback?(luke) → feedback+
Attachment #8386285 - Flags: review+
Attachment #8386280 - Flags: feedback?(luke)
Awesome! Fixed the nits and landed, with a bunch of tests (I could reuse the tests in the &&/|| patch by changing the forms + I added the test cases that my home-made fuzzer found out, like a?1:1 and a?0:0).

FTR, it doesn't affect existing tracked asm.js benchmarks, which is expected as emscripten doesn't generate this particular form. However, now it can :)

https://hg.mozilla.org/integration/mozilla-inbound/rev/6048059d6ea1
Summary: asm.js may wish to support && and || in if statements → OdinMonkey: support short-circuiting ternaries in if statements
https://hg.mozilla.org/mozilla-central/rev/6048059d6ea1
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Depends on: 981314
You need to log in before you can comment on or make changes to this bug.