Closed Bug 561963 Opened 14 years ago Closed 6 years ago

Inline relational and equality comparisons more agressively in the JIT

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX
Q2 12 - Cyril

People

(Reporter: wmaddox, Assigned: wmaddox)

References

Details

(Whiteboard: PACMAN)

Attachments

(6 files, 5 obsolete files)

Speculatively inline the intptr/intptr case for untyped arguments.
Perform some specializations for known numeric argument types.
Generate fast inline test for strict equality with a known null value.

This is the second of a series of patches based on the investigations detailed in Bug 552542.
Assignee: nobody → wmaddox
Depends on: 561249
Whiteboard: PACMAN
Attachment #441715 - Attachment is patch: true
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → flash10.2
Attachment #441715 - Attachment is obsolete: true
Attachment #442129 - Flags: review?(edwsmith)
Attachment #442129 - Flags: feedback?(rreitmai)
Attachment #442129 - Attachment is patch: true
Blocks: 562458
No longer blocks: 562458
Depends on: 562458
Blocks: 552542
Comment on attachment 442129 [details] [diff] [review]
Inline relational and equality comparisons more agressively in the JIT

R- Until we mitigate the problem of inserted labels degrading effectiveness of VarTracker and CseFilter.  structure & quality of the patch are generally good.
Attachment #442129 - Flags: review?(edwsmith) → review-
Component: Virtual Machine → JIT Compiler (NanoJIT)
QA Contact: vm → nanojit
Comment on attachment 442129 [details] [diff] [review]
Inline relational and equality comparisons more agressively in the JIT

removing self from feedback until issues addressed.
Attachment #442129 - Flags: feedback?(rreitmai)
Depends on: 591556
Rebased.  Added calls to suspendCSE()/resumeCSE().
This is a -Dverifyonly run of 16k brightspot SWFs and traps each call the JIT is emitting to equals() and what traits are being used for the call.
same analysis for 'compare' calls.
(In reply to comment #5)
> Created attachment 475063 [details]
> static brightspot analysis of all 'equals' JIT calls
> 
> This is a -Dverifyonly run of 16k brightspot SWFs and traps each call the JIT
> is emitting to equals() and what traits are being used for the call.

Yow.  {int,uint,Number} * {int,uint,Number} appears /nowhere/ in this list.
(In reply to comment #6)
> Created attachment 475064 [details]
> static brightspot analysis of all 'compare' JIT calls
> 
> same analysis for 'compare' calls.

Nor in this list, which is even more surprising.
They get converted to inline operations by CodegenLIR::cmpOptimization.
Attachment #475064 - Attachment is obsolete: true
Attachment #475063 - Attachment is obsolete: true
Rebased, with comments showing pseudocode for the generated code for speculative inlining.
Attachment #442129 - Attachment is obsolete: true
Attachment #472750 - Attachment is obsolete: true
Attachment #483272 - Flags: superreview?(edwsmith)
Attachment #483272 - Flags: review?(rreitmai)
Attachment #483272 - Flags: review?(rreitmai) → review?(wsharp)
I have not done a review of this patch but I did apply it to my local source and run both the acceptance (which passed) and the performance tests on it.

One strange result in performance:

Metric: v8 custom v8 normalized metric (hardcoded in the test)
Dir: v8.5/optimized/
  richards                   3361    3960   17.8   17.8

Why is this test 18% slower with this patch applied?  VTUNE shows MORE time spent in equals() with VMCFG_FASTPATH_CMP defined than without which seems odd.

Is this something you want to investigate Bill before I do more review?
This is primarily a rebasing of a patch that has shown good performance previously, so you may continue to review it for overall code quality.  I do want to take a look at this, however.  I did have to merge it with some changes you had made, and perhaps I botched something.  This patch does (slightly) violate the principle of not repeating any work done on the fastpath on the slowpath, and that is suspect.
Comment on attachment 483272 [details] [diff] [review]
Specialization and inlining of relational and equality comparisons (rebased, with comments)

The performance bug is from this line which is missing an else:

+            branchToAbcPos(br, binaryIns(LIR_eqp, lhs, rhs), target);
+        } if ((lht == rht) && (lht == STRING_TYPE)) {

This coding style is used in cmpEq as well and I find it confusing:

             return binaryIns(LIR_eqp, lhs, rhs);
        } if ((lht == rht) && (lht == STRING_TYPE)) {

I have never seen a "} if" that like before and it should be on a separate line from the }.

Since we are duplicating this complex logic, we should put it into a helper function with the comment block:

         if (((lht == NULL_TYPE) && (rht && (!rht->hasComplexEqualityRules() || rht == STRING_TYPE))) ||
             ((rht == NULL_TYPE) && (lht && (!lht->hasComplexEqualityRules() || lht == STRING_TYPE))) ||
             ((rht && !rht->hasComplexEqualityRules()) && (lht && !lht->hasComplexEqualityRules()))) {


In general, I think we are overemphasizing the intptr case in these fastpaths.  I have done a small brightspot pass that I will add as an attachment analyzing our cmpEq usage and you can see we are emitting the intptr fasthpath for cases that will always fail the checks.  

2656 equals: String * intptr==intptr
860 equals: null Object intptr==intptr (probably fails?)
248 equals: Object * intptr==intptr (probably fails?)
234 equals: String Object intptr==intptr 

I'm wondering whether we should only inline this fastpath for (*,*) trait pairs (when both are null) beyond the (int, *)(*, int)(Number, *)(*, Number) checks we already do.
Attachment #483272 - Flags: review?(wsharp) → review-
This is a small brightspot run with the latest patch applied and debug output added to cmpEq to dump each case we are taking.  "intptr==intptr" means that we are inlining the intptr JIT code to first check for integer atoms.  

It might be interesting to do some dynamic analysis of running content of what percentage of (*,*) pairs are really integers.
cmple dump.  No reason for string < string to be try integers first

227 cmple: String String intptr==intptr
189 cmple: Number * LIR_led
90 cmple: int * LIR_led
87 cmple: * Number LIR_led
48 cmple: * int LIR_led
30 cmple: * * intptr==intptr
16 cmple: Object * intptr==intptr
8 cmple: uint * LIR_led
5 cmple: * Date intptr==intptr
5 cmple: Date * intptr==intptr
4 cmple: * uint LIR_led
3 cmple: String * intptr==intptr
3 cmple: * String intptr==intptr
(In reply to comment #16)

> The performance bug is from this line which is missing an else:
> 
> +            branchToAbcPos(br, binaryIns(LIR_eqp, lhs, rhs), target);
> +        } if ((lht == rht) && (lht == STRING_TYPE)) {
> 
> This coding style is used in cmpEq as well and I find it confusing:
> 
>              return binaryIns(LIR_eqp, lhs, rhs);
>         } if ((lht == rht) && (lht == STRING_TYPE)) {
> 
> I have never seen a "} if" that like before and it should be on a separate line
> from the }.

Yes, I botched the merge with your recent change to specialize the string/string case.  There is a missing 'else', as you observe.  I'm not shure what you mean in the last sentence, however.  The reason you haven't seen "} if" before is that indeed one would never sensibly write it that way, but it is common K&R style to write "} else if".  I do see that you have written some code like this:

if () {
...
}
else if () {
...
}

Do you like that better?  I prefer the more compact style, particularly for a lengthy chain of tests.  There are places in CodegenLIR where both styles are mixed even in the same function, so real consistency is going to require some janitorial work, but we can establish a preference for new code if you feel strongly about it.
The top case is in branchEq and is broken.  The 2nd case is in cmpEq and works because of the return in the previous block.  I wasn't sure this was on purpose and you were trying to save space with something like:

if (a) {
   ...
   return
} if (b) {
   ...
   return
}

Any existing if/else in there is fine and the above example would be:

if (a) {
   ...
   return
} 

if (b) {
   ...
   return
}
(In reply to comment #18)
> cmple dump.  No reason for string < string to be try integers first
> 
> 227 cmple: String String intptr==intptr

cmpLe currently specializes only comparisons involving at least one argument of a (statically-known) numeric type.  All other cases get promoted to atoms, and compared generically.  For true untyped comparisons, an intptr/intptr fastpath should be a huge win if experience with Lisp and Smalltalk means anything.  What you are saying here, is that (statically-known) String/String comparisons are common and should be specialized as you have done for the equality comparison case, correct?  (I could just drop the intptr/intptr fastpath, but why not specialize too?)

If I understand the lsat part of comment #16, what you are saying is that cases in which we know that at least one argument is neither a number nor an atom, we should omit the intptr/intptr fastpath, even if we do the actual comparison by coercing to atom and using the general compare helper.  Eminently reasonable.
In cmpLe after the isNumeric path, in the VMCFG_FASTPATH_CMP define, you emit integer checking code with this pseudocode:

if (((lhs ^ kIntptrType) | (rhs ^ kIntptrType) & kAtomTypeMask) != 0) goto fallback;

This will fail (and we know it at JIT time) for all of these cases:

227 cmple: String String 
5 cmple: * Date 
5 cmple: Date * 
3 cmple: String * 
3 cmple: * String 

Leaving these possibilities:

30 cmple: * * intptr==intptr
16 cmple: Object * intptr==intptr

Going from the full results, (*.*) might very well be two strings or two doubles instead of two integers.  And Objects are very rarely numeric types but it's a possibility.  Without some dynamic analysis we won't know for sure.  But perhaps we should not be emitting the integer checks at all in this case and just be calling compare().  

Same idea for the various other cmp/br operations.
(In reply to comment #22)

> Leaving these possibilities:
> 
> 30 cmple: * * intptr==intptr
> 16 cmple: Object * intptr==intptr
> 
> Going from the full results, (*.*) might very well be two strings or two
> doubles instead of two integers.  And Objects are very rarely numeric types but
> it's a possibility.  Without some dynamic analysis we won't know for sure.  But
> perhaps we should not be emitting the integer checks at all in this case and
> just be calling compare().

The intptr/intptr fastpath should not be inserted if it is not possible, given what is known at JIT time, for both arguments to be integer.  If the type of neither is known, however, we speculate that both are integers.  This is done out-of-line in the interpreter and unspecialized op_add helper.  This is the only mechanism that we have to speed up the important case of numeric addition in untyped code, although it can penalize some code that has typed its numeric operations but punted on others.
Attachment #483272 - Flags: superreview?(edwsmith)
The patch has gotten rather large and unwieldy, so I think it will be best to try to land it in at least two pieces.  The first natural division is between the relational comparisons and the equality comparisons.

While examining the code, I observed that the slowpath case for emitBranchNumericRelAtom() invokes coerceToNumber().  When originally written and tested, this code generated a call to the 'number' helper function, and, in aggregate, the fastpaths implemented showed good benchmark results.  Currently, however, with VMCFG_FAST_FROMATOM configured, we generates code with a redundant fastpath that will never be taken.  In fact, with coerceToNumber() implemented as it is now, we should get about the same code from the slowpath, particularly since the "fastpath" converts the intptr argument to a double, rather than doing an integer comparison as it should. (d'oh!).  A new version tries to be a bit smarter about this.  Surprisingly, with the VMCFG_FASTFROM_ATOM in place, however, there is little or no improvement from this, even though the code is slightly better to the eye, avoiding an extra memory store/load.  It is quite likely that the results will be different with a non-x86 processor, but the experience was a bit disappointing considering the quantity of additional code.  Speculatively inlining untyped intptr/inptr comparisions pays off on a few benchmarks, but not nearly as well or as often as for addition.  The results in preliminary x86-only tests are noisy and inconclusive enough that I don't feel comfortable with this patch as it stands.  I'd like to do some microbenchmarking of comparisons in isolation, and tests on non-x86 platforms to make sure there aren't any unexpected regressions.
Intuitively, the non-speculative inlining should always pay off, even if only slightly.  I expect to be tied up for a few days with higher-priority tasks, however, so I'm going to post the work in progress just to enter it into the record.  This patch has some hacked-up command line options to enable various optimization scenarios, permitting comparative benchmarking without code-layout artifacts due to building multiple test executables.
Flags: flashplayer-bug-
Depends on: Andre
No longer depends on: Andre
Flags: flashplayer-injection-
Target Milestone: Q3 11 - Serrano → Q1 12 - Brannan
Depends on: Andre
Target Milestone: Q1 12 - Brannan → Q2 12 - Cyril
Removing Andre blocker, as Dan assigned it to Cyril
No longer depends on: Andre
Tamarin is a dead project now. Mass WONTFIX.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: