Last Comment Bug 615279 - JM: Undo ADD and INC/DEC in overflow path to avoid unnecessary syncs/moves
: JM: Undo ADD and INC/DEC in overflow path to avoid unnecessary syncs/moves
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal (vote)
: ---
Assigned To: general
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-11-29 11:19 PST by Brian Hackett (:bhackett)
Modified: 2011-10-21 04:15 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
JM patch for ADD (4.23 KB, patch)
2010-11-29 11:28 PST, Brian Hackett (:bhackett)
no flags Details | Diff | Splinter Review

Description Brian Hackett (:bhackett) 2010-11-29 11:19:54 PST
When doing an ADD under heavy register pressure, JM reuses an operand register for the result.  In order to ensure that the operand can be rematerialized in the slow path if the ADD overflows, JM ensures that operand is synced, issuing a store before the operation if necessary.  In the JM branch, this happens three times in the am3 loop on v8-crypto.  These stores are unnecessary, as if the ADD overflows the original operands can still be recovered:

add x y
jo
> neg x
> add x y
> neg x
> sync x, y, call stub

This works even if the negs overflow (x == 0x80000000).

This also applies to INC/DEC operations, which will use an extra register and issue an extra move to hold the old value in case of overflow.  Not such a big deal for am3, maybe wait for INC/DEC ops to be coalesced into adds (bug 610618?).  This should also be doable for SUB, but that case is more complicated (not commutative) and doesn't bite on am3.
Comment 1 Brian Hackett (:bhackett) 2010-11-29 11:28:58 PST
Created attachment 493743 [details] [diff] [review]
JM patch for ADD

Applied this patch to JM, but could apply to TM too.  This only handles ADD operations.

http://hg.mozilla.org/projects/jaegermonkey/rev/e18996c2a36f
Comment 2 David Mandelin [:dmandelin] 2010-11-29 13:12:16 PST
(In reply to comment #1)
> Created attachment 493743 [details] [diff] [review]
> JM patch for ADD
> 
> Applied this patch to JM, but could apply to TM too.  This only handles ADD
> operations.

Does the patch speed up v8-crypto or some other benchmark of interest?
Comment 3 Brian Hackett (:bhackett) 2010-11-29 13:23:50 PST
With type inference this gets me a couple ms on v8-crypto (will apply to TM to measure without inference).  This plus a fix for types in arginc (http://hg.mozilla.org/projects/jaegermonkey/rev/30ffdc01adf2) removed 9 memory accesses from the loop in am3, and cut 6ms from my time on v8-crypto (62ms -> 56ms).

This loop has very high register pressure, so is a good regalloc test case.  It also has a pretty long linear body, so the current regalloc does a good job with it.  After the above two patches, there are still 33 memory accesses in this loop, 13 of which are necessary for getelem/setelems.  Of the remaining 20, 7 can be removed with a linear scan regalloc (WIP), several more by using ebp as a GPR.

Note You need to log in before you can comment on or make changes to this bug.