Last Comment Bug 869525 - Use xor+setcc instead of setcc+movzbl in BaselineIC-x86-shared.cpp
: Use xor+setcc instead of setcc+movzbl in BaselineIC-x86-shared.cpp
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: x86_64 All
: -- enhancement (vote)
: mozilla23
Assigned To: Dan Gohman [:sunfish]
:
Mentors:
Depends on: 869532
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-07 09:53 PDT by Dan Gohman [:sunfish]
Modified: 2013-05-10 01:24 PDT (History)
2 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
a patch containing a proposed fix (1.16 KB, patch)
2013-05-07 09:54 PDT, Dan Gohman [:sunfish]
no flags Details | Diff | Splinter Review
refresh patch to apply to trunk (1.15 KB, patch)
2013-05-09 07:37 PDT, Dan Gohman [:sunfish]
nicolas.b.pierron: review+
Details | Diff | Splinter Review

Description Dan Gohman [:sunfish] 2013-05-07 09:53:21 PDT
BaselineIC-x86-shared.cpp has code that does this:

  ucomisd %xmm0, %xmm1
  seta    %cl
  movzbl  %cl, %ecx

Since it happens to be convenient to do so, this would be better:

  xorl    %ecx, %ecx
  ucomisd %xmm0, %xmm1
  seta    %cl

because it has smaller code size and on common processors today, and the xor of a register with itself is handled specially.
Comment 1 Dan Gohman [:sunfish] 2013-05-07 09:54:09 PDT
Created attachment 746482 [details] [diff] [review]
a patch containing a proposed fix
Comment 2 Dan Gohman [:sunfish] 2013-05-07 10:06:56 PDT
This patch depends on the patch in bug 869532.
Comment 3 Dan Gohman [:sunfish] 2013-05-09 07:37:03 PDT
Created attachment 747421 [details] [diff] [review]
refresh patch to apply to trunk
Comment 4 Nicolas B. Pierron [:nbp] 2013-05-09 09:53:58 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/0f05638c8f26
Comment 5 Jeff Muizelaar [:jrmuizel] 2013-05-09 12:14:48 PDT
Are you sure that this is better? Both gcc and clang use the mozbl variant.
Comment 6 Dan Gohman [:sunfish] 2013-05-09 14:38:08 PDT
GCC (4.6 and 4.8) on my system uses the xor variant, for this C code with -O2 for example:

long foo(double x) { return x > 0; }

The xor variant is part of the "peephole2" pass in GCC, which specifically rewrites setcc+movzbl to xor+setcc when it can.

Also, the xor variant is recommended first by Agner's "Optimizing subroutines in assembly 
language", section "Replacing conditional jumps with conditional set instructions" [0].

Also, Intel uses the xor variant in their x86 optimizing manual [1]. It isn't a specific recommendation, but they do use it in an example (Example 3-2. "Code Optimization to Eliminate Branches").

Also, the xor trick has a smaller overall encoding.

[0] http://www.agner.org/optimize/optimizing_assembly.pdf
[1] http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf
Comment 7 Jeff Muizelaar [:jrmuizel] 2013-05-09 14:48:40 PDT
(In reply to Dan Gohman from comment #6)
> GCC (4.6 and 4.8) on my system uses the xor variant, for this C code with
> -O2 for example:


Very true. I forgot my gcc was very old. I've filed a bug against clang to add the optimization there:
http://llvm.org/bugs/show_bug.cgi?id=15946
Comment 8 Ed Morley [:emorley] 2013-05-10 01:24:23 PDT
https://hg.mozilla.org/mozilla-central/rev/0f05638c8f26

Note You need to log in before you can comment on or make changes to this bug.