Peculiar code in DivideBCBorderSize

RESOLVED FIXED

Status

()

Core
Layout: Tables
--
trivial
RESOLVED FIXED
15 years ago
14 years ago

People

(Reporter: mats, Assigned: mats)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment)

(Assignee)

Description

15 years ago
From nsTableFrame.cpp:

static void
DivideBCBorderSize(nscoord  aPixelSize,
                   nscoord& aSmallHalf,
                   nscoord& aLargeHalf)
{
  aSmallHalf = aPixelSize / 2;
  aLargeHalf = ((aSmallHalf + aSmallHalf) < aPixelSize) ? aSmallHalf + 1 :
aSmallHalf;
}


I think it would be more readable (and faster) as:

  aSmallHalf = aPixelSize / 2;
  aLargeHalf = aPixelSize - aSmallHalf;
Comment on attachment 136205 [details] [diff] [review]
Patch rev. 1

r=bzbarsky, marking the sr=dbaron.

Would be good to not lose this before 1.7a opens...
Attachment #136205 - Flags: superreview+
Attachment #136205 - Flags: review+

Comment 4

15 years ago
fix checked in
Status: NEW → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED
(Assignee)

Updated

15 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Updated

15 years ago
Assignee: table → mats.palmgren
Status: REOPENED → NEW
(Assignee)

Updated

15 years ago
Status: NEW → RESOLVED
Last Resolved: 15 years ago15 years ago
Resolution: --- → FIXED

Comment 5

15 years ago
Why not convert this function into a macro (or at least an 'inline') call?

In most of the cases only aSmallHalf or aLargeHalf is used:
2770       DivideBCBorderSize(propData->mTopBorderWidth, smallHalf, largeHalf);
2771       border.top += NSToCoordRound(p2t * (float)smallHalf);
2772 
2773       DivideBCBorderSize(propData->mRightBorderWidth, smallHalf, largeHalf);
2774       border.right += NSToCoordRound(p2t * (float)largeHalf);

Why not:
#define DIV_SMALL(aPixelSize ) ((aPixelSize) / 2)
#define DIV_LARGE(aPixelSize ) ((aPixelSize) - DIV_SMALL(aPixelSize) )

This prevents function calls, and the code overhead of the function itself.
The DIV_SMALL compiles to one instruction, the DIV_LARGE to two cpu instructions
which is still way less than a function call with three arguments.

Comment 6

14 years ago
On -O2 -GL builds, largeHalf gets optimized out if it isn't needed. But we
don't compile that way on Windows so Inline has merit. I think that

aSmallHalf = aPixelSize >> 1;
aLargeHalf = aSmallHalf + (aPixelSize & 1);

is better code as you don't have the baggage associated with division. The
code generated (-O2 -GL) with the divide and subtract is a bit odd:

        movzx	edi, BYTE PTR [ecx+17]
....
	mov	eax, edi  (clock 1)
	cdq               (clock 2)
	sub	eax, edx  (clock 3)
	sar	eax, 1    (clock 4)
	sub	edi, eax  (clock 5)

The shift+add should look like:

        movzx   edi, BYTE PTR [ecx+17]
....
        mov     eax, edi  (clock 1)
        sar     eax, 1    (clock 2)
        and     edi, 1    (clock 2)
        add     eax, edi  (clock 3)

The first can't do instruction parallelism due to dependencies from instruction
to instruction. The latter can do parallelism because instructions 2 and 3
aren't dependent.

(In reply to comment #5)
> Why not convert this function into a macro (or at least an 'inline') call?
> 
> In most of the cases only aSmallHalf or aLargeHalf is used:
> 2770       DivideBCBorderSize(propData->mTopBorderWidth, smallHalf, largeHalf);
> 2771       border.top += NSToCoordRound(p2t * (float)smallHalf);
> 2772 
> 2773       DivideBCBorderSize(propData->mRightBorderWidth, smallHalf, largeHalf);
> 2774       border.right += NSToCoordRound(p2t * (float)largeHalf);
> 
> Why not:
> #define DIV_SMALL(aPixelSize ) ((aPixelSize) / 2)
> #define DIV_LARGE(aPixelSize ) ((aPixelSize) - DIV_SMALL(aPixelSize) )
> 
> This prevents function calls, and the code overhead of the function itself.
> The DIV_SMALL compiles to one instruction, the DIV_LARGE to two cpu instructions
> which is still way less than a function call with three arguments.

Comment 7

14 years ago
Summary of some testing:
No changes to the code let to 10 instructions, 2 calls (excluding the float
calculations for the following code:
      nscoord smallHalf, largeHalf;

      DivideBCBorderSize(propData->mTopBorderWidth, smallHalf, largeHalf);
      border.top += NSToCoordRound(p2t * (float)smallHalf);

      DivideBCBorderSize(propData->mRightBorderWidth, smallHalf, largeHalf);
      border.right += NSToCoordRound(p2t * (float)largeHalf);

Variations tested:
Added inline: 
     15 instr, 0 calls (excl. fmul till __ftol)
With 'inline', and the smallHalf=x>>1, smallHalf+x&1 trick: 
     14 instructions, 0 calls (excl. fmul till __ftol)
With macros DIV_SMALL/DIV_LARGE:
#define DIV_SMALL(aPixelSize ) ((aPixelSize) >> 1)
#define DIV_LARGE(aPixelSize ) ((aPixelSize) - DIV_SMALL(aPixelSize) )
Code becomes:
      border.top    += NSToCoordRound(p2t *
(float)DIV_SMALL(propData->mTopBorderWidth));
      border.right  += NSToCoordRound(p2t *
(float)DIV_LARGE(propData->mRightBorderWidth));
     13 instructions, 0 calls (excl. fmul till __ftol)

Alternative Macro's:
#define DIV_SMALL(aPixelSize ) ((aPixelSize) >> 1)
#define DIV_LARGE(aPixelSize ) (DIV_SMALL(aPixelSize) + ((aPixelSize)&1) )
     14 instructions, 0 calls (excl. fmul till __ftol)

Probably the easiest change with the most results is to add 'inline' to the 
DivideBCBorderSize function. Changing the calculations as per last comment,
results then in 14 instructions. 
(Using macros one could further reduce to 13, but than means changes over a lot
of places (in one file however)).

So 'inline' please, less code per call and overall, and faster as well...
You need to log in before you can comment on or make changes to this bug.