Closed Bug 226619 Opened 20 years ago Closed 20 years ago

Peculiar code in DivideBCBorderSize

Categories

(Core :: Layout: Tables, defect)

defect
Not set
trivial

Tracking

()

RESOLVED FIXED

People

(Reporter: MatsPalmgren_bugz, Assigned: MatsPalmgren_bugz)

References

()

Details

Attachments

(1 file)

From nsTableFrame.cpp:

static void
DivideBCBorderSize(nscoord  aPixelSize,
                   nscoord& aSmallHalf,
                   nscoord& aLargeHalf)
{
  aSmallHalf = aPixelSize / 2;
  aLargeHalf = ((aSmallHalf + aSmallHalf) < aPixelSize) ? aSmallHalf + 1 :
aSmallHalf;
}


I think it would be more readable (and faster) as:

  aSmallHalf = aPixelSize / 2;
  aLargeHalf = aPixelSize - aSmallHalf;
sr=dbaron
Attached patch Patch rev. 1Splinter Review
Comment on attachment 136205 [details] [diff] [review]
Patch rev. 1

r=bzbarsky, marking the sr=dbaron.

Would be good to not lose this before 1.7a opens...
Attachment #136205 - Flags: superreview+
Attachment #136205 - Flags: review+
fix checked in
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: table → mats.palmgren
Status: REOPENED → NEW
Status: NEW → RESOLVED
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
Why not convert this function into a macro (or at least an 'inline') call?

In most of the cases only aSmallHalf or aLargeHalf is used:
2770       DivideBCBorderSize(propData->mTopBorderWidth, smallHalf, largeHalf);
2771       border.top += NSToCoordRound(p2t * (float)smallHalf);
2772 
2773       DivideBCBorderSize(propData->mRightBorderWidth, smallHalf, largeHalf);
2774       border.right += NSToCoordRound(p2t * (float)largeHalf);

Why not:
#define DIV_SMALL(aPixelSize ) ((aPixelSize) / 2)
#define DIV_LARGE(aPixelSize ) ((aPixelSize) - DIV_SMALL(aPixelSize) )

This prevents function calls, and the code overhead of the function itself.
The DIV_SMALL compiles to one instruction, the DIV_LARGE to two cpu instructions
which is still way less than a function call with three arguments.
On -O2 -GL builds, largeHalf gets optimized out if it isn't needed. But we
don't compile that way on Windows so Inline has merit. I think that

aSmallHalf = aPixelSize >> 1;
aLargeHalf = aSmallHalf + (aPixelSize & 1);

is better code as you don't have the baggage associated with division. The
code generated (-O2 -GL) with the divide and subtract is a bit odd:

        movzx	edi, BYTE PTR [ecx+17]
....
	mov	eax, edi  (clock 1)
	cdq               (clock 2)
	sub	eax, edx  (clock 3)
	sar	eax, 1    (clock 4)
	sub	edi, eax  (clock 5)

The shift+add should look like:

        movzx   edi, BYTE PTR [ecx+17]
....
        mov     eax, edi  (clock 1)
        sar     eax, 1    (clock 2)
        and     edi, 1    (clock 2)
        add     eax, edi  (clock 3)

The first can't do instruction parallelism due to dependencies from instruction
to instruction. The latter can do parallelism because instructions 2 and 3
aren't dependent.

(In reply to comment #5)
> Why not convert this function into a macro (or at least an 'inline') call?
> 
> In most of the cases only aSmallHalf or aLargeHalf is used:
> 2770       DivideBCBorderSize(propData->mTopBorderWidth, smallHalf, largeHalf);
> 2771       border.top += NSToCoordRound(p2t * (float)smallHalf);
> 2772 
> 2773       DivideBCBorderSize(propData->mRightBorderWidth, smallHalf, largeHalf);
> 2774       border.right += NSToCoordRound(p2t * (float)largeHalf);
> 
> Why not:
> #define DIV_SMALL(aPixelSize ) ((aPixelSize) / 2)
> #define DIV_LARGE(aPixelSize ) ((aPixelSize) - DIV_SMALL(aPixelSize) )
> 
> This prevents function calls, and the code overhead of the function itself.
> The DIV_SMALL compiles to one instruction, the DIV_LARGE to two cpu instructions
> which is still way less than a function call with three arguments.
Summary of some testing:
No changes to the code let to 10 instructions, 2 calls (excluding the float
calculations for the following code:
      nscoord smallHalf, largeHalf;

      DivideBCBorderSize(propData->mTopBorderWidth, smallHalf, largeHalf);
      border.top += NSToCoordRound(p2t * (float)smallHalf);

      DivideBCBorderSize(propData->mRightBorderWidth, smallHalf, largeHalf);
      border.right += NSToCoordRound(p2t * (float)largeHalf);

Variations tested:
Added inline: 
     15 instr, 0 calls (excl. fmul till __ftol)
With 'inline', and the smallHalf=x>>1, smallHalf+x&1 trick: 
     14 instructions, 0 calls (excl. fmul till __ftol)
With macros DIV_SMALL/DIV_LARGE:
#define DIV_SMALL(aPixelSize ) ((aPixelSize) >> 1)
#define DIV_LARGE(aPixelSize ) ((aPixelSize) - DIV_SMALL(aPixelSize) )
Code becomes:
      border.top    += NSToCoordRound(p2t *
(float)DIV_SMALL(propData->mTopBorderWidth));
      border.right  += NSToCoordRound(p2t *
(float)DIV_LARGE(propData->mRightBorderWidth));
     13 instructions, 0 calls (excl. fmul till __ftol)

Alternative Macro's:
#define DIV_SMALL(aPixelSize ) ((aPixelSize) >> 1)
#define DIV_LARGE(aPixelSize ) (DIV_SMALL(aPixelSize) + ((aPixelSize)&1) )
     14 instructions, 0 calls (excl. fmul till __ftol)

Probably the easiest change with the most results is to add 'inline' to the 
DivideBCBorderSize function. Changing the calculations as per last comment,
results then in 14 instructions. 
(Using macros one could further reduce to 13, but than means changes over a lot
of places (in one file however)).

So 'inline' please, less code per call and overall, and faster as well...
You need to log in before you can comment on or make changes to this bug.