Closed Bug 226619 Opened 21 years ago Closed 21 years ago

Peculiar code in DivideBCBorderSize

Categories

(Core :: Layout: Tables, defect)

defect
Not set
trivial

Tracking

()

RESOLVED FIXED

People

(Reporter: MatsPalmgren_bugz, Assigned: MatsPalmgren_bugz)

References

()

Details

Attachments

(1 file)

From nsTableFrame.cpp: static void DivideBCBorderSize(nscoord aPixelSize, nscoord& aSmallHalf, nscoord& aLargeHalf) { aSmallHalf = aPixelSize / 2; aLargeHalf = ((aSmallHalf + aSmallHalf) < aPixelSize) ? aSmallHalf + 1 : aSmallHalf; } I think it would be more readable (and faster) as: aSmallHalf = aPixelSize / 2; aLargeHalf = aPixelSize - aSmallHalf;
Attached patch Patch rev. 1Splinter Review
Comment on attachment 136205 [details] [diff] [review] Patch rev. 1 r=bzbarsky, marking the sr=dbaron. Would be good to not lose this before 1.7a opens...
Attachment #136205 - Flags: superreview+
Attachment #136205 - Flags: review+
fix checked in
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: table → mats.palmgren
Status: REOPENED → NEW
Status: NEW → RESOLVED
Closed: 21 years ago21 years ago
Resolution: --- → FIXED
Why not convert this function into a macro (or at least an 'inline') call? In most of the cases only aSmallHalf or aLargeHalf is used: 2770 DivideBCBorderSize(propData->mTopBorderWidth, smallHalf, largeHalf); 2771 border.top += NSToCoordRound(p2t * (float)smallHalf); 2772 2773 DivideBCBorderSize(propData->mRightBorderWidth, smallHalf, largeHalf); 2774 border.right += NSToCoordRound(p2t * (float)largeHalf); Why not: #define DIV_SMALL(aPixelSize ) ((aPixelSize) / 2) #define DIV_LARGE(aPixelSize ) ((aPixelSize) - DIV_SMALL(aPixelSize) ) This prevents function calls, and the code overhead of the function itself. The DIV_SMALL compiles to one instruction, the DIV_LARGE to two cpu instructions which is still way less than a function call with three arguments.
On -O2 -GL builds, largeHalf gets optimized out if it isn't needed. But we don't compile that way on Windows so Inline has merit. I think that aSmallHalf = aPixelSize >> 1; aLargeHalf = aSmallHalf + (aPixelSize & 1); is better code as you don't have the baggage associated with division. The code generated (-O2 -GL) with the divide and subtract is a bit odd: movzx edi, BYTE PTR [ecx+17] .... mov eax, edi (clock 1) cdq (clock 2) sub eax, edx (clock 3) sar eax, 1 (clock 4) sub edi, eax (clock 5) The shift+add should look like: movzx edi, BYTE PTR [ecx+17] .... mov eax, edi (clock 1) sar eax, 1 (clock 2) and edi, 1 (clock 2) add eax, edi (clock 3) The first can't do instruction parallelism due to dependencies from instruction to instruction. The latter can do parallelism because instructions 2 and 3 aren't dependent. (In reply to comment #5) > Why not convert this function into a macro (or at least an 'inline') call? > > In most of the cases only aSmallHalf or aLargeHalf is used: > 2770 DivideBCBorderSize(propData->mTopBorderWidth, smallHalf, largeHalf); > 2771 border.top += NSToCoordRound(p2t * (float)smallHalf); > 2772 > 2773 DivideBCBorderSize(propData->mRightBorderWidth, smallHalf, largeHalf); > 2774 border.right += NSToCoordRound(p2t * (float)largeHalf); > > Why not: > #define DIV_SMALL(aPixelSize ) ((aPixelSize) / 2) > #define DIV_LARGE(aPixelSize ) ((aPixelSize) - DIV_SMALL(aPixelSize) ) > > This prevents function calls, and the code overhead of the function itself. > The DIV_SMALL compiles to one instruction, the DIV_LARGE to two cpu instructions > which is still way less than a function call with three arguments.
Summary of some testing: No changes to the code let to 10 instructions, 2 calls (excluding the float calculations for the following code: nscoord smallHalf, largeHalf; DivideBCBorderSize(propData->mTopBorderWidth, smallHalf, largeHalf); border.top += NSToCoordRound(p2t * (float)smallHalf); DivideBCBorderSize(propData->mRightBorderWidth, smallHalf, largeHalf); border.right += NSToCoordRound(p2t * (float)largeHalf); Variations tested: Added inline: 15 instr, 0 calls (excl. fmul till __ftol) With 'inline', and the smallHalf=x>>1, smallHalf+x&1 trick: 14 instructions, 0 calls (excl. fmul till __ftol) With macros DIV_SMALL/DIV_LARGE: #define DIV_SMALL(aPixelSize ) ((aPixelSize) >> 1) #define DIV_LARGE(aPixelSize ) ((aPixelSize) - DIV_SMALL(aPixelSize) ) Code becomes: border.top += NSToCoordRound(p2t * (float)DIV_SMALL(propData->mTopBorderWidth)); border.right += NSToCoordRound(p2t * (float)DIV_LARGE(propData->mRightBorderWidth)); 13 instructions, 0 calls (excl. fmul till __ftol) Alternative Macro's: #define DIV_SMALL(aPixelSize ) ((aPixelSize) >> 1) #define DIV_LARGE(aPixelSize ) (DIV_SMALL(aPixelSize) + ((aPixelSize)&1) ) 14 instructions, 0 calls (excl. fmul till __ftol) Probably the easiest change with the most results is to add 'inline' to the DivideBCBorderSize function. Changing the calculations as per last comment, results then in 14 instructions. (Using macros one could further reduce to 13, but than means changes over a lot of places (in one file however)). So 'inline' please, less code per call and overall, and faster as well...
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: