413787 - NSToIntRound, NSToCoordRound code generation

Assignee

Description

•

17 years ago

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20080101 BonEcho/2.0.0.11 (mmoy CE K8C-X03)
Build Identifier: Trunk

At -O1 optimization, NSToIntRound generates this code in nsAccessible:

?NS_lroundf@@YAHM@Z PROC				; NS_lroundf, COMDAT

; 62   :     return x >= 0.0f ? PRInt32(x + 0.5f) : PRInt32(x - 0.5f);

	fldz
	fld	DWORD PTR _x$[esp-4]
	fcom	ST(1)
	fnstsw	ax
	fstp	ST(1)
	test	ah, 1
	jne	SHORT $LN3@NS_lroundf
	fadd	QWORD PTR __real@3fe0000000000000
$LN10@NS_lroundf:
	jmp	__ftol2_sse
$LN3@NS_lroundf:
	fsub	QWORD PTR __real@3fe0000000000000
	jmp	SHORT $LN10@NS_lroundf
?NS_lroundf@@YAHM@Z ENDP				; NS_lroundf
_TEXT	ENDS

I'd guess that __ftol2_sse checks for sse support to determine
whether to use SSE or f87 to convert the result to an integer.

so that an actual invocation results in this:

; 337  :   return NSToIntRound(float(aAppUnits) / aAppUnitsPerPixel);

	fild	DWORD PTR _aAppUnits$[ebp]
	push	ecx
	fidiv	DWORD PTR _aAppUnitsPerPixel$[ebp]
	fstp	DWORD PTR tv79[ebp]
	fld	DWORD PTR tv79[ebp]
	fstp	DWORD PTR [esp]
	call	?NS_lroundf@@YAHM@Z			; NS_lroundf
	pop	ecx

The source code for NS_lroundf in nsmathutils.h is:

inline NS_HIDDEN_(PRInt32) NS_lroundf(float x)
{
    return x >= 0.0f ? PRInt32(x + 0.5f) : PRInt32(x - 0.5f);
}

with the compiler choosing not to inline the code despite the inline tag in the source. The thing is that f87 has hardware float to integer with rounding support as does SSE2 but there's no language support to get to the f87 instructions. The rounding mode is controlled by the RC field of the FPU control word but in actual practice with Mozilla, it's always been normal rounding from what I've seen. This would be one way to do it in MSVC and it should be faster than what is there now, even with the additional stack overhead of inline assembler. 

inline PRInt32 NSToIntRound(float aValue)
{
int a;
  __asm{
     fld      aValue
     fistp    a
     }
  return a;
 }

On Mac OSX (Intel) and Windows x64, this could be used:

inline nscoord NSToCoordRound(float aValue)
{
  return (_mm_cvtss_si32(_mm_load_ss(&aValue)));
 }

as these platforms guarantee SSE2.

Reproducible: Always

Steps to Reproduce:
1.
2.
3.

Reed Loden [:reed]

Updated

•

17 years ago

Status: UNCONFIRMED → NEW

Component: XPCOM → GFX

Ever confirmed: true

QA Contact: xpcom → general

Version: unspecified → Trunk

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 1

•

17 years ago

Cripes! That gets called a lot. Will the asm or intrinsics inhibit optimization significantly?

Michael Moy

Assignee

Comment 2

•

17 years ago

There's a good paper at http://ldesoras.free.fr/doc/articles/rounding_en.pdf for hardware, MSVC and C on ceil, floor and on rounding in general.

The inline assembler above is very cheap but this paper indicates that the default rounding mode is really round towards the even integer. But it does provide ceil, floor and rounding functions which are pretty efficient. The round_int routine that it provides matches NSToCoordRound in my testing. I didn't see an equivalent of NSToIntRound though and I'm curious if there's a way to do it without using any conditionals.

These are the results that I found. Quick is the inline assembler above. lroundf is used by NSToIntRound, and round_int is from the paper. Perhaps a starting approach would be to use round_int for NSToCoordRound on Win32.

Number   Quick  lroundf roundint NSToCoordRound
 -3.00     -3     -3       -3          -3
 -2.75     -3     -3       -3          -3
 -2.50     -2     -3       -2          -2
 -2.25     -2     -2       -2          -2
 -2.00     -2     -2       -2          -2
 -1.75     -2     -2       -2          -2
 -1.50     -2     -2       -1          -1
 -1.25     -1     -1       -1          -1
 -1.00     -1     -1       -1          -1
 -0.75     -1     -1       -1          -1
 -0.50      0     -1        0           0
 -0.25      0      0        0           0
  0.00      0      0        0           0
  0.25      0      0        0           0
  0.50      0      1        1           1
  0.75      1      1        1           1
  1.00      1      1        1           1
  1.25      1      1        1           1
  1.50      2      2        2           2
  1.75      2      2        2           2
  2.00      2      2        2           2
  2.25      2      2        2           2
  2.50      2      3        3           3
  2.75      3      3        3           3
  3.00      3      3        3           3

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 3

•

17 years ago

The rounding behaviour is actually quite important, see
https://bugzilla.mozilla.org/show_bug.cgi?id=410748#c14
So we need round_int.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 4

•

17 years ago

In particular, rounding towards the nearest even integer would mean a 1px-wide border covering [0.5, 1.5) would be drawn 2px wide covering [0, 2).

lroundf is rounding away from 0 and has a similar problem, [-0.5, 0.5) becomes [-1, 1].

Jeremy Lea

Comment 5

•

16 years ago

cairo has a special lround() which claims to do what we want and fast, and already in the tree...

http://lxr.mozilla.org/seamonkey/source/gfx/cairo/cairo/src/cairo-misc.c#218

It only works on doubles, but float to double conversion should be cheap?  Maybe similar functions could be done for floor() and ceil().

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 6

•

16 years ago

Oooh, nice catch. I bet a float version of it is possible, even actually slightly more efficient. Anyone want to take that on? It's tricky but on the bright side you can exhaustively test all 2^32 possible float values :-).

Michael Moy

Assignee

Comment 7

•

16 years ago

I took a look at the generated assembler code for Win32 at -O1 for lround and found that it took around 50 cycles on AMD K8. roundint took about 30 cycles. I think that things might be a little closer on Intel processors as Intel has higher memory latencies and roundint has one more memory access compared to lround.

I'm going to try to write a non inline assembler version of roundint to get rid of the inline assembler stack overhead to trim a few cycles.

Michael Moy

Assignee

Comment 8

•

16 years ago

I may have been comparing apples to oranges on the latencies if lroundf is comparable to lround instead of NSToCoordRound.

Michael Moy

Assignee

Comment 9

•

16 years ago

NSToCoordRound is used in 666 modules so it seems that it would be worthwhile to use the routine referenced in the paper. I sent an email to the author (he puts out open source software) to see if he has any restrictions on the code in his paper.

An SSE2 solution using intrinsics for Mac OSX looks possible.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 10

•

16 years ago

_cairo_lround is basically the same as NSToCoordRound and roundint. lroundf is not what we want.

So it sounds like you're saying roundint wins. The only thing is that _cairo_lround works on doubles whereas we only need a version that works on floats, so it may be possible to shave some cycles off the _cairo_lround approach.

Michael Moy

Assignee

Comment 11

•

16 years ago

The roundint approach takes six instructions compared to about 36 for lroundf and I got roundint down to about 21 cycles. Unfortunately I just realized that roundint doesn't cover one billion to two billion as adding the number to itself results in losing a bit. The fix costs nine cycles so I'm back to 30 again.

I have the code for NSToIntRound too and that's about 30 cycles.

I'll take a look at cairo_lround for single-precision.

Michael Moy

Assignee

Comment 12

•

16 years ago

I had a closer look at _cairo_lround and it looks to me to take 38 cycles not including a few pushes and pops and the call overhead. I think that the routine is too large to be inlined by the compiler without using __forceinline. Going from double to single precision looks like it would save about 5 cycles. Three for not having to do put the lower doubleword and higher doubleword together (2 shifts and and 1 or) and the partial savings of a move doubleword as the entire number is in one doubleword for a cost of about 33 cycles.

roundint uses about 20 cycles but doesn't support numbers above 2^30. roundint does inline on -O1 and the code is pretty small so there should be minimal impact on image size. The current NSToCoordRound routine gets inlined but calls out to ftol which is the expensive part. The inlined part alone is probably slower than roundint. It may be valid using roundint for coordinates even if it doesn't support numbers above 2^30 as the mantissa in a single precision floating point number is only 23 bits.

I modified roundint with a fix for numbers between 2^30 and 2^31 by writing out the integer as a 64-bit quadword and then grabbing the lowest bit off the high quad and the upper 31 bits of the low quad and or-ing them together at a cost of three cycles or 23 in all. The resulting routine is not inlined at -O1 optimization unless __forceinline is used.

And I did an NSToIntRound which should be pretty efficient compared to what we are doing now. I'm going to post the patch with the accurate routines for those that want to take a look. I think that roundint should work fine but would be interested in the opinions of others.

The _cairo_lround just requires figuring out the correct constants to use for single instead of double and getting rid of the 64-bit stuff that isn't needed with single precision.

Michael Moy

Assignee

Comment 13

•

16 years ago

Attached patch Accurate version of roundint (obsolete) — Details — Splinter Review

This version of roundint takes about 23 cycles on an AMD K8 chip. An NSToIntRound solution is also provided in the patch and it too is an accurate version.

Michael Moy

Assignee

Comment 14

•

16 years ago

Here's the single precision version of -cairo_lround. It has a limit somewhere between 1,000,000 and 10,000,000. Numbers higher (in absolute value) than the limit return zero. The double version doesn't do this because it has more than 32 bits to work with in the mantissa. Single precision only has 23 bits so it can't represent numbers over the limit. The routine only does right-shift even though the shift_amount can be negative which would make the numbers bigger instead of smaller. A modification could be put in to check the sign of the shift_amount and shift to the left if the shift_amount is negative.

The routine here is 32 cycles compiled with -O1.

int cairo_lround(float x)
{
  unsigned int *i = (int *) &x;
  unsigned int j, k, top, shift_amount;

  j = *i;
  top = j >> 23;
  shift_amount = 149 - (top & 0xff);
  top >>= 8;
  j &= 0x7fffff;   /* mantissa */
  j |= 0x800000;   /* implicit 1 */
  j -= top;
  top--;
  j >>= shift_amount;
  j = (j >> 1) + (j & 1);
  j &= ((shift_amount > 31) - 1);
  j = (j & top) - (j & ~top);
  return j;
}

The patch that I attached doesn't compile in Mozilla as it apparently doesn't handle the float already in the stack so I need to play around with the routines a little.

Michael Moy

Assignee

Comment 15

•

16 years ago

Attached patch Accurate version of roundint (obsolete) — Details — Splinter Review

Mixed up the pointer nomenclature between destination and source operands.

Attachment #307535 - Attachment is obsolete: true

Michael Moy

Assignee

Comment 16

•

16 years ago

Attached patch roundint patches for NSToIntRound and NSToCoordRound (obsolete) — Details — Splinter Review

This is the patch that doesn't handle values above 2^30.

Assignee: nobody → mmoy

Status: NEW → ASSIGNED

Mike Schroepfer

Comment 17

•

16 years ago

Noming since this looks like a good perf win.

Flags: blocking1.9?

Michael Moy

Assignee

Comment 18

•

16 years ago

Attached patch Update for accurate version of NSIntToRound (obsolete) — Details — Splinter Review

Coords are limited to 2^30 so the issue with NSToCoordRound isn't an issue. I changed the NSToIntRound back to use the accurate version

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 19

•

16 years ago

Wouldn't block the release for this, but do want to get it in, especially since Michael has a patch -- ask for review, probably from roc?  I had a look as well, and the patch looked fine.  Do we know whether this will get inlined for sure?  If not, do we want to do the force_inline bits to make sure it is?

After roc's review, request approval-1.9 on it to get it in, thanks!

Flags: wanted1.9+

Flags: blocking1.9?

Flags: blocking1.9-

Michael Moy

Assignee

Updated

•

16 years ago

Attachment #307781 - Flags: review?(roc)

Michael Moy

Assignee

Comment 20

•

16 years ago

Here's a snippet from the assembler listing of nsHTMLReflowState at -O1 which shows that NSToCoordRound is inlined. NSToIntRound isn't inlined unless -O2 is used.

; 2002 :       normalLineHeight = NSToCoordRound(emHeight * NORMAL_LINE_HEIGHT_FACTOR);

	fild	DWORD PTR _emHeight$[ebp]
	fmul	QWORD PTR __real@3ff3333340000000
	fstp	DWORD PTR $T48755[ebp]
	fld	DWORD PTR $T48755[ebp]
	fadd	ST(0), ST(0)
	fadd	DWORD PTR _round_to_nearest
	fistp	DWORD PTR _i$48753[ebp]
	mov	eax, DWORD PTR _i$48753[ebp]
	sar	eax, 1

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 21

•

16 years ago

If you replace
+    mov     eax, dword ptr i
+    sar     eax, 1
with "return i/2", does that generate just as good code but create more potential for optimization, maybe?

+#if defined(XP_WIN32) && defined(_M_IX86) && !defined(__GNUC__)

Would be nice to factor this out so we have #defines identifying various inline assemblers we support. And it would be nice to have this code on Linux and Mac too ... these can be followup issues.

+    mov     eax, dword ptr i
+    shr     eax, 1
+    test    aValue, -2147483648
+    je      positive
+    neg     eax
Likewise, I wonder if writing C code to implement this would give as good code, while giving the compiler more freedom to optimize around.

We should really hunt down users of NSToIntRound and see if they'd be as well off or better using the other rounding mode.

Maybe it would be a good idea to define these function in nsMathUtils and let NSToCoordRound and NSToIntRound be wrappers around them. This isn't really anything to do with coords.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 22

•

16 years ago

Yeah it looks to me like pretty much all users of NSToIntRound should be using the other rounding mode.
http://mxr.mozilla.org/seamonkey/search?string=NSToIntRound
Maybe we can just get rid of it.

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

16 years ago

Blocks: 421165

Michael Moy

Assignee

Comment 23

•

16 years ago

Good points on reverting to C code for as much work as possible. I only wish that Microsoft had intrinsics for f87. It would have been easier to do this in intrinsics and that's what I plan to do with Mac OSX and Win64 as they guarantee SSE2. I'm not that good with GCC-style inline assembler but it's a good thing to learn.

I used NS_lroundf2 for the nsMathUtils.h file and am running a build right now with the modified code for the first inline assembler routine. I'll do the other one later on.

One difference between NSToIntRound and NSToCoordRound is that the latter isn't supported above 2^30 while the former is. There are comments in nsCoord.h to this affect on doing conversions between the two.

On the #ifs for the different kinds of inline assembler and intrinsics, perhaps a bug should be opened for this.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 24

•

16 years ago

Hmm, good point. I guess we need two functions, one which works on the full range of integers and one that only works on valid nscoords.

Michael Moy

Assignee

Comment 25

•

16 years ago

Attached patch Updated patch for nsCoord.h and nsMathUtils.h (obsolete) — Details — Splinter Review

I tried using the compiler for the division stage and the compiler inserts additional instructions (CDQ and SUB) and uses an additional register in the process so I reverted to the inline assembler-only code. If you would still prefer the other code, let me know and I'll change it back.

Attachment #307544 - Attachment is obsolete: true

Attachment #307610 - Attachment is obsolete: true

Attachment #307781 - Attachment is obsolete: true

Attachment #308240 - Flags: review?(roc)

Attachment #307781 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 26

•

16 years ago

Bummer. Oh well.

The code looks fine, but we need better names for these things, and some documentation.

Instead of NS_lroundf2, how about NS_lroundup30? 30 meaning 30 bits of precision, up meaning 0.5 always goes up.

And instead of NS_roundf2, why aren't we just replacing NS_lroundf?

Michael Moy

Assignee

Comment 27

•

16 years ago

Instead of dividing by 2, I can just do

return i >> 1;

for NSToCoordRound and something similar for NSToIntRound.

Michael Moy

Assignee

Comment 28

•

16 years ago

Attached patch Updated with comments and routine name changes (obsolete) — Details — Splinter Review

Attachment #308240 - Attachment is obsolete: true

Attachment #308612 - Flags: review?(roc)

Attachment #308240 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 29

•

16 years ago

+/*
 inline NS_HIDDEN_(PRInt32) NS_lroundf(float x)
 {
     return x >= 0.0f ? PRInt32(x + 0.5f) : PRInt32(x - 0.5f);
 }
+*/

I would just remove this.

+#if defined(XP_WIN32) && defined(_M_IX86) && !defined(__GNUC__)
+const float round_to_nearest = 0.5f;
+#endif

Move this into NS_lroundf and NS_lroundup30 and make it static in each one. Avoids namespace pollution and excessive #ifdefs.

+    if (*xp & -2147483648)

Would be more readable in hex.

+        return -(*ip >> 1);
+    else
+        return (*ip >> 1);

I'd share the *ip >> 1 in a local variable.

Otherwise great.

Michael Moy

Assignee

Comment 30

•

16 years ago

Attached patch Final patch (hopefully) — Details — Splinter Review

I removed the replacement for NSToIntRound. VS2005 improves the conversion from float to int quite a bit and I couldn't beat it. The replacement for NSToCoordRound takes about one-seventh the time than the current version.

Attachment #308612 - Attachment is obsolete: true

Attachment #310454 - Flags: review?(roc)

Attachment #308612 - Flags: review?(roc)

Mike Schroepfer

Comment 31

•

16 years ago

We should get this in for beta5 if we can.

Michael Moy

Assignee

Comment 32

•

16 years ago

This code on Mac OSX uses about 1/3rd the time on my tests compared to NSToCoordRound.

__inline int roundint(float x)
{
  static const float round_to_nearest[1] = {0.5};
 int i;
 __m128 a, b, c;

 a = _mm_load_ss(&x);
 b = _mm_load_ss(round_to_nearest);
 a = _mm_add_ss(a,a);
 a = _mm_add_ss(a,b);

 return (_mm_cvtss_si32(a)) >> 1;
}

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

16 years ago

Attachment #310454 - Flags: superreview+

Attachment #310454 - Flags: review?(roc)

Attachment #310454 - Flags: review+

Reed Loden [:reed]

Updated

•

16 years ago

Attachment #310454 - Flags: approval1.9b5?

Attachment #310454 - Flags: approval1.9?

Jesse Ruderman

Updated

•

16 years ago

Keywords: perf

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 33

•

16 years ago

Michael, one more thing that would be nice is an NS_ASSERTION that the input value is a legal nscoord and not NS_UNCONSTRAINEDSIZE.

I'm kinda wondering whether the fact that this mangles NS_UNCONSTRAINEDSIZE into something negative whereas the old function passed it through could cause problems when it interacts with code that would be technically wrong but currently works. Maybe we should slip it to the next release because of that.

Keywords: perf

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 34

•

16 years ago

It would be good if we had Talos perf or codesize numbers to help make that decision.

Reed Loden [:reed]

Updated

•

16 years ago

Keywords: perf

Mike Connor [:mconnor]

Comment 35

•

16 years ago

Comment on attachment 310454 [details] [diff] [review]
Final patch (hopefully)

I don't think we want to take this post-freeze, the risk isn't well-understood enough at this point.

Attachment #310454 - Flags: approval1.9b5? → approval1.9b5-

Mike Schroepfer

Comment 36

•

16 years ago

(In reply to comment #33)
> Michael, one more thing that would be nice is an NS_ASSERTION that the input
> value is a legal nscoord and not NS_UNCONSTRAINEDSIZE.
> 
> I'm kinda wondering whether the fact that this mangles NS_UNCONSTRAINEDSIZE
> into something negative whereas the old function passed it through could cause
> problems when it interacts with code that would be technically wrong but
> currently works. Maybe we should slip it to the next release because of that.
> 

Is there a way to mitigate that?  

Any ideas on the possible perf impacts here?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 37

•

16 years ago

(In reply to comment #36)
> Is there a way to mitigate that?

Not without eating some of the perf improvement.

> Any ideas on the possible perf impacts here?

Not yet.

Mike Schroepfer

Comment 38

•

16 years ago

Can we do a try server build to test impact?

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 39

•

16 years ago

All tests and reftests pass with this patch on win32; it doesn't compile out of the box on linux/OSX:

../../dist/include/xpcom/nsMathUtils.h: In function 'PRInt32 NS_lroundup30(float)':
../../dist/include/xpcom/nsMathUtils.h:96: error: 'NS_floorf' was not declared in this scope
../../dist/include/xpcom/nsMathUtils.h:96: error: 'nscoord' was not declared in this scope

but that looks like a missing #include.

Tp may have dropped slightly, but the tryserver talos numbers aren't in line with the normal tinderbox ones (for this patch, Tp was 444.. previous 2 runs on the tryserver were 446 and 448; main tinderbox Tp is in the 480 range).

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 40

•

16 years ago

For the #include issue, I'll have to move the arch/compiler selection up to nscoord.h instead of nsMathUtils.h, but it's straightforward.  I'm thinking not for b5 at this point, though.

Michael Moy

Assignee

Comment 41

•

16 years ago

The problem with the patch in OSX and Linux is that the definitions for the non-Windows case needs functions defined later in the module.

Single-precision floating point numbers only have accuracy to about 8 million or about 4 million when you include the sign. Beyond that, you get coarse rounding.

I came up with another alternative that handles numbers from -(2^31) to +(2^31) with performance comparable to the previous assembler routine without any assembler code that inlines at -O1 on Windows.

inline nscoord NSToCoordRound(float aValue)
{
  return nscoord(((unsigned int)(x + 2147483648.5) - (unsigned int) 0x80000000));
}

It produces this code as a function. Of course the first and last instructions can be optimized inline by the compiler.

	fld	DWORD PTR _x$[esp-4]
	fadd	QWORD PTR __real@41e0000000100000
	call	__ftol2
	sub	eax, -2147483648			; 80000000H

The approach is simple for rounding: add 2^31 and then round and then subtract out the 2^31. This will provide the floor behaviour for positive and negative numbers in the range as it makes negative numbers positive.

I need to do some testing on the Mac OSX side with this.

As an interesting aside, I tried this with -O2 optimization and found that the compiler generates slower code for converting floating points to longs with truncation. It uses floating point control word code with -O2 instead of using the faster library code. With -O1, my bench took 3 seconds, with -O2, it too 6 seconds and with the current floorf approach, it took 14 seconds.

Michael Moy

Assignee

Comment 42

•

16 years ago

Some statistics on Windows at -O1 and -O2. The program also checks values from -(2^31) to 2^31. The test program is the next attachment. I'm going to try it on Mac OSX next.

F:\mozilla>round1
249999999
249999999
start = 0, end1 = 3765, end2 = 30640
roundint = 3765 milliseconds  NSToCoordRound = 26875 milliseconds
roundint percentage of NSToCoordRound = 14.009302

checking from -2147483648.000000 to -1000000000.000000 by 0.25
checking from -1000000000.000000 to 0.000000 by 0.25
checking from 0.000000 to 1000000000.000000 by 0.25
checking from 1000000000.000000 to 2147483648.000000 by 0.25

F:\mozilla>cl -FAs -O2 round1.c
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

round1.c
Microsoft (R) Incremental Linker Version 8.00.50727.42
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:round1.exe
round1.obj

F:\mozilla>round1
249999999
249999999
start = 0, end1 = 3796, end2 = 15125
roundint = 3796 milliseconds  NSToCoordRound = 11329 milliseconds
roundint percentage of NSToCoordRound = 33.506929

Michael Moy

Assignee

Comment 43

•

16 years ago

Attached file Test program for timing and checking results — Details

Mike Schroepfer

Comment 44

•

16 years ago

Michael - does this new version produce results exactly the same as the old one?

Mike Schroepfer

Comment 45

•

16 years ago

Comment on attachment 312209 [details]
Test program for timing and checking results

I'd like to try the new version to see affects on Tp - if anything goes screwy let's back it out asap and differ to .next..

Attachment #312209 - Flags: approval1.9+

Reed Loden [:reed]

Updated

•

16 years ago

Keywords: checkin-needed

Reed Loden [:reed]

Comment 46

•

16 years ago

hmm, schrep, did you perhaps mean to approve attachment 310454 [details] [diff] [review] instead?

Keywords: checkin-needed

Mike Schroepfer

Updated

•

16 years ago

Attachment #310454 - Flags: approval1.9? → approval1.9+

Mike Schroepfer

Comment 47

•

16 years ago

Yes - my greasemonkey script got the best of me..

Reed Loden [:reed]

Updated

•

16 years ago

Keywords: checkin-needed

Michael Moy

Assignee

Comment 48

•

16 years ago

A few comments:

I ran a comparison of values from -2^31 to 2^31 on Windows and found no differences in values. The test checks values in increments of 0.25. That's what the test program is for besides benchmarking. I modified the program this morning to directly use floats as the current attachment program uses a double in the loop and I needed to test with floats as that's the argument type. Everything was fine.

Things are quite different on Mac OSX though. On Mac OSX, my code is 2% slower because the codegen adds the floating point control stuff to my code and the floor function is apparently much faster on Mac OSX. I also ran the verification stuff and I think that the Mac OSX results are incorrect for the current code - that is the rounding isn't done correctly for some intervals below about -8 million and above +8 million. I'd have to do this code for just Windows as I think that I want to understand better what's going on with Mac OSX.

Reed Loden [:reed]

Comment 49

•

16 years ago

Checking in gfx/public/nsCoord.h;
/cvsroot/mozilla/gfx/public/nsCoord.h,v  <--  nsCoord.h
new revision: 1.13; previous revision: 1.12
done
Checking in xpcom/ds/nsMathUtils.h;
/cvsroot/mozilla/xpcom/ds/nsMathUtils.h,v  <--  nsMathUtils.h
new revision: 3.2; previous revision: 3.1
done

Status: ASSIGNED → RESOLVED

Closed: 16 years ago

Keywords: checkin-needed

Resolution: --- → FIXED

Target Milestone: --- → mozilla1.9

Bob Clary [:bc] (inactive)

Comment 50

•

16 years ago

I think this busted me on linux and several on the Firefox tbox:

In file included from ../../dist/include/layout/nsIPresShell.h:58,
                 from /work/mozilla/builds/1.9.0/mozilla/uriloader/base/nsDocLoader.cpp:58:
../../dist/include/gfx/nsCoord.h: In function ?nscoord NSToCoordRound(float)?:
../../dist/include/gfx/nsCoord.h:298: error: ?x? was not declared in this scope

Reed Loden [:reed]

Comment 51

•

16 years ago

No updated patch was attached that addressed comment #39, so this definitely burned when it hit the tree since I didn't see that comment. I took vlad's advice in comment #40 and fixed the problem that way, but I forgot to rename a variable, so that's what that issue is. I've renamed the variable, so it should (hopefully) work now.

Rick Stockton

Comment 52

•

16 years ago

WE'RE NOW FAILING ACID2, and I think that this change to round-off calculations did it.

On the small black squares on either side of the chin (not the forehead),, we now display TALLER than the reference image. The way the chin div height is calculated is influenced by the div inside of it- we round up to 13 px.

Rick Stockton

Comment 53

•

16 years ago

I see that new bug # 426616 has been created for this regression. I'm not sure that this code caused the change, but let's have someone smarter than me check it out...

Dão Gottwald [:dao]

Updated

•

16 years ago

Depends on: 426616

Reed Loden [:reed]

Updated

•

16 years ago

No longer depends on: 426616

Michael Moy

Assignee

Comment 54

•

16 years ago

I think that something is changing the FPU rounding mode and isn't changing it back which means that this assumption (that noone changes it) is incorrect. Could someone point me how I can run ACID2?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 55

•

16 years ago

This bug did not cause the problem. See https://bugzilla.mozilla.org/show_bug.cgi?id=426616#c9

Rick Stockton

Comment 56

•

16 years ago

Thanks, ROC, for taking a look and determining that this WASN'T the cause. I was unsure if the "hopefully final" patch of March 19 had ever actually been applied. And yes, you're right that I DID NOT not the failure occur right around the 4/1 patch-- it's been present for at least a week, and maybe months.

I'll head over to 426616 and see what smarter heads than mine have been brought to bear over there. :))

Nobody; OK to take it and work on it

Updated

•

16 years ago

Product: Core → Core Graveyard

(Away)

Updated

•

10 years ago

Depends on: 1024274

Accurate version of roundint 16 years ago Michael Moy 2.22 KB, patch		Details \| Diff \| Splinter Review
Accurate version of roundint 16 years ago Michael Moy 2.22 KB, patch		Details \| Diff \| Splinter Review
roundint patches for NSToIntRound and NSToCoordRound 16 years ago Michael Moy 2.46 KB, patch		Details \| Diff \| Splinter Review
Update for accurate version of NSIntToRound 16 years ago Michael Moy 2.51 KB, patch		Details \| Diff \| Splinter Review
Updated patch for nsCoord.h and nsMathUtils.h 16 years ago Michael Moy 3.37 KB, patch		Details \| Diff \| Splinter Review
Updated with comments and routine name changes 16 years ago Michael Moy 4.65 KB, patch		Details \| Diff \| Splinter Review
Final patch (hopefully) 16 years ago Michael Moy 3.06 KB, patch	roc : review+ roc : superreview+ mconnor : approval1.9b5- mtschrep : approval1.9+	Details \| Diff \| Splinter Review
Test program for timing and checking results 16 years ago Michael Moy 1.34 KB, text/plain	mtschrep : approval1.9+	Details