Closed Bug 61898 Opened 24 years ago Closed 14 years ago

JS number ops not optimized for integer domain

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: brendan, Unassigned)

References

Details

(Keywords: js1.5)

Attachments

(15 files)

82.86 KB, patch
Details | Diff | Splinter Review
83.15 KB, patch
Details | Diff | Splinter Review
29.78 KB, patch
Details | Diff | Splinter Review
113.55 KB, patch
Details | Diff | Splinter Review
114.13 KB, patch
Details | Diff | Splinter Review
31.00 KB, patch
Details | Diff | Splinter Review
118.10 KB, patch
Details | Diff | Splinter Review
32.91 KB, patch
Details | Diff | Splinter Review
499 bytes, patch
Details | Diff | Splinter Review
118.09 KB, patch
Details | Diff | Splinter Review
96.53 KB, patch
Details | Diff | Splinter Review
2.07 KB, patch
Details | Diff | Splinter Review
5.91 KB, patch
Details | Diff | Splinter Review
5.86 KB, patch
Details | Diff | Splinter Review
2.65 KB, patch
Details | Diff | Splinter Review
The trick is testing whether an op will overflow and require a double, and test
cheaply enough that it's worth the int speedup and the branch penalties.

See bug 54743, the early comments.

/be
Status: NEW → ASSIGNED
Keywords: js1.5, mozilla0.8
Target Milestone: --- → mozilla0.8
I'm likely to morph this bug into a JS performance catch-all, or (if I get more
scrupulous) file a bunch of bugs that the next patch fixes.  This patch brings
SpiderMonkey performance to the front rank in nombas.com's (questionable, to say
the least) benchmark mentioned at
http://support.nombas.com/scripts/rightnow.exe/getfaq?11-000818-0003.

/be
Oh, one more optimization, long overdue (and I wish I had let JSClass hooks be
stubbed by NULL, as that's even faster to test than == JS_PropertyStub): getter
and setter calls to JS_PropertyStub are avoided, to preserve any caller-saved
registers and pay for a branch rather than a function call.

The last patch broke XBL calls to JS_EvaluateUCScriptForPrincipals when cx->fp
was non-null.  Better patch coming up.

/be
Probably another patch coming, to fix the error reporting in jsemit.c (other
than JS_ReportOutOfMemory) to cite filename and lineno in an error report
struct.  There are two messages in js.msg that use "{0}, line {1}: " to imitate
parse-time errors reported by js_ReportCompileErrorNumber.

The better fix is to make js_ReportCompileErrorNumber take either a ts, or a cg,
and get filename and lineno from whichever is not null.  Then user-defined error
reporters receive report structs with useful filename and lineno members.

/be
The JSOP_ARGUMENTS, JSOP_ARGSUB, and JSOP_ARGCNT bytecodes optimize uses (but
not sets) of arguments, arguments[i], and arguments.length, respectively.  But
if someone assigns to arguments in a function, that predefined local variable
binds to a new value (the right-hand side of the assignment).  Formerly, the
compiler looked for such assignments and set TCF_FUN_SETS_ARGS in tc->flags
(also setting TCF_FUN_HEAVYWEIGHT, so that activations of the function get a
Call object).

But such compile-time analysis couldn't catch eval(someStr) where someStr at
run-time was set to 'arguments=42' or 'arguments=[1,2,3]'.  What's more, Call
object setter code in jsfun.c knows to set an override bit in fp->overrides for
this case, and the getter cooperates.  So the previous patch in this bug fixed
the runtime code in jsinterp.c for JSOP_ARGUMENTS to check whether arguments was
overridden.  If so, it got the value (object or not!) from fp->callobj.  If not
it creates (if necessary) and gets fp->argsobj.

The same drill is needed for JSOP_ARGSUB and JSOP_ARGCNT, and the latest patch
fixes these cases, and eliminates the insufficient (because of potential eval)
TCF_FUN_SETS_ARGS flags, simplifying the compiler slightly.

Please review, this is all but done.

/be
Ok, now I'm done (forgot to rerun the testsuite after the last change, which by
removing the TCF_FUN_SETS_ARGS flag "optimized" 'arguments = 42' into a
JSOP_ARGUMENTS call, which left the stack unbalanced high).

/be
To recap: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=23334 is the
full diff -u patch;
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=23333 is the two-liner
relative to the previous patch's jsemit.c, showing the JSOP_ARGUMENTS
specialization fix; and
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=23325 is the
still-valid diff -wu of jsinterp.c, where -w helps show the significant changes
(a lot of local macros got JS_BEGIN/END_MACRO brackets for consistency and
slight future-proofing).

Still looking for r= from rogerl and/or jband, then I'll mail shaver and cc:
reviewers@mozilla.org.

/be
I read it thru, applied it on Windows and ran the test suite. Everything looked 
fine. I'd prefer it if jband r='d also since this is so big.
Did you consider detecting integer binary operands and performing an integer 
operation, falling back to double only on overflow? 
Seems a waste to change reportCompilerError only to pass NULL most of the time?
>Did you consider detecting integer binary operands and performing an integer 
>operation, falling back to double only on overflow? 

That's what the patch does, for ++ and --.  It's harder (not by much for +) to
do the same for the binary operators, but it could be done.  Are you doing any
such for JS2?

>Seems a waste to change reportCompilerError only to pass NULL most of the time?

Yes, but I swag the code bloat of that extra argument as small, and the cycle
cost penalizes only error cases.  We could abstract file/line into some common
"base class" (poor man's MI in C using nested structs) but we would still want
to pass ts for the line buffer.  Unless we abstracted that too, and made the
cg-derivation get the line, somehow.  Hmm.  For now I'd like to get this in, but
I may file a bug asking for line buffer excerpting for error reports from the
code generator.

jband, can you r=?  Thanks.

/be
> Are you doing any such for JS2?

Well, sort of - when we can infer or know the type. In the general case our 
current prototype doesn't do anything different.

By the way, I stepped through the assembly for JSOP_SUB, and the typical path 
looks pretty code (even in the debug build). The only sad part is the call to 
_ftol ( which is generated by '(jsint)d' ). This routine is very careful to 
handle pending FP exceptions as it manipulates the control word to set the 
rounding mode. If we could know we didn't care, we could reduce the code to:

fld   'd'           ; load the double value
fldcw gRZValue      ; some global variable containing the  round-to-zero mode
fistp 'v'           ; convert the double value to an integer 

If that could be inlined in place of the _ftol instruction, you'd save about 15 
instructions each time JSDOUBLE_IS_INT is called.
rogerl: awesome, thanks for that (I don't know x86 fpu instructions, but I have
a book at home).  More in a bit.

/be
r=jband. Any numbers on the artifical benchmarks to show that this makes much 
difference?
jband: see my posting the other day in m.jseng.  Nombas engine numbers on my
Dell 500 MHz:

Global looping speed:      341327.99344650254352 repetitions / second
Local looping speed:       721709.006928406539373 repetitions / second
Call empty function:       256410.256410256406525 repetitions / second
Call function:             67759.8590594931592932 repetitions / second
Calculation speed:         75063.8042335985664977 repetitions / second
Object manipulation speed: 2752.09158960810236749 repetitions / second
ECMA string speed:         7012.62272089761609095 repetitions / second
Total time for test:       100.957 seconds 

SpiderMonkey numbers without the patch in this bug:


Global looping speed:      300144.06915319356 repetitions / second
Local looping speed:       483870.96774193546 repetitions / second
Call empty function:       366533.085719871 repetitions / second
Call function:             24669.429642786657 repetitions / second
Calculation speed:         57188.608029280564 repetitions / second
Object manipulation speed: 2655.1962190005843 repetitions / second
ECMA string speed:         11248.59392575928 repetitions / second
Total time for test:       113.261 seconds 

Results with this bug's patch:

Global looping speed:      446561.47662994935 repetitions / second
Local looping speed:       855578.3709787816 repetitions / second
Call empty function:       450342.26011768944 repetitions / second
Call function:             367107.19530102785 repetitions / second
Calculation speed:         166500.1665001665 repetitions / second
Object manipulation speed: 3032.140691328078 repetitions / second
ECMA string speed:         11296.882060551288 repetitions / second
Total time for test:       67.518 seconds 

I am gonna look hard at rogerl's suggestion, but after checking in the current
patch.  jband, can I take your r= as an sr=, or should I get shaver to sr=?

/be
Cool!

Sure. sr=jband or get a shaver-sized scrutiny first. Your choice.

Checked in, but keeping this open for followup work to avoid ftol.

/be
Sorry I couldn't review in time to help.  Great work!  Wanna see if the Nombas 
guys are willing to re-run their benchmark?  =) We should get this in a 1.51 
release -- or 1.5rc3, though they make it sound like we've already released 1.5.

I'll try to be more responsive if further (binary-op) optimizations come 
through.
brendan, I had to back this change out to fix a blocker bug.  see #66545.
I can't believe I busted 'for (var i in o) ...;' loops.  Phil, the testsuite
seems not to test this case.  I broke it badly.  Say i is the first local var in
the function: the loop doesn't iterate i, it iterates the first non-argument,
non-local-varialbe name in the function!  i has value undefined and never gets
iterated over the enuemrable properties' ids.

The fix is to remove the check at the top of LookupArgOrVar that returns early
if pn->pn_slot >= 0 || pn->pn_op == JSOP_ARGUMENTS.  Because the parser already
specialized JSOP_NAME to JSOP_GETVAR and set pn_slot for all 'var i' parse trees
(including the 'var i' in 'for (var i in o)'), LookupArgOrVar will fail to
re-specialize from JSOP_FORNAME to JSOP_FORVAR.

I'm going to burn a reminder to test prefs UI every time before checking in any
scary JS changes.

The testsuite should cover this case several different ways, including testing
that the function converts to a string that evals to a function that has the
same output as the original.

/be
I will add coverage to the testsuite for these issues - 
I don't pretend to understand anything you wrote in your comments, Brendan :-), 
so maybe I missed it.
But has the memory usage also been reduced by this patch/other patches?
If you take a look at nombas page, they state that Mozilla's JS uses 3x as much 
memory as their engine does in this case.
Talking to shaver (it was just like old times, I was getting verklempt) helped
me remember why the last patch, which made LookupArgOrVar idempotent, broke
things.  In the old days, JS compiled bytecode in one pass (with a little code
reordering to handle for(;;i++) loops -- the i++ can't be emitted in source
order).  The JSOP_GETVAR for JSOP_NAME, etc. bytecode specializations were done
in the parser.

Then came exception handling, which required a tree-building parser and bytecode
emitting code generator, for two pass compilation.  Some time after that change,
bytecode specialization moved to the back end to handle 'function f(){alert(x);
var x = 42}; var x = 2; f()', which must alert 'undefined', not 2 or 42, per
ECMA and original JS.  But the specializations done while parsing a 'var'
statement never moved, and they cause pn_slot to be set non-negative before any
call to LookupArgOrVar.

With the bad patch that made LookupArgOrVar idempotent by having it return early
if pn_slot >= 0, this left the code generator's necessary attempt to specialize
the 'var i' in 'for (var i in o)...' from a JSOP_FORNAME bytecode to JSOP_FORVAR
bound to fail, because the parser had already set i's pn_slot to 0 (assuming i
is the first local variable).  If the loop body ('...') is 'print(i)', and print
is the first non-arg/var name, then JSOP_FORNAME will iterate 'print' and not
'i' over the properties of o!

I hope this is all clear.  I'm attaching a patch to the patch (i.e., to this
morning's trunk before sspitzer backed me out), for easiest review.

/be
I didn't measure memory use in the Nombas vs. Mozilla head-to-head, but see bug
33390 for a fix to compiler bloat that should help.  If anyone wants to measure
and file bugs (on Mozilla and/or Nombas! ;-), please feel free.  I'll try to
measure after bug 33390 has been fixed, and after all this cycle-time
optimization dust has settled.

/be
roger, shaver, jband: bless my change, I need your benedictions!

/be
{r,sr}=jband Obviously Correct :)
{sr,r}=shaver.

I'm with jband, and a little verklempt myself.
I am being very careful now!

New patch-to-patch coming up, this one properly specializes JSOP_SETCONST to
JSOP_SETVAR (constant initialization inside a function needs no specialized
bytecode: local consts are readonly local vars, so SETVAR suffices; what's more,
the JSOP_SETCONST bytecode works only at top-level, where a specialized variant
of JSOP_SETNAME is required to overcome the JSPROP_READONLY attribute).

It also reports a new error, if you try to redeclare a formal parameter as a
const inside the function to which the formal belongs.

/be
Patched patch checked in.  On to ftol elimination.

/be
If (shudder) something goes wrong, please call me (home or cell, lots of people
have those numbers, they're in the AOL phonebook).  I can't take another
backout!  Seriously, I don't expect any backout-worthy problems.  Now where's my
pasture to be put out to?

/be
Hi!

I ran the tests on my Celeron 550 with the latest build and these are my 
results:

Spidermonkey:

Global looping speed:      319013.18587834964 repetitions / second
Local looping speed:       662544.1696113074 repetitions / second
Call empty function:       353773.58490566036 repetitions / second
Call function:             294117.64705882355 repetitions / second
Calculation speed:         162337.66233766233 repetitions / second
Object manipulation speed: 2112.3785382340516 repetitions / second
ECMA string speed:         7645.25993883792 repetitions / second
Total time for test:       91.02 seconds

Nombas:

Global looping speed:      521195.27449617785 repetitions / second
Local looping speed:       1091703.0567685589 repetitions / second
Call empty function:       464396.28482972138 repetitions / second
Call function:             104602.51046025104 repetitions / second
Calculation speed:         79239.302694136291 repetitions / second
Object manipulation speed: 4440.4973357015988 repetitions / second
ECMA string speed:         8833.9222614840983 repetitions / second
Total time for test:       65.42 seconds

IE

Global looping speed:      343092.406221409 repetitions / second
Local looping speed:       629194.6308724832 repetitions / second
Call empty function:       171115.67419575635 repetitions / second
Call function:             13175.230566534914 repetitions / second
Calculation speed:         92936.8029739777 repetitions / second
Object manipulation speed: 2225.1891410769917 repetitions / second
ECMA string speed:         10706.638115631691 repetitions / second
Total time for test:       148.14 seconds

While we now perform better than IE, I can't measure the improvement Brendan did 
in his benchmarks. Is this because of the bug which was still in the version 
Brendan used? Checkin of fix for bug 33390 might also have something to do with 
it, although I'm not sure if they are already in the latest build.

On a side note: How much do these optimizations do for Real World Application 
(aka Mozilla)?
Christian: how are you building (debug or optimized), and what (js shell,
xpcshell, mozilla)?  Also, exactly how do you run the Nombas test?  I use the
samples/simple1 sample program.

/be
I downloaded build 2001012620, downloaded all the attachments from 
http://support.nombas.com/scripts/rightnow.exe/getfaq?11-000818-0003
and replaced the JS32.DLL with JS3250.DLL from the nightly.
I then used the commands described in the readme from Nombas, ie.

jsshell32 speed.jse
secon32 speed.jse
cscript //nologo speed.jse

I made two test cycles, both gave me similar results.
I'm running on Linux, but that shouldn't help my numbers.  I am building the JS
shell from source, optimized.  Is it possible the jsshell32.exe you're running
is DEBUG?  Can you pull mozilla/js/src from cvs.mozilla.org and try making an
optimized shell yourself, using Release rather than Debug configuration?

/be
I built js from todays source with VC6 and nspr4 with nmake - both optimized.

Still no difference.
Perhaps someone else on the CC list using Win98 could try?
Ok. Now I'm annoyed! I thought I'd be nice and compare a current build with a 
timestamp pull from last week. After building I see it abort silenting in 
jsshell both debug and release builds. With some trial and error print debugging 
in the speed test I see that it is aborting in the first loop test after j == 
4194304 but before the next loop body. Now 4194304 == 0x400000 and that looks 
interesting. I traced through the interpreter loop a bit and saw it bail in a 
JSOP_GOTO. It turns out that js.c sets a branch callback that looks like:

    if ((++branch_count & 0x3fffff) == 0)
         return JS_FALSE;
    return JS_TRUE;

I expect this is here to stop errant tests or something. But there is not even a 
printf or JS_ReportError or anything. Just silent abort of the script. I'm 
thinking this is not doing anyone any good. What do you users of jsshell want 
out of this callback?
Here is what I get on a 400 mhz PII running NT4 using:
  NMAKE /f "js.mak" CFG="jsshell - Win32 Release"
  Release\jsshell.exe speed.jse

With the current tree:

Global looping speed:      351765.8646404953 repetitions / second
Local looping speed:       674096.7104080532 repetitions / second
Call empty function:       354274.9173358526 repetitions / second
Call function:             283607.48723766307 repetitions / second
Calculation speed:         177683.01350390902 repetitions / second
Object manipulation speed: 1786.9907076483203 repetitions / second
ECMA string speed:         8448.800270361608 repetitions / second
Total time for test:       92.102 seconds

With a timestamp pull using cvs up -D 2001-1-23 :

Global looping speed:      254824.68061973364 repetitions / second
Local looping speed:       381718.2410423453 repetitions / second
Call empty function:       311655.9318512362 repetitions / second
Call function:             13101.008777675881 repetitions / second
Calculation speed:         41538.58934950569 repetitions / second
Object manipulation speed: 1613.1634134537828 repetitions / second
ECMA string speed:         8594.018563080097 repetitions / second
Total time for test:       160.16 seconds

This is similar to what brendan sees on Linux. The only change I made was to 
make my_BranchCallback never return JS_FALSE.
jband: thanks, and sorry for not mentioning that I too had to comment out the
branch callback (I did it a while ago and forgot).  I think the JS shell should
not even set that callback unless given an option.  I'll come up with a patch
and attach it here.

christian: you mention nspr4 -- are you compiling using js.mak, or makefile.win?
If the latter, perhaps JS_THREADSAFE is costing us too much again (although I
sweated bullets to make thread-safety nearly free, see bug 54743).  To verify,
how about trying js.mak?  Or compare xpcshell to jsshell?

/be
Brendan, if the JS shell user sets a branch limit, will there be any 
warning when the limit is attained? Or will it fail silently? 
If you are going to go to the trouble to fix up this branch callback stuff then 
it would be good to output a message when it bails (as Phil noted). Also, it is 
odd that gBranchCount never gets reset - maybe it should get set to 0 in 
Process?
with or without those changes... sr=jband. Thanks.
I reset gBranchCount and output a message to gErrFile blaming the script's file
and first line.

/be
I used js.mak for JS and makefile.win for NSPR4.

I made the same comparison as jband did with the tree from 23.1.2001. My results 
were about the same, but with a faster PC.
I just saw that my DLLs are 2x as large as the downloaded ones, although I'm 
definitely building optimized. (No MOZ_DEBUG set and it builds in 
dist/WIN32_O.OBJ). Is there any other optimization setting than this?
If you're building the JS shell, you don't need NSPR at all.  If you are
building an optimized ("Release") JS shell on windows, the .exe ends up in the
Release subdirectory of js/src, IIRC.  Something's amiss; I still suspect you
are building a debug shell and perhaps uselessly linking it with an optimized
NSPR library.

/be
Christian, any news?

I'm moving the just-the-conversion-part-of-ftol inlining that rogerl suggested
to mozilla0.9.  0.8 has most of the fixings for this bug.

/be
Keywords: mozilla0.8mozilla0.9
Target Milestone: mozilla0.8 → mozilla0.9
I upgraded my PC and unfortunately, JSSHELL has become somewhat weird after 
that.
I compiled exactly the jband did with
NMAKE /f "js.mak" CFG="jsshell - Win32 Release"
and used the JSSHELL and JS32.DLL out of the release-directory.
When I now start the benchmark, it says:
"starting test."
Then it sits there for a moment and after 10 secs or so it just exits, without 
giving any further info. Is there a way to get some more debug-output without 
changing the code of the benchmark?

XPCSHELL from one of the Mozilla nightlies works fine, though.
Christian: I'm betting you don't have the newest js.c with brendan's 
branchCallback changes mentioned earlier in this bug.
After a fresh pull it worked, but the results haven't changed:
Nombas:
Global looping speed:      1050420.1680672269 repetitions / second
Local looping speed:       2010723.8605898123 repetitions / second
Call empty function:       881316.09870740306 repetitions / second
Call function:             221238.93805309737 repetitions / second
Calculation speed:         159744.40894568691 repetitions / second
Object manipulation speed: 8849.5575221238923 repetitions / second
ECMA string speed:         18518.518518518518 repetitions / second
Total time for test:       33.12 seconds

Today's Seamonkey:
Global looping speed:      505731.6250842886 repetitions / second
Local looping speed:       1111111.111111111 repetitions / second
Call empty function:       601443.4643143545 repetitions / second
Call function:             480769.23076923075 repetitions / second
Calculation speed:         292397.6608187135 repetitions / second
Object manipulation speed: 3138.7319522912744 repetitions / second
ECMA string speed:         16556.29139072848 repetitions / second
Total time for test:       55.75 seconds

Note that this is on a Thunderbird 1 Ghz.

JS32.DLL is 442.368 bytes,
JSSHELL.EXE is 20.480 bytes.
Christian: It is not clear to me that what your numbers show that "the results
haven't changed". You show old numbers on a machine different from the one you
are using now. Abd the new numbers only compare SM and Nombus. We are looking
for SM improvements in the specific areas where the code has been changed -
integers used in loop counters, etc. Can you compare a recent SpiderMonkey build
with an old one on the same machine? That is what I did and saw real
improvement. Also, are you sure you are really using your newly built dll when
you run? If you have path issues the exe might possibly load the old one without
your knowing it.

The file size of 442.368 bytes you show for the dll looks odd to me. In a build
today I get:

> Release:
>  2/19/01  14:49         335,872  js32.dll
>  2/19/01  14:49          20,480  jsshell.exe
> 
> Debug:
>  2/19/01  14:46         696,375  js32.dll
>  2/19/01  14:46          45,114  jsshell.exe
My point was that according to Brendan's first test, the patched JS should be 
faster than Nombas by ~33%, while on my PC, it was still slower (although the 
patch certainly made things faster).
But you're probably correct when you say that something is wrong with my build. 
When I build Mozilla myself, it's about 1.5 times as large as the nightlies.
It's definitely not a debug-build I'm doing, but something is wrong. Could this 
be because I'm on 98 and not NT?
Are we interested in optimizing the addition and subtraction cases as well? 
They seem relatively straightforward, at first glance.
This one is good for kenton -- what's his bugzilla id?  I can't find one by
searching for his first name.  Reassigning to beard to pass along.

/be
Assignee: brendan → beard
Status: ASSIGNED → NEW
Keywords: mozilla0.9mozilla0.9.1
Target Milestone: mozilla0.9 → mozilla0.9.1
Kenton, sorry to dump such a long bug on you.  See rogerl's comments dated
2001-01-24 14:57 and let me know if I can help.

/be
Assignee: beard → khanson
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Moving TFV.
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Target Milestone: mozilla0.9.4 → mozilla0.9.5
I'm reading a book now that has an interesting tidbit about float->int
conversion (Game Programming Gems II; the article is "Floating-Point Tricks:
Improving Performance with IEEE Floating Point" by Yossarian King).

The trick in question manages to get "(int)f" from 60 cycles on a P2 down to
just 5 -- faster, he reports, than using the FPU without changing the rounding
mode.  I'm not sure if we can massage it into working for our double-precision
numbers, but I thought I'd throw it out for consideration by our IEEE gurus.

typedef union {
    int   i;
    float f;
} IntOrFloat;

IntOrFloat bias; // magic bias value to force the bits around nicely
bias.i = (23 + 127) << 23;

#define FAST_FTOI(f, i)
{
    IntOrFloat n; n.f = f;
    n.f += bias.f;
    n.i -= bias.i;
    i = n.i;
}

"Why does this work?  Adding 1 x 2**23 as a floating-point number pushes the
mantissa into the lower 23 bits, setting the exponent to a known value (23 +
127).  Subtracting the known exponent as an integer removes these unwanted upper
bits, leaving the desired integer in the low bits of the result."

(For more detail, including diagrams, people should probably read the book.)

Here's what I get with gcc 2.96-85 on -O2:
    int i;
    IntOrFloat n;
    n.f = f;
    n.f += bias.f;
 8048614:       d9 05 4c 98 04 08       flds   0x804984c
 804861a:       d8 45 08                fadds  0x8(%ebp)
 804861d:       d9 5d fc                fstps  0xfffffffc(%ebp)
 8048620:       8b 55 fc                mov    0xfffffffc(%ebp),%edx
 8048623:       89 d0                   mov    %edx,%eax
    n.i -= bias.i;
 8048625:       2b 05 4c 98 04 08       sub    0x804984c,%eax
And for the simple cast:

 8048636:       d9 45 08                flds   0x8(%ebp)
    int i = (int)f;
 8048639:       d9 7d fc                fnstcw 0xfffffffc(%ebp)
 804863c:       8b 55 fc                mov    0xfffffffc(%ebp),%edx
 804863f:       c6 45 fd 0c             movb   $0xc,0xfffffffd(%ebp)
 8048643:       d9 6d fc                fldcw  0xfffffffc(%ebp)
 8048646:       89 55 fc                mov    %edx,0xfffffffc(%ebp)
 8048649:       db 5d f8                fistpl 0xfffffff8(%ebp)
 804864c:       d9 6d fc                fldcw  0xfffffffc(%ebp)
 804864f:       8b 45 f8                mov    0xfffffff8(%ebp),%eax
 It will take a better man than me to count the execution cycles for those
(taking into account, as the author suggests, integer-vs-fp pipelining
behaviours), but it does seem to work well for floats.
This trick works for positive numbers less that 2^23.  However, to be robust, one 
would have to add logic to handle 1) negative values, 2) magnitudes greater then 
2^23, 3) infinities and 4) nans.  Also the rounding direction may cause 
unanticipated behavior. Typically floats are truncated when converted to 
integers. The default rounding mode would round the result to nearest. When all 
these special cases are included the speed up may not be so great. BTW the 
algorithm works for 64-bit integers and doubles.

Tricks like this are typically used in game software for pixel graphics 
calculations where the float is known to be a positive number less than 2^23 and 
the desired rounding is round to nearest
This bug is marked 0.9.5, and that time is now.

Does anybody think that any of these patches are important to land now?

If not , let's mark this 0.9.7, which is the next milestone for performance
fixes.


     
steve. all patches in this bug have landed, early this year.  Where did you read
that 0.9.7 is the next milestone for performance fixes?  I don't see that in the
Mozilla roadmap.

/be
Brendan:

I'm not sure if I've presumed too much, or not. This was from a 8/29 e-mail from
Steve Elmer, the subject of which was "Mach V Roadmap *1st DRAFT*", and was
asking for buy-in. Upon re-reading, it also says

" Note:  Mozilla feedback is still pending.  The milestones represented here are
projections that have not been confirmed by Mozilla ... I'm still working on
lining this schedule up against the Mozilla 1.0 effort, it looks like we're
pretty close based on Brendan's recent email."

I have a vague, unsubstantiated, and hopefully wrong feeling that I have
unwittingly stepped into something ...(hint: four letters, and it begins with an
`s') I'm just trying to do with right thing with whatever information I have,
however old or unfinalized that information may be, and am willing to be
educated on whatever scheduling information that I do need ...

Thanks for the other info. If all patches are landed, should this bug be marked
as fixed? Or do we want to keep this as a performance catch-all? (If so, which
target milestone should we mark it for?)

Steve
This bug is overlong; you guys should feel free to file a sequel bug.  However,
as I morphed it, it is not a catch-all: it asks for the interpreter to use the
integer ALU and avoid hitting the FPU, where possible.  Currently, SpiderMonkey
uses doubles freely, for simplicity.

/be
Target Milestone: mozilla0.9.5 → mozilla0.9.6
retargeting for future.
Target Milestone: mozilla0.9.6 → mozilla0.9.7
This doesn't belong in 9.7, 9.8 or 1.0
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.7 → Future
Blocks: 149801
Blocks: 156155
No longer blocks: 149801
Assignee: khanson → general
Status: ASSIGNED → NEW
QA Contact: pschwartau → general
Target Milestone: Future → ---
I believe this is fixed... brendan can you reopen if i'm missing something here?
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Yes, this is fixed by the JITs and fatvals.

/be
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: