Closed Bug 61898 Opened 25 years ago Closed 15 years ago

JS number ops not optimized for integer domain

Tracking

()

Status:

VERIFIED FIXED

People

(Reporter: brendan, Unassigned)

References

Details

(Keywords: js1.5)

Attachments

(15 files)

optimize: common ops for JSVAL_IS_INT domain; arguments uses; useless expr elim. 25 years ago Brendan Eich [:brendan] 82.86 KB, patch		Details \| Diff \| Splinter Review
better patch fixes useless expr elim., cleans up a few odds and ends 25 years ago Brendan Eich [:brendan] 83.15 KB, patch		Details \| Diff \| Splinter Review
diff -wu of jsinterp.c, for easier reviewing 25 years ago Brendan Eich [:brendan] 29.78 KB, patch		Details \| Diff \| Splinter Review
jsemit.c uses js_ReportCompileErrorNumber too for standard file/line reporting 25 years ago Brendan Eich [:brendan] 113.55 KB, patch		Details \| Diff \| Splinter Review
unify nonint_increment cases via a macro, avoiding redundant js_NewDoubleValue calls in the arg/var prefix-operator cases 25 years ago Brendan Eich [:brendan] 114.13 KB, patch		Details \| Diff \| Splinter Review
diff -wu of jsinterp.c, again for easier reviewing 25 years ago Brendan Eich [:brendan] 31.00 KB, patch		Details \| Diff \| Splinter Review
fixed function f(x){eval('arguments=x');return arguments[0]} and similar cases 25 years ago Brendan Eich [:brendan] 118.10 KB, patch		Details \| Diff \| Splinter Review
cvs diff -wu of last patch's jsinterp.c 25 years ago Brendan Eich [:brendan] 32.91 KB, patch		Details \| Diff \| Splinter Review
patch to last patch's jsemit.c, fixes JSOP_ARGUMENTS blunder 25 years ago Brendan Eich [:brendan] 499 bytes, patch		Details \| Diff \| Splinter Review
fixed patch (jsemit.c two line change). 25 years ago Brendan Eich [:brendan] 118.09 KB, patch		Details \| Diff \| Splinter Review
diff -wu of all the modified files, for readability 25 years ago Brendan Eich [:brendan] 96.53 KB, patch		Details \| Diff \| Splinter Review
jsemit.c and jsparse.c patch to last patch, should be all better 25 years ago Brendan Eich [:brendan] 2.07 KB, patch		Details \| Diff \| Splinter Review
my final answer! patch to last full patch (js.msg jsemit.c jsparse.c) 25 years ago Brendan Eich [:brendan] 5.91 KB, patch		Details \| Diff \| Splinter Review
whitespace cleanup in jsparse.c, otherwise unchanged from last patch to patch 25 years ago Brendan Eich [:brendan] 5.86 KB, patch		Details \| Diff \| Splinter Review
js.c patch to make branch callback optional, and to parameterize its limit 25 years ago Brendan Eich [:brendan] 2.65 KB, patch		Details \| Diff \| Splinter Review

Brendan Eich [:brendan]

Reporter

Description

•

25 years ago

The trick is testing whether an op will overflow and require a double, and test cheaply enough that it's worth the int speedup and the branch penalties. See bug 54743, the early comments. /be

Brendan Eich [:brendan]

Reporter

Updated

•

25 years ago

Status: NEW → ASSIGNED

Keywords: js1.5, mozilla0.8

Target Milestone: --- → mozilla0.8

Brendan Eich [:brendan]

Reporter

Comment 1

•

25 years ago

I'm likely to morph this bug into a JS performance catch-all, or (if I get more scrupulous) file a bunch of bugs that the next patch fixes. This patch brings SpiderMonkey performance to the front rank in nombas.com's (questionable, to say the least) benchmark mentioned at http://support.nombas.com/scripts/rightnow.exe/getfaq?11-000818-0003. /be

Brendan Eich [:brendan]

Reporter

Comment 2

•

25 years ago

Attached patch optimize: common ops for JSVAL_IS_INT domain; arguments uses; useless expr elim. — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 3

•

25 years ago

Oh, one more optimization, long overdue (and I wish I had let JSClass hooks be stubbed by NULL, as that's even faster to test than == JS_PropertyStub): getter and setter calls to JS_PropertyStub are avoided, to preserve any caller-saved registers and pay for a branch rather than a function call. The last patch broke XBL calls to JS_EvaluateUCScriptForPrincipals when cx->fp was non-null. Better patch coming up. /be

Brendan Eich [:brendan]

Reporter

Comment 4

•

25 years ago

Attached patch better patch fixes useless expr elim., cleans up a few odds and ends — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 5

•

25 years ago

Attached patch diff -wu of jsinterp.c, for easier reviewing — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 6

•

25 years ago

Probably another patch coming, to fix the error reporting in jsemit.c (other than JS_ReportOutOfMemory) to cite filename and lineno in an error report struct. There are two messages in js.msg that use "{0}, line {1}: " to imitate parse-time errors reported by js_ReportCompileErrorNumber. The better fix is to make js_ReportCompileErrorNumber take either a ts, or a cg, and get filename and lineno from whichever is not null. Then user-defined error reporters receive report structs with useful filename and lineno members. /be

Brendan Eich [:brendan]

Reporter

Comment 7

•

25 years ago

Attached patch jsemit.c uses js_ReportCompileErrorNumber too for standard file/line reporting — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 8

•

25 years ago

Attached patch unify nonint_increment cases via a macro, avoiding redundant js_NewDoubleValue calls in the arg/var prefix-operator cases — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 9

•

25 years ago

Attached patch diff -wu of jsinterp.c, again for easier reviewing — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 10

•

25 years ago

Attached patch fixed function f(x){eval('arguments=x');return arguments[0]} and similar cases — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 11

•

25 years ago

Attached patch cvs diff -wu of last patch's jsinterp.c — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 12

•

25 years ago

The JSOP_ARGUMENTS, JSOP_ARGSUB, and JSOP_ARGCNT bytecodes optimize uses (but not sets) of arguments, arguments[i], and arguments.length, respectively. But if someone assigns to arguments in a function, that predefined local variable binds to a new value (the right-hand side of the assignment). Formerly, the compiler looked for such assignments and set TCF_FUN_SETS_ARGS in tc->flags (also setting TCF_FUN_HEAVYWEIGHT, so that activations of the function get a Call object). But such compile-time analysis couldn't catch eval(someStr) where someStr at run-time was set to 'arguments=42' or 'arguments=[1,2,3]'. What's more, Call object setter code in jsfun.c knows to set an override bit in fp->overrides for this case, and the getter cooperates. So the previous patch in this bug fixed the runtime code in jsinterp.c for JSOP_ARGUMENTS to check whether arguments was overridden. If so, it got the value (object or not!) from fp->callobj. If not it creates (if necessary) and gets fp->argsobj. The same drill is needed for JSOP_ARGSUB and JSOP_ARGCNT, and the latest patch fixes these cases, and eliminates the insufficient (because of potential eval) TCF_FUN_SETS_ARGS flags, simplifying the compiler slightly. Please review, this is all but done. /be

Brendan Eich [:brendan]

Reporter

Comment 13

•

25 years ago

Attached patch patch to last patch's jsemit.c, fixes JSOP_ARGUMENTS blunder — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 14

•

25 years ago

Attached patch fixed patch (jsemit.c two line change). — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 15

•

25 years ago

Ok, now I'm done (forgot to rerun the testsuite after the last change, which by removing the TCF_FUN_SETS_ARGS flag "optimized" 'arguments = 42' into a JSOP_ARGUMENTS call, which left the stack unbalanced high). /be

Brendan Eich [:brendan]

Reporter

Comment 16

•

25 years ago

To recap: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=23334 is the full diff -u patch; http://bugzilla.mozilla.org/showattachment.cgi?attach_id=23333 is the two-liner relative to the previous patch's jsemit.c, showing the JSOP_ARGUMENTS specialization fix; and http://bugzilla.mozilla.org/showattachment.cgi?attach_id=23325 is the still-valid diff -wu of jsinterp.c, where -w helps show the significant changes (a lot of local macros got JS_BEGIN/END_MACRO brackets for consistency and slight future-proofing). Still looking for r= from rogerl and/or jband, then I'll mail shaver and cc: reviewers@mozilla.org. /be

rogerl (gone)

Comment 17

•

25 years ago

I read it thru, applied it on Windows and ran the test suite. Everything looked fine. I'd prefer it if jband r='d also since this is so big. Did you consider detecting integer binary operands and performing an integer operation, falling back to double only on overflow? Seems a waste to change reportCompilerError only to pass NULL most of the time?

Brendan Eich [:brendan]

Reporter

Comment 18

•

25 years ago

>Did you consider detecting integer binary operands and performing an integer >operation, falling back to double only on overflow? That's what the patch does, for ++ and --. It's harder (not by much for +) to do the same for the binary operators, but it could be done. Are you doing any such for JS2? >Seems a waste to change reportCompilerError only to pass NULL most of the time? Yes, but I swag the code bloat of that extra argument as small, and the cycle cost penalizes only error cases. We could abstract file/line into some common "base class" (poor man's MI in C using nested structs) but we would still want to pass ts for the line buffer. Unless we abstracted that too, and made the cg-derivation get the line, somehow. Hmm. For now I'd like to get this in, but I may file a bug asking for line buffer excerpting for error reports from the code generator. jband, can you r=? Thanks. /be

rogerl (gone)

Comment 19

•

25 years ago

> Are you doing any such for JS2? Well, sort of - when we can infer or know the type. In the general case our current prototype doesn't do anything different. By the way, I stepped through the assembly for JSOP_SUB, and the typical path looks pretty code (even in the debug build). The only sad part is the call to _ftol ( which is generated by '(jsint)d' ). This routine is very careful to handle pending FP exceptions as it manipulates the control word to set the rounding mode. If we could know we didn't care, we could reduce the code to: fld 'd' ; load the double value fldcw gRZValue ; some global variable containing the round-to-zero mode fistp 'v' ; convert the double value to an integer If that could be inlined in place of the _ftol instruction, you'd save about 15 instructions each time JSDOUBLE_IS_INT is called.

Brendan Eich [:brendan]

Reporter

Comment 20

•

25 years ago

Attached patch diff -wu of all the modified files, for readability — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 21

•

25 years ago

rogerl: awesome, thanks for that (I don't know x86 fpu instructions, but I have a book at home). More in a bit. /be

John Bandhauer

Comment 22

•

25 years ago

r=jband. Any numbers on the artifical benchmarks to show that this makes much difference?

Brendan Eich [:brendan]

Reporter

Comment 23

•

25 years ago

jband: see my posting the other day in m.jseng. Nombas engine numbers on my Dell 500 MHz: Global looping speed: 341327.99344650254352 repetitions / second Local looping speed: 721709.006928406539373 repetitions / second Call empty function: 256410.256410256406525 repetitions / second Call function: 67759.8590594931592932 repetitions / second Calculation speed: 75063.8042335985664977 repetitions / second Object manipulation speed: 2752.09158960810236749 repetitions / second ECMA string speed: 7012.62272089761609095 repetitions / second Total time for test: 100.957 seconds SpiderMonkey numbers without the patch in this bug: Global looping speed: 300144.06915319356 repetitions / second Local looping speed: 483870.96774193546 repetitions / second Call empty function: 366533.085719871 repetitions / second Call function: 24669.429642786657 repetitions / second Calculation speed: 57188.608029280564 repetitions / second Object manipulation speed: 2655.1962190005843 repetitions / second ECMA string speed: 11248.59392575928 repetitions / second Total time for test: 113.261 seconds Results with this bug's patch: Global looping speed: 446561.47662994935 repetitions / second Local looping speed: 855578.3709787816 repetitions / second Call empty function: 450342.26011768944 repetitions / second Call function: 367107.19530102785 repetitions / second Calculation speed: 166500.1665001665 repetitions / second Object manipulation speed: 3032.140691328078 repetitions / second ECMA string speed: 11296.882060551288 repetitions / second Total time for test: 67.518 seconds I am gonna look hard at rogerl's suggestion, but after checking in the current patch. jband, can I take your r= as an sr=, or should I get shaver to sr=? /be

John Bandhauer

Comment 24

•

25 years ago

Cool! Sure. sr=jband or get a shaver-sized scrutiny first. Your choice.

Brendan Eich [:brendan]

Reporter

Comment 25

•

25 years ago

Checked in, but keeping this open for followup work to avoid ftol. /be

Mike Shaver (:shaver emeritus)

Comment 26

•

25 years ago

Sorry I couldn't review in time to help. Great work! Wanna see if the Nombas guys are willing to re-run their benchmark? =) We should get this in a 1.51 release -- or 1.5rc3, though they make it sound like we've already released 1.5. I'll try to be more responsive if further (binary-op) optimizations come through.

(not reading, please use seth@sspitzer.org instead)

Comment 27

•

25 years ago

brendan, I had to back this change out to fix a blocker bug. see #66545.

Brendan Eich [:brendan]

Reporter

Comment 28

•

25 years ago

I can't believe I busted 'for (var i in o) ...;' loops. Phil, the testsuite seems not to test this case. I broke it badly. Say i is the first local var in the function: the loop doesn't iterate i, it iterates the first non-argument, non-local-varialbe name in the function! i has value undefined and never gets iterated over the enuemrable properties' ids. The fix is to remove the check at the top of LookupArgOrVar that returns early if pn->pn_slot >= 0 || pn->pn_op == JSOP_ARGUMENTS. Because the parser already specialized JSOP_NAME to JSOP_GETVAR and set pn_slot for all 'var i' parse trees (including the 'var i' in 'for (var i in o)'), LookupArgOrVar will fail to re-specialize from JSOP_FORNAME to JSOP_FORVAR. I'm going to burn a reminder to test prefs UI every time before checking in any scary JS changes. The testsuite should cover this case several different ways, including testing that the function converts to a string that evals to a function that has the same output as the original. /be

Phil Schwartau

Comment 29

•

25 years ago

I will add coverage to the testsuite for these issues -

Christian Mattar

Comment 30

•

25 years ago

I don't pretend to understand anything you wrote in your comments, Brendan :-), so maybe I missed it. But has the memory usage also been reduced by this patch/other patches? If you take a look at nombas page, they state that Mozilla's JS uses 3x as much memory as their engine does in this case.

Brendan Eich [:brendan]

Reporter

Comment 31

•

25 years ago

Talking to shaver (it was just like old times, I was getting verklempt) helped me remember why the last patch, which made LookupArgOrVar idempotent, broke things. In the old days, JS compiled bytecode in one pass (with a little code reordering to handle for(;;i++) loops -- the i++ can't be emitted in source order). The JSOP_GETVAR for JSOP_NAME, etc. bytecode specializations were done in the parser. Then came exception handling, which required a tree-building parser and bytecode emitting code generator, for two pass compilation. Some time after that change, bytecode specialization moved to the back end to handle 'function f(){alert(x); var x = 42}; var x = 2; f()', which must alert 'undefined', not 2 or 42, per ECMA and original JS. But the specializations done while parsing a 'var' statement never moved, and they cause pn_slot to be set non-negative before any call to LookupArgOrVar. With the bad patch that made LookupArgOrVar idempotent by having it return early if pn_slot >= 0, this left the code generator's necessary attempt to specialize the 'var i' in 'for (var i in o)...' from a JSOP_FORNAME bytecode to JSOP_FORVAR bound to fail, because the parser had already set i's pn_slot to 0 (assuming i is the first local variable). If the loop body ('...') is 'print(i)', and print is the first non-arg/var name, then JSOP_FORNAME will iterate 'print' and not 'i' over the properties of o! I hope this is all clear. I'm attaching a patch to the patch (i.e., to this morning's trunk before sspitzer backed me out), for easiest review. /be

Brendan Eich [:brendan]

Reporter

Comment 32

•

25 years ago

I didn't measure memory use in the Nombas vs. Mozilla head-to-head, but see bug 33390 for a fix to compiler bloat that should help. If anyone wants to measure and file bugs (on Mozilla and/or Nombas! ;-), please feel free. I'll try to measure after bug 33390 has been fixed, and after all this cycle-time optimization dust has settled. /be

Brendan Eich [:brendan]

Reporter

Comment 33

•

25 years ago

Attached patch jsemit.c and jsparse.c patch to last patch, should be all better — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 34

•

25 years ago

roger, shaver, jband: bless my change, I need your benedictions! /be

John Bandhauer

Comment 35

•

25 years ago

{r,sr}=jband Obviously Correct :)

Mike Shaver (:shaver emeritus)

Comment 36

•

25 years ago

{sr,r}=shaver. I'm with jband, and a little verklempt myself.

Brendan Eich [:brendan]

Reporter

Comment 37

•

25 years ago

I am being very careful now! New patch-to-patch coming up, this one properly specializes JSOP_SETCONST to JSOP_SETVAR (constant initialization inside a function needs no specialized bytecode: local consts are readonly local vars, so SETVAR suffices; what's more, the JSOP_SETCONST bytecode works only at top-level, where a specialized variant of JSOP_SETNAME is required to overcome the JSPROP_READONLY attribute). It also reports a new error, if you try to redeclare a formal parameter as a const inside the function to which the formal belongs. /be

Brendan Eich [:brendan]

Reporter

Comment 38

•

25 years ago

Attached patch my final answer! patch to last full patch (js.msg jsemit.c jsparse.c) — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 39

•

25 years ago

Attached patch whitespace cleanup in jsparse.c, otherwise unchanged from last patch to patch — Details — Splinter Review

Brendan Eich [:brendan]

Reporter

Comment 40

•

25 years ago

Patched patch checked in. On to ftol elimination. /be

Brendan Eich [:brendan]

Reporter

Comment 41

•

25 years ago

If (shudder) something goes wrong, please call me (home or cell, lots of people have those numbers, they're in the AOL phonebook). I can't take another backout! Seriously, I don't expect any backout-worthy problems. Now where's my pasture to be put out to? /be

Christian Mattar

Comment 42

•

25 years ago

Hi! I ran the tests on my Celeron 550 with the latest build and these are my results: Spidermonkey: Global looping speed: 319013.18587834964 repetitions / second Local looping speed: 662544.1696113074 repetitions / second Call empty function: 353773.58490566036 repetitions / second Call function: 294117.64705882355 repetitions / second Calculation speed: 162337.66233766233 repetitions / second Object manipulation speed: 2112.3785382340516 repetitions / second ECMA string speed: 7645.25993883792 repetitions / second Total time for test: 91.02 seconds Nombas: Global looping speed: 521195.27449617785 repetitions / second Local looping speed: 1091703.0567685589 repetitions / second Call empty function: 464396.28482972138 repetitions / second Call function: 104602.51046025104 repetitions / second Calculation speed: 79239.302694136291 repetitions / second Object manipulation speed: 4440.4973357015988 repetitions / second ECMA string speed: 8833.9222614840983 repetitions / second Total time for test: 65.42 seconds IE Global looping speed: 343092.406221409 repetitions / second Local looping speed: 629194.6308724832 repetitions / second Call empty function: 171115.67419575635 repetitions / second Call function: 13175.230566534914 repetitions / second Calculation speed: 92936.8029739777 repetitions / second Object manipulation speed: 2225.1891410769917 repetitions / second ECMA string speed: 10706.638115631691 repetitions / second Total time for test: 148.14 seconds While we now perform better than IE, I can't measure the improvement Brendan did in his benchmarks. Is this because of the bug which was still in the version Brendan used? Checkin of fix for bug 33390 might also have something to do with it, although I'm not sure if they are already in the latest build. On a side note: How much do these optimizations do for Real World Application (aka Mozilla)?

Brendan Eich [:brendan]

Reporter

Comment 43

•

25 years ago

Christian: how are you building (debug or optimized), and what (js shell, xpcshell, mozilla)? Also, exactly how do you run the Nombas test? I use the samples/simple1 sample program. /be

Christian Mattar

Comment 44

•

25 years ago

I downloaded build 2001012620, downloaded all the attachments from http://support.nombas.com/scripts/rightnow.exe/getfaq?11-000818-0003 and replaced the JS32.DLL with JS3250.DLL from the nightly. I then used the commands described in the readme from Nombas, ie. jsshell32 speed.jse secon32 speed.jse cscript //nologo speed.jse I made two test cycles, both gave me similar results.

Brendan Eich [:brendan]

Reporter

Comment 45

•

25 years ago

I'm running on Linux, but that shouldn't help my numbers. I am building the JS shell from source, optimized. Is it possible the jsshell32.exe you're running is DEBUG? Can you pull mozilla/js/src from cvs.mozilla.org and try making an optimized shell yourself, using Release rather than Debug configuration? /be

Christian Mattar

Comment 46

•

25 years ago

I built js from todays source with VC6 and nspr4 with nmake - both optimized. Still no difference. Perhaps someone else on the CC list using Win98 could try?

John Bandhauer

Comment 47

•

25 years ago

Ok. Now I'm annoyed! I thought I'd be nice and compare a current build with a timestamp pull from last week. After building I see it abort silenting in jsshell both debug and release builds. With some trial and error print debugging in the speed test I see that it is aborting in the first loop test after j == 4194304 but before the next loop body. Now 4194304 == 0x400000 and that looks interesting. I traced through the interpreter loop a bit and saw it bail in a JSOP_GOTO. It turns out that js.c sets a branch callback that looks like: if ((++branch_count & 0x3fffff) == 0) return JS_FALSE; return JS_TRUE; I expect this is here to stop errant tests or something. But there is not even a printf or JS_ReportError or anything. Just silent abort of the script. I'm thinking this is not doing anyone any good. What do you users of jsshell want out of this callback?

John Bandhauer

Comment 48

•

25 years ago

Here is what I get on a 400 mhz PII running NT4 using: NMAKE /f "js.mak" CFG="jsshell - Win32 Release" Release\jsshell.exe speed.jse With the current tree: Global looping speed: 351765.8646404953 repetitions / second Local looping speed: 674096.7104080532 repetitions / second Call empty function: 354274.9173358526 repetitions / second Call function: 283607.48723766307 repetitions / second Calculation speed: 177683.01350390902 repetitions / second Object manipulation speed: 1786.9907076483203 repetitions / second ECMA string speed: 8448.800270361608 repetitions / second Total time for test: 92.102 seconds With a timestamp pull using cvs up -D 2001-1-23 : Global looping speed: 254824.68061973364 repetitions / second Local looping speed: 381718.2410423453 repetitions / second Call empty function: 311655.9318512362 repetitions / second Call function: 13101.008777675881 repetitions / second Calculation speed: 41538.58934950569 repetitions / second Object manipulation speed: 1613.1634134537828 repetitions / second ECMA string speed: 8594.018563080097 repetitions / second Total time for test: 160.16 seconds This is similar to what brendan sees on Linux. The only change I made was to make my_BranchCallback never return JS_FALSE.

Brendan Eich [:brendan]

Reporter

Comment 49

•

25 years ago

jband: thanks, and sorry for not mentioning that I too had to comment out the branch callback (I did it a while ago and forgot). I think the JS shell should not even set that callback unless given an option. I'll come up with a patch and attach it here. christian: you mention nspr4 -- are you compiling using js.mak, or makefile.win? If the latter, perhaps JS_THREADSAFE is costing us too much again (although I sweated bullets to make thread-safety nearly free, see bug 54743). To verify, how about trying js.mak? Or compare xpcshell to jsshell? /be

Brendan Eich [:brendan]

Reporter

Comment 50

•

25 years ago

Attached patch js.c patch to make branch callback optional, and to parameterize its limit — Details — Splinter Review

Phil Schwartau

Comment 51

•

25 years ago

Brendan, if the JS shell user sets a branch limit, will there be any warning when the limit is attained? Or will it fail silently?

John Bandhauer

Comment 52

•

25 years ago

If you are going to go to the trouble to fix up this branch callback stuff then it would be good to output a message when it bails (as Phil noted). Also, it is odd that gBranchCount never gets reset - maybe it should get set to 0 in Process?

John Bandhauer

Comment 53

•

25 years ago

with or without those changes... sr=jband. Thanks.

Brendan Eich [:brendan]

Reporter

Comment 54

•

25 years ago

I reset gBranchCount and output a message to gErrFile blaming the script's file and first line. /be

Christian Mattar

Comment 55

•

25 years ago

I used js.mak for JS and makefile.win for NSPR4. I made the same comparison as jband did with the tree from 23.1.2001. My results were about the same, but with a faster PC. I just saw that my DLLs are 2x as large as the downloaded ones, although I'm definitely building optimized. (No MOZ_DEBUG set and it builds in dist/WIN32_O.OBJ). Is there any other optimization setting than this?

Brendan Eich [:brendan]

Reporter

Comment 56

•

25 years ago

If you're building the JS shell, you don't need NSPR at all. If you are building an optimized ("Release") JS shell on windows, the .exe ends up in the Release subdirectory of js/src, IIRC. Something's amiss; I still suspect you are building a debug shell and perhaps uselessly linking it with an optimized NSPR library. /be

Brendan Eich [:brendan]

Reporter

Comment 57

•

25 years ago

Christian, any news? I'm moving the just-the-conversion-part-of-ftol inlining that rogerl suggested to mozilla0.9. 0.8 has most of the fixings for this bug. /be

Keywords: mozilla0.8 → mozilla0.9

Target Milestone: mozilla0.8 → mozilla0.9

Christian Mattar

Comment 58

•

25 years ago

I upgraded my PC and unfortunately, JSSHELL has become somewhat weird after that. I compiled exactly the jband did with NMAKE /f "js.mak" CFG="jsshell - Win32 Release" and used the JSSHELL and JS32.DLL out of the release-directory. When I now start the benchmark, it says: "starting test." Then it sits there for a moment and after 10 secs or so it just exits, without giving any further info. Is there a way to get some more debug-output without changing the code of the benchmark? XPCSHELL from one of the Mozilla nightlies works fine, though.

John Bandhauer

Comment 59

•

25 years ago

Christian: I'm betting you don't have the newest js.c with brendan's branchCallback changes mentioned earlier in this bug.

Christian Mattar

Comment 60

•

25 years ago

After a fresh pull it worked, but the results haven't changed: Nombas: Global looping speed: 1050420.1680672269 repetitions / second Local looping speed: 2010723.8605898123 repetitions / second Call empty function: 881316.09870740306 repetitions / second Call function: 221238.93805309737 repetitions / second Calculation speed: 159744.40894568691 repetitions / second Object manipulation speed: 8849.5575221238923 repetitions / second ECMA string speed: 18518.518518518518 repetitions / second Total time for test: 33.12 seconds Today's Seamonkey: Global looping speed: 505731.6250842886 repetitions / second Local looping speed: 1111111.111111111 repetitions / second Call empty function: 601443.4643143545 repetitions / second Call function: 480769.23076923075 repetitions / second Calculation speed: 292397.6608187135 repetitions / second Object manipulation speed: 3138.7319522912744 repetitions / second ECMA string speed: 16556.29139072848 repetitions / second Total time for test: 55.75 seconds Note that this is on a Thunderbird 1 Ghz. JS32.DLL is 442.368 bytes, JSSHELL.EXE is 20.480 bytes.

John Bandhauer

Comment 61

•

25 years ago

Christian: It is not clear to me that what your numbers show that "the results haven't changed". You show old numbers on a machine different from the one you are using now. Abd the new numbers only compare SM and Nombus. We are looking for SM improvements in the specific areas where the code has been changed - integers used in loop counters, etc. Can you compare a recent SpiderMonkey build with an old one on the same machine? That is what I did and saw real improvement. Also, are you sure you are really using your newly built dll when you run? If you have path issues the exe might possibly load the old one without your knowing it. The file size of 442.368 bytes you show for the dll looks odd to me. In a build today I get: > Release: > 2/19/01 14:49 335,872 js32.dll > 2/19/01 14:49 20,480 jsshell.exe > > Debug: > 2/19/01 14:46 696,375 js32.dll > 2/19/01 14:46 45,114 jsshell.exe

Christian Mattar

Comment 62

•

25 years ago

My point was that according to Brendan's first test, the patched JS should be faster than Nombas by ~33%, while on my PC, it was still slower (although the patch certainly made things faster). But you're probably correct when you say that something is wrong with my build. When I build Mozilla myself, it's about 1.5 times as large as the nightlies. It's definitely not a debug-build I'm doing, but something is wrong. Could this be because I'm on 98 and not NT?

Mike Shaver (:shaver emeritus)

Comment 63

•

25 years ago

Are we interested in optimizing the addition and subtraction cases as well? They seem relatively straightforward, at first glance.

Brendan Eich [:brendan]

Reporter

Comment 64

•

24 years ago

This one is good for kenton -- what's his bugzilla id? I can't find one by searching for his first name. Reassigning to beard to pass along. /be

Assignee: brendan → beard

Status: ASSIGNED → NEW

Keywords: mozilla0.9 → mozilla0.9.1

Target Milestone: mozilla0.9 → mozilla0.9.1

Brendan Eich [:brendan]

Reporter

Comment 65

•

24 years ago

Kenton, sorry to dump such a long bug on you. See rogerl's comments dated 2001-01-24 14:57 and let me know if I can help. /be

Assignee: beard → khanson

clayton

Updated

•

24 years ago

Target Milestone: mozilla0.9.1 → mozilla0.9.2

clayton

Comment 66

•

24 years ago

Moving TFV.

Target Milestone: mozilla0.9.2 → mozilla0.9.3

Asa Dotzler [:asa]

Updated

•

24 years ago

Target Milestone: mozilla0.9.3 → mozilla0.9.4

Kenton Hanson (gone)

Updated

•

24 years ago

Target Milestone: mozilla0.9.4 → mozilla0.9.5

Mike Shaver (:shaver emeritus)

Comment 67

•

24 years ago

I'm reading a book now that has an interesting tidbit about float->int conversion (Game Programming Gems II; the article is "Floating-Point Tricks: Improving Performance with IEEE Floating Point" by Yossarian King). The trick in question manages to get "(int)f" from 60 cycles on a P2 down to just 5 -- faster, he reports, than using the FPU without changing the rounding mode. I'm not sure if we can massage it into working for our double-precision numbers, but I thought I'd throw it out for consideration by our IEEE gurus. typedef union { int i; float f; } IntOrFloat; IntOrFloat bias; // magic bias value to force the bits around nicely bias.i = (23 + 127) << 23; #define FAST_FTOI(f, i) { IntOrFloat n; n.f = f; n.f += bias.f; n.i -= bias.i; i = n.i; } "Why does this work? Adding 1 x 2**23 as a floating-point number pushes the mantissa into the lower 23 bits, setting the exponent to a known value (23 + 127). Subtracting the known exponent as an integer removes these unwanted upper bits, leaving the desired integer in the low bits of the result." (For more detail, including diagrams, people should probably read the book.) Here's what I get with gcc 2.96-85 on -O2: int i; IntOrFloat n; n.f = f; n.f += bias.f; 8048614: d9 05 4c 98 04 08 flds 0x804984c 804861a: d8 45 08 fadds 0x8(%ebp) 804861d: d9 5d fc fstps 0xfffffffc(%ebp) 8048620: 8b 55 fc mov 0xfffffffc(%ebp),%edx 8048623: 89 d0 mov %edx,%eax n.i -= bias.i; 8048625: 2b 05 4c 98 04 08 sub 0x804984c,%eax And for the simple cast: 8048636: d9 45 08 flds 0x8(%ebp) int i = (int)f; 8048639: d9 7d fc fnstcw 0xfffffffc(%ebp) 804863c: 8b 55 fc mov 0xfffffffc(%ebp),%edx 804863f: c6 45 fd 0c movb $0xc,0xfffffffd(%ebp) 8048643: d9 6d fc fldcw 0xfffffffc(%ebp) 8048646: 89 55 fc mov %edx,0xfffffffc(%ebp) 8048649: db 5d f8 fistpl 0xfffffff8(%ebp) 804864c: d9 6d fc fldcw 0xfffffffc(%ebp) 804864f: 8b 45 f8 mov 0xfffffff8(%ebp),%eax It will take a better man than me to count the execution cycles for those (taking into account, as the author suggests, integer-vs-fp pipelining behaviours), but it does seem to work well for floats.

Kenton Hanson (gone)

Comment 68

•

24 years ago

This trick works for positive numbers less that 2^23. However, to be robust, one would have to add logic to handle 1) negative values, 2) magnitudes greater then 2^23, 3) infinities and 4) nans. Also the rounding direction may cause unanticipated behavior. Typically floats are truncated when converted to integers. The default rounding mode would round the result to nearest. When all these special cases are included the speed up may not be so great. BTW the algorithm works for 64-bit integers and doubles. Tricks like this are typically used in game software for pixel graphics calculations where the float is known to be a positive number less than 2^23 and the desired rounding is round to nearest

Steve Adamski (gone)

Comment 69

•

24 years ago

This bug is marked 0.9.5, and that time is now. Does anybody think that any of these patches are important to land now? If not , let's mark this 0.9.7, which is the next milestone for performance fixes.

Brendan Eich [:brendan]

Reporter

Comment 70

•

24 years ago

steve. all patches in this bug have landed, early this year. Where did you read that 0.9.7 is the next milestone for performance fixes? I don't see that in the Mozilla roadmap. /be

Steve Adamski (gone)

Comment 71

•

24 years ago

Brendan: I'm not sure if I've presumed too much, or not. This was from a 8/29 e-mail from Steve Elmer, the subject of which was "Mach V Roadmap *1st DRAFT*", and was asking for buy-in. Upon re-reading, it also says " Note: Mozilla feedback is still pending. The milestones represented here are projections that have not been confirmed by Mozilla ... I'm still working on lining this schedule up against the Mozilla 1.0 effort, it looks like we're pretty close based on Brendan's recent email." I have a vague, unsubstantiated, and hopefully wrong feeling that I have unwittingly stepped into something ...(hint: four letters, and it begins with an `s') I'm just trying to do with right thing with whatever information I have, however old or unfinalized that information may be, and am willing to be educated on whatever scheduling information that I do need ... Thanks for the other info. If all patches are landed, should this bug be marked as fixed? Or do we want to keep this as a performance catch-all? (If so, which target milestone should we mark it for?) Steve

Brendan Eich [:brendan]

Reporter

Comment 72

•

24 years ago

This bug is overlong; you guys should feel free to file a sequel bug. However, as I morphed it, it is not a catch-all: it asks for the interpreter to use the integer ALU and avoid hitting the FPU, where possible. Currently, SpiderMonkey uses doubles freely, for simplicity. /be

Kenton Hanson (gone)

Updated

•

24 years ago

Target Milestone: mozilla0.9.5 → mozilla0.9.6

Kenton Hanson (gone)

Comment 73

•

24 years ago

retargeting for future.

Target Milestone: mozilla0.9.6 → mozilla0.9.7

Kenton Hanson (gone)

Comment 74

•

24 years ago

This doesn't belong in 9.7, 9.8 or 1.0

Status: NEW → ASSIGNED

Target Milestone: mozilla0.9.7 → Future

Phil Schwartau

Updated

•

23 years ago

Blocks: 149801

Phil Schwartau

Updated

•

23 years ago

Blocks: 156155

Phil Schwartau

Updated

•

23 years ago

No longer blocks: 149801

Bob Clary [:bc] (inactive)

Updated

•

20 years ago

Assignee: khanson → general

Status: ASSIGNED → NEW

QA Contact: pschwartau → general

Target Milestone: Future → ---

Jan de Mooij [:jandem]

Comment 75

•

15 years ago

I believe this is fixed... brendan can you reopen if i'm missing something here?

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Brendan Eich [:brendan]

Reporter

Comment 76

•

15 years ago

Yes, this is fixed by the JITs and fatvals. /be

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.