421864 - Interpreter creates too many doubles

Reporter

Description

•

17 years ago

User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.12) Gecko/20080302 SeaMonkey/1.1.8 Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9b5pre) Gecko/2008031006 Minefield/3.0b5pre A expression like x += a + b + c + d + e + f + g + h; where a .. h are doubles creates a new double for every intermediate step. Instead, it could modify most of them in place. I was dissatisfied with floating point performance in Mozilla, looked at the code and found this possible optimization. Actually floating point performance isn't that bad anymore; it has improved very much since beta 3. Reproducible: Always Steps to Reproduce: Code & testcase Actual Results: Interpreter creates too much doubles Expected Results: Should recycle the doubles

x0

Reporter

Comment 1

•

17 years ago

Attached patch patch (obsolete) — Details — Splinter Review

This patch passes the JS test suite and works in the browser, but I don't know much about GC, so I'm not sure if it is really safe. It assumes that doubles returned by functions can be modified in place, but I had to fix several functions where this wasn't true. After I wrote the patch I noticed that it will collide with bug 419632. I also found bug 161110, which will make it obsolete. So I don't know if this patch is useful.

x0

Reporter

Comment 2

•

17 years ago

Attached file testcase — Details

Timings on my Athlon64, 2.2 GHz, Linux amd64 system: Seamonkey 1.1.8 14.0 s (built with Gentoo ebuild, amd64) Firefox 3 alpha 7 11.7 s (x86 binary) Firefox 3 beta 1 11.0 s (x86 binary) Firefox 3 beta 3 10.2 s (x86 binary) Trunk 1.28 s (CVS build, amd64) Trunk + patch .74 s (CVS build, amd64) Trunk + bug 419632 1.36 s (CVS build, amd64) Opera 9.10 1.69 s (x86 binary) Konqueror 3.5.6 6.48 s (built with Gentoo ebuild, amd64) Improvement with the patch is high, but the testcase tests exactly the optimizations of the patch. Real world scripts won't benefit that much. You can change the loop count of the testcase by adding a fragment like #x:param(1e6) to the URL (1e6 is the default).

Brendan Eich [:brendan]

Comment 3

•

17 years ago

Good thinking -- return values are single-reference temporaries by definition. Igor, can you help get this proven safe and into landable shape? /be

Assignee: general → igor

Status: UNCONFIRMED → NEW

Ever confirmed: true

Brian Crowder

Updated

•

17 years ago

Summary: Interpreter creates too much doubles → Interpreter creates too many doubles

Brian Crowder

Updated

•

17 years ago

Keywords: perf

Version: unspecified → Trunk

Brendan Eich [:brendan]

Comment 4

•

17 years ago

In a rush at a conference, but I've thought about such schemes before. Is this patch correct when you do the following: x = a + b; y = x; x += c + d + e; given a-e in 1-5, is y still 3 after the += completes? With a copy-on-write bit in gcflags, we could avoid copying NaN, etc. In general we'd need a write barrier, though. We have one for objects now, but not for stack frame slots. It is not free to add to, so the cost might undermine the win. /be

Igor Bukanov

Comment 5

•

17 years ago

AFAICS the idea from this patch is only applicable to cases of chained math operations like in x + y + z where it can eliminate one temporary. Cases like x + y can not be optimized since there is no guarantee that the result of x or y is not aliased due to SpiderMonkey API compatibility. On the other hand numerical-intensive code does contain various forms of multiply-add. nbody and ray-tracing tests from the sunspider benchmark are littered with samples like: vals[0] * v[0] + vals[1] * v[1] + vals[2] * v[2] + vals[3]; v1[1] * v2[2] - v1[2] * v2[1] bodyi.vx -= dx * bodyj.mass * mag; This creates a lot of temporaries.

Igor Bukanov

Updated

•

17 years ago

Severity: normal → enhancement

x0

Reporter

Comment 6

•

17 years ago

(In reply to comment #4) > x = a + b; > y = x; > x += c + d + e; > > given a-e in 1-5, is y still 3 after the += completes? Yes, the last line translates to x = x + ((c + d) + e); and the patch only reuses temporary results on the left hand side of an expression. So it saves just 1 double here. However, it could be extended to also reuse temporary results on the right hand side. I hadn't thought about that.

x0

Reporter

Comment 7

•

17 years ago

I found a case where the patch fails: var x = .5; var y = .5; var z = (function () { return x })() + .5; x, y and z will all three be 1 afterwards. Interpreted functions do not always return temporary values as I had assumed. NB: If you test, you have to use values that are not small integers. The patch doesn't affect integer arithmetic (but .5 + .5 won't be converted to an integer).

Brendan Eich [:brendan]

Comment 8

•

17 years ago

Comment 7 shows the aliasing problem I gave a different example of in comment 4. The copy-on-write idea is good, it requires a write barrier. Simple flow-unaware static analysis could be used to select a prefix opcode for arithmetic and other ops, to try to mutate left or right operands. The + operator can apply to strings too, so this idea could be used to optimize string concatenation too. /be

Mike Shaver (:shaver emeritus)

Comment 9

•

17 years ago

The oh-so-close patch in https://bugzilla.mozilla.org/show_bug.cgi?id=419743 does similar static analysis to pick an optimized bytecode for string concatenation, so as to amortize allocation and copy costs better. Might be interesting to do the same thing here, and just look for multiple runs of temporary-generating ops, and then picking some different bytecodes that operate on a pigeonhole unrooted double instead of putting a jsval on the stack? The addition cases are troublesome, but at least in cases like this vals[0] * v[0] + vals[1] * v[1] + vals[2] * v[2] + vals[3]; we can know that the addition's operands are going to be numbers, since * always produces a number -- except for possibly vals[3], but even then the in-place operate case would be OK. Yeah, if we could get some JSOP_LOADTEMP and JSOP_ADDTEMP/MULTEMP/DIVTEMP things going, then we could probably work well for strings as well, since the string in the temporary slot could be safely realloced. Oh, but that wouldn't work as well for subexpressions, I guess, since the pigeonhole would get reused. Hrmph.

Brendan Eich [:brendan]

Comment 10

•

17 years ago

Attached patch 7.5% win for non-PGO Mac GCC sunspider (obsolete) — Details — Splinter Review

I had based this on Igor's patch for bug 421274, and would like to do so again if that patch re-lands soon. It's a small change. This approach can be used for strings too. The virtual register numbers would map at runtime to either the double accumulators, or the string accumulators, for +. It may also win to let locals (var and let in functions) refer to tagged pointers to accumulator registers, but this requires escape analysis to avoid imposing a write barrier at runtime. /be

Assignee: igor → brendan

Status: NEW → ASSIGNED

Attachment #309282 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 11

•

17 years ago

Igor, mind if I steal this bug? /be

Brendan Eich [:brendan]

Comment 12

•

17 years ago

Hrm, I already did -- sorry, didn't mean to. /be

Brendan Eich [:brendan]

Comment 13

•

17 years ago

Attached patch picked a few nits (obsolete) — Details — Splinter Review

This passes js tests (so did the previous patch). /be

Attachment #308379 - Attachment is obsolete: true

Attachment #309282 - Attachment is obsolete: true

Attachment #309293 - Flags: review?(igor)

Attachment #309282 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 14

•

17 years ago

Attached patch fixed perf regression from first patch (obsolete) — Details — Splinter Review

js> function f() {x+=a*b + c} js> dis(f) main: 00000: bindname "x" 00003: dup 00004: getxprop "x" 00007: name "a" 00010: name "b" 00013: mul 1 00015: name "c" 00018: add 1 00020: add 0 00022: setname "x" 00025: pop 00026: stop Source notes: 0: 20 [ 20] xdelta 1: 20 [ 0] assignop Without this patch, the add at 18 had regno 0 as its immediate. The fix is to FreeNodeReg, it should not reset pn_reg. Each accumulator is single-use, so we record uses in tree nodes. Freeing a register from busy state does not require erasing recorded uses in the tree. /be

Attachment #309293 - Attachment is obsolete: true

Attachment #309450 - Flags: review?(igor)

Attachment #309293 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 15

•

17 years ago

Igor, jimb, anyone else: I would appreciate tips on compiler options or pragmas to guarantee double alignment of JSStackFrame.dregs (soon JSFrameRegs.dregs), to avoid the runtime alignment done within the overallocated fp->dregs space by FRAME_DREGP. We'd like to cover GCC, ICC, and MSVC of course. /be

Brendan Eich [:brendan]

Comment 16

•

17 years ago

Attached patch fix decompilation of op= (assignment ops) (obsolete) — Details — Splinter Review

Surprised we don't have a testcase for these. Decompiling *=, etc., requires one-opcode look-behind. The ancient decompiler ass-u-med that lastop was exactly one byte behind. /be

Attachment #309450 - Attachment is obsolete: true

Attachment #309452 - Flags: review?(igor)

Attachment #309450 - Flags: review?(igor)

Igor Bukanov

Comment 17

•

17 years ago

Comment on attachment 309452 [details] [diff] [review] fix decompilation of op= (assignment ops) >@@ -2664,24 +2664,36 @@ js_TraceStackFrame(JSTracer *trc, JSStac .... >+ for (vp = fp->spbase, end = vp + nslots; vp < end; vp++) { >+ v = *vp; >+ if ((size_t)(v - (jsval)fp->dregs) < sizeof fp->dregs) >+ continue; >+ if (JSVAL_IS_TRACEABLE(v)) { >+ JS_SET_TRACING_INDEX(trc, "operand", vp - fp->spbase); >+ JS_CallTracer(trc, JSVAL_TO_TRACEABLE(v), >+ JSVAL_TRACE_KIND(v)); >+ } >+ } > } This is too hacky. Given the amount of checks in JSVAL_IS_TRACEABLE(v) it is better to write the body of the loop as: if (JSVAL_IS_OBJECT(v) && v != JSVAL_NULL) { traceKind = JSTRACE_OBJECT; } else if (JSVAL_IS_DOUBLE(v)) { jddouble *dp = JSVAL_TO_DOUBLE(v); if ((size_t) (dp - fp->dregs) < JS_ARRAY_LENGTH(fp->dregs)) continue; traceKind = JSTRACE_DOUBLE; } else if (JSVAL_IS_STRING(v)) { traceKind = JSTRACE_STRING; } else { JS_ASSERT(!JSVAL_IS_TRACEABLE(v)); continue; } JS_SET_TRACING_INDEX(trc, "operand", vp - fp->spbase); JS_CallTracer(trc, JSVAL_TO_TRACEABLE(v), traceKind); It also makes things ready for string registers. >Index: jsinterp.c >=================================================================== >RCS file: /cvsroot/mozilla/js/src/jsinterp.c,v >retrieving revision 3.477 >diff -p -u -8 -r3.477 jsinterp.c >--- jsinterp.c 13 Mar 2008 18:03:02 -0000 3.477 >+++ jsinterp.c 14 Mar 2008 17:11:26 -0000 >@@ -2221,16 +2221,20 @@ js_DumpOpMeters() > * reference. > */ > #define STORE_NUMBER(cx, n, d) \ > JS_BEGIN_MACRO \ > jsint i_; \ > \ > if (JSDOUBLE_IS_INT(d, i_) && INT_FITS_IN_JSVAL(i_)) { \ > sp[n] = INT_TO_JSVAL(i_); \ >+ } else if (OP_IS_ARITH(op) && pc[1]) { \ >+ jsdouble *dp_ = FRAME_DREGP(fp, pc[1]); \ >+ *dp_ = d; \ >+ sp[n] = DOUBLE_TO_JSVAL(dp_); \ > } else { \ > SAVE_SP_AND_PC(fp); \ > if (!js_NewDoubleInRootedValue(cx, d, &sp[n])) \ > goto error; \ > } \ > JS_END_MACRO This clearly demonstrates how optimizing parser can improve things with almost no touches to the interpreter. >diff -p -u -8 -r3.86 jsinterp.h >- * NB: This struct is manually initialized in jsinterp.c and jsiter.c. If you >- * add new members, update both files. But first, try to remove members. The >- * sharp* and xml* members should be moved onto the stack as local variables >- * with well-known slots, if possible. >+ * NB: This struct is manually initialized in jsinterp.c and jsiter.c. If you >+ * add new members, update both files to initialize appropriately (the dregs >+ * member does not need to be initialized). And please try to remove members. >+ * The sharp* and xml* members should be moved into magic local variables with >+ * well-known slots, if possible. > */ > struct JSStackFrame { > jsval *sp; /* stack pointer */ > jsbytecode *pc; /* program counter */ .... >+ char dregs[3 * sizeof(jsdouble)]; The plea to remove members and bloating frames that do not benefit from the patch is self-contradictory. The buffer for doubles can be allocated together with stack slots and the script can be marked as such. Then dregs can become a local variable in js_Interpret. Now, that would require to change the serialization format of jsscript to recover the number of double regs. Effectively it means to bump JSXDR_MAGIC_SCRIPT_CURRENT. But this will be a good thing since it would also allow to store the version immediately after the magic and treat both magic and version mismatch the same way. The code is not going to support older versions of xdr scripts in any case and pretending that it does leads to bugs and waste of time. Bumping magic during betas also gives better chance of testing the code. >Index: jsparse.c ... > JSBool >-js_FoldConstants(JSContext *cx, JSParseNode *pn, JSTreeContext *tc) >+js_FoldConstants(JSContext *cx, JSParseNode *pn, JSParseNode *parent, >+ JSTreeContext *tc) Nit: this is no longer a constant folder. Name like ExpressionOptimizer would better reflect the new meaning of the function. >+ if (PN_IS_PURE(parent) && PN_IS_ARITH(pn)) { ... >+ if (PN_IS_PURE(parent) && PN_IS_ARITH(pn)) { The new macros is only used in this combination. This suggests to have one single macro instead. Another note is that the patch excludes the bit operation from the optimization. The cryptography-related benchmarks from SunSpider contains sequences like (a << shift | b) with a << shift. They generates ints that do not fit jsval in 50% of cases. Thus even bit-manipulation code may benefit from dval registers. But this is for another bug.

x0

Reporter

Comment 18

•

17 years ago

With the latest patch: js> x Assertion failure: pn->pn_reg <= 2, at jsparse.c:259 Aborted Every test of the test suite fails with the same assertion for me.

Brendan Eich [:brendan]

Comment 19

•

17 years ago

x00000000, you need to clobber when structs used in multiple .c files change size. The patch works, please confirm that rebuilding from source fixed what you reported in comment 18. /be

Brendan Eich [:brendan]

Comment 20

•

17 years ago

(In reply to comment #17) > This is too hacky. I am a hacker. :-) > Given the amount of checks in JSVAL_IS_TRACEABLE(v) it is > better to write the body of the loop as: ... > It also makes things ready for string registers. Thanks, will do. > This clearly demonstrates how optimizing parser can improve things with almost > no touches to the interpreter. I've been trying to tag non-GC-page doubles for a while, in several bugs. Trick to avoiding a write barrier entails locality and instruction selection plus register allocation. More to come. > >+ * The sharp* and xml* members should be moved into magic local variables with > >+ * well-known slots, if possible. > > */ > > struct JSStackFrame { > > jsval *sp; /* stack pointer */ > > jsbytecode *pc; /* program counter */ > .... > >+ char dregs[3 * sizeof(jsdouble)]; > > The plea to remove members and bloating frames that do not benefit from the > patch is self-contradictory. That's a bit harsh -- getting sharp* and xml* out is good no matter what. The double registers need to be double-aligned (0 mod 8 addresses), however, which is not guaranteed by jsval-aligned stack allocation. > The buffer for doubles can be allocated together > with stack slots and the script can be marked as such. Then dregs can become a > local variable in js_Interpret. But then it becomes another pointer to keep alive in a register, or spill and load. I'd rather it be a small offset from an existing pointer or known on-stack buffer such as JSFrameRegs. Then the only issue is alignment, hence my plea to you, JimB, et al. > Now, that would require to change the serialization format of jsscript to > recover the number of double regs. Effectively it means to bump > JSXDR_MAGIC_SCRIPT_CURRENT. But this will be a good thing since it would also > allow to store the version immediately after the magic and treat both magic and > version mismatch the same way. The code is not going to support older versions > of xdr scripts in any case and pretending that it does leads to bugs and waste > of time. Bumping magic during betas also gives better chance of testing the > code. I'd rather not do this now. It is a good idea to reform script magic vs. XDR bytecode version, but time is short and this patch introduces exactly 2 double accumulators, intentionally to keep things simple and satisfy the 90% (99.9%) use-case where there are few parenthesized sub-expressions requiring more than one accumulator. > >Index: jsparse.c > ... > > JSBool > >-js_FoldConstants(JSContext *cx, JSParseNode *pn, JSTreeContext *tc) > >+js_FoldConstants(JSContext *cx, JSParseNode *pn, JSParseNode *parent, > >+ JSTreeContext *tc) > > Nit: this is no longer a constant folder. Name like ExpressionOptimizer would > better reflect the new meaning of the function. Good idea, will do. > >+ if (PN_IS_PURE(parent) && PN_IS_ARITH(pn)) { > ... > >+ if (PN_IS_PURE(parent) && PN_IS_ARITH(pn)) { > > The new macros is only used in this combination. This suggests to have one > single macro instead. Sure, but layered on these. > Another note is that the patch excludes the bit operation from the > optimization. The cryptography-related benchmarks from SunSpider contains > sequences like (a << shift | b) with a << shift. They generates ints that do > not fit jsval in 50% of cases. Thus even bit-manipulation code may benefit from > dval registers. But this is for another bug. That too is simply done in this patch, due to adjacency of opcodes and existing structure of js_FoldConstants. New patch shortly. /be

Brendan Eich [:brendan]

Comment 21

•

17 years ago

Attached patch per Igor's review (obsolete) — Details — Splinter Review

** TOTAL **: 1.108x as fast 3539.3ms +/- 1.7% 3195.0ms +/- 0.2% significant Unaccountably except due to TLB and cache conflicts, the only test that slowed down was: 3bit-bits-in-byte: *1.042x as slow* 57.8ms +/- 1.8% 60.2ms +/- 0.5% significant This test stays in the interpreter and uses only int jsvals. Between PGO and the still-beckoning change to organize interpreter "cases" by "temperature" and the dynamic-successor relation, we should be able to make this go away. Interested in how much this wins in PGO'ed builds. Easiest way to tell is to get the patch in ;-). /be

Attachment #309502 - Flags: review?(igor)

x0

Reporter

Comment 22

•

17 years ago

(In reply to comment #19) > x00000000, you need to clobber when structs used in multiple .c files change > size. The patch works, please confirm that rebuilding from source fixed what > you reported in comment 18. I still get the error. At the end of js_FoldConstants: case PN_UNARY: case PN_NAME: /* Our kid may be null (e.g. return; vs. return e;). */ pn1 = pn; do { pn1 = pn1->pn_kid; } while (pn1 && PN_TYPE(pn1) == TOK_RP); FreeNodeReg(pn1, tc); break; *pn1->pn_kid is mostly zero, but pn_reg is 32. Seems to be a missing initialization of pn_reg.

Igor Bukanov

Comment 23

•

17 years ago

(In reply to comment #20) > > The buffer for doubles can be allocated together > > with stack slots and the script can be marked as such. Then dregs can become a > > local variable in js_Interpret. > > But then it becomes another pointer to keep alive in a register, or spill and > load. I'd rather it be a small offset from an existing pointer or known > on-stack buffer such as JSFrameRegs. Then the only issue is alignment, hence my > plea to you, JimB, et al. An alternative for doubles in JSFrame is to have dregs[3] local in js_Interpret. For inlined calls a copy would have to be stored in JSInlinedFrame, of cause, but the coping can be done only if caller's frame needs doubles. This will have an extra benefit of avoiding access fp during the optimized operations helping the compiler to optimize code. But I agree that this can be done in a separated bug.

Igor Bukanov

Updated

•

17 years ago

Attachment #309452 - Attachment is obsolete: true

Attachment #309452 - Flags: review?(igor)

Brendan Eich [:brendan]

Updated

•

17 years ago

Priority: -- → P1

Target Milestone: --- → mozilla1.9beta5

Brendan Eich [:brendan]

Comment 24

•

17 years ago

jimb, mmoy, anyone: could use help with minimal compiler options or pragmas to double-align JSStackFrame.dregs (and well-type it, and remove padding overhead from it). GCC's -malign-double is a bit too big of a hammer; care about MSVC and ICC too as noted in a previous comment. Thoughts? /be

Igor Bukanov

Comment 25

•

17 years ago

Comment on attachment 309502 [details] [diff] [review] per Igor's review >@@ -2664,24 +2664,48 @@ js_TraceStackFrame(JSTracer *trc, JSStac >+ } else if (JSVAL_IS_DOUBLE(v)) { >+ jsdouble *dp = JSVAL_TO_DOUBLE(v); >+ if ((size_t)((char *)dp - fp->dregs) < sizeof fp->dregs) >+ continue; ... >RCS file: /cvsroot/mozilla/js/src/jsinterp.h,v ... > struct JSStackFrame { ... >+ /* >+ * GC-thing-aligned double accumulators used to store intermediate results >+ * to avoid allocating from the GC heap. Use FRAME_DREGP(fp,regno) to find >+ * the address of the accumulator identified by regno (which is one-based, >+ * note well) in the frame at fp. >+ */ >+ char dregs[3 * sizeof(jsdouble)]; >+ >+#define FRAME_DREGP(fp,regno) \ >+ (JS_ASSERT((regno) != 0), \ >+ (jsdouble *)(((jsuword)((fp)->dregs + ((regno) - 1) * sizeof(jsdouble) \ >+ + sizeof(jsdouble) - 1)) \ >+ & ~JSVAL_TAGMASK)) Why use char dregs[3 * sizeof(jsdouble)] and not just jsdouble dregs[3]? With the patch from bug 421274 that array would not even be double-aligned at the current position. Preparations for string regs should wait its own bug without complicating this one. r+ with this and the issue from the comment 22 addressed.

Brendan Eich [:brendan]

Comment 26

•

17 years ago

(In reply to comment #23) > (In reply to comment #20) > An alternative for doubles in JSFrame is to have dregs[3] local in > js_Interpret. For inlined calls a copy would have to be stored in > JSInlinedFrame, of cause, but the coping can be done only if caller's frame > needs doubles. This will have an extra benefit of avoiding access fp during the > optimized operations helping the compiler to optimize code. Still need something like JSFrameRegs *regs in JSStackFrame pointing to the local drefs, so the GC can avoid scanning 'em. That suggests to me that we prioritize alignment and co-location in JSFrameRegs ahead of minimizing compiler-selected/variable save/restore code and data space (which I agree we should do -- the bug 389214 might serve already). /be

Igor Bukanov

Comment 27

•

17 years ago

(In reply to comment #26) > Still need something like JSFrameRegs *regs in JSStackFrame pointing to the > local drefs, so the GC can avoid scanning 'em. Right, I have missed that.

Brendan Eich [:brendan]

Comment 28

•

17 years ago

(In reply to comment #25) > Why use char dregs[3 * sizeof(jsdouble)] and not just jsdouble dregs[3]? Because the wider type serves no purpose and may even mislead readers and (more important) debugger-drivers. True, the alignment could just happen to be good and the view of dregs[0] and dregs[1] valid -- but I found less than 50-50 odds of misalignment on Mactel, and looking at half a double in dregs[0] where the other half is in dregs[1] does no one any favors. > With > the patch from bug 421274 that array would not even be double-aligned at the > current position. Nothing guarantees double alignment except the code in this patch; whether dregs moves to JSFrameRegs is independent. > Preparations for string regs should wait its own bug without > complicating this one. > r+ with this and the issue from the comment 22 addressed. I do not know why x00000000 is seeing a crash. I'm not -- the patches pass the js testsuite and have since v1. Need more crash info, or joint debugging over IRC. x00000000, do you have a name? Can you get onto irc.mozilla.org and /query me? /be

Brendan Eich [:brendan]

Comment 29

•

17 years ago

I meant to write "worse than 50-50 odds of misalignment". Not scientific but it was clear from some printfs I added. /be

Igor Bukanov

Comment 30

•

17 years ago

(In reply to comment #28) > Because the wider type serves no purpose and may even mislead readers and (more > important) debugger-drivers. Right, this is another thing I have missed. But then the patch should use 3*sizeof(jsdouble) + sizeof(jsuword) bytes for the array to ensure that the array always has room for 3 8-byte aligned doubles. I would also prefer uint8, not char, as the type for the array. [Note I have extremely unstable internet connection today and IRC is useless for me as most of the time the message would not go through due to timeouts :(. With bugzilla I can click reload if that happens.] True, the alignment could just happen to be good > and the view of dregs[0] and dregs[1] valid -- but I found less than 50-50 odds > of misalignment on Mactel, and looking at half a double in dregs[0] where the > other half is in dregs[1] does no one any favors. > > > With > > the patch from bug 421274 that array would not even be double-aligned at the > > current position. > > Nothing guarantees double alignment except the code in this patch; whether > dregs moves to JSFrameRegs is independent. > > > Preparations for string regs should wait its own bug without > > complicating this one. > > > r+ with this and the issue from the comment 22 addressed. > > I do not know why x00000000 is seeing a crash. I'm not -- the patches pass the > js testsuite and have since v1. Need more crash info, or joint debugging over > IRC. x00000000, do you have a name? Can you get onto irc.mozilla.org and /query > me? > > /be >

x0

Reporter

Comment 31

•

17 years ago

The problem is that pn_kid is accessed for PN_NAME, which has no pn_kid. See code snippet of comment 22.

Brendan Eich [:brendan]

Comment 32

•

17 years ago

There are only 2 double registers, hence the 3 * sizeof(jsdouble). I can use uint8 for sure, tend to be old-school about char (C guarantees sizeof(char) == 1 but I recall days of old when char could be a 36-bit word! on PDP-10s!). But, I really want to believe, like Agent Mulder, that some pragma can be injected at just the right spot to ensure alignment without this over-allocating/aligning in the patch, and without changing global options. /be

Brendan Eich [:brendan]

Comment 33

•

17 years ago

(In reply to comment #31) > The problem is that pn_kid is accessed for PN_NAME, which has no pn_kid. > See code snippet of comment 22. Argh, I misremembered pn_expr as overlaying pn_kid. Sorry about that. Fixing in a minute, and thanks for finding it (I just re-ran the suite and did not have any unexpected failures -- what platform are you using?). /be

x0

Reporter

Comment 34

•

17 years ago

(In reply to comment #33) > what platform are you using? Linux, amd64.

Brendan Eich [:brendan]

Comment 35

•

17 years ago

Attached patch with that fixed (obsolete) — Details — Splinter Review

x00000000, thanks again -- I would appreciate it if you could test this too. /be

Attachment #309502 - Attachment is obsolete: true

Attachment #309521 - Flags: review?(igor)

Attachment #309502 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 36

•

17 years ago

Attached patch tweak to regBusy so it's JSPackedBool[2], etc. (obsolete) — Details — Splinter Review

No need for a dummy regBusy[0] for the no-reg case. /be

Attachment #309521 - Attachment is obsolete: true

Attachment #309535 - Flags: review?(igor)

Attachment #309521 - Flags: review?(igor)

x0

Reporter

Comment 37

•

17 years ago

About 1/3 of the test suite still crashes (with attachment 309521 [details] [diff] [review]; now segfaults without assertion failure). A minimal testcase that crashes is: var x = .5 x * 1 + 1; Crash is in BINARY_OP(*), but |x * 1| does not crash.

Brendan Eich [:brendan]

Comment 38

•

17 years ago

Crashing where? Might have to macro-expand not only BINARY_OP for the JSOP_MUL case, but also of course STORE_* macros. That should let you see the crashing line in context. /be

x0

Reporter

Comment 39

•

17 years ago

Crash is at *dp_ = d in STORE_DOUBLE. dp_ seems to be quite random, but non-null.

Brendan Eich [:brendan]

Comment 40

•

17 years ago

dp_ is assigned to on the previous line, from FRAME_DREGP(fp, pc[1]). That macro does not produce a random value unless the compiler is broken. Can you disassemble that line's instructions in gdb, and the *dp_ = d; line's instructions too? Thanks, /be

x0

Reporter

Comment 41

•

17 years ago

dp_ looses its upper 32 bits. The reason is probably that JSVAL_TAGMASK is 32 bit.

Brendan Eich [:brendan]

Comment 42

•

17 years ago

Attached patch workaround JSVAL_TAGMASK not 64-bit clean bug (obsolete) — Details — Splinter Review

x000000000 (mind if I call you x0 for short? ;-), thanks again. This is an ancient 64-bit porting hazard in jsapi.h, over 11 years old. Working around it in FRAME_DREGP the obvious way should fix things for you. Please confirm. /be

Attachment #309535 - Attachment is obsolete: true

Attachment #309563 - Flags: review?(igor)

Attachment #309535 - Flags: review?(igor)

x0

Reporter

Comment 43

•

17 years ago

All tests pass now (assuming the patch just changes ~JSVAL_TAGMASK to ~(jsuword)JSVAL_TAGMASK, which was my correction of the last patch when I ran the tests). Additionally, also js1_5/extensions/regress-335700.js ("Object Construction with getter closures should be O(N)") passes, which failed before.

Brendan Eich [:brendan]

Comment 44

•

17 years ago

(In reply to comment #43) > All tests pass now (assuming the patch just changes ~JSVAL_TAGMASK to > ~(jsuword)JSVAL_TAGMASK, which was my correction of the last patch when I ran > the tests). That was it -- if you click on the diff link for the latest attachment, at the top of the colorized side-by-side diff is an interdiff link, which confirms this. > Additionally, also js1_5/extensions/regress-335700.js ("Object Construction > with getter closures should be O(N)") passes, which failed before. That's a false negative -- that test flaps a bit in my experience. It's doing a least-squares fit over perhaps not enough data to find the quadratic curve. Thanks for all the testing help, and the original bug report and approach. /be

x0

Reporter

Comment 45

•

17 years ago

Some basic arithmetic is totally wrong now: js> var a = .1, b = .9; js> (a + a) * (2 * b); 3.24 js> (a + a) / (2 * b); 1

Brendan Eich [:brendan]

Comment 46

•

17 years ago

Comment on attachment 309563 [details] [diff] [review] workaround JSVAL_TAGMASK not 64-bit clean bug Registers aren't marked busy when they should be. New patch (with __attribute__ / __declspec based alignment of dregs) later tonight. /be

Attachment #309563 - Attachment is obsolete: true

Attachment #309563 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 47

•

17 years ago

Attached patch passes mochitests and js tests (obsolete) — Details — Splinter Review

I can't get __attribute__ ((aligned (8))) to do what it is supposed to do when used after a member declaration such as jsdouble dregs[2] and before the ; that ends the declaration. Any have a clue to loan? I'll file followup bugs as needed and note them in comments before checking in (at least one in jsinterp.h about using alignment pragmas), but this patch is ready for review. BTW, the fix from the last patch was to propagate pn_reg up from the kid to its TOK_RP (parenthesized group expression) node in js_OptimizeExprs. /be

Attachment #309595 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 48

•

17 years ago

Bugzilla interdiff between the last two patches works, BTW. /be

x0

Reporter

Comment 49

•

17 years ago

The patch still has problems. In a + b * c * (d + e) the intermediate result of b * c will be destroyed by (d + e). It also misses some optimizations, e.g. a * b * c * d * e won't be optimized. I'll post a patch that uses just a count of the used registers instead of busy bits. I think that's a better approach, because they work like a stack. BTW, __attribute__ ((aligned (8))) does work for me in gcc 4.2.3. ... uint8 misalign; jsdouble dregs[2]; is already correctly aligned without an attribute, but ... uint8 misalign; jsdouble dregs[2] __attribute__ ((aligned (16))); gives 16-byte alignment as expected.

x0

Reporter

Comment 50

•

17 years ago

Attached patch alternative patch, based on previous one (obsolete) — Details — Splinter Review

Passes the js test suite and works in the browser.

Brendan Eich [:brendan]

Comment 51

•

17 years ago

x0, you rock -- I was trying to reason "bottom-up" based on singular use of intermediates, but that's unsound. Also, my __attribute__ ((aligned (8))) problems were self-inflicted. The hard case is the "inline_call" code in the interpreter, where JSInlineFrame is allocated on the interpreter stack. GCC does not even try to warn about infeasibility of enforcing the aligned attribute here. New patch shortly, a mostly-stylistic-nit-picking pass over yours plus alignment pragmas. /be

x0

Reporter

Comment 52

•

17 years ago

A remaining issue: After (x += .1) + 1 x will be corrupt. We need to exclude assignment ops from using registers.

Brendan Eich [:brendan]

Comment 53

•

17 years ago

Yes, pn_type will be TOK_ASSIGN where pn_op satisfies PN_IS_ARITH. I've got that alignment patch almost ready (frustrating dealing with case proliferation under the inline call code for JSOP_CALL in js_Interpret), I'll fix that too. Thanks, /be

x0

Reporter

Comment 54

•

17 years ago

Attached file Perl script to generate random arithmetic js code — Details

With the issue in comment 52 fixed, I get always the same output as in an unpatched build.

x0

Reporter

Updated

•

17 years ago

Attachment #309800 - Attachment mime type: application/octet-stream → text/plain

Brendan Eich [:brendan]

Updated

•

17 years ago

Blocks: 423300

Brendan Eich [:brendan]

Comment 55

•

17 years ago

Attached patch proposed brendan+x0 love-child fix (obsolete) — Details — Splinter Review

Passes mochi, js, and x0's tcgen.pl tests (I had to fold in the r+ patch for bug 423300). Many thanks to x00000000@freenet.de for all the help and timely assists! /be

Attachment #309595 - Attachment is obsolete: true

Attachment #309718 - Attachment is obsolete: true

Attachment #310190 - Flags: review?(igor)

Attachment #309595 - Flags: review?(igor)

Igor Bukanov

Comment 56

•

17 years ago

Comment on attachment 310190 [details] [diff] [review] proposed brendan+x0 love-child fix >Index: jsgc.c ... >@@ -2664,25 +2664,49 @@ js_TraceStackFrame(JSTracer *trc, JSStac ... > if (fp->regs) { > nslots = (uintN) (fp->regs->sp - fp->spbase); > JS_ASSERT(nslots <= fp->script->depth); >- TRACE_JSVALS(trc, nslots, fp->spbase, "operand"); >+ for (vp = fp->spbase, end = vp + nslots; vp < end; vp++) { >+ jsval v; >+ uintN traceKind; >+ >+ v = *vp; >+ if (!JSVAL_IS_PRIMITIVE(v)) { >+ traceKind = JSTRACE_OBJECT; >+ } else if (JSVAL_IS_DOUBLE(v)) { >+ jsdouble *dp = JSVAL_TO_DOUBLE(v); >+ if (IS_FRAME_DREGP(fp, dp)) >+ continue; This is wrong and I should have seen this earlier. For callers of an inline frame fp->regs will point to JSInlineFrame.callerRegs but the double jsvals will still point to REGS.dregs local from JS_Interpret native stack. The situation is worse for generator frames as for them jsvals pointing to registers may well point to the stack of JS_Interpret that has finished its execution. The following test case may expose the problem: function gen() { yield (Math.PI * 2) * (yield 0); } var x = gen(); x.next(); gc(); var result = x.send(1); if (result !== Math.PI * 2) throw "Bad result: "+ result; This problem with generators pretty much requires to move dregs back to JSStackFrame.

Attachment #310190 - Flags: review?(igor)

Brendan Eich [:brendan]

Comment 57

•

17 years ago

Argh, I should have seen that. I need a vacation! Ok, it is not hard to move dregs back to the frame. /be

x0

Reporter

Comment 58

•

17 years ago

The JSOP_NEG case in jsinterp.c should have a STORE_NUMBER() instead of js_NewNumberInRootedValue() to actually use its register. That was already missing in my patch. IMHO there should be 3 registers at least. Most scripts won't use the 3rd register much, but they are not expensive. The simplest expression that would use a 3rd register is something like a + b + c * d * -e Some additional operators could be declared "pure". Most important are JSOP_STRICTEQ and JSOP_STRICTNE. The other missing optimizations are only edge cases, AFAICT (examples are JSOP_GETELEM where only integers are common or conditionals without relational operator like if (x + y) ...). The TOK_ASSIGN case in js_OptimizeExprs() doesn't make much sense at all. Any constant expression (except the rightmost) should cause a syntax error anyways (and does so), thus there's nothing to optimize. I don't understand the (old) comment there: case TOK_ASSIGN: /* * Compound operators such as *= should be subject to folding, in case * the left-hand side is constant, and so that the decompiler produces * the same string that you get from decompiling a script or function * compiled from that same string. As with +, += is special. */

Brendan Eich [:brendan]

Comment 59

•

17 years ago

(In reply to comment #58) > The JSOP_NEG case in jsinterp.c should have a STORE_NUMBER() instead of > js_NewNumberInRootedValue() to actually use its register. That was already > missing in my patch. Thanks, will fix in next patch. > IMHO there should be 3 registers at least. Most scripts won't use the 3rd > register much, but they are not expensive. Hard to measure but they take up (or will take up, in the new patch) interpreter (not C) stack frame space, which is not cost free. > The simplest expression that would > use a 3rd register is something like > > a + b + c * d * -e Does anything use three registers in the microbenchmarks we study? Just curious, not saying that those micros are decisive. I will instrument the new patch lightly to find out. > Some additional operators could be declared "pure". Most important are > JSOP_STRICTEQ and JSOP_STRICTNE. These are not that common, although some JS gurus push them over == and !=. It would be awkward to reorder jsopcode.tbl heavily at this stage, but I'll see what I can do. > The TOK_ASSIGN case in js_OptimizeExprs() doesn't make much sense at all. Any > constant expression (except the rightmost) should cause a syntax error anyways > (and does so), thus there's nothing to optimize. I don't understand the (old) > comment there: > > case TOK_ASSIGN: > /* > * Compound operators such as *= should be subject to folding, in case > * the left-hand side is constant, and so that the decompiler produces > * the same string that you get from decompiling a script or function > * compiled from that same string. As with +, += is special. > */ > To answer this question, use cvs annotate and log: $ cvs annotate jsparse.c | grep 'Compound operators' Annotations for jsparse.c *************** 3.210 (brendan% 25-Aug-06): * Compound operators such as *= should be subject to folding, in case $ cvs log -r3.210 jsparse.c RCS file: /cvsroot/mozilla/js/src/jsparse.c,v Working file: jsparse.c head: 3.336 branch: locks: strict access list: keyword substitution: kv total revisions: 506; selected revisions: 1 description: ---------------------------- revision 3.210 date: 2006/08/25 18:01:35; author: brendan%mozilla.org; state: Exp; lines: +15 -0 Fold assign-ops harder, to uphold decompiler round-trip invariance (349663, r=shaver). ============================================================================= Then see the cited bug 349663. The reason is due to the strange nature of const in JS1.5: you can assign to a const once per evaluation of its initialiser, after which assignment ops targeting it must evaluate it and the right operand but not set the const. /be

x0

Reporter

Comment 60

•

17 years ago

Thanks for the explanation, I understand it now (although I do not understand why a const doesn't simply throw on re-assignment, but there are more specifications in JavaScript and ECMAScript that make little sense to me). Complex arithmetic calculations are common in spherical trigonometry. I'm developing a script that calculates map projections of the earth and displays them with SVG, with the user being able to change the view interactively. That was much slower in Mozilla than in Opera until recently, which caused me to file this bug. The script would use more than 2 registers, but I checked that I could rewrite almost all such expressions to be more stack efficient and using only 2 registers. For most of the script, 2 registers are enough anyways. So if there's a real cost for the 3rd register, then it's not worth it. Really complex scripts are rare (OTOH, they tend to be the performance critical ones). Not optimizing JSOP_STRICTEQ and JSOP_STRICTNE is ok, because floating point is much more likely to use relational than equality operators. It just seemed a little odd to me that they will be treated differently than their classical counterparts.

Igor Bukanov

Comment 61

•

17 years ago

(In reply to comment #60) >I'm > developing a script that calculates map projections of the earth and displays > them with SVG, with the user being able to change the view interactively. Does this involve calls to trigonometric functions from Math?

x0

Reporter

Comment 62

•

17 years ago

(In reply to comment #61) > (In reply to comment #60) > >I'm > > developing a script that calculates map projections of the earth and displays > > them with SVG, with the user being able to change the view interactively. > > Does this involve calls to trigonometric functions from Math? Yes, but their results are cached in local vars and reused many times. But the slow function calls affect overall performance nevertheless (of course I use |var sin= Math.sin| etc. to accelerate them, but function call overhead is still high (in other browsers too)).

Brendan Eich [:brendan]

Comment 63

•

17 years ago

(In reply to comment #60) > Thanks for the explanation, I understand it now (although I do not understand > why a const doesn't simply throw on re-assignment, but there are more > specifications in JavaScript and ECMAScript that make little sense to me). It's all water under the bridge, but ECMA-262 Edition 1 was 1997, based on JS1 in Netscape, beta'ed in 1995. No try/catch, so errors were fatal. Attempts to set ReadOnly properties resulted in uncatchable error in Netscape (IIRC, my memory is failing at this distance), but the ECMA working group felt that silence, or at most a boolean status result, were better outcomes. Hence the true or false result for the delete operator, as well as the result of an assignment to a ReadOnly property being the result of the right-hand side. > Complex arithmetic calculations are common in spherical trigonometry. I'm > developing a script that calculates map projections of the earth and displays > them with SVG, with the user being able to change the view interactively. That > was much slower in Mozilla than in Opera until recently, which caused me to > file this bug. The script would use more than 2 registers, but I checked that I > could rewrite almost all such expressions to be more stack efficient and using > only 2 registers. For most of the script, 2 registers are enough anyways. So if > there's a real cost for the 3rd register, then it's not worth it. Really > complex scripts are rare (OTOH, they tend to be the performance critical ones). Working on a new patch that puts the accumulators in the JSContext, and can have many more. > Not optimizing JSOP_STRICTEQ and JSOP_STRICTNE is ok, because floating point is > much more likely to use relational than equality operators. It just seemed a > little odd to me that they will be treated differently than their classical > counterparts. It's a matter of time -- the freeze for Firefox 3 beta 5 is upon us. /be

Mike Schroepfer

Comment 64

•

17 years ago

(In reply to comment #60) > Complex arithmetic calculations are common in spherical trigonometry. I'm > developing a script that calculates map projections of the earth and displays > them with SVG, with the user being able to change the view interactively. That > was much slower in Mozilla than in Opera until recently, which caused me to > file this bug. The script would use more than 2 registers, but I checked that I > could rewrite almost all such expressions to be more stack efficient and using > only 2 registers. For most of the script, 2 registers are enough anyways. So if > there's a real cost for the 3rd register, then it's not worth it. Really > complex scripts are rare (OTOH, they tend to be the performance critical ones). > Would you be willing to share a link to the app?

x0

Reporter

Comment 65

•

17 years ago

Attached image SVG with floating point intensive script (obsolete) — Details

(In reply to comment #64) > Would you be willing to share a link to the app? It's not ready yet, but the attached file is a mostly working example. It contours pure grid data first and projects it then (Lambert azimuthal equal-area). On click the map will be re-projected with a new center point. (Data is temperature trend of the last 30 years as calculated by NCEP/DOE reanalysis 2 (which is not reliable everywhere).)

x0

Reporter

Updated

•

17 years ago

Attachment #310406 - Attachment mime type: application/octet-stream → image/svg+xml

x0

Reporter

Comment 66

•

17 years ago

Attached image SVG with floating point intensive script — Details

Same file unzipped. Bugzilla doesn't seem to be able to handle compressed SVGs.

Attachment #310406 - Attachment is obsolete: true

Michael Moy

Comment 67

•

17 years ago

That page performs quite well on a 2.5 Ghz MacBook Pro on the trunk. It only takes a second to repaint the new globe. On Firefox 2.0.0.12 and a 2.8 Ghz AMD X2 machine running Windows, it takes almost six seconds to repaint after a click.

x0

Reporter

Comment 68

•

17 years ago

(In reply to comment #67) > That page performs quite well on a 2.5 Ghz MacBook Pro on the trunk. It only > takes a second to repaint the new globe. > > On Firefox 2.0.0.12 and a 2.8 Ghz AMD X2 machine running Windows, it takes > almost six seconds to repaint after a click. Yes, there was a huge improvement in performance somewhere between beta 3 and now. If I knew that before I started to search for possible improvements in the code, I probably had not filed this bug...

Mike Shaver (:shaver emeritus)

Comment 69

•

17 years ago

Glad you didn't find out, then. :)

x0

Reporter

Comment 70

•

17 years ago

I traced down when the improvement of projection calculation in the attached file happened. It was between the 2008-02-15 and 2008-02-16 nightlies, when the consumed time dropped from 3.42 s to 1.44 s on my machine. So it must have been bug 395993. A non-null this pointer has been passed to Math.sin etc. before, even when called as a local var. Since then, timings improved further, but in multiple smaller steps. 2008-02-26 was at 1.17 s, beta 4 at 0.97 s, and the current nightly is at 0.82 s. I haven't traced them down further.

Brendan Eich [:brendan]

Comment 71

•

17 years ago

I have a new patch coming some time in the next week. This won't make mozilla1.9 / firefox3 of course. /be

Target Milestone: mozilla1.9beta5 → ---

Damon Sicore (:damons)

Comment 72

•

17 years ago

getting this on the 1.9.1 triage queue.

Flags: blocking1.9.1?

Brendan Eich [:brendan]

Comment 73

•

17 years ago

Attached patch snapshot — Details — Splinter Review

Shaver kindly offered to take this one across the finish line. /be

Brendan Eich [:brendan]

Updated

•

17 years ago

Attachment #310190 - Attachment is obsolete: true

Brendan Eich [:brendan]

Updated

•

17 years ago

Assignee: brendan → shaver

Status: ASSIGNED → NEW

Mike Shaver (:shaver emeritus)

Comment 74

•

17 years ago

Comment on attachment 324346 [details] [diff] [review] snapshot >+#define OP_IN_RANGE(op,lo,hi) ((uintN)((op)-(lo)) <= (uintN)((hi)-(lo))) >+#define OP_IS_PURE(op) OP_IN_RANGE(op, JSOP_EQ, JSOP_BITNOT) >+#define OP_IS_ARITH(op) OP_IN_RANGE(op, JSOP_BITOR, JSOP_NEG) Maybe it's just my early morning brain, but won't OP_IN_RANGE report false positives for opcodes lower than lo? OP_IN_RANGE(3,5,10) 3 - 5 <= 10 - 5 -2 <= 5 ?

Igor Bukanov

Comment 75

•

17 years ago

(In reply to comment #74) > >+#define OP_IN_RANGE(op,lo,hi) ((uintN)((op)-(lo)) <= (uintN)((hi)-(lo))) ... > OP_IN_RANGE(3,5,10) > 3 - 5 <= 10 - 5 > -2 <= 5 Note the uintN part which gives: (uintN)(3 - 5) <= (uintN)(10 - 5) 2**32 - 2 <= 5 false

Robert Sayre

Updated

•

17 years ago

Flags: blocking1.9.1? → blocking1.9.1+

Robert Sayre

Comment 76

•

17 years ago

Igor is going to take this, aiui

Assignee: shaver → igor

Igor Bukanov

Comment 77

•

17 years ago

I focus on this bug from now on.

Igor Bukanov

Comment 78

•

17 years ago

(In reply to comment #73) > Created an attachment (id=324346) [details] > snapshot My current thinking is to use stack slots for double registers. That is, the code would reserve some space before vars to store the doubles eliminating the need for any special storage in JSStackFrame or JSContext for the price of an extra uint8 field counting the number of double registers in JSScript. But that comes for free as JSScript has unused uint8 field.

Depends on: 441686

Igor Bukanov

Comment 79

•

17 years ago

Attached patch using slots for dregs v1 (obsolete) — Details — Splinter Review

This patch is based on the patch from bug 441686 comment 18 and allocates the double registers between vars and stack slots. Except for these dregs allocation changes this is essentially the patch from the comment 73 ported to the trunk. The patch just compiles and I have not done any testing with it.

Igor Bukanov

Comment 80

•

17 years ago

Attached patch using slots for dregs v2 (obsolete) — Details — Splinter Review

work in progress

Attachment #329363 - Attachment is obsolete: true

Igor Bukanov

Comment 81

•

17 years ago

Attached patch using slots for dregs v3 (obsolete) — Details — Splinter Review

yet another incremental update that properly initializes the number of double registers in the script.

Igor Bukanov

Updated

•

17 years ago

Attachment #330875 - Attachment is obsolete: true

Igor Bukanov

Comment 82

•

17 years ago

Attached patch using slots for dregs v4 (obsolete) — Details — Splinter Review

Changes compared with the patch from the comment 73: 1. The double registers are stored in frame slots between function variables (or global/regexps in the script case) and the stack base. In STORE_DOUBLE macro the base for double registers is calculated from script->nfixed and fp->slots. Although it adds a few instructions, an alternative of caching dregsEnd_ from the macro on function entrance made things slower both on Pentium-M 1.1 GHz and Xeon 2.66 GHz (64 bit). 2. I added JSParseOptimizer struct as a convenient way to pass to the code generator the ndregs counter for the compiled functions.

Attachment #330882 - Attachment is obsolete: true

Attachment #330952 - Flags: review?(brendan)

Igor Bukanov

Comment 83

•

17 years ago

Attached patch using slots for dregs v5 (obsolete) — Details — Splinter Review

The new version of the patch bumps XDR version to reflect bytecode changes and makes JSOP_BITNOT to use dregs to optimize cases like x & ~y.

Attachment #331009 - Flags: review?(brendan)

Igor Bukanov

Updated

•

17 years ago

Attachment #330952 - Attachment is obsolete: true

Attachment #330952 - Flags: review?(brendan)

Igor Bukanov

Comment 84

•

17 years ago

Attached patch using slots for dregs v6 (obsolete) — Details — Splinter Review

The patch is a sync with the trunk to remove unrelated changes for editline makefile. I fixed that in the bug 447705.

Attachment #331069 - Flags: review?(brendan)

Igor Bukanov

Updated

•

17 years ago

Attachment #331009 - Attachment is obsolete: true

Attachment #331009 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 85

•

17 years ago

Comment on attachment 331069 [details] [diff] [review] using slots for dregs v6 Looks like globalUses and loopyGlobalUses crept back in. Since you are adding more uses, could you please rename JOF_2BYTE to JOF_UINT8? This will help tracemonkey merging too (from TM's jsopcode.h: #define JOF_UINT8 13 /* uint8 immediate, e.g. top 8 bits of 24-bit atom index */ use this for best merge unity ;-). What are the perf effects with the latest m-c shell builds and this patch? I will review and stamp promptly the next patch. Thanks again, sorry for delays. /be

Igor Bukanov

Comment 86

•

17 years ago

Comment on attachment 331069 [details] [diff] [review] using slots for dregs v6 New patch is comming

Attachment #331069 - Attachment is obsolete: true

Attachment #331069 - Flags: review?(brendan)

Robert Sayre

Updated

•

17 years ago

Priority: P1 → P2

Alfred Kayser

Comment 87

•

17 years ago

Any progress on this? This will probably also help to improve on the v8 benchmarks, as there is a lot of math in there.

Robert Sayre

Comment 88

•

17 years ago

going to have to unblock on this, given lack of activity. moving to WANTED.

Flags: blocking1.9.1+ → wanted1.9.1+

Igor Bukanov

Comment 89

•

17 years ago

This goes back to the pool.

Assignee: igor → general

Nochum Sossonko [:Natch]

Comment 90

•

16 years ago

This was a 1.9.1 blocker once, is this still an issue? Nominating for blocking-1.9.2 to get it back on the radar...

Flags: blocking1.9.2?

Robert Sayre

Updated

•

16 years ago

Flags: wanted1.9.2+

Flags: blocking1.9.2?

Flags: blocking1.9.2-

Luke Wagner [:luke]

Comment 91

•

13 years ago

fatvals made this not a problem

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → WORKSFORME

patch 17 years ago x0 33.71 KB, patch		Details \| Diff \| Splinter Review
testcase 17 years ago x0 754 bytes, text/html		Details
7.5% win for non-PGO Mac GCC sunspider 17 years ago Brendan Eich [:brendan] 43.35 KB, patch		Details \| Diff \| Splinter Review
picked a few nits 17 years ago Brendan Eich [:brendan] 43.39 KB, patch		Details \| Diff \| Splinter Review
fixed perf regression from first patch 17 years ago Brendan Eich [:brendan] 43.35 KB, patch		Details \| Diff \| Splinter Review
fix decompilation of op= (assignment ops) 17 years ago Brendan Eich [:brendan] 46.18 KB, patch		Details \| Diff \| Splinter Review
per Igor's review 17 years ago Brendan Eich [:brendan] 55.62 KB, patch		Details \| Diff \| Splinter Review
with that fixed 17 years ago Brendan Eich [:brendan] 55.79 KB, patch		Details \| Diff \| Splinter Review
tweak to regBusy so it's JSPackedBool[2], etc. 17 years ago Brendan Eich [:brendan] 55.78 KB, patch		Details \| Diff \| Splinter Review
workaround JSVAL_TAGMASK not 64-bit clean bug 17 years ago Brendan Eich [:brendan] 55.79 KB, patch		Details \| Diff \| Splinter Review
passes mochitests and js tests 17 years ago Brendan Eich [:brendan] 57.63 KB, patch		Details \| Diff \| Splinter Review
alternative patch, based on previous one 17 years ago x0 52.57 KB, patch		Details \| Diff \| Splinter Review
Perl script to generate random arithmetic js code 17 years ago x0 5.36 KB, text/plain		Details
proposed brendan+x0 love-child fix 17 years ago Brendan Eich [:brendan] 171.65 KB, patch		Details \| Diff \| Splinter Review
SVG with floating point intensive script 17 years ago x0 41.79 KB, image/svg+xml		Details
SVG with floating point intensive script 17 years ago x0 128.15 KB, image/svg+xml		Details
snapshot 17 years ago Brendan Eich [:brendan] 66.19 KB, patch		Details \| Diff \| Splinter Review
using slots for dregs v1 17 years ago Igor Bukanov 76.89 KB, patch		Details \| Diff \| Splinter Review
using slots for dregs v2 17 years ago Igor Bukanov 84.20 KB, patch		Details \| Diff \| Splinter Review
using slots for dregs v3 17 years ago Igor Bukanov 106.36 KB, patch		Details \| Diff \| Splinter Review
using slots for dregs v4 17 years ago Igor Bukanov 110.94 KB, patch		Details \| Diff \| Splinter Review
using slots for dregs v5 17 years ago Igor Bukanov 111.84 KB, patch		Details \| Diff \| Splinter Review
using slots for dregs v6 17 years ago Igor Bukanov 110.04 KB, patch		Details \| Diff \| Splinter Review