++ and -- cases in the interpreter generates too much code

VERIFIED FIXED

Status

()

Core
JavaScript Engine
--
enhancement
VERIFIED FIXED
10 years ago
10 years ago

People

(Reporter: Igor Bukanov, Assigned: Igor Bukanov)

Tracking

({memory-footprint})

Trunk
memory-footprint
Points:
---
Bug Flags:
in-testsuite +
in-litmus -

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 5 obsolete attachments)

(Assignee)

Description

10 years ago
Currently in the interpreter loop all the code implementing various pre- and post-increment/decrement bytecodes uses NONINT_INCREMENT_OP_MIDDLE macro to implement non-integer case of the operation. Since the macro contains a lot of code, its expansion at three places leads to bloat in js_Interpret just to support this the rarely executed case.

It would be nice to replace the macro by a function to shrink the code.
(Assignee)

Updated

10 years ago
No longer depends on: 418128
(Assignee)

Comment 1

10 years ago
Created attachment 304510 [details]
test case to check the correctness of ++/-- operations

The test cases covers most of ++/-- bytecodes to ensure the operation correctness. The only exception is fast global bytecodes.
(Assignee)

Comment 2

10 years ago
Created attachment 304578 [details] [diff] [review]
v1

The patch moves NONINT_INCREMENT_OP_MIDDLE into separated function NonIntIncDec and reorganizes fast ++/-- cases to share more code. 

With GCC 4.3 on Fedora-9 alpha the patch shrinks the size of jsinterp.o from 54756 to 52288 or by 2468 bytes (4.5%).
(Assignee)

Comment 3

10 years ago
To implement the bug 419632 the code would need stronger rooting. So I will update this patch so the new implementation of ++/-- can provide all the necessary strong roots for allocating the double numbers.
Blocks: 419632
(Assignee)

Comment 4

10 years ago
Created attachment 305838 [details] [diff] [review]
v2

the new version makes ++/-- code not to rely on weak rooting of numbers at all so that can be switched of. For that it was necessary to have extra 2 slots on the stack so the patch added JS_TMPSLOT2. The patch also changes the meaning of the extra slots to mean an extra space besides JSCodeSpec.uses that should exist on the stack. This is the reason the patch has added JS_TMPSLOT2 for ++name-like operations despite the fact that JSCodeSpec.uses is zero and JSCodeSpec.defs is one for them. The extra slot just makes sure that the space is available after nuses. If some of it will be used for defs, then fine.
Attachment #304578 - Attachment is obsolete: true
Attachment #305838 - Flags: review?(brendan)
Version: unspecified → Trunk
(Assignee)

Comment 5

10 years ago
Created attachment 307360 [details] [diff] [review]
v3

The new version re-syncs v2 with trunk changes.
Attachment #305838 - Attachment is obsolete: true
Attachment #307360 - Flags: review?(brendan)
Attachment #305838 - Flags: review?(brendan)
Comment on attachment 307360 [details] [diff] [review]
v3

>-    jsval *vp, lval, rval, ltmp, rtmp;
>+    jsval *vp, lval, rval, ltmp, rtmp, limit, delta, delta2;

Suggest putting novelties in blocks, if necessary blocks around numerous "cases" (computed goto target / computed goto opcode implementations). We have some evidence (IIRC you did) that top-level locals hurt optimization, and anyway these additions are a bit too special to deserve to take up such space at the top ;-).

Same for op2.

>+                    sp[-1] = v + delta;

Instead of delta, perhaps incr or adjust? Just a thought.

>+          finish_prop_incop:
>+            if (cs->nuses == 0) {
>+                /* sp[-1] already contains the result of name increment. */
>+            } else {
>+                rtmp = sp[-1];
>+                sp -= cs->nuses;
>+                sp[-1] = rtmp;
>+            }
>+            len = cs->length;
>             DO_NEXT_OP(len);

The vaguer convention for label name is end_*op and it can be a label at the very end, or at an epilog such as this. Perhaps finish_*op is better, but end_*op is the current convention. Your call.

>+            if (JS_LIKELY(JSVAL_IS_INT(rval) && rval != limit)) {

In the rest of the code, "max" is greatest possible value, limit is one beyond or fencepost. Suggest maxval rather than limit.

>+ * Find the results of incrementing or decrementing *vp when it is expected
>+ * that it will not fit JS int. On exit for pre-increments both vp and vp2

s/JS int/jsval int/

/be
(Assignee)

Comment 7

10 years ago
Created attachment 307433 [details] [diff] [review]
v4

The new version addresses the nits and restricts the range of fast inc/dec operations to [-2*29, 2**29), not (-2*30, 2**30). Such int jsvals do not require overflow checks and can be tested via:

#define CAN_DO_FAST_INC_DEC(v)     (((((v) << 1) ^ v) & 0x80000001) == 1)

With GCC it saved 97 bytes of code as the new checks avoids loading of max/min constants for overflow detection.
Attachment #307360 - Attachment is obsolete: true
Attachment #307433 - Flags: review?(brendan)
Attachment #307360 - Flags: review?(brendan)
(Assignee)

Comment 8

10 years ago
Created attachment 307453 [details] [diff] [review]
v5

The new version adds static asserts to verify that CAN_DO_FAST_INC_DEC(v) is false when v is INT_TO_JSVAL(JSVAL_INT_MIN|MAX) or JSVAL_VOID.
Attachment #307433 - Attachment is obsolete: true
Attachment #307453 - Flags: review?(brendan)
Attachment #307433 - Flags: review?(brendan)
Comment on attachment 307453 [details] [diff] [review]
v5

>+#define CAN_DO_FAST_INC_DEC(v)     (((((v) << 1) ^ v) & 0x80000001) == 1)

Good to know it not de-optimize any increment ops we care about in at least one popular benchmark ;-).

>+JS_STATIC_ASSERT(JSVAL_INT == 1);
>+JS_STATIC_ASSERT(!CAN_DO_FAST_INC_DEC(JSVAL_VOID));
>+JS_STATIC_ASSERT(!CAN_DO_FAST_INC_DEC(INT_TO_JSVAL(JSVAL_INT_MIN)));
>+JS_STATIC_ASSERT(!CAN_DO_FAST_INC_DEC(INT_TO_JSVAL(JSVAL_INT_MAX)));
>+JS_STATIC_ASSERT(!CAN_DO_FAST_INC_DEC(JSVAL_VOID));

Righteous.

>+ * Find the results of incrementing or decrementing *vp. On exit for
>+ * pre-increments both vp and vp2 will contain the result. For post-increments

"For pre-increments, both *vp and *vp2 will contain the result on return." etc.

Need caffeine and I'll do another review, just wanted to note this -- great patch AFAICT at this point. Let's optimize with r+ if it passes mochitest and js tests.

/be
Attachment #307453 - Flags: review?(brendan) → review+
(Assignee)

Comment 10

10 years ago
Created attachment 307501 [details] [diff] [review]
v6

The new version addresses the nits from the comment 9.
Attachment #307453 - Attachment is obsolete: true
(Assignee)

Comment 11

10 years ago
Comment on attachment 307501 [details] [diff] [review]
v6

The patch shrinks the interpreter by 2.5K and allows to implement faster path for allocating double values.
Attachment #307501 - Flags: review+
Attachment #307501 - Flags: approval1.9?
(Assignee)

Comment 12

10 years ago
The patch from the comment 10 passed mochi tests (Linux build) and the test suite in jsshell (thread-safe debug build).
Comment on attachment 307501 [details] [diff] [review]
v6

a1.9+=damons
Attachment #307501 - Flags: approval1.9? → approval1.9+
(Assignee)

Comment 14

10 years ago
I checked in the patch from comment 10 to the trunk:

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1204748522&maxdate=1204748925&who=igor%25mir2.org
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED

Updated

10 years ago
Keywords: footprint
/cvsroot/mozilla/js/tests/js1_7/regress/regress-418641.js,v  <--  regress-418641.js
initial revision: 1.1
Flags: in-testsuite+
Flags: in-litmus-
v 1.9.0
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.