Closed Bug 256728 Opened 21 years ago Closed 21 years ago

Suggestion: eliminating TSF_REGEXP

Categories

(Core :: JavaScript Engine, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla1.8alpha4

People

(Reporter: igor, Assigned: brendan)

Details

(Keywords: js1.5)

Attachments

(1 file, 1 obsolete file)

To distinguish between / denoting division and regexp start jsparse.c instructs the scanner via special TSF_REGEXP flag to mark the regexp context for the scanner. The scanner then uses this flag during scanning of /. But this leads to numerous flag set/clear: currently there are 15 such pairs which are executed even if there are no regexp in the source. To eliminate this flag set/clear I suggest to always scan / as devision and then during parsing of primary expression treat / as regexp start in the literal content and read the rest of regexp there.
Attached patch Implementation (obsolete) — Splinter Review
The patch moves regexp scanning to the separated js_ReadRegExpToken function and then changes PrimaryExpr to call the function after it gets /. The only complication here is that the scanner will /= read as a special token and in that case js_ReadRegExpToken adds the initial = to the buffer. The the patch removes 15 pairs of TSF_REGEXP set/clear.
Sorry, this is going to be WONTFIXED. The TSF_REGEXP flag is being generalized into a TSF_OPERAND flag, and used for E4X and possibly operator precedence unary +/- disambiguation _a la_ Narcissus (js/narcissus/jsparse.js) changes that are coming. It's true that it need not be set and cleared so often. That could be what this bug could stand for, but I don't need this bug to make that change. Igor, are you ok with WONTFIX? /be
WONTFIX is OK but it was exactly E4X implementation in Rhino that triggered that. Rhino also had TSF_REGEXP but after merging E4X patch from Bea/AgileDelta folks which always scanned < as TOK_LT and then changed that to XML literal in the unary expression context I realized that doing the same trick with / would save a few lines of code. And it did.
The reason I prefer a scanner flag (or several, in the case of E4X -- the work in progress patch in bug 256691 adds TSF_XMLTAGMODE, TSF_XMLTEXTMODE, and TSF_XMLONLYMODE -- the last is for pure XML 1.0 parsing without E4X's embedded {expressions}) is that it means you get a whole token from the scanner, and you can continue to combine DIVOPs (distinguished by pn_op; *, / and % are divops). Another reason is that <, <?, </, and <! (or perhaps <!-- and <![CDATA[) are all distinct tokens, with spaces not allowed between parts, so either the scanner must expose its GetChar primitive to the parser, or the parser must call into alternative entry points than the js_GetToken one -- or just use flag bits, as I'm doing. For the operator precedence parser in Narcissus, there are five places that must set t.scanOperand = false, and five corresponding t.scanOperand = true settings. The savings in stack space and runtime (if optimized fully, e.g. by hand in a C translation in SpiderMonkey) motivate this approach. Operator precedence parsing requires scanning unary +/- as distinct tokens, so requires an operand vs. operator mode-flag. I'll keep this bug open to minimize the bit-twiddling in a patch that can be landed ahead of the E4X patch, to make shaver's life easier. /be
Assignee: general → brendan
> The reason I prefer a scanner flag (or several, in the case of E4X -- the work > in progress patch in bug 256691 adds TSF_XMLTAGMODE, TSF_XMLTEXTMODE, and .... Oops, I meant bug 246441. /be
Status: NEW → ASSIGNED
Keywords: js1.5
OS: Linux → All
Priority: -- → P2
Hardware: PC → All
Target Milestone: --- → mozilla1.8alpha4
> can continue to combine DIVOPs (distinguished by pn_op; *, / and % are divops). * is TOK_STAR, of course. The point is not specific to TOK_DIVOP -- TOK_UNARYOP and other cases combine tokens that have the same precedence and associativity, to reduce the parser size. /be
So this patch actually increases net code size: 28c28 < 23427 8 0 23435 5b8b Linux_All_OPT.OBJ/jsparse.o --- > 23419 8 0 23427 5b83 Linux_All_OPT.OBJ/jsparse.o 31c31 < 11029 960 0 11989 2ed5 Linux_All_OPT.OBJ/jsscan.o --- > 11189 960 0 12149 2f75 Linux_All_OPT.OBJ/jsscan.o This is the standalone, optimized (gcc 3.3.2 -O2) build, unpatched trunk on left and with patch on right. The parser got slightly smaller, but the scanner got a bit bigger from the alternative entry point. Although the bit-twiddling may seem ugly (at first ;-), code sizes trump aesthetics if the two are in conflict (not sure they are, here). I'll put the TSF_OPERAND patch here next. /be
In Java the case was the opposite: the overhead of new function was buried under saving from eliminations of byte on/off. BTW, if js_GetToken would be replaced by the pair: JSTokenType GetTokenImpl(JSContext *cx, JSTokenStream *ts) { Current code with RETURN(x) replaced by return x } JSTokenType js_GetToken(JSContext *cx, JSTokenStream *ts) { JSTokenType tt; JSToken *tp; tt = GetTokenImpl(cx, ts); tp = &CURRENT_TOKEN(ts); Expansion of the current RETURN if (tt == TOK_ERROR) ts->flags |= TSF_ERROR; tp->pos.end.index = ts->linepos + (ts->linebuf.ptr - ts->linebuf.base) - ts->ungetpos; return (tp->type = tt); } then jsscan.o build with gcc 3.3.3 -O2 would shrink by 396 bytes from 17604 to 17208 or 3.5% ...
Igor, see the e4x patch in progress in bug 246441 -- it gets rid of the over-aggressive use of macros in js_GetToken. Bogus stuff especially on modern super-scalar architectures, based at one point in 1997 or so, on profile data gathered on an SGI Indy(!). /be
And saves 628 bytes: 31c31 < 11029 960 0 11989 2ed5 Linux_All_OPT.OBJ/jsscan.o --- > 10401 960 0 11361 2c61 Linux_All_OPT.OBJ/jsscan.o /be
Attachment #156894 - Attachment is obsolete: true
Attachment #156936 - Flags: review?(shaver)
Comment on attachment 156936 [details] [diff] [review] proposed fix, paves the way for e4x patch OK, anything to reduce the review-pain of E4X!
Attachment #156936 - Flags: review?(shaver) → review+
Patch checked in. /be
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: