598322 - Optimize String.charAt

Reporter

Description

•

14 years ago

Now that we have a good grip on String.charCodeAt (in bug #593383) we should start thinking about how to turn uses of String.charAt into uses of String.charCodeAt.  After all, most straightforward code will use the former, not the latter.

Here are some ideas.

A switch on String.charAt that compares the switch value to only characters (one-char strings) can be turned into a switch on String.charCodeAt that compares to character values.  It would be good for ASC to do this since it's a heavyweight optimization in the JIT, and ASC might have the early-binding info (since String is a final class).  Also, ASC might not compile the switch on a string value as a lookupswitch, but as an 'if' nest that the JIT would need to recover.

In general, though, comparing the String.charAt value to a single-character string can be optimized, probably - easier than the switch optimization and maybe a good start.

Werner Sharp

Assignee

Comment 1

•

14 years ago

Attached patch patch which does direct binding, plus can change to charCodeAt (obsolete) — Details — Splinter Review

Stashing this away for future work.  Depends on the charCodeAt bug patch to land.

Two parts: direct bind to optimized charAt routines instead of going through a thunk.

If we are in compare and have charAt on one side and a single character string on the other, switch to charCodeAt and an integer compare.   This is about 2x faster than calling charAt and doing the string.equals call.

A small brightspot analysis:

281 calling charAt with integer arg
263 calling charAt with integer const
234 calling charAt with float arg
47 calling charAt with unsigned arg

267 charAtI const compare which can convert to charCodeAt
94 charAtF const compare which can convert to charCodeAt

Werner Sharp

Assignee

Comment 2

•

14 years ago

Ignore the jit-calls.h from the last patch.  My hand editing of all unrelated changes was wrong here.  The new charAt code should be 

    PUREMETHOD(STRINGADDR(String::_charAt),  SIG2(P,P,I), String_charAtI)
    PUREMETHOD(STRINGADDR(String::AS3_charAt),  SIG2(P,P,F), String_charAtF)

Lars T Hansen

Reporter

Comment 3

•

14 years ago

(In reply to comment #1)
> Created attachment 477250 [details] [diff] [review]
> patch which does direct binding, plus can change to charCodeAt

Awesome.

> If we are in compare and have charAt on one side and a single character string
> on the other, switch to charCodeAt and an integer compare.   This is about 2x
> faster than calling charAt and doing the string.equals call.

Even when the character read from the string is outside the ASCII range?  IIRC we precompute strings for the ASCII chars, but out in the Unicode weeds we cons up a new string every time.

(Admittedly for most current scanners ASCII would be the main focus.)

Werner Sharp

Assignee

Comment 4

•

14 years ago

I am American.  All I think in is ASCII. :-)

I have not tested the high character patch specifically but it is undoubtedly much more than 2x faster.

Steven Johnson

Comment 5

•

14 years ago

(In reply to comment #4)
> I am American.  All I think in is ASCII. :-)

Tell that to French Canadians and Mexicans -- last I checked, they are in "America" too :-)

Andre Kruetzfeldt

Updated

•

14 years ago

Flags: flashplayer-qrb+

Werner Sharp

Assignee

Comment 6

•

14 years ago

Attached patch updated patch (obsolete) — Details — Splinter Review

Still a work in progress.  switch statement test case shows multiple calls because of late stage specialization.   We should only call charCodeAt once...

function testCharAtSwitch(s2:String):int
	{
	
		var c:Boolean = 0;
		for (var i:int = 1; i < 100000000; i++)
		{
			switch (s2.charAt(0))
			{
			case 'z':
				c = 1;
				break;
			case 'b':
				c = 2;
				break;
			case 'c':
				c = 4;
				break;
			case 'd':
				c = 5;
				break;
			case 'e':
				c = 8;
				break;
			default:
				c = 0;
			}
		}
		return c;
	
	}

Unicode test is 30x faster than older code.

Andre Kruetzfeldt

Updated

•

14 years ago

Assignee: nobody → wsharp

Werner Sharp

Assignee

Comment 7

•

14 years ago

Attached patch latest patch (obsolete) — Details — Splinter Review

1. Support direct binding to three charAt variants (int, uint, double)
2. Switch to charCodeAt when charAt is being compared with a single character constant string.
3. Track our specialized function and re-use old LIns* where possible.  (Avoids duplicate function calls in switches, etc.)
4. Faster _charAt functions (inline charAt, etc.)

Attachment #477250 - Attachment is obsolete: true

Attachment #480730 - Attachment is obsolete: true

Attachment #480914 - Flags: superreview?(edwsmith)

Attachment #480914 - Flags: review?(wmaddox)

Werner Sharp

Assignee

Comment 8

•

14 years ago

Attached patch add charAtU support to charCodeAt switch (obsolete) — Details — Splinter Review

Attachment #480914 - Attachment is obsolete: true

Attachment #480920 - Flags: superreview?(edwsmith)

Attachment #480920 - Flags: review?(wmaddox)

Attachment #480914 - Flags: superreview?(edwsmith)

Attachment #480914 - Flags: review?(wmaddox)

Werner Sharp

Assignee

Comment 9

•

14 years ago

Performance numbers for microbenchmarks.  (100 million loop in an isolated function)

With patch:

charcodeat switch elapsed time is 1130
charat switch elapsed time is 1056
charAt(0) == 'a' elapsed time is 449
charAt(0) == 'a' (unicode) elapsed time is 444
charAt(int) == 'a' elapsed time is 482
charAt(uint) == 'a' elapsed time is 482
charAt(Number) == 'a' elapsed time is 1019
s=charAt(int) elapsed time is 606
s=charAt(uint) elapsed time is 606
s=charAt(Number) elapsed time is 1165

Old code:

charcodeat switch elapsed time is 2452
charat switch elapsed time is 5662
charAt(0) == 'a' elapsed time is 1215
charAt(0) == 'a' (unicode) elapsed time is 15085
charAt(int) == 'a' elapsed time is 1922
charAt(uint) == 'a' elapsed time is 2826
charAt(Number) == 'a' elapsed time is 2595
s=charAt(int) elapsed time is 1105
s=charAt(uint) elapsed time is 2037
s=charAt(Number) elapsed time is 1835

William Maddox

Comment 10

•

14 years ago

Comment on attachment 480920 [details] [diff] [review]
add charAtU support to charCodeAt switch

Since getSpecializedFunc is essentially a global routine within the context of the CodegenLIR class, it would be best to drop the redundant explicit "this->".
There are cases where the redundancy usefully marks the the object as an important parameter, but, in this case, the dependence is pervasive and not noteworthy.

           LIns *priorFunc = this->getSpecializedFunc(arg);

If you factor out the code guarded by the "op == LIR_calld" test in CodegenLir::coerceNumberToInt(), you can cleanly separate the specialization logic itself from the boilerplate to reuse previous specializations.  Only one call to addSpecializedFunc() will be needed.  Same for CodegenLIR::optimizeStringCmpWithStringCall().

   bool CodegenLIR::specializeOneArgStringFunction(Traits *result,
        const CallInfo *ciInt, const CallInfo *ciUint, const CallInfo *ciNumber)

The function has nothing in particular to with strings.  It is suitable
for specializing any function that takes a single numeric argument, so
the name should probably reflect this.  Stylistically, lengthy argument lists are awkward, particularly when the actual arguments are likely to be lengthy as well.  The function is encapsulating a case analysis, and the fact that there's a parameter for every case is the underlying issue.  It's not entirely clear what can be done here, and there is precedent in CodegenLIR::cmpOptmization() for this sort of thing.  Still, it's worth seeking an alternative.  One possibility is to wrap up the CallInfos for the specializations in a struct, one for each function to specialize.  Then, the struct appropriate for the operation to be specialized would be passed as the argument.

Down the road, we will likely have some framework, e.g., ABC-level expression tree traversal, that will want to drive a specializer with both an expected result type (i.e., type coerced to) and an argument signature.  It makes sense to factor the specializer in terms of the shape of the calls being specialized rather than the semantic family relationships among the specialized functions.


CodegenLIR.cpp:  There are Windows-style line terminators here.

Edwin Smith

Comment 11

•

14 years ago

(In reply to comment #10)
> Down the road, we will likely have some framework, e.g., ABC-level expression
> tree traversal, that will want to drive a specializer with both an expected
> result type (i.e., type coerced to) and an argument signature.  It makes sense
> to factor the specializer in terms of the shape of the calls being specialized
> rather than the semantic family relationships among the specialized functions.

I think I agree, but you really lost me.  Could you give an example of what you mean by "semantic family relationships" and what the alternate way would be?

Werner Sharp

Assignee

Comment 12

•

14 years ago

Attached patch incorporate suggestions from Bill — Details — Splinter Review

Make changes suggested by Bill.

As the code in inlineBuiltinFunction progresses, in the near future it makes sense to switch to some sort of table driven format to match function signatures and perform the optimizations.  I will research this for future bugs in this area such as String.indexOf/lastIndexOf, etc.

Attachment #480920 - Attachment is obsolete: true

Attachment #483984 - Flags: superreview?(edwsmith)

Attachment #483984 - Flags: review?(wmaddox)

Attachment #480920 - Flags: superreview?(edwsmith)

Attachment #480920 - Flags: review?(wmaddox)

William Maddox

Comment 13

•

14 years ago

(In reply to comment #11)
> (In reply to comment #10)
> > Down the road, we will likely have some framework, e.g., ABC-level expression
> > tree traversal, that will want to drive a specializer with both an expected
> > result type (i.e., type coerced to) and an argument signature.  It makes sense
> > to factor the specializer in terms of the shape of the calls being specialized
> > rather than the semantic family relationships among the specialized functions.

One way to do a specialization is to recognize that we are calling a specific function, and then to examine its arguments and expected result (after coercion) in order to determine which specialization to generate.  The other way is to recognize the relevant space of result/argument contexts, and then select the appropriate code depending on the function called.  The first approach, in which we slice the code first along the lines of the function being specialized (i.e., the family of semantically-related specializations of the same function) seems a bit more natural an cleaner when there are only a few ad-hoc specializations.  As the number of specializations increases, however, we will end up repeating the same logic over and over again to detect the contexts.

During a tree walk, we can pass in a coercion target type as we recurse down the tree, or call an entirely different method for each coercion target.  Case analysis for the argument types can be a single decision tree.
The suggestion is to try to factor the code for what we are doing today in a manner that will more easily fit into such a framework.  This will be easier if the form of the call is determined prior to the identity of the function.

William Maddox

Comment 14

•

14 years ago

The comment above was intended to be a reply to comment #11.

Rick Reitmaier

Comment 15

•

14 years ago

(In reply to comment #13)
> (In reply to comment #11)
> > (In reply to comment #10)
> This will be easier if the form of the call is determined prior to the identity of the function.

I'm not sure why this would be a requirement, but I'll wait to see how this pans out.

Werner Sharp

Assignee

Comment 16

•

14 years ago

Our inputs in inlineBuiltinFunction are:

nativeId
argCount
argTypes[argCount]

Based upon a table approach, we would have an array with various properties:

nativeId
argCount
argTypes[argCount]
CallInfo * to use
FunctionHelper to use?

Perhaps a hash of the function signature will find a match in the bottom values (do we need a HashTable or can we get away with a linear array?   If we find a match, we use the callInfo instead or perhaps call a helper function to emit the correct code (in the case of isNaN for example).

Then everything is table driven and expansion is more or less adding an entry to the table.

If we are supporting 20ish native ids, I'm not sure we want to have the overhead of computing a hash everytime we are called for the 850+ builtins.  A pre-check of the nativeId might be warranted.

Table example:

{String_AS3_charCodeAt, 1, intArg, String_charCodeAtFI, 0}
{String_AS3_charCodeAt, 1, uintArg, String_charCodeAtFU, 0}
{String_AS3_charCodeAt, 1, numArg, String_charCodeAtFF, 0}

If reviewers think this is important at this stage, I can work on implementing this.  With only three functions handled currently, I feel its a bit premature to worry about this level of complexity.  But future work will include various Math calls and more String calls and switching to a table approach would make sense.

William Maddox

Comment 17

•

14 years ago

(In reply to comment #16)

> If reviewers think this is important at this stage, I can work on implementing
> this.  With only three functions handled currently, I feel its a bit premature
> to worry about this level of complexity.

Agreed. For one thing, we currently specialize only on argument type, or only on coercion context, but not both at the same time.  (We can achieve the effect of such a specialization, but it has to be written as two separate ones, in which a call generated by one specialization is again rewritten.)  We also currently lack complete ABC-level type information in some cases, but must rely on what can be inferred from the LIR types.  The investment in a table driven scheme will be better justified when need no longer hack around these deficiencies, but have an ABC-level IR.

I am inclined to backpedal a bit now on my earlier claims in comment #13 as to the manner in which the specialization case analysis should be organized.  It seems a natural approach for numeric specialization, where almost every <expected result type, arg1 type, arg2 type> triple is an interesting case for specialization.  Looking at the the charAt/charCodeAt specializations, however, we see that the context can be rather specific to a particular function -- for example, charAt() calls compared to an integer.  We don't want to check for these very idiosyncratic scenarios before we know they will be relevant to the function at hand.  In a table-driven scheme, we'd need some provision to attach an ad-hoc procedural escape, either in performing the transformation or in testing the context for its applicability.  I gather this is what Werner is after with FunctionHelper.

William Maddox

Comment 18

•

14 years ago

Attached patch Patch refactored for somewhat greater reusability, more descriptiive names — Details — Splinter Review

Here is my take on making this code seem a bit more like a foundation to build on than an spot-fix for the charAt/charCodeAt case at hand.  The main substantive change is that specializeFunction() does not presume to know a single mapping from original function to specialization, but allows each call site to use a different one appropriate to the context. I've changed some names to be more descriptive, and hopefully they suggest a pattern to follow as we extend the set of specializations.  The fundamental division of labor is unchanged, and a single specialization takes into account either the context of use or the (pre-coercion) argument types, but not both.  Function arity remains "baked in", as we are only specializing unary functions at present, with the expectation that we'd add a binary and trinary case at the most.

William Maddox

Comment 19

•

14 years ago

(In reply to comment #18)
> Created attachment 484598 [details] [diff] [review]
> Patch refactored for somewhat greater reusability, more descriptiive names

One thing that I don't like about this patch is that, while I could rely on the return value from specializeIntCall() to determine if a specialization is applicable,  as in coerceNumberToInt(), there's enough crud that comes before the call in optimizeIntCmpWithNumberCall() and optimizeStringCmpWithStringCall() that I keep the explicit guards for specializable functions, e.g.:

    static Specialization stringCmpWithString[] = {
        { FUNCTIONID(String_charAtI),    FUNCTIONID(String_charCodeAtII) },
        { FUNCTIONID(String_charAtU),    FUNCTIONID(String_charCodeAtIU) },
        { FUNCTIONID(String_charAtF),    FUNCTIONID(String_charCodeAtIF) },
        { 0, 0 }
    };

    // ...

    if (ci == FUNCTIONID(String_charAtI) || 
        ci == FUNCTIONID(String_charAtU) || 
        ci == FUNCTIONID(String_charAtF)) {

    // ... lots of code ...

    callSide = specializeIntCall(callSide, stringCmpWithString);
    AvmAssert(callSide != NULL);

    // ...

        return binaryIns(icmp, strSide, callSide);

    // ...

It would be much cleaner if 'stringCmpWithString' controlled both the choice to analyze the context further as well as the actual specialization.  The easiest way to do this would be to split specializeIntCall() into two parts.  The first would simply search for an appropriate entry in the Specializer vector and return it (or NULL, if not applicable).  The second would actually rewrite the call.

Werner Sharp

Assignee

Comment 20

•

14 years ago

You patch is fine by me (as are my earlier variants of course) but I do think we are worrying a bit too much about generalizing this logic when there are very few functions that this will work for.  I can't think of one function beyond charCodeAt (returning a Number that is really an int 99.9% of time) and charAt (where single character compares are common) that will benefit from this logic.  Yes, inlineBuiltinFunction will need to handle 20-30 functions at some point and we need some efficient logic there but name me one other function that returns a Number that is really an integer that we want to specialize to a faster integer variant?  

For FunctionHelper, I'm envisioning specific routines to emit code for something like Math.min (integer, integer)...

(bogus pseudocode from code in bug 588922)	
CodegenLIR::emitMathMinInteger()
{
    x = x->oprnd1();
    y = y->oprnd1();
    LIns *s1 = binaryIns(LIR_lti, x, y);
    LIns *s2 = lirout->insChoose(s1, x, y, use_cmov);
    // the specializer will remove a d2i/i2d roundtrip
    localSet(op1-2, i2dIns(s2), result);
}

It's table entry would be avmplus::NativeID::Math_private__min, 2 args, integer args, and a function ptr to the above function.

William Maddox

Comment 21

•

14 years ago

(In reply to comment #20)
> You patch is fine by me (as are my earlier variants of course) but I do think
> we are worrying a bit too much about generalizing this logic when there are
> very few functions that this will work for.  I can't think of one function
> beyond charCodeAt (returning a Number that is really an int 99.9% of time) and
> charAt (where single character compares are common) that will benefit from this
> logic.  Yes, inlineBuiltinFunction will need to handle 20-30 functions at some
> point and we need some efficient logic there but name me one other function
> that returns a Number that is really an integer that we want to specialize to a
> faster integer variant?

Some of the cases you specialize are rather idiosyncratic -- e.g. in a relational comparison with an integer.  I presumed, however, that *any* function returning a number could be specialized in a context where the result is coerced to integer.  In principle, such opportunities to fuse an operation and a subsequent coercion appear frequently, though Number->int coercions
stand out as particularly costly and profitable to optimize.
Are there really no other numeric functions worth optimizing? (This is a serious question -- you've been looking at the brightspot data.) 

Othewise, I don't really have any objections.  I was encouraging you to think a bit more generally, because it looked to me like you were producing an ad-hoc spot-fix for a single family of builtins without thinking ahead to the impact of the next N such fixes.  This sort of specialization looks like a pattern where you just turn the crank and keep iterating through dozens of functions. If you've thought it through further than I have, then that's fine.

Since my concerns are entirely matters of architecture and style, rather than correctess, I'll go ahead and R+ you patch, and leave these concerns for SR.

William Maddox

Updated

•

14 years ago

Attachment #483984 - Flags: review?(wmaddox) → review+

Edwin Smith

Updated

•

14 years ago

Blocks: 606956

Werner Sharp

Assignee

Comment 22

•

14 years ago

Bug 606561 has some future work to use a table driven approach to inlineBuiltinFunction

Edwin Smith

Comment 23

•

14 years ago

Comment on attachment 483984 [details] [diff] [review]
incorporate suggestions from Bill

This is all complicated enough that I'm only reviewing for code shape.  The logic looks sound.  

Frankly I don't think we need to go bikeshedding on how to predict the best pattern for choosing functions.  I care that the code is readable and that we have a naming convention for helpers, but its okay to refactor the way specializers are chosen and optimized, as we add more cases.

Tests that cover then new jit code (100% line coverage for new code), and new String helper functions, should go along with this patch.

nits:
 * spaces between func name and ( in getSpecializedFunc, specializeFunction, addSpecializedFunc, in CodegenLIR.cpp/h, maybe more cases.  grep for /[a-z][A-Z] (/

 * missing indentation near CodegenLIR.cpp:3797

Comments on new functions (what is this function for?) please.

Attachment #483984 - Flags: superreview?(edwsmith) → superreview+

Werner Sharp

Assignee

Comment 24

•

14 years ago

Attached file testcase — Details

Werner Sharp

Assignee

Updated

•

14 years ago

Attachment #486404 - Attachment mime type: application/octet-stream → text/plain

Werner Sharp

Assignee

Comment 25

•

14 years ago

http://hg.mozilla.org/tamarin-redux/rev/6b65d845ad7d

Patch has some of Bill's code and some of mine.   The specializeOneArgFunction function will be removed when the code pattern described in 606561 lands.

Werner Sharp

Assignee

Updated

•

14 years ago

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

patch which does direct binding, plus can change to charCodeAt 14 years ago Werner Sharp 6.38 KB, patch		Details \| Diff \| Splinter Review
updated patch 14 years ago Werner Sharp 10.44 KB, patch		Details \| Diff \| Splinter Review
latest patch 14 years ago Werner Sharp 22.28 KB, patch		Details \| Diff \| Splinter Review
add charAtU support to charCodeAt switch 14 years ago Werner Sharp 22.55 KB, patch		Details \| Diff \| Splinter Review
incorporate suggestions from Bill 14 years ago Werner Sharp 21.71 KB, patch	wmaddox : review+ edwsmith : superreview+	Details \| Diff \| Splinter Review
Patch refactored for somewhat greater reusability, more descriptiive names 14 years ago William Maddox 26.10 KB, patch		Details \| Diff \| Splinter Review
testcase 14 years ago Werner Sharp 16.25 KB, text/plain		Details