Closed Bug 460904 Opened 11 years ago Closed 9 years ago

TM: We don't trace regexp replace with lambda functions

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED WONTFIX
Tracking Status
status1.9.2 --- beta1-fixed

People

(Reporter: gal, Assigned: luke)

References

Details

Attachments

(4 files, 11 obsolete files)

4.05 KB, application/x-javascript
Details
25.95 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
58.81 KB, patch
jorendorff
: review+
Details | Diff | Splinter Review
125.09 KB, patch
Details | Diff | Splinter Review
We can't call the lambda function from the regexp engine when on trace.

Instead, we should add two hidden natives. One that generates an array with match information (coordinates and string value), and another one that does the replace operation. The second native could maybe also be done in script. 

When on trace, call a script that uses these native primitives but a script for-in loop to call the lambda function instead of calling the fully native function that would try to call the lambda function from native code.
Note: this blocks tracing string-unpack-code.
Part 1 of the patch. match2 of course will need to be hidden probably and we need some scripted code to go along with this to actually implement replace.
Assignee: general → gal
Status: NEW → ASSIGNED
Attachment #345007 - Attachment is patch: true
Attachment #345007 - Attachment mime type: application/octet-stream → text/plain
Attached file replace implementation in script (obsolete) —
Attached patch patch (obsolete) — Splinter Review
Attachment #345007 - Attachment is obsolete: true
Attachment #345019 - Attachment is patch: false
Brendan, do you think this is the right approach? (I am doing the wrapper script right now, its very straight forward).
Attachment #346994 - Attachment is obsolete: true
Attachment #354543 - Flags: review?(brendan)
Attached patch v2 (obsolete) — Splinter Review
Attachment #354543 - Attachment is obsolete: true
Attachment #354543 - Flags: review?(brendan)
Attachment #354544 - Flags: review?(brendan)
#undef PUSH_RESULT before final return from fill_result_array.

Use OBJ_DEFINE_PROPERTY to avoid proto-setter hazards.

The comment bitching about crufty in/out gcroot+result parameter hackage could be replaced by cleaning up the internal APIs to propagate match/mismatch result in a separate JSBool out param. Why not now?

Looking forward to consolidated patch as discussed on IRC. Approach seems good.

/be
Attached patch patch (obsolete) — Splinter Review
Attachment #345019 - Attachment is obsolete: true
Attachment #354544 - Attachment is obsolete: true
Attachment #354544 - Flags: review?(brendan)
Attached patch patch (obsolete) — Splinter Review
This doesn't actually trace yet (aborts with callee is not an object). Probably calling replace on a primitive string aborts us.
Attachment #385251 - Attachment is obsolete: true
Depends on: 412571
With 502145 this patch works and stays on trace. We might still be slower than necessary until 412571 is fixed since we have to shrink-wrap the string into a String object at the moment.
Depends on: 502145
I'm interested in testing this fix to see if it helps us stay on trace in our JS code. Is there a version of this patch available that applies against a current revision of TM?
Attached patch Semi-incomplete patch (obsolete) — Splinter Review
This patch will apply against r31053. Getting it to build is another story. The build process appears to require a new function in the js shell that is implemented by a portion of this patch, so my strategy was to first apply the portion of the patch that updated the shell to include the encode() function, and use that to manually generate the required .c.out files. This required some makefile hackage that I've omitted here in the patch since it was sort of a dynamic process to get it to work. Once I got it to build, I found that the traces would abort with the "callee is not an object" message. I've uploaded my testcase and my version of patch; hopefully it will be helpful.
Reassigned.  After talking a bit with Brendan, I'm thinking about changing the strategy a wee bit and I'd be happy to get input from all CC'd.

Currently, the patch does trickery swapping in the self-hosted String.replace at CALLPROP-time.  The self-hosted version calls the native code when there are no lambdas.  This will slow down all non-JIT uses of String.replace.  The new strategy is:
- When interpreting and not recording, don't do anything different.
- When recording, call str_replace in JSOP_CALL.
- If the replace string is a lambda, str_replace returns 'false' and sets a new jsval field 'selfHosted' in JSContext to the self-hosted JSFunction* (lazily created/cached the same way as in the current patch).
- The meaning of 'selfHosted' is "if a fast native fails and selfHosted is != JSVAL_NULL, replace the old function jsval on the stack with selfHosted and then act like an interpreted JSOP_CALL.
- The self-hosted 'replace' calls a new String member, 'matchAll', that returns an array-of-arrays and then does the lambda-replacement in JS (like the previous patch).
- In the recorder (before all the above happens), we generate either a call to the native str_replace (if !lambda) or a call to the self-hosted version (if lambda).  This can be done generically by having all fast-natives that possibly use JSContext::selfHosted supplying a simple decision procedure "should I self-host?" that the tracer calls in TraceRecorder::callFunction.

Comments?
Assignee: gal → lw
The main problem I had with the patch is the primitive this issue. If we just implement the native function as scripted code and leave the string as this, we have to teach the interpreter to tolerate primitive this values, which is very much not trivial. I was thinking changing the semantics of the self hosted functions and hand in the host object as this (the host object is a magic object that contains functions like matchAll), and the string becomes a regular argument. That avoids the whole 'this' headache. As for #14, it sounds about as good or bad as the previous approach. I have no objects if its easier to implement.
The new approach seems strictly better than the existing patch's approach in that it should not regress interpreter-only perf, which we still must worry about when tracing misfires.

Formalizing the cx->selfHosted setting + tracing decision alg a little bit seems good. Could do after this bug is patched, on the second instance, or now if it's clear how to formalize. JSTraceableNative extension?

/be
(In reply to comment #16)
> JSTraceableNative extension?

That would be an ideal way to map from fast native -> fast native's self-hosting decision procedure.  I'm thinking just store the function pointer or NULL.
Why are we exposing a new non-standard method?  It should be possible to use it without exposing it (except perhaps via Function.caller, but that's non-standard already).  Are we trying to get buy-in from other vendors on this at all?
(In reply to comment #18)
> Why are we exposing a new non-standard method?

Because (a) matchAll is generally useful and (b) it's easier to share than keep private to the implementation.

> Are we trying to get buy-in from other vendors on this at all?

Yes, I am going to -- in due course.

/be
The current plan, as I understand it, is to make matchAll DontDelete|ReadOnly to make it easy to call it.  This is different from what most (all?) other ES-native function properties have, and I expect you will get pushback on this when you try to sell it to other vendors, under the water-under-the-bridge theory -- and I'm not sure I'd necessarily disagree with them.
Good point -- I had assumed matchAll would have {DontEnum} as its attributes.

But then you can mess with replace's behavior.

One way out is to make the original value of String.prototype.matchAll available as an rt->builtinFunctions[i] element, but then you'd need to write an imacro or otherwise generate JSOP_CALLBUILTIN by hand to invoke it.

/be
If we precompile the self-hosted replace in a special global object, it can have matchAll as a private helper, even on String.prototype. More developer work, code size, and runtime startup cost, though.

/be
Unrelated to the current discussion, replace_glob is doing the same realloc-every-time dance as Array.join used to.  There is a lot more computation, so the overall speedup isn't as dramatic.  Swapping in JSTempVector and measuring with varying sizes of strings, regexps, replacement, and other parameters, I'm able to get 5% to 50% speedup.  Perhaps I should land this as a separate patch.
(In reply to comment #23)
> Swapping in JSTempVector and measuring with varying sizes of strings,
> regexps, replacement, and other parameters, I'm able to get 5% to 50% speedup.

Great!

> Perhaps I should land this as a separate patch.

Go for it!

Talking to Waldo, a special global object and a root context for bootstrapping seem like useful things as we self-host more. We'll see a hit in benchmark perf unless we initialize lazily, and then we'll see a hit anyway in all likelihood (if the lambda replace first-time costs are big enough relative to the runtime of the benchmark excluding those costs).

XDR or better loom beyond this point.

I wonder if we shouldn't beef up imacros (loops, or backward jumps) and not yet self-host.

/be

/be
(In reply to comment #23)
New bug 509725
So, if you follow the String.replace path, there is this cute case where, at runtime, match_or_replace looks at the bytecode stream and if it sees that the array returned by 'match' isn't needed, it just does a test.  gal has a 15 line comment complaining about it.  So, my question is: is this an important optimization in real life?  I removed it and v8 and sunsider show no significant change.  If it is important, I think this optimization belongs in codegen, which would make it faster still.

If no objections, I have a patch that cleans up the whole area in preparation for the self-hosted String.replace.
For the benefit of future generations, here is what I was complaining about:

+    /*
+     * This is a good example of the over-micro-optimization of the engine
+     * that causes really weird corner cases and hard-to-understand states.
+     * A single match is handled separately below, instead of directly
+     * being registered (via find_replen or fill_result_array) from within
+     * the matching loop. *vp is used to root the match object, but
+     * match_or_replace returns JSVAL_TRUE to indicate that at least one
+     * match was found. Leading up to that function call we use *vp to
+     * protect the result object from the GC. ExecuteRegExp called from
+     * within match_or_replace overwrites *vp with JSVAL_TRUE, so we have to
+     * put the result object back here and save the original matchval to be
+     * used below to decide whether a match occured. As of the time of this
+     * checkin I verified by hand that the path between ExecuteRegExp and
+     * this point does not trigger the GC, but this is going to break at
+     * some point in the future for sure. Its really time to re-write some
+     * of the horrible cruft regexp has accumulated.
+     */
Attached patch simplify/clean String regexp ops (obsolete) — Splinter Review
This patch goes over the patch in bug 509826.

The patch removes the strange case warranting Andreas's comment.  It turns out, when I got to understand it, that the next-bytecode testing was unrelated.  However, if it doesn't help benchmarketing, and normal web devs know to use 'test', it seems like unnecessary complexity.  If it does turn out to be wanted, I think we could build a slightly more general solution where an operation can ask "how will the value I am producing be used?".  It seems like many operations could use this information to optimize.

Other changes include naming/style updates.  "Glob" was weird.
Yeah, rip out that old bytecode-inspection code.

We need a metabug to track concrete bugs to remove *all* bytecode inspection. Jason, did you file any such meta or nonmeta bugs?

/be
Filed bug 510058 - Meta: Remove all bytecode inspection.
With the above weirdness removed, I realized there was a still a lot of convoluted logic which was further confounded by adding matchAll.  So, this patch is a fairly substantial refactoring of the logic.  After that, adding matchAll was easy.  Currently matchAll is affixed to String, but that is completely temporary.

For now, I'm going to start adding the feature described in bug 485949 comment 13, as an independent patch, so that the new self-hosted String.prototype.replace can call the matchAll in this patch directly without going through name lookup madness.
Attachment #394112 - Attachment is obsolete: true
Blocks: 511777
I would like to commit this sooner because it will be a while before the rest of the self-hosted String.replace stuff can commit and its a burden to rebase this patch.
Attachment #394978 - Attachment is obsolete: true
Attachment #395889 - Flags: review?(jwalden+bmo)
Oops, s/StringRegExpOp/DoMatch in the comments in that patch.
Attached patch WIP, for comment (obsolete) — Splinter Review
This patch adds the ability to write:

"aaabcdaabcd".#builtin#StringMatchAll(/(a+)b/g);

which returns [["aaab","aaa"],["aab","aa"]].  This only compiles using the js/shell with the new -h flag ('h' for self 'h'osted).

I'm posting this now for any comments on my scanner/parser/emitter hacks.

Next up is the actual self-hosting feature (and answering some of the questions in TODO comments).
Attachment #385494 - Attachment is obsolete: true
Attachment #392955 - Attachment is obsolete: true
Comment on attachment 395889 [details] [diff] [review]
refactor String regexp ops in preparation for matchAll

We could use some inline isGlobal and such methods in JSRegExp sometime, rather than this pick-a-flag-out-of-the-bitset non-obviousness...


>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp
>+    /*
>+     * For ECMA Edition 3, the first argument is to be converted to a string to
>+     * match in a "flat" sense (without regular expression metachars having
>+     * special meanings) UNLESS the first arg is a RegExp object.
>+     */
>+    bool
>+    initFromArgs(JSContext *cx, uintN optarg, bool flat, uintN argc, jsval *vp)

This could use some description of the meanings of the arguments, optarg and flat most particularly.  The latter is semi-implicitly-documented, the former not at all.

Please add a comment by the optarg < argc case noting that interpreting the optional argument as specifying flags is extension-behavior not codified in any version of ECMAScript.

I cut out the guts before noticing this, but in the flat case with an object argument the GC-safety of ArgToRootedString depends on JSObjectOps.defaultValue not overwriting *vp early.  File a bug on addressing this with a tvr, or on making documentation explicitly note this defaultValue hazard?


>+        mCx = cx;

Do this at the top of the method, please.  For a long time reading this I thought this testcase might crash, until I remembered mRe is zeroed at construction time.

  "funky".match({ toString: function() { throw 17; } });


>+Matched(bool test, jsval v)
>+{
>+    return (test && v == JSVAL_TRUE) || (!test && !JSVAL_IS_NULL(v));
>+}

Equivalent and more readable:

  return test ? v == JSVAL_TRUE : !JSVAL_IS_NULL(v);


>+/* Keep the logic for the last three arguments to DoMatch in one place. */
>+#define MATCH_ARGS    true,  false, false
>+#define MATCHALL_ARGS false, false, true
>+#define REPLACE_ARGS  true,  true,  true

I'd prefer a composed bitfield of enumerated values, please.


>+/* Factor out looping and matching logic. */
>+static bool
>+DoMatch(JSContext *cx, jsval *vp, JSString *str, const RegExpGuard &g,
>+        DoMatchCallback callback, void *data,
>+        bool testGlobal, bool testSingle, bool callbackOnSingle)
>+{
>+    if (g.re()->flags & JSREG_GLOB) {
>+        /* global matching ('g') */
>+        if (g.reobj() && !js_SetLastIndex(cx, g.reobj(), 0))
>+            return false;
>+        for (size_t ct = 0, i = 0, length = str->length(); i <= length; ++ct) {

Mr idtmc nd lss trs nm wld b "count".


>+    JS_ASSERT(count <= JSVAL_INT_MAX);
>     return arrayobj->setProperty(cx, INT_TO_JSID(count), &v);

This is that issue you mentioned earlier, right?  Were we going to address this in followup territory?  Refresh my memory.
Attachment #395889 - Flags: review?(jwalden+bmo) → review+
(In reply to comment #35)
> Mr idtmc nd lss trs nm wld b "count".

s/tm/mt/ f crs!
(In reply to comment #35)
> I cut out the guts before noticing this, but in the flat case with an object
> argument the GC-safety of ArgToRootedString depends on JSObjectOps.defaultValue
> not overwriting *vp early.

Since vp is an in-out parameter it is up to defaultValue implementations to root a second value if necessary. This should not be something defaultValue callers have to consider.

But since js_DefaultValue carefully sets *vp only on the way out, it's not a bug. A comment about this for js_DefaultValue would be fine.

/be
Someone (Vlad IIRC) was blanching at #builtin#. Maybe we should use something prettier.

If we used builtin:: then the threat would be that an attacker could run our code in their environment with a fake builtin namespace, but (a) so what? and (b) let's defeat that by making the builtin:: namespace be compile-time and unshadowable in code that compiles with support for it.

/be
(In reply to comment #38)
> Someone (Vlad IIRC) was blanching at #builtin#. Maybe we should use something
> prettier.
> 
> If we used builtin:: then the threat would be that an attacker could run our
> code in their environment with a fake builtin namespace, but (a) so what? and
> (b) let's defeat that by making the builtin:: namespace be compile-time and
> unshadowable in code that compiles with support for it.

So, if I understand (b) correctly, all I need to do is switch "#builtin#" for "builtin::" in the scanner?
Not quite, because builtin could still be used to shadow the magic. Maybe that's ok, but I was considering a more spoof-proof scheme since we are bothering with defense in depth.

You want to let the parser that would normally build a TOK_DBLCOLON AST with a TOK_NAME node for the 'builtin' atom on the left and the qualified expression on the right (note we allow namespaced expressions) build a TOK_BUILTIN tree with just the expression as its kid.

To outlaw shadowing, when self-hosting we report a compile-time error on any binding (function, formal parameter, var, const, let, catch variable, or array comprehension implicit let) named 'builtin'. You can put this check in Define.

/be
(In reply to comment #37)
> Since vp is an in-out parameter it is up to defaultValue implementations to
> root a second value if necessary. This should not be something defaultValue
> callers have to consider.

Isn't the standard assumption that any value passed to a function is already rooted, for the duration of that function call?  obj in the defaultValue hook is rooted through vp and nothing else.  I don't think I'm happy forcing functions to think before modifying vp[i] in the off chance that that value is the sole root keeping obj alive.
(In reply to comment #41)
> (In reply to comment #37)
> > Since vp is an in-out parameter it is up to defaultValue implementations to
> > root a second value if necessary. This should not be something defaultValue
> > callers have to consider.
> 
> Isn't the standard assumption that any value passed to a function is already
> rooted, for the duration of that function call?

Yes.

> obj in the defaultValue hook
> is rooted through vp and nothing else.

Yes.

> I don't think I'm happy forcing
> functions to think before modifying vp[i] in the off chance that that value is
> the sole root keeping obj alive.

That's how the protocol works. If you waste cycles on extra rooting instead of ordering stores, you lose perf to the competition. If you order stores and comment, you hang on by the skin of your teeth. If we get conservative stack scanning then some of these cases go away. We aim to get conservative stack scanning soon.

There's no bug at present, and no need to add a tvr just in case someone breaks js_DefaultValue some day.

We would prefer automation, through the type system, a static analysis, native stack scanning, or something. In the mean time, adding overhead for non-bugs is a bad idea. It slows down the code and takes time from getting to the automatic promised land.

/be
I forget that JSObjectOps is not public API!  Given that, I retract all objections, as anyone not following that closely has no leg to stand on (although I remain worried about footguns).

However: the comment should go in jsobj.h within the definition of JSObjectOps (NB: not next to the definition of JSConvertOp, as it's also used for JSClass::convert which doesn't appear to have this problem).
No longer blocks: 511777
To trace replace, I need to attach some information to the str_replace fast native.  Annotation is currently done a la JSTraceableNatives, which annotate fast natives with a list of alternative type-specialized versions.  The current data layout is that JSFunction::trcinfo points to this array directly, which is not helpful if I want to annotations just to the FastNative, not each of the specializations.  This patch introduces an intermediary data structure, JSNativeTraceInfo, that points to the array of type specializations and is pointed to by JSFunction::trcinfo.  As a nice bonus, I can factor the JSFastNative pointer, held by each JSTraceableNative, into the JSNativeTraceInfo.

I also renamed "JSTraceableNative" to "JSSpecializedNative", since that is what it is.  The bulk of this patch is just doing all the related renaming.
Attachment #398430 - Flags: review?(jorendorff)
Blocks: 514511
Comment on attachment 398430 [details] [diff] [review]
tweak/rename JSTraceableNative in preparation for tracing replace

>+ * 'sns' points to a static array of available specializations terminated by
>+ * the lack of having the JSTN_MORE flag set.

'specializations'.

>+            JSNativeTraceInfo *trcinfo;  /* tracer metadata; can be first
>                                             element of array */

The comment is no longer true.

r=me with those trivial fixes, or you can go for bonus points:

>+#define JSFUN_TRCINFO       0x2000  /* when set, u.n.trcinfo is non-null,
>+                                       JSFunctionSpec::call points to a
>+                                       JSNativeTraceInfo. */

It would be nice to get rid of this flag, since you can just look at trcinfo itself. (This wasn't the case when the flag was added.) I don't see any places where it's needed. In fact jsdtracef.cpp is misusing it.
(In reply to comment #45)
> >+#define JSFUN_TRCINFO       0x2000  /* when set, u.n.trcinfo is non-null,
> >+                                       JSFunctionSpec::call points to a
> >+                                       JSNativeTraceInfo. */
> 
> It would be nice to get rid of this flag, since you can just look at trcinfo
> itself. (This wasn't the case when the flag was added.) I don't see any places
> where it's needed. In fact jsdtracef.cpp is misusing it.

I was totally on a warpath to do that... until I got to jsapi.cpp, js_generic_fast_native_method_dispatcher.  There it looks like the native is pulled out of the JSFuncSpec, and the only way you know whether to interpret JSFuncSpec::call as a JSNative or a JSNativeTraceInfo is that flag.  Getting around that seemed like it involved bigger changes.  (Although it certainly would be nice if we could lay of this "oh you think its a function pointer?  think again!" road.)
Attachment #398430 - Flags: review?(jorendorff) → review+
Attached patch works, but slowSplinter Review
Well, I got it working (passing trace-tests), but its slow.  As in, unpack-code is 60% slower, and validate-input is 14% slower.

Looking at a micro-benchmark (String.replace in a loop) under callgrind (kcachegrind is awesome, btw), the patch spends 4x more cycles than the interpreted str_replace.  From what I can see, this is largely due to (1) generation of the slow array returned by js_ExecuteRegExp, (2) the intermediary Array-of-Arrays-of-match-results used to communicate match results to the self-hosted code.  str_replace simply calls the RegExp.test version of js_ExecuteRegExp (no array built), calls the lambda directly, and splats the results in a js::Vector.  This can be fixed by changing the algorithm to be more like str_replace.

Past the micro-benchmark there is also the problem that each distinct lambda argument will generate a new branch exit inside the self-hosted function, hence the cost of a String.replace will go up linearly as it is called with different lambda arguments.  Fixing this requires a significant new feature that we've discussed (the first step is bug 516901) where we lookup the fragment to call based on the caller's fragment pc and the callee's identity.

So, definitely not ready.  On the bright side, getting this to trace and work well seems to be yielding generally useful features:
 - the (caller,callee)-keyed dispatch just described should be widely useful in fighting branch explosion
 - the self-hosting mechanism in the patch
 - the "advise" mechanism (for redirecting calls during recording)
 - the #builtin# mechanism is the beginning of static namespaces (bug 514511)

So, perhaps its worth it to push on.
Attachment #395892 - Attachment is obsolete: true
Zero-perf-regression tolerance says no push till win -- but pieces could go in if they individually do not regress anything.

/be
To be clear, above I meant "push on" figuratively, not mercurially :)

OK, so developments: the more I think about ways to improve the self-hosted speed, the more that I see it is an uphill battle: I'm trying to match -O3-compiled C++ with traced JS.  There is a tiny advantage self-hosted replace has, which is that the lambda gets inlined, but other than that, its pure slowdown compared to str_replace.  That means any benchmark that does String.replace(lambda) in a loop without anything else to amortize the cost will show a big slowdown.

So, the (caller,callee)-keyed dispatch will be needed regardless (it seems essential to the type of higher-order function use done by String.replace), so let's ignore that; it needs to be implemented separately.

The idea is to trace into natives like normal, and then, when we encounter the js_Invoke inside str_replace, we trace a new tree, and stitch the two together.  This new tree would, conceptually, be an extension of the caller, hence the call from native to trace would not do the slow js_ExecuteTree type-matching, but the faster js_CallTree fixup.  Clearly, this is a big feature, but (1) it could be used for other situations that abort (Andreas points out eval), and (2) it seems to solve the essential problem of this bug: we don't want to self host because str_replace is so complicated, we just want to stay on trace.

I'll file a new bug.
Strange text alignment in that last comment.

I put the two features described above in bug 517174 and bug 517149.
oh um, I'll back this out of the branch, I thought this was a different bug.
Flags: wanted1.9.2+
well, hold on, I only landed the first one. That should be ok, and just prevent merge pain, right?
Flags: wanted1.9.2+
Depends on: 517174
These bugs are all part of a search I made for js bugs that are getting lost in transit:

http://tinyurl.com/jsDeadEndBugs

They all have a review+'ed, non-obsoleted patch and are not marked fixed-in-tracemonkey or checkin-needed but have not seen any activity in 300 days. Some of these got lost simply because the assignee/patch provider never requested a checkin, or just because they were forgotten about.
The r+'d pre-patch cleanups were landed. comment 47 and comment 51 describe why self-hosting is not the way to go for str_replace + lambda, so WONTFIX'ing.  Note, we also tried a non-selfhosted approach in bug 519567, which was also WONTFIX'd due to a poor complexity/benefit ratio.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.