Closed Bug 724315 Opened 13 years ago Closed 11 years ago

Providing a RegExp object does not need to prevent a flat match

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 965712

People

(Reporter: cdleary, Unassigned)

Details

(Whiteboard: [good first bug][mentor=jorendorff][lang=c++])

Attachments

(2 files)

We have an optimization for string builtins (like js::str_match and js::str_replace) so that calling String.prototype.match with a "regular string" -- one without fancy regexp meta-characters (like '^') -- doesn't need to build a one-shot regexp object. Example: someText.match("foo"); We can search for "foo" in someText (without ever creating a RegExp object) simply using a function like strstr. However, you can also have a user-created RegExp object with the same contents: someText.match(/foo/) In which case we'll use the compiled RegExp JIT code to execute the match instead of doing the strstr-based flat match. The purpose of this bug is three-fold! 1. Investigate whether the strstr-based flat match is faster than the equivalent JIT-compiled RegExp code. (And, if it *is* faster and you're feeling adventurous, find out why!) 2. Use some simple fprintfs to see if this situation comes up in benchmarks or typical web browsing. 2. If strstr *is* faster and the situation comes up with enough regularity, make a patch to add a IsFlat flag on RegExp objects that causes them to use strstr instead of JIT compiled code.
Whiteboard: [good first bug][mentor=cdleary] → [good first bug][mentor=cdleary][lang=c++]
Hi Chris, I'm new here and was looking at this bug for my first bug and had a question. It looks like ECMA-262 suggests that for string.prototype.match: 3. If Type(regexp) is Object and the value of the [[Class]] internal property of regexp is "RegExp", then let rx be regexp; 4. Else, let rx be a new RegExp object created as if by the expression new RegExp(regexp) where RegExp is the standard built-in constructor with that name. As far as Firefox is concerned, Is it better to adhere to the exact description of the function in the spec or to implement an optimization that breaks from the spec algorithm as long as it doesn't interfere with the end result of the function?
(In reply to Mark Adams from comment #1) > I'm new here and was looking at this bug for my first bug and had a > question. Welcome, Mark! > As far as Firefox is concerned, Is it better to adhere to the exact > description of the function in the spec or to implement an optimization that > breaks from the spec algorithm as long as it doesn't interfere with the end > result of the function? Adhering to the exact description in the spec is generally useful if a) the description is efficient enough and b) it'll help the next person who comes along and tries to verify that we're doing the correct thing according to the spec. The reason that the second purpose of this bug is "Use some simple fprintfs to see if this situation comes up in benchmarks or typical web browsing," is to see whether or not it'll be worthwhile to deviate from the exact description in the spec a little more in order to get performance wins that matter. In other words: we like performance wins that matter, and we're willing to pay some amount of direct correspondence to the specification in order to get it. Like you note, optimizations shouldn't create observable differences from the specified behavior -- usually we're creating a "fast path" for a common or more-easily-handled usage pattern that falls back to a more general "slow path". Also noteworthy is that we rip out optimizations that we *thought* might have a positive performance impact in the past, but that were later measured to be useless.
Hi, I am new to development on spidermonkey and mozilla. I would like to work on this bug, and would be grateful if this bug could be assigned to me.
Assignee: general → madhavan93
Whiteboard: [good first bug][mentor=cdleary][lang=c++] → [good first bug][mentor=jorendorff][lang=c++]
Can you confirm that you're still working on this bug?
Flags: needinfo?(madhavan93)
Yes, I would like to resume work on this. Sorry for the lathargy.
Flags: needinfo?(madhavan93)
Hi, I'm new here and I'm interested in this bug. If you are still working on it I'd like to help you, if you aren't I'd be glad to be assigned on this task.
Flags: needinfo?(madhavan93)
I think we're safe in making you the owner of this task now.
Assignee: madhavan93 → mxs
Flags: needinfo?(madhavan93)
Thanks! I've made three tiny benchmarks: - 100 runs of a few calls to match() against a small string (50 chars), the difference is not relevant; - 100 runs of a few calls to match() against a medium string (500 chars), looks like there's a gain of 10 %; - 100 runs of a few calls to match() against a large string (3000 chars), here the gain is noticable, about 60%. I've chosen totally arbitrarily the length of those strings so maybe it's not relevant, but the tendancy is clear: the longer is the matched string, the better is this optimization. I've attached the code of the bench and the curves I get, if you want to reproduce those curves I can also provide you with the script to generate them. The tests were done on a laptop (i5) running Linux, with a js shell built with optimizations, I pinned the process to a dedicated CPU to avoid noise. I see some regular pikes on the curve, I think it's the gc but I haven't found a way to disable it, is there a way to do so? I hope I haven't missed something, and that the bench is not flawed by some external magic that I don't understand yet. If you think it's ok, I guess I can investigate why it's better, and continue to step 2 to have an idea of the average lengths involved in "normal" condition, so as to see if this really worth it to implement it.
Status: NEW → ASSIGNED
Hey Sébastien, these results look promising! There are two things you could do next: The first is to slightly clean up the benchmark by creating two different functions for the regexp and the string matching cases and calling those once with a larger loop count in the function itself. This way, you decrease the overhead of the benchmarking harness. That might not matter too much, but the difference will probably be slightler larger. The second, more involved but also more interesting step is to instrument the matching code to print information to stdout on whether a match has been given a regexp but could be a flat match, and how long the input string was. Then do a browser build and use it for a while, collecting the output for analysis. This should give a good first approximation of how worthwhile this optimization is. Thanks for working on this!
Sébastien: are you still working on this RegExp optimization? The results from your benchmark looked promising. :)
Flags: needinfo?(mxs)
Hey, I've moved to something else, I think I'm going to resume my work on this soon, but feel free to reassign it if someone else is interested.
Flags: needinfo?(mxs)
Assignee: mxs → nobody
I want to mention bug 965712. That bug adjust regexp to use our builtin string matching for patterns that are actually strings.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: