Closed
Bug 673188
Opened 13 years ago
Closed 13 years ago
Compile regexps lazily
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
mozilla10
People
(Reporter: bzbarsky, Assigned: cdleary)
References
Details
(Whiteboard: [MemShrink:P1])
Attachments
(4 files, 6 obsolete files)
11.93 KB,
patch
|
Details | Diff | Splinter Review | |
20.82 KB,
image/png
|
Details | |
25.63 KB,
image/png
|
Details | |
71.02 KB,
patch
|
mrbkap
:
review+
|
Details | Diff | Splinter Review |
Eager compilation ends up using more time (see bug 672064) and probably more memory (see bug 670689).
This is not quite the same as bug 463545, which was using the nanojit-based regexp jit. Might still be wontfix, of course.
Updated•13 years ago
|
Whiteboard: [MemShrink]
Could we have an optimization for regexps like we do for the mjit, where the regexp is interpreted a few times before it's compiled? Is this possible with the Yarr interpreter?
Assignee | ||
Comment 2•13 years ago
|
||
(In reply to comment #1)
> Could we have an optimization for regexps like we do for the mjit, where the
> regexp is interpreted a few times before it's compiled? Is this possible
> with the Yarr interpreter?
That's an interesting question. I think the concept doesn't apply to our current architecture for a number of reasons:
- We actually JIT compile directly from the pattern (AST) instead of requiring bytecode first. We only use bytecode for those regexps that are too complex for the JIT compiler to handle (or too difficult to optimize, however you want to look at it).
- We can't easily cross from the interpreter to JIT code at "safe points" -- the looping concept isn't as well defined, so large text inputs on initial runs would suffer. The YARR-JIT also uses the C stack to hold backtracking information last I checked, so you'd have to do a transform there.
So I think hypothetically it's possible and a cool idea, but practically it's not feasible.
The second problem seems like something we could overcome. If the input is too large, we could compile immediately. Or, if we're spending too much time in the interpreter, we could just start over in the JIT (assuming there are no side effects; perhaps I'm being naive...).
I don't really understand the first problem. When does the bytecode actually get generated? Would it be difficult to generate it first instead of only as a fallback?
I guess this isn't really a clear win, since the bytecode might actually be bigger than the generated machine code. We'd have to measure.
Updated•13 years ago
|
Whiteboard: [MemShrink] → [MemShrink][mentor=cdleary]
Assignee | ||
Comment 4•13 years ago
|
||
(In reply to comment #3)
> Or, if we're spending too much time
> in the interpreter, we could just start over in the JIT (assuming there are
> no side effects; perhaps I'm being naive...).
No, that's correct. I think that's a viable approach. I'll think it over some more, but the current regexp setup is that a pattern either gets compiled directly to native (if it's simple enough) or to directly to bytecode if it's too complex. I'm thinking that flushing memory for the unused regexps (bug 673189) plus lazy compilation may have the best usefulness/complexity ratio -- we can flush bytecode as well since it can be regenerated from the regexp source text.
Assignee | ||
Comment 5•13 years ago
|
||
dukeleto from #jsapi said that he'd be interested in taking this as a first bug. Some background information: regexps are currently compiled eagerly, on creation. See |js::RegExp::create*| in jsregexpinlines.h. The interface for these C++ objects is quite limited -- the primary way outside code interacts with them is through |js::RegExp::execute|.
|js::RegExp|s contain the code required to execute a regular expression -- it's either native code xor bytecode (i.e. when the fallback bytecode value is set, there is no native code).
Let me know if you have any questions -- IMO the |js::RegExp| interface is pretty clean and it generally just calls into the appropriate part of the YARR regexp engine component.
Updated•13 years ago
|
Whiteboard: [MemShrink][mentor=cdleary] → [MemShrink:P1][mentor=cdleary]
Updated•13 years ago
|
Assignee: general → jonathan
Comment 6•13 years ago
|
||
From what I gather from previous comments, js::RegExp's should be only be compiled if used at least N times, where N is some compile-time constant. This could be implemented by moving code which is currently in |js::RegExp::create*| to |js::RegExp::execute| which is only triggered on the Nth call to|js::RegExp::execute| . This would not require any API changes.
Does that sound like a viable plan?
Status: NEW → ASSIGNED
Comment 7•13 years ago
|
||
In #jsapi:
<cdleary-lappy> dukeleto: the bug in its current form is just to lazily compile, not to switch from bytecode over to JITted code
IOW, instead of compiling a regex when it's created, compile it just before its first use.
Assignee | ||
Comment 8•13 years ago
|
||
(In reply to comment #7)
> IOW, instead of compiling a regex when it's created, compile it just before
> its first use.
Yup, and when this is implemented we can put bug 673189 on top of this to clear out unused-but-compiled regexps at each GC.
Comment 9•13 years ago
|
||
This change may cause some substantial memory savings, so it's good to see you taking it on, Duke!
Comment 10•13 years ago
|
||
Just wanted to give an update on this, since jlebar asked me about it on IRC:
I spent a few days familiarizing myself with the RegExp code and I identified the actual line of code that compiles RegExp's on creation, which gave me a small sense of accomplishment.
Currently I am working on a test so that I can verify my patch works correctly. At first, I was going to write the tests in C++, because there is no way to tell if a RegExp is compiled from Javascript. But I had the pleasure of talking to jimb about this during a Mozilla PDX lunch, and he suggested that I create a private API for the shell which exposes a single function, something like isCompiled(), which will allow Javascript tests to detect if a RegExp is compiled or not.
That is my current plan of attack. If anyone has comments, suggestions or examples for how to implement a shell-only private API, that would be most appreciated.
Comment 11•13 years ago
|
||
> If anyone has comments, suggestions or
> examples for how to implement a shell-only private API, that would be most
> appreciated.
I'm no expert, but I think you should take a look at |shell_functions| in js/src/shell/js.cpp.
Comment 12•13 years ago
|
||
This patch compiles RegExp lazily, but because the TokenStream is not available at execution time, it passes in NULL currently. Many RegExp tests still pass, but tests which deal with multiple matched items fail, because only a single match is returned.
Comment 13•13 years ago
|
||
Assignee | ||
Comment 14•13 years ago
|
||
This is the right idea. I think the match result count is off because parenCount is initialized by |RegExp::compileHelper| which isn't being invoked until after you pull that value out in |RegExp::executeInternal|.
However, there are also a few trickier things lurking.
- Not all |RegExp|s are backed by the bytecode pattern -- looking at |RegExp::compileHelper|, we "try" to JIT compile the pattern, but if we can't, then we fall back on bytecode compilation. For the patterns that are successfully JIT compiled, |byteCode| will always be null.
- Because regular expressions can create "early errors" (errors which must be reported at parse-time, per the ECMAScript specification), we have to at least invoke the YARR parser with a nop-like delegate (see YarrParser.h for information on the parser delegate) in order to check for error codes.
- The token stream is only necessary for early error reporting, so we should be able to create a separate method that just checks for early errors, independent of lazy compilation. (By checking up front, we ensure that no early errors exist at lazy compilation time.)
Duke, let me know what your bandwidth is like for iterating on this patch -- unfortunately, I may have to steal it away to get an interdependent cluster of these regexp enhancements in for the next Firefox release train. I lacked that foresight when I said this was a good first bug, and I'm sorry!
Comment 15•13 years ago
|
||
cdleary: I don't give up easily, so I would like to continue iterating on this patch armed with the knowledge of comment #14. I feel confident that I can fix bullet #1 your comment, because I remember reading useful comments about how to check if JIT'ed code is available. I think I know what I need to do to get parentCount correct as well.
I grasp the "early errors" bit, and I will read the relevant parts of the spec to make sure I fully understand how they should work, but that is the part where I feel like I still need to feel around the code a bit more to see what actually needs to be done.
I have tuits to work with you this week to get this bug fixed, but I understand if you need to steal it away. Do you have an estimate for when you will be forced to steal this interesting bug away from me?
Comment 16•13 years ago
|
||
By hoisting the call to compile to before we calculate matchItemCount in |RegExp::executeInternal|, I reduced the number of failing test files from 77 down to just these 6:
ecma_3/RegExp/regress-188206.js
ecma_3/RegExp/regress-223273.js
ecma_3/RegExp/regress-57631.js
ecma_5/Global/cross-global-implicit-this.js
js1_5/Exceptions/regress-257751.js
js1_5/Exceptions/regress-332472.js
Comment 17•13 years ago
|
||
The list of failing test files for this patch is:
ecma_3/RegExp/regress-188206.js
ecma_3/RegExp/regress-223273.js
ecma_3/RegExp/regress-57631.js
ecma_5/Global/cross-global-implicit-this.js
js1_5/Exceptions/regress-257751.js
js1_5/Exceptions/regress-332472.js
which mostly seem to be tests for "early errors" not being thrown for invalid RegExps.
Attachment #551289 -
Attachment is obsolete: true
Attachment #551291 -
Attachment is obsolete: true
Comment 18•13 years ago
|
||
jorendorff gave me these instructions for finishing this up:
1) Remove Tokenstream param from compile() and compileHelper()
2) Change compileHelper() to assert yarrError is 0 instead of checking for it
3) Learn the secrets of Yarr::YarrPattern to detect early errors
4) WIN
Assignee | ||
Comment 19•13 years ago
|
||
Duke, I think if we get it done in the next couple of weeks it will be fine. Changes like these (lazifying stuff, changing GC) generally need time to bake in the nightly channel, so we're not likely to get it into the current train given that there's only a week or so left.
Some advice: be sure you're running the jit-tests as well as the jstests -- because it's so much easier to add tests there, regexp tests end up in both suites.
I would also recommend, as you gain understanding from the spec, adding tests specifically to test for early errors and such. Increasing our test suite and learning from the spec at the same time is a double win! For jit-tests you effectively just need to drop a file into the directory and use the |assertEq| shell builtin.
Comment 20•13 years ago
|
||
In addition to the above currently failing normal tests, these JIT tests are failing (due to the lack of early syntax errors being thrown):
FAILURES:
-m -j -p /home/leto/hg/mozilla-central/js/src/jit-test/tests/basic/bug576837-regexp.js
-m -j -p /home/leto/hg/mozilla-central/js/src/jit-test/tests/basic/bug593663-regexp.js
Comment 21•13 years ago
|
||
This patch implements the API changes recommended, and starts down the path of reporting early errors in RegExps from the parser.
Attachment #551674 -
Attachment is obsolete: true
Comment 22•13 years ago
|
||
Attachment #553584 -
Attachment is obsolete: true
Comment 23•13 years ago
|
||
The newest patch cheats a bit by making reportYarrError a public method so that I can call it from the parser. I plan to fix this by moving reportYarrError to the parser, since it is not used anywhere else.
The bigger problem is, not all the early error failing tests have been resolved. Here is an example of some of the passing/failing tests from ecma_3/RegExp/regress-57631.js :
FAILED! [reported from test()] Testing for error creating illegal RegExp object on pattern '(?)'and flag 'i' : Expected value 'SyntaxError', Actual value 'not a SyntaxError'
FAILED! [reported from test()] Testing for error creating illegal RegExp object on pattern '(?)'and flag 'g' : Expected value 'SyntaxError', Actual value 'not a SyntaxError'
FAILED! [reported from test()] Testing for error creating illegal RegExp object on pattern '(?)'and flag 'm' : Expected value 'SyntaxError', Actual value 'not a SyntaxError'
FAILED! [reported from test()] Testing for error creating illegal RegExp object on pattern '(?)'and flag 'undefined' : Expected value 'SyntaxError', Actual value 'not a SyntaxError'
FAILED! [reported from test()] Testing for error creating illegal RegExp object on pattern '(a'and flag 'i' : Expected value 'SyntaxError', Actual value 'not a SyntaxError'
FAILED! [reported from test()] Testing for error creating illegal RegExp object on pattern '(a'and flag 'g' : Expected value 'SyntaxError', Actual value 'not a SyntaxError'
FAILED! [reported from test()] Testing for error creating illegal RegExp object on pattern '(a'and flag 'm' : Expected value 'SyntaxError', Actual value 'not a SyntaxError'
FAILED! [reported from test()] Testing for error creating illegal RegExp object on pattern '(a'and flag 'undefined' : Expected value 'SyntaxError', Actual value 'not a SyntaxError'
FAILED! [reported from test()] Testing for error creating illegal RegExp object on pattern '( ]'and flag 'i' : Expected value 'SyntaxError', Actual value 'not a SyntaxError'
FAILED! [reported from test()] Testing for error creating illegal RegExp object on pattern '( ]'and flag 'g' : Expected value 'SyntaxError', Actual value 'not a SyntaxError'
FAILED! [reported from test()] Testing for error creating illegal RegExp object on pattern '( ]'and flag 'm' : Expected value 'SyntaxError', Actual value 'not a SyntaxError'
FAILED! [reported from test()] Testing for error creating illegal RegExp object on pattern '( ]'and flag 'undefined' : Expected value 'SyntaxError', Actual value 'not a SyntaxError'
PASSED! Testing for error creating illegal RegExp object on pattern '(?)'and flag 'a'
PASSED! Testing for error creating illegal RegExp object on pattern '(?)'and flag '123'
PASSED! Testing for error creating illegal RegExp object on pattern '(?)'and flag '/(?:)/'
PASSED! Testing for error creating illegal RegExp object on pattern '(a'and flag 'a'
PASSED! Testing for error creating illegal RegExp object on pattern '(a'and flag '123'
PASSED! Testing for error creating illegal RegExp object on pattern '(a'and flag '/(?:)/'
PASSED! Testing for error creating illegal RegExp object on pattern '( ]'and flag 'a'
PASSED! Testing for error creating illegal RegExp object on pattern '( ]'and flag '123'
PASSED! Testing for error creating illegal RegExp object on pattern '( ]'and flag '/(?:)/'
Anybody have any ideas why some invalid regexp early errors are detected, but not others?
Comment 24•13 years ago
|
||
My hunch about the remaining failing RegExp tests is that JSC::Yarr::checkSyntax() does not correctly detect some invalid RegExps (such as the ones in bug576837-regexp) because they are valid to parse, but are invalid for other reasons, i.e. that builtin character classes within ranges produce syntax errors.
Attachment #555661 -
Attachment is obsolete: true
Assignee | ||
Comment 25•13 years ago
|
||
Just wanted to get some preliminary numbers on how many destructed |RegExpPrivate|s were never executed.
Highest number-of-times-executed counts:
[(0, 2468),
(1, 753),
(13, 45),
(2, 42),
(35, 40),
(3, 24),
(4, 24),
(9, 23),
(5, 15),
(85, 14)]
So laziness seems very useful, and I'll grab memory overhead size for those unused guys.
Assignee: jonathan → cdleary
Assignee | ||
Comment 26•13 years ago
|
||
Doing some pretty lame browsing around GMail, GReader, and Twitter I got the following usage numbers for zero-times-executed |RegExpPrivate|s:
Total: 739108 bytes
Max: 6904
Mode: 363
Mean: 525.68
Stddev: 27533.82
So we generate under a meg of stuff for no real reason.
Comment 27•13 years ago
|
||
And we throw these regexps away when we destroy the compartment?
If so, I don't think this is MemShrink:P1, unless we can find a site which generates 10 times as much memory from regexps...
Assignee | ||
Comment 28•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #27)
It's also a speed concern for compilation overhead of huge regexps (note the measurements in bug 672064).
Assignee | ||
Comment 29•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #27)
Oh, and also making the |RegExpObject| lazy allows us to purge them all on GC (i.e. from idle compartments), which is probably the more MemShrink:P1 issue.
Assignee | ||
Comment 30•13 years ago
|
||
Duke, just wanted to reiterate that I'm sorry to have to steal this one away! (Per comment 14.)
Whiteboard: [MemShrink:P1][mentor=cdleary] → [MemShrink:P1]
Comment 31•13 years ago
|
||
I'm definitely not saying you shouldn't fix this bug. :) It just seems that the expected memory implications aren't particularly large.
Assignee | ||
Comment 32•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #31)
Oh yeah, I was just pointing out that the prolonged usage results are different from my one-off browsing session results. bz posted a troubling piece of long-term about:memory data WRT RegExp code that got me jumpstarted back on finishing this stuff.
Reporter | ||
Comment 33•13 years ago
|
||
> And we throw these regexps away when we destroy the compartment?
In my case I had several hundred megs of regexp code in the system compartment. So no destroying of the compartment going on.
Assignee | ||
Comment 34•13 years ago
|
||
Depends on patches in bug 691695 and bug 692069. This looks like it may be passing all my shell tests. It lazifies the RegExpObject creating a RegExpPrivate. (The other option is to lazify the compilation but eagerly create the RegExpPrivate, but this way makes it easier to install a RegExpPrivate cache and probably reduces refcount bumping.)
Assignee | ||
Comment 35•13 years ago
|
||
Unless I'm doing something wrong, it looks like the twitter-text.js from bug 672064 comment 2 is an effective microbenchmark: the execution time is reduced from 1600ms to 10ms.
Looks green on try. (Should also be easy to add a RegExpPrivate cache in front of the lazy creation. Huzzah for well factored code.)
Attachment #565477 -
Attachment is obsolete: true
Attachment #565677 -
Flags: review?(mrbkap)
Comment 36•13 years ago
|
||
Comment on attachment 565677 [details] [diff] [review]
Lazify RegExpPrivate creation.
This looks really good. I complained to cleary on IRC about parenthesizing the conditionals on ?: expressions and he said he'd fix this.
Attachment #565677 -
Flags: review?(mrbkap) → review+
Assignee | ||
Comment 37•13 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/61dd23c012ee
https://hg.mozilla.org/integration/mozilla-inbound/rev/d50bd6d5b097
Target Milestone: --- → mozilla10
Comment 38•13 years ago
|
||
were Dromaeo regressions expected? tree-management is showing CSS and jslib regressions on linux and linux64
Comment 39•13 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/61dd23c012ee
https://hg.mozilla.org/mozilla-central/rev/d50bd6d5b097
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 40•13 years ago
|
||
(In reply to Marco Bonardo [:mak] from comment #38)
> were Dromaeo regressions expected? tree-management is showing CSS and jslib
> regressions on linux and linux64
Kind of. Bug 694527 *should* quickly restore normalcy, which is why I don't think it's worth a back out -- what page / mailing list should I be watching to verify that we bounce back to our previous value when that lands today?
Comment 41•13 years ago
|
||
Comment 42•13 years ago
|
||
In graphs-new you have to select Custom Chart to see inbound graphs.
Comment 43•13 years ago
|
||
(In reply to Chris Leary [:cdleary] from comment #40)
> (In reply to Marco Bonardo [:mak] from comment #38)
> > were Dromaeo regressions expected? tree-management is showing CSS and jslib
> > regressions on linux and linux64
>
> Kind of. Bug 694527 *should* quickly restore normalcy, which is why I don't
> think it's worth a back out -- what page / mailing list should I be watching
> to verify that we bounce back to our previous value when that lands today?
Can I please suggest that we should back this out now and reland it with bug 694527 at the same time? That's the only way to be sure that nothing is regressed, by not seeing any regression emails on dev.tree-management...
Comment 44•13 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #43)
> (In reply to Chris Leary [:cdleary] from comment #40)
> > (In reply to Marco Bonardo [:mak] from comment #38)
> > > were Dromaeo regressions expected? tree-management is showing CSS and jslib
> > > regressions on linux and linux64
> >
> > Kind of. Bug 694527 *should* quickly restore normalcy, which is why I don't
> > think it's worth a back out -- what page / mailing list should I be watching
> > to verify that we bounce back to our previous value when that lands today?
>
> Can I please suggest that we should back this out now and reland it with bug
> 694527 at the same time? That's the only way to be sure that nothing is
> regressed, by not seeing any regression emails on dev.tree-management...
Except that I just saw that bug 694527 landed on inbound. :((
Comment 45•13 years ago
|
||
Believed to have caused bug 694752.
I have a backout based on m-c tip going to try (see https://bugzilla.mozilla.org/show_bug.cgi?id=694752#c8), and if that helps, I'll be backing this out.
This would also resolve the problems in comment 43, since everything could be re-landed at once, making the dev.tree-management results clearer.
Assignee | ||
Comment 46•13 years ago
|
||
Nick mentioned in his blog post that the ultimate state of this bug was unclear. The regressions were resolved in a followup fix -- there was confusion because I didn't land that on mozilla-central post-merge. Everything is cool, and we should get some nice thread-data-level consolidation wins out of bug 634654.
You need to log in
before you can comment on or make changes to this bug.
Description
•