Closed Bug 904887 Opened 11 years ago Closed 3 years ago

regular expression performance much slower in FF than Chrome

Categories

(Core :: JavaScript Engine, defect)

25 Branch
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox42 --- wontfix
firefox43 --- affected
firefox44 --- affected
firefox45 --- affected

People

(Reporter: getify, Unassigned)

References

()

Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:25.0) Gecko/20100101 Firefox/25.0 (Beta/Release) Build ID: 20130813004004 Steps to reproduce: Check the code and performance test here: http://jsperf.com/regex-lexing-perf Basically, I've got a big string of various characters and number literals in it, then I loop through that string with a /g global regex, one match at a time, using `pattern.exec(..)`. Actual results: The code functions correctly in all browsers, but it is 6 times slower in FF than in Chrome. Expected results: I expect some variations but nowhere near that much difference in performance. I expect this raw JS, indeed mostly inside the regex, to perform say within 10% or less difference, on average, with various modern browsers. Also, I've observed that the difference in performance gets much worse (more than linearly) as the length of the lexed string increases. I have full complex heuristic-lexing code in a project called "literalizer" which can analyze all 274kb of jQuery's unminified code in around 350ms in Chrome, but that same code takes over *20 seconds* in FF. That's nearly 100x slower. :( I suspect the regex handling is the causal difference. But my basic understanding of regex mechanics and "catastrophic backtracking" doesn't ring any major alarm bells with this regex. Moreover, I'd like to understand why Chrome seems to have far less trouble with this (kind of) regex than FF. http://www.regular-expressions.info/catastrophic.html
Component: Untriaged → JavaScript Engine
OS: Mac OS X → All
Product: Firefox → Core
Hardware: x86 → All
Summary: regular expression performance → regular expression performance much slower in FF than Chrome
My bet is that array allocation and/or GC cost are the culprits for the performance drawback. Using the `if (!pattern.exec(code)) {...}` form which is special cased by the compiler to avoid allocating an array, performance is on a par with V8.
Thanks for the analysis, Andre. This is obviously a vastly reduced case of my actual code. The "literalizer" lexing code uses this pattern of regex "traversal" as I show it, so that I can at each token inspect its "context" heuristically to see how to interpret it. In that code, I obviously actually use the array output of exec(), though I rarely ever rely on matching groups TBH. When you consider the more complex code in an "isolated" test, like here: http://jsperf.com/literalizer-perf-1 You can see that the perf diff between FF and Chrome gets even more pronounced, almost 30x. Obviously there's lots more moving parts inside of literalizer that the original perf test, and I'm sure some of them are partly to blame for the 30x. For instance, I just realized I'm using several `match(..)` calls when I should probably be doing `test(..)`, since I'm not using the return value from `match(..)`. I'll be fixing that, and we'll see if that improves the 30x at all. I concur that GC might very well be a big concern with this coding pattern. I'm going to keep looking for hidden GC (like not using return values) I can get rid of.
Update: I made the change from match() to test() in literalizer, and it improved the literalizer-perf test by about 1 op/sec... so a tiny bump, but almost imperceptible. It improved the average run time for literalizing jQuery from 20.5 seconds to 19.5 seconds.
You may get another performance improvement when you use `RegExp.lastMatch` to obtain the matched part of the input string. That means replacing [1] with [2]. And don't tell anyone I've recommended to use RegExp statics ;-) [1] https://github.com/getify/literalizer/blob/master/lib/lit.js#L681-L706 [2] https://gist.github.com/anba/ed404bec7cd5642110c2
@Andre Thanks again for the tip. I have chosen the techniques I use specifically because I was worried about these: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Deprecated_and_obsolete_features I didn't want to write code based on them, if those were indeed subject to being removed at some point (even "soon"). In particular, as you can see from the code, I manually calculate `left_context` instead of using `leftContext`, for that exact reason.
Still seems like there is an issue worth profiling here, right? The same code runs dramatically faster in V8 than in Spidermonkey. Workarounds using deprecated features should not be needed.
Also, it's probably worth profiling the original testcase rather than or in addition to a simplified version of it; simplifying a testcase can often change the performance problem.
@David -- that's why I provided both: Reduced test-case: http://jsperf.com/regex-lexing-perf Complete test-case: http://jsperf.com/literalizer-perf-1 Note that the reduced test case shows that's there's a substantial perf problem just with plan vanilla "global-regex traversal". The complete-test-case code basically is just more complex usage of that same sort of technique (with a lot of other regex tests mixed in), so I would (naively, I'm sure) expect a major perf improvement to the complete test case if the reduced test-case were "fixed". But, you're right, if the isolated/reduced test case improves but my actual code still runs too slow, we won't have met my "goal" here. :)
(In reply to André Bargull from comment #4) > You may get another performance improvement when you use `RegExp.lastMatch` > to obtain the matched part of the input string. That means replacing [1] > with [2]. And don't tell anyone I've recommended to use RegExp statics ;-) I am giving you the evil eye. /o\ (In reply to Kyle Simpson from comment #5) > I have chosen the techniques I use specifically because I was worried > about these: > > https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/ > Deprecated_and_obsolete_features And I am giving you a gold star. \o/
With the reduced test case from comment 8, Chrome 46 is still 3x faster than Firefox Beta 43 on OS X. Reduced test-case: 62,920 vs 21,040 ops/sec Complete test-case: 746 vs 572 ops/sec Note that Nightly 45 is slower than Beta 42, even with JSGC_DISABLE_POISONING=1: 15,819 vs 21,040 ops/sec.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Hannes can you take a look at this? :)
Flags: needinfo?(hv1989)
(In reply to Jan de Mooij [:jandem] from comment #11) > Hannes can you take a look at this? :) Definitely. Bug 1187438 is related. We cannot execute a fast jit to jit transition for global patterns. I created a dirty patch there and tested it. It halves our execution time. locally: 573ms to 120ms d8 is 151ms Now to make it a decent patch there was need for some rather hacky things. At the same time Arai was looking into selfhosting all this code. (In order to make use ES6 compatible). That also makes doing this patch easier. Bug 887016. I'll look into coordinating to get that landed. There were still some performances issues that patch had, before it could land.
Flags: needinfo?(hv1989)
Depends on: 1187438

Hello,

was this ever solved? I think that I got hit by something similar ... big one line string and function with regular expression on it. When I append "\n" on it it seems to be quite okay (fast enough), but when it is missing it is about much slower than Chrome (500ms vs 7s on FF). Maybe it is also bug in library which is used there (I did not dig deep enough as just "\n" on end was okay), but I am wondering if it can be connected.

The RegExp engine was replaced by the one from v8 a year ago.
So the issue you are seeing might be unrelated to the original issue.

Would you mind opening a new bug with a test case which reproduces the issue you are seeing?

Flags: needinfo?(zdnexnet)

Ok, if I will manage to make testcase I will create new one.

Flags: needinfo?(zdnexnet)

This issue will not be fixed, because it concerns the previous RegExp engine which is no longer used in SpiderMonkey.
The regexp engine got replaced by v8 regexp engine, while performance differences might exists, they should no longer be attached to this old bug.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.