Closed Bug 1691284 Opened 5 years ago Closed 5 years ago

Regex performance issue

Categories

(Core :: JavaScript Engine, defect)

Firefox 87
defect

Tracking

()

RESOLVED DUPLICATE of bug 1391654

People

(Reporter: mailnew4ster+github, Unassigned)

Details

(Keywords: perf)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.23 Safari/537.36 Edg/89.0.774.18

Steps to reproduce:

Run:
'google.com/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'.match(new RegExp('^([a-z]+.)*google.com/'))

Actual results:

It runs for a long time and then terminates with the following error:
Uncaught InternalError: too much recursion

Expected results:

I encountered a bug, and it took me a while to realize it's a problem with this regex. I simplified it to the one above. That's strange to me since intuitively, it's not ambiguous and shouldn't recurse too much.

Moreover, if the following is used, the result is returned instantly:
'google.com/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'.match(/^([a-z]+.)*google.com//)

In general, the more 'x'-es you add, the slower it gets. The following takes about three seconds to complete on my machine:
'google.com/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'.match(new RegExp('^([a-z]+.)*google.com/'))

Chrome and Node have this issue as well, but they're not killed and just keep running (maybe they will be killed eventually). After looking around I understand that both browsers share regex engines. All online regex debuggers I found online just work instantly.

Chromium V8 issue:
https://bugs.chromium.org/p/v8/issues/detail?id=11406

Component: Untriaged → DOM: Core & HTML
Product: Firefox → Core

To save some time for whoever has to triage this: The difference between the new RegExp() and the literal calls is that the new RegExp case uses \. instead of \\.. The result is a backtrack from the quantifier instead of matching a period. The rendering in bugzilla hides this but it is clear in the chromium issue.

Component: DOM: Core & HTML → JavaScript Engine

As mentioned above, the problem with this regexp is that the . in ([a-z]+.)* is being interpreted as a wildcard, rather than a literal period. As a result, the regexp engine must backtrack to consider all possible partitions of xxx...xxx, which is exponential.

The short-term solution is to fix the regexp. A long-term solution will have to wait for the new non-backtracking engine.

Status: UNCONFIRMED → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.