bugzilla.mozilla.org will be intermittently unavailable on Saturday, March 24th, from 16:00 until 20:00 UTC.

Regex runtime depends on input size when it shouldn't




JavaScript Engine
4 years ago
2 years ago


(Reporter: Peter Isza, Unassigned)


7 Branch

Firefox Tracking Flags

(Not tracked)



(1 attachment)



4 years ago
Created attachment 831337 [details]

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:25.0) Gecko/20100101 Firefox/25.0 (Beta/Release)
Build ID: 20131025151332

Steps to reproduce:

This bug makes JISON very slow, which makes compiling CoffeeScript on Firefox very slow. Slower than IE.

Jsfiddle: http://jsfiddle.net/eP2TR/10/.

I have run the following regular expressions on strings that contain only letter 'a':
  1. /^b+/
  2. /^(b)+/
  3. /^(?:b+)/
  4. /^(?:(?:b)+)/

Obviously, there should be no matches. The regex engine should quit after reading the first character, so the running time shouldn't depend on what's after the first letter in the string.

Safari 6 also contains the same bug (I guess it is using the same sh*tty regex compiler), while Chrome, IE10 and Opera don't cont.

Actual results:

In cases #2 and #4 the running times depended (linearly) on the size of the input.

Expected results:

The regex engine should have quit after seeing the first character of the input.


4 years ago
Priority: -- → P3

Comment 1

4 years ago
Regression window for cases #2 and #4(m-c)
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0a1) Gecko/20110606 Firefox/7.0a1 ID:20110606110214
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0a1) Gecko/20110606 Firefox/7.0a1 ID:20110606132754

Regressed by:
	cc36a234d0d6	David Mandelin — Bug 625600: Update Yarr import to WebKit rev 86639, r=cdleary,dvander
Blocks: 625600
Component: JavaScript Engine: JIT → JavaScript Engine
Ever confirmed: true
OS: Mac OS X → All
Version: 25 Branch → 7 Branch
Hmm.  Cases 2 and 4 run in the Yarr interpreter, I would think, due to https://bugs.webkit.org/show_bug.cgi?id=122891

That said, it's odd that the Yarr interpreter would take linear time on those.

Peter, just to make sure, are the regexps in comment 0 representative of the ones JISON is using, in that fixing that case will actually help JISON?
Priority: P3 → --
Flags: needinfo?(luke)

Comment 3

4 years ago
Hi Boris,

Yes, we parse JavaScript code with jison (http://almafa.org), and that's how we discovered the bug. We didn't try it with CoffeeScript though, but we're pretty sure the bug slows that down too.

By the way I thought the regexp engine used a finite automaton to parse the string.
I'm not familiar with the details of how Yarr works.  I know some regexps get compiled into optimized jitcode that does the match and some run in effectively a regexp interpreter.  How that last is implemented, I don't know.

Comment 5

4 years ago
I disavow any knowledge of this; I'm not sure who maintains regexps now...
Flags: needinfo?(luke)

Comment 6

4 years ago
Is Yarr a separate project? Should I file a bug there?

Comment 8

4 years ago
I have reported the bug here as well: https://bugs.webkit.org/show_bug.cgi?id=124319#c0.

Comment 9

4 years ago
Scrolling in CodeMirror is also extremely slow due to this bug when the window is big and syntax highlighting is turned on.
We don't use yarr anymore, we have irregexp instead. The jsfiddle in the first comment shows ~constant time for all the cases, so I suspect that as the result of the exec() call isn't used, it's dce'd by Ion and we are effectively measuring an empty loop.

If I change the initial code to do more runs and remember what the last result of exec() is (and print it to make sure it's not dce'd), I can see that we still have ~constant time. So closing as worksforme, feel free to open a new bug if you think the contrary.

Last Resolved: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.