Closed Bug 406271 Opened 16 years ago Closed 14 years ago

JavaScript Regular Expression Extremely Slow


(Core :: JavaScript Engine, defect)

1.8 Branch
Not set





(Reporter: areagan, Assigned: luke)




(Keywords: perf)


(2 files, 6 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20071115 Firefox/
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20071115 Firefox/

I have Javascript code used to highlight text.  When the JavaScript executes over a large document it is extremely slow.
In my test the sample took about 80 seconds to execute.
There is an alert at the beginning and end of the script to show the user when highlighting begins and ends.  
When I tested the same code in Internet Explorer it completed in about 14 seconds.
There is now "slow script" notice printed so the user does not have the option to stop execution of this script.

Reproducible: Always

Steps to Reproduce:
1.Load Web Page
2.Watch for "start" and "done" alert to observer time taken.
3.Load the same page in Internet Explorer and watch how quick it is
Actual Results:  
Execution of the script is extremely slow, about 8-10x slower than IE.

Expected Results:  
Complete in a similar time to internet explorer.
I would expect the typical, this script is unresponsive error, where I can select to continue exection, stop, or debug options, to appear sometime during the execution of this slow script.
Firefox trunk 2007113004 on Linux completes in 6 seconds for me.

Andy, please test a trunk build and report your results:

FYI, Firefox 2.x only takes security and crash fixes (and regressions from
those fixes) at this point.
Assignee: nobody → general
Component: General → JavaScript Engine
Keywords: perf
Product: Firefox → Core
QA Contact: general → general
Version: unspecified → 1.8 Branch
I downloaded the latest trunk as suggested and the sample code completed in 41 seconds.

The user-agent from the about for the trunk is: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b2pre) Gecko/2007120305 Minefield/3.0b2pre

I tested on a co-workers machine on the generally released version of FireFox and he had the same results.
I get a 404 not found on the url. Is there another location for the test script?
Sorry about that.  I had a server fail and the site switched to a backup server that didn't have the example.  The server is back up so the sample is available.  I also put the sample on the backup server in case I get another unexpected fail-over.
My test results on Windows XPSP2:
Firefox trunk 2007120505  6 seconds
Opera 9.21                2 seconds
IE7                       2 seconds
Does this still happen with the recent nightlys?

Also, link is not accessible.
I can confirm that Regular Expressions are slower than expected.

FireFox: 3.1b2 (20081201)
OS: Windows Vista SP 0
1 GB RAM, 1.66GHZ Dual Core

While developing a tokenizer I came across the following results with 500 lines of JavaScript:

Chrome    22ms
IE7:                 359ms
FireFox 3.1b2:     1,083ms
Safari 3.2.1:      4,573ms
Opera 9.63:        6,260ms

URL to test with:

Just paste about 500 or so lines of JS into the first textarea and hit "Tokenize".

Hope this helps
Sorry, looks like I was wrong on those numbers and the cause. Here's a reduced test with results:

onload = function(){
   var str = "qwdqw !@%^&*hrty4567756223!dewbghgN ER  .<.TYu5  ";
   var re = /\w+/
   var m;
   var d1 = new Date()
   for(var i = 0;i < 1000000;i++){
      m = re.exec(str)
   alert(new Date() - d1 + "ms")

Chrome    1,173ms
Safari 3.2.1:         1,450ms
Opera 9.63:           2,683ms
FireFox 3.1b2:        5,326ms
IE7:                  7,987ms

On MBP, I got:

Minefield today nightly: 4200ms
recent webkit nightly: 459ms
Ever confirmed: true
This testcase points out two problems: we don't trace RegExp.prototype.exec, and we don't native-compile any of the naked classes.
Dave, quick fix to the regexp compilation issue possible?

Traceable native for exec should be a no-brainer, zero-risk to boot -- separate bug blocking this one seems best.

Competitive issue, not a blocker by some criteria but close to it in my book.

Flags: wanted1.9.1?
OS: Windows XP → All
QA Contact: general → dmandelin
Hardware: x86 → All
If I read everything right the regex is /\w+/, and in that case a cheesy quick fix is possible. I had hoped to switch to a new compiler structure before doing quantifiers, but it could be done either way.
Flags: wanted1.9.1? → wanted1.9.1+
Are all of these the same fundamental issue?

bug 406271
bug 480759
bug 459783
Flags: blocking1.9.2?
Updated numbers from my last post:

Chrome 2	 409ms
Safari 4b	 429ms
Opera 10a	1228ms
Fx 3.5b4	1306ms
IE8		2794ms
Blocks: 503107
Attached patch WIP v.1 (obsolete) — Splinter Review
Not quite ready for review, but this patch passes the js tests and my unit tests.  First, I added compilation for \w \W \s \S \d \D and '.'.  Next, to /\w+/ at all, it is necessary to determine that no backtracking is needed when greedily consuming the quantifier body.  So, /a*/ /a*b/ /\w*#/ /.*\n/ and /\d*\s/ don't need backtracking, while /a*a/ /\w*a/ /\w*\d/ and /.*s/ do need backtracking and thus can't be compiled.

A big part of this patch is enumerating the characters that can be consumed along a path, representing this set efficiently, and testing set intersection.  Once this was done, I was able to generalize quantifiers and alternatives to support nesting (as long as they use non-capturing parenthesis).

Current limitations:
 - '.' includes almost everything, so the "no-backtracking" limitation will abort compilation on practically every regexp where .* isn't last.  (e.g., /.*star.*/ in bug 503107 comment 1)
 - user-defined character classes don't have an obvious efficient way to implement enumeration/intersection, so regexps like /(?:[a-c]|[d-f])x/ will conservatively assume [a-c] and [d-f] intersect and fail compilation.
 - only *, +, and ? quantifiers are supported because general {n,k} requires a stack of loop counters.  (It would be easy to support {n,}, with a fixed upper bound on n, through "inlining")
 - only greedy quantifiers are supported
 - no capturing parenthesis

Comments welcome
Assignee: general → lw
Attached file Performance test
Performance comparison:

Without JITing (trunk): 2624ms
With JITing (patch):    111ms
QA Contact: dmandelin → general
Attached patch WIP v.2 (obsolete) — Splinter Review
Bug fixes and performance tweaks.

SunSpider performance improved:
dna:         1.126x as fast    39.4ms +/- 1.3%    35.0ms +/- 0.0%
tagcloud:    1.013x as fast    95.3ms +/- 0.8%    94.1ms +/- 0.7%
unpack-code: 1.050x as fast    115.2ms +/- 1.2%   109.7ms +/- 0.9%

I'm not sure how much of this is regexps being compiled now that weren't before and how much is the tweaks to what gets passed in and gets kept live.
Attachment #387485 - Attachment is obsolete: true
Attached patch v.4 (obsolete) — Splinter Review
Attachment #387577 - Attachment is obsolete: true
Attachment #387988 - Flags: review?(dmandelin)
Comment on attachment 387988 [details] [diff] [review]

I thought it was kind of unfortunate to require all the character set testing code, since it would be temporary until we get backtracking, but then I realized it could continue to be useful in other algorithms or optimizations, so that seems fine.

Also, very cool C++ usage in general. Mostly I just have a bunch of nits about names and comments.

>+ * Return whether the given subexpression has a valid path consuming no
>+ * character (epsilon edge).  'next' indicates whether the algorithm should
>+ * consider the next expression in concatenation order.  The conservative answer
>+ * is 'true'.
>+ */
>+static bool epsPath(RENode *node, bool next = true)

There are some ambiguities in this comment (what does 'consider' mean? and 'conservative answer to what?' (context suggests 'next', which is of course wrong)). I recommend a revision like

 * Return true if the given subexpression may match the empty string. The conservative
 * answer is |true|. If |rest| is true, then the subexpression is considered to be the
 * rest of the regexp, starting from node. Otherwise, the subexpression is considered to
 * be |node| in isolation.
static bool mayMatchEmpty(RENode *node, bool rest = true)

But I don't know if |rest| is a particularly great name. 

Note that I also recommend a rename to |mayMatchEmpty|, because mention of epsilon edges may be confusing or distracting in this context (see also below).

>-    JSBool targetCurrentPoint(LIns* ins) 
>-    {
>-        if (fragment->lirbuf->outOMem()) 
>-            return JS_FALSE;
>-        ins->setTarget(lir->ins0(LIR_label)); 
>-        return JS_TRUE;
>-    }
>+    void targetCurrentPoint(LIns *ins)
>+    {
>+        ins->setTarget(lir->ins0(LIR_label));
>+    }

I noticed you removed the OOM checks from these items. I remember having to add them to fix some bug. Are you sure that the LIR buffer cannot be OOM before these are called? If it is, I think you can end up trying to patch random stuff and going very bad.

>+    /* Factor out common code to index js_alnum. */
>+    LIns *compileIndexTable(LIns *chr, const bool *tbl)

When I saw this I thought I would be reading a function that builds an index table. I recommend renaming to something like |compileTableRead|.

>+    {
>+        if (sizeof(bool) != 1) {
>+            LIns *sizeLog2 = lir->insImm(StaticLog2<sizeof(bool)>::result);

I like the template thing very much.

>+    LIns *compileAlt(RENode *node, LIns *pos, bool atEnd, LInsList &fails)
>+    {
>+        RENode *leftRe = (RENode *)node->kid, *rightRe = (RENode *)node->u.kid2;
>+        /*
>+         * If the RE continues after the alternative, we need to ensure that no
>+         * backtracking is required.  Recursive calls to compileNode will fail
>+         * on capturing parens, so we guard here on non-deterministic branches
>+         * ("left branch succeded, continue to next or try other branch?") in
>+         * the underlying NFA.
>+         */

*Please* avoid the terms "NFA" and "nondeterministic" in the context of JS regular expression matching. In my experience, they open up endless confusing discussions because those terms are strongly overloaded in this context. (In particular, people tend to start thinking about automata theory constructs that we are totally not using here.) Also, they are mostly incorrect. Perl- and ECMA-style regular expression matching is not nondeterministic and is not implementable by NFAs. It is a deterministic backtracking search, which is a PDA in automata theory terms. So please rewrite all those comments.

>+        if (!atEnd) {
>+            /* Catch non-deterministic branch between left and right */

The part that really needs explanation is why we test this condition. I think you want something like: 

              // We don't support backtracking yet, so we are correct only if
              // for any given text string there is only one way to match this alt.

By and large, though, the code itself in this section is pretty clear.

>+        /*
>+         * This might be the only LIR_live in Mozilla, so I will explain its
>+         * sinister semantics.  LIR_lives must appear immediately following a
>+         * backwards jump and describe what is live immediately at the *target*
>+         * of the back-edge.  Thus, these instructions answer the question "what
>+         * is live at the top of the loop?", which makes sense, because the
>+         * backwards scan has not yet seen the top of the loop and needs this
>+         * information to continue working backwards up the inside of the loop.
>+         *
>+         * Here, 'cpend' and 'state' get defined before the loop, and used
>+         * inside, so they are live at 'loopTop'.  While 'iterBegin' is used
>+         * after the loop, making it live in on loop exit, it gets defined after
>+         * 'loopTop', which "kills" its liveness.
>+         */
>+        lir->ins1(LIR_live, state);
>+        lir->ins1(LIR_live, cpend);

Thanks for figuring this out. I assume variables only need to be listed this way if they are only live because of a read in the loop body, and that if they are read after the loop, these usually aren't needed. Or is that not true?
Attached patch v.5 (obsolete) — Splinter Review
(In reply to comment #19)


> I noticed you removed the OOM checks from these items. I remember having to add
> them to fix some bug. Are you sure that the LIR buffer cannot be OOM before
> these are called? If it is, I think you can end up trying to patch random stuff
> and going very bad.

I talked to Andreas a bit about the OOM situation in nanojit before writing this and, if I understand correctly, the idea is that nanojit OOMs early so that there is enough memory "in reserve" to satisfy another bunch of instructions (e.g., enough to compile a normal JSOP).  Thus, you only have to check OOM if you have a loop or recursion or other particularly long codegen.  This is why, if you search jstracer.cpp, the only outOMem calls are in loops and pinch points like monitorRecording.  In jsregexp, the only loops/recursion pass through compileNode, so that's where I put the main OOM check, with special cases in compileFlat and compileClass.

> Thanks for figuring this out. I assume variables only need to be listed this
> way if they are only live because of a read in the loop body, and that if they
> are read after the loop, these usually aren't needed. Or is that not true?

That's what I initially thought too, but the resulting assembly begged to differ quite forcefully ;).  The basic issue is that, since the regalloc works backwards, when it reaches a backwards edge, it is *completely* reliant on the LIR_live insns to tell it what is now live.  So even if you have a program:

1  x = ...
2  top:
3    ... no use/def of x
4    jt after
5    ... no use/def of x
6  goto top
7  after:
8  ... = x

if you don't make x LIR_live at 6, the regalloc will assume x is not live until is sees 4 (and pulls in the live state from 7).  However, between 6 and 4, the regalloc will not know x is live and insn 5 is free to clobber x's storage.
Attachment #387988 - Attachment is obsolete: true
Attachment #388232 - Flags: review?(dmandelin)
Attachment #387988 - Flags: review?(dmandelin)
Comment on attachment 388232 [details] [diff] [review]

>+ * This table allows efficient testing for \w during a trace.  ECMA-262
>+ * defines \w to be [0-9A-Z_a-z].  The index must be < 128.

Is JS_ISWORD redundant now? I see it still used in the patch, but it uses the ctype.h lookup table.

>+        if (JS7_ISDEC(*p)) {
>+            if (classes & Digit)
>+                return false;
>+        }
>+        else if (JS_ISWORD(*p)) {

Nit here and throughout: cuddle else with preceding } (dmr, K&R original style).

>+            if (classes & OtherAlnum)
>+                return false;
>+        }
>+        else if (RE_IS_LINE_TERM(*p)) {
>+            if (classes & LineTerms)
>+                return false;
>+        }
>+        else if (JS_ISSPACE(*p)) {
>+            if (classes & OtherSpace)
>+                return false;
>+        }
>+        else {
>+            if (classes & Other)
>+                return false;
>+        }
>+    }
>+    return true;
>+ * Predicate version of the STL's set_intersection.  Assumes both ranges are
>+ * sorted and thus runs in linear time.
>+ *
>+ * FIXME: This is a reusable algorithm, perhaps it should be put somewhere.
>+ */
>+template <class InputIterator1, class InputIterator2>
>+bool set_disjoint(InputIterator1 p1, InputIterator1 end1,
>+                  InputIterator2 p2, InputIterator2 end2)

Super-nit, hate to pick it but we've kept consistent even in recent C++ code: type on preceding line, function declarator starts in new line at column 1.

>+    if (p1 == end1 || p2 == end2)
>+        return true;
>+    while (true) {
>+        if (*p1 < *p2) {
>+            ++p1;
>+            if (p1 == end1)
>+                return true;
>+        } else if (*p2 < *p1) {
>+            ++p2;
>+            if (p2 == end2)
>+                return true;
>+        }
>+        else {
>+            return false;
>+        }
>+    }

Seems better to avoid the appearance of falling off the end of the function, by putting the *p1 == *p2 test in the loop condition instead of uselessly-faux-iloopy true, and return false; before closing brace of function.

>+    switch (node->op) {
>+        case REOP_EMPTY:  return true;
>+        case REOP_FLAT:   return false;
>+        case REOP_CLASS:  return false;
>+        case REOP_ALNUM:  return false;
>+        case REOP_ALT:    return (mayMatchEmpty((RENode *)node->kid) ||
>+                                  mayMatchEmpty((RENode *)node->u.kid2)) &&
>+                                 (!next || mayMatchEmpty(node->next));
>+        case REOP_QUANT:  return (node->u.range.min == 0 ||
>+                                  mayMatchEmpty((RENode *)node->kid)) &&
>+                                 (!next || mayMatchEmpty(node->next));
>+        default:          return true;

Prevailing style half-indents labels (case, default and goto target) like the next major hunk:

>+    switch (node->op) {
>+      /* Record as bitflags. */
>+      case REOP_DOT:       set.addClass(CharSet::Dot);     return true;

Nice to see more idiomatic high-class C++ style code over time!

(In reply to comment #21)

Thanks; don't worry, I'll get this style eventually :)

> (From update of attachment 388232 [details] [diff] [review])
> >+/*
> >+ * This table allows efficient testing for \w during a trace.  ECMA-262
> >+ * defines \w to be [0-9A-Z_a-z].  The index must be < 128.
> Is JS_ISWORD redundant now? I see it still used in the patch, but it uses the
> ctype.h lookup table.

Perhaps I should place "extern js_alnum" in jsstr.h, define js_alnum in jsstr.cpp, and change #define JS_ISWORD to use js_alnum instead of of std::isalnum?
Comment on attachment 388232 [details] [diff] [review]

On the OOM checking, I know that initially, I did not check OOM in the branch patch functions (and returned void, just like you did), but found it necessary to avoid some crashes. Nanojit OOMs have changed since then, so maybe it's OK. I should look up those bugs and see if they have any good test cases.
Attachment #388232 - Flags: review?(dmandelin) → review+
Attached patch v.6 (obsolete) — Splinter Review
Other than the stylistic changes, this patch moves js_alnum from jsregexp.cpp to js_Alnum in jsstr.h/jsstr.cpp and redefines JS_ISWORD to use js_Alnum.  In a  micro-benchmark (gcc -O3) of this new table lookup vs. std::isalnum, the table lookup is 10x faster because (1) its inlined and (2) isalnum doesn't know we only care about [0,127] and so it does locale-ish things.
Attachment #388232 - Attachment is obsolete: true
Attached patch rebased (obsolete) — Splinter Review
Attachment #388294 - Attachment is obsolete: true
Attachment #389022 - Attachment is obsolete: true
Attachment 389036 [details] [diff] pushed to TM as b837948c1daf.
Depends on: 505083
Closed: 14 years ago
Resolution: --- → FIXED
Depends on: 505400
Blocks: 506201
Flags: blocking1.9.2? → wanted1.9.2+
You need to log in before you can comment on or make changes to this bug.