Last Comment Bug 98409 - (regexpliteralflaw) literal global regular expression (regexp) instance remembers lastIndex
(regexpliteralflaw)
: literal global regular expression (regexp) instance remembers lastIndex
Status: RESOLVED FIXED
fixed-in-tracemonkey
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal with 1 vote (vote)
: mozilla1.9.3a1
Assigned To: Brendan Eich [:brendan]
:
Mentors:
: 112376 185415 225408 237111 252363 286259 291175 296610 330221 330334 334795 339687 347748 367388 379341 386028 386049 387683 393544 402046 418408 423155 431515 446182 449758 451404 453635 455041 474412 483196 484933 507074 520534 534951 571875 576124 596466 616798 646002 (view as bug list)
Depends on: 540306 540985
Blocks: es5
  Show dependency treegraph
 
Reported: 2001-09-05 14:46 PDT by fstmaurice
Modified: 2013-06-14 09:47 PDT (History)
60 users (show)
brendan: wanted1.9.1?
brendan: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Test Case of the bug (469 bytes, text/html)
2001-09-05 14:46 PDT, fstmaurice
no flags Details
Reduced testcase (1.08 KB, text/html)
2001-09-07 12:42 PDT, Phil Schwartau
no flags Details
draft patch, late-in-the-day-of xmas gift (9.70 KB, patch)
2009-12-25 23:24 PST, Brendan Eich [:brendan]
no flags Details | Diff | Review
shorter draft patch (9.12 KB, patch)
2009-12-26 13:00 PST, Brendan Eich [:brendan]
igor: review+
Details | Diff | Review
passes tests (10.96 KB, patch)
2009-12-29 12:16 PST, Brendan Eich [:brendan]
igor: review+
Details | Diff | Review
fix the test to track the semantic change (13.19 KB, patch)
2010-01-06 07:09 PST, Brendan Eich [:brendan]
no flags Details | Diff | Review

Description fstmaurice 2001-09-05 14:46:15 PDT
When testing a regular expression such as in an email validation, the
RegExp.test() method returns the right value only half the time.

Reproducible: Always
Steps to Reproduce:

- Load the following attachement
- Click several times on the RegExp.test() button to validate the email

Actual Results:  The email is found to be valid only half the time

Expected Results:  The test method should always return true.

Other comments: This testcase works as expected in Internet Explorer, but not in
NS4.
Comment 1 fstmaurice 2001-09-05 14:46:59 PDT
Created attachment 48326 [details]
Test Case of the bug
Comment 2 fstmaurice 2001-09-05 14:49:15 PDT
For a workaround, just take out the comment on line 8 of the test case.
Comment 3 Boris Zbarsky [:bz] (Out June 25-July 6) 2001-09-05 22:45:39 PDT
seeing this on current linux trunk as well
Comment 4 Phil Schwartau 2001-09-07 12:37:11 PDT
You can see this behavior with a simpler example: 

                     var RE = /a/g;
                     var STR = 'aaa';


Then do RE.test(STR) in a loop. I will attach this test below -
Comment 5 Phil Schwartau 2001-09-07 12:42:03 PDT
Created attachment 48609 [details]
Reduced testcase
Comment 6 Phil Schwartau 2001-09-07 12:49:18 PDT
The reduced testcase runs the test in two loops of five turns each.

It shows that each time RE.test(STR) is executed, RE.lastIndex increments,
and the next RE.test(STR) is done from this position in STR. Once we have 
reached the end of STR, RE.lastIndex == STR.length and there is no match
from this position. At this point, RE.lastIndex gets set back to 0.
Thus when we test again, we get a match again, at the beginning of STR.
Comment 7 Phil Schwartau 2001-09-07 13:10:39 PDT
From the ECMA-262 Final Spec (December 1999):

15.10.6.3 RegExp.prototype.test(string) 
Equivalent to the expression RegExp.prototype.exec(string) != null. 


15.10.6.2 RegExp.prototype.exec(string) 
Performs a regular expression match of string against the regular expression
and returns an Array object containing the results of the match, or null
if the string did not match 

The string ToString(string) is searched for an occurrence of the regular 
expression pattern as follows: 

1.  Let S be the value of ToString(string). 
2.  Let length be the length of S. 
3.  Let lastIndex be the value of the lastIndex property. 
4.  Let i be the value of ToInteger(lastIndex). 
5.  If the global property is false, let i = 0. 
6.  If I < 0 or I > length then set lastIndex to 0 and return null. 
7.  Call [[Match]], giving it the arguments S and i. If [[Match]] returned             
    failure, go to step 8; otherwise let r be its State result & go to step 10. 
8.  Let i = i+1. 
9.  Go to step 6. 
10. Let e be r's endIndex value. 
11. If the global property is true, set lastIndex to e. 
12. Let n be the length of r's captures array.
    (This is the same value as 15.10.2.1's NCapturingParens.) 
13. Return a new array with the following properties: 
  • The index property is set to the position of the matched substring
    within the complete string S. 
  • The input property is set to S. 
  • The length property is set to n + 1. 
  • The 0 property is set to the matched substring
    (i.e. the portion of S between offset i inclusive and offset e exclusive). 
  • For each integer i such that I > 0 and I •n, set the property named                     
    ToString(i) to the i th element of r's captures array.
  
Comment 8 Phil Schwartau 2001-09-07 13:13:07 PDT
Here is the output of the reduced testcase:


TESTING:   RE = /a/g
           STR = 'aaa'
          (two loops of five turns)

RE.test(STR) = true    RE.lastIndex = 1    RegExp.lastMatch = a    
RE.test(STR) = true    RE.lastIndex = 2    RegExp.lastMatch = a    
RE.test(STR) = true    RE.lastIndex = 3    RegExp.lastMatch = a    
RE.test(STR) = false   RE.lastIndex = 0    RegExp.lastMatch = a    
RE.test(STR) = true    RE.lastIndex = 1    RegExp.lastMatch = a    

RE.test(STR) = true    RE.lastIndex = 2    RegExp.lastMatch = a    
RE.test(STR) = true    RE.lastIndex = 3    RegExp.lastMatch = a    
RE.test(STR) = false   RE.lastIndex = 0    RegExp.lastMatch = a    
RE.test(STR) = true    RE.lastIndex = 1    RegExp.lastMatch = a    
RE.test(STR) = true    RE.lastIndex = 2    RegExp.lastMatch = a    
Comment 9 Phil Schwartau 2001-09-07 13:19:01 PDT
FINDINGS

I get the same results above in both NN4.7 and Mozilla.
In IE4.7 I get


TESTING:   RE = /a/g
           STR = 'aaa'
          (two loops of five turns)

RE.test(STR) = true    RE.lastIndex = undefined    RegExp.lastMatch = undefined    
RE.test(STR) = true    RE.lastIndex = undefined    RegExp.lastMatch = undefined    
RE.test(STR) = true    RE.lastIndex = undefined    RegExp.lastMatch = undefined    
RE.test(STR) = true    RE.lastIndex = undefined    RegExp.lastMatch = undefined    
RE.test(STR) = true    RE.lastIndex = undefined    RegExp.lastMatch = undefined    

RE.test(STR) = true    RE.lastIndex = undefined    RegExp.lastMatch = undefined    
RE.test(STR) = true    RE.lastIndex = undefined    RegExp.lastMatch = undefined    
RE.test(STR) = true    RE.lastIndex = undefined    RegExp.lastMatch = undefined    
RE.test(STR) = true    RE.lastIndex = undefined    RegExp.lastMatch = undefined    
RE.test(STR) = true    RE.lastIndex = undefined    RegExp.lastMatch = undefined    
Comment 10 Phil Schwartau 2001-09-07 14:16:32 PDT
I think that NN4.x, Mozilla, and N6 are behaving according to spec here.
In particular, note ECMA steps 6 and 11:


6.  If i < 0 or i > length then set lastIndex to 0 and return null. 

11. If the global property is true, set lastIndex to e.


So if the global property is true (as in this bug report),
lastIndex gets incremented. Once we have lastIndex > length, the
method is supposed to return null by step 6. When this happens,
then RE.test() == (RE.prototype.exec(STR) != null) == false,
and lastIndex gets set back to 0.
Comment 11 Phil Schwartau 2001-09-07 14:18:00 PDT
Have to mark this one invalid. Thank you for this report, however -
please continue using Bugzilla. We depend on contributors like you
to catch the things we miss!

Will ask rogerl to verify this one -
Comment 12 Phil Schwartau 2001-09-07 14:22:28 PDT
rogerl agrees - marking Verified. 

Note that RE.test(STR)== false can actually be utilized as information -
it tells you when you've gone beyond the length of the string STR.
Comment 13 Phil Schwartau 2001-11-28 10:09:04 PST
*** Bug 112376 has been marked as a duplicate of this bug. ***
Comment 14 Boris Zbarsky [:bz] (Out June 25-July 6) 2002-12-14 22:06:16 PST
*** Bug 185415 has been marked as a duplicate of this bug. ***
Comment 15 Garrett Smith 2002-12-15 12:02:00 PST
I apologize. Invalid bug.

6.  If I < 0 or I > length then set lastIndex to 0 and return null. 


So \g matches the last index. Should really be "g", though, which is why I
didn't get the behavior in IE. With the RegExp constructor, you don't escape flags.
Comment 16 Phil Schwartau 2003-11-24 11:56:12 PST
*** Bug 225408 has been marked as a duplicate of this bug. ***
Comment 17 Brendan Eich [:brendan] 2004-07-22 11:08:29 PDT
Better summary for duping.

/be
Comment 18 Brendan Eich [:brendan] 2004-07-22 11:10:20 PDT
*** Bug 252363 has been marked as a duplicate of this bug. ***
Comment 19 Brendan Eich [:brendan] 2004-07-22 13:53:46 PDT
*** Bug 252363 has been marked as a duplicate of this bug. ***
Comment 20 Brendan Eich [:brendan] 2004-07-22 13:57:09 PDT
Note also that a literal regexp causes one object to be created at runtime when
the script or function containing the literal regexp is evaluated, even if the
literal is in a loop body.  If you call new RegExp, you'll get one object per
new expression evaluated, which in a loop body will be as many as the loop iterates.

/be
Comment 21 Erik Fabert 2005-04-20 11:25:06 PDT
*** Bug 291175 has been marked as a duplicate of this bug. ***
Comment 22 Erik Fabert 2005-06-04 06:43:53 PDT
*** Bug 296610 has been marked as a duplicate of this bug. ***
Comment 23 shutdown 2006-03-13 02:11:34 PST
*** Bug 330221 has been marked as a duplicate of this bug. ***
Comment 24 Erik Fabert 2006-03-13 07:26:30 PST
*** Bug 330334 has been marked as a duplicate of this bug. ***
Comment 25 :Gavin Sharp [email: gavin@gavinsharp.com] 2006-03-23 12:38:00 PST
*** Bug 286259 has been marked as a duplicate of this bug. ***
Comment 26 Bob Clary [:bc:] 2006-03-23 15:20:35 PST
*** Bug 237111 has been marked as a duplicate of this bug. ***
Comment 27 :Gavin Sharp [email: gavin@gavinsharp.com] 2006-04-20 03:21:59 PDT
*** Bug 334795 has been marked as a duplicate of this bug. ***
Comment 28 Frank Wein [:mcsmurf] 2006-05-30 10:46:43 PDT
*** Bug 339687 has been marked as a duplicate of this bug. ***
Comment 29 Ryan Flint [:rflint] (ping via IRC for reviews) 2006-08-07 08:32:43 PDT
*** Bug 347748 has been marked as a duplicate of this bug. ***
Comment 30 Ryan Flint [:rflint] (ping via IRC for reviews) 2007-05-01 04:52:04 PDT
*** Bug 379341 has been marked as a duplicate of this bug. ***
Comment 31 Dave Townsend [:mossop] 2007-06-27 03:33:04 PDT
*** Bug 386028 has been marked as a duplicate of this bug. ***
Comment 32 Dave Townsend [:mossop] 2007-06-27 06:53:36 PDT
*** Bug 386049 has been marked as a duplicate of this bug. ***
Comment 33 Adam Guthrie 2007-07-11 10:09:19 PDT
*** Bug 387683 has been marked as a duplicate of this bug. ***
Comment 34 Simon Bünzli 2007-08-24 05:49:46 PDT
*** Bug 393544 has been marked as a duplicate of this bug. ***
Comment 35 Brendan Eich [:brendan] 2007-10-15 14:44:27 PDT
See http://wiki.ecmascript.org/doku.php?id=proposals:bug_fixes for the good news that this bug will no longer be INVALID in ES4. Should we reopen and fix ahead of the spec for JS1.8 in Firefox 3, to get real-world testing? I think so, and to avoid forgetting this issue, I'm making this bug blog the js1.8 meta-bug.

/be
Comment 36 Brian Crowder 2007-10-17 10:39:55 PDT
"[REGEXP.CONSTRUCTION] new RegExp(”pattern”) and /pattern/ mean different things: the latter is not created anew every time it is evaluated. This is usually surprising and is probably a historical accident? [Brendan says it was a mistake that optimized the wrong good.] In the context of E4X it will be more surprising still, since evaluating <tag>content</tag> yields a new XML object every time."

So we want /pattern/ to instantiate (or behave as though it is) a new RegExp object on every iteration?  For global regexp literals this means setting lastIndex = 0 before every match?  What other differences should it yield?
Comment 37 Brian Crowder 2007-10-17 10:41:35 PDT
Oh, and doesn't changing this behavior potentially break the web in a noticeable way?
Comment 38 Magnus Kristiansen 2007-10-17 15:07:27 PDT
The main problem is that lastIndex doesn't reset, but a full fix would require things like removing custom properties as well.

As for breakage, it depends on the use. IE recreates the regexp, so any application depending on this behavior would break there. When stored in a variable, the current behavior is also contrary to basic JS expectations (no static variables). 

Direct use, e.g. while(res=/re/g.exec(str)){}, is trickier. It's still static and breaks IE, but the staticness is less obvious here. There could be non-cross-browser scripts depending on it. So the public web would be safe, but some isolated sections might not.
Comment 39 Brendan Eich [:brendan] 2007-10-17 18:08:27 PDT
(In reply to comment #37)
> Oh, and doesn't changing this behavior potentially break the web in a
> noticeable way?

Not likely. The third most duped bug when I looked (see my blog, first post on
js1, js2 and in between) was about this. People have latent bugs due to it. The
sorts of workarounds you would add (resetting lastIndex too much) become
harmless with the incompatible "bug fix".

As Magnus points out, IE doesn't follow ES3 spec here. That's a strong clue that we can change without breaking the web, but user-agent control flow forks could bite us.

We are planning to make the "bug fix" for JS2/ES4: /hi/ evaluates each time to
a fresh object. Late to think about trying it out in 1.9/fx3 but we could take
that chance. Or do it soon after.

/be
Comment 40 :Gavin Sharp [email: gavin@gavinsharp.com] 2007-11-01 21:45:11 PDT
*** Bug 402046 has been marked as a duplicate of this bug. ***
Comment 41 Phil Ringnalda (:philor) 2008-02-19 09:09:08 PST
*** Bug 418408 has been marked as a duplicate of this bug. ***
Comment 42 Dave Townsend [:mossop] 2008-03-15 09:31:56 PDT
*** Bug 423155 has been marked as a duplicate of this bug. ***
Comment 43 Dave Townsend [:mossop] 2008-04-30 06:03:13 PDT
*** Bug 431515 has been marked as a duplicate of this bug. ***
Comment 44 Aiko 2008-07-19 08:47:18 PDT
*** Bug 446182 has been marked as a duplicate of this bug. ***
Comment 45 Sylvain Pasche 2008-08-08 08:57:14 PDT
*** Bug 449758 has been marked as a duplicate of this bug. ***
Comment 46 Aiko 2008-08-20 11:44:20 PDT
*** Bug 451404 has been marked as a duplicate of this bug. ***
Comment 47 Dave Townsend [:mossop] 2008-09-04 06:42:27 PDT
*** Bug 453635 has been marked as a duplicate of this bug. ***
Comment 48 Brendan Eich [:brendan] 2008-11-16 11:24:12 PST
*** Bug 367388 has been marked as a duplicate of this bug. ***
Comment 49 Brendan Eich [:brendan] 2009-01-20 14:24:35 PST
*** Bug 474412 has been marked as a duplicate of this bug. ***
Comment 50 Aiko 2009-03-11 06:28:08 PDT
*** Bug 455041 has been marked as a duplicate of this bug. ***
Comment 51 Spocke 2009-03-12 02:20:53 PDT
WebKit also does this the same as IE. So Gecko and Opera is the only ones doing it "the right way" right now. So I think it's safe to switch this behavior since normally users tends to make things cross browser these days and not Firefox only.
Comment 52 Simon Bünzli 2009-03-13 12:33:17 PDT
*** Bug 483196 has been marked as a duplicate of this bug. ***
Comment 53 Aiko 2009-03-24 05:47:02 PDT
*** Bug 484933 has been marked as a duplicate of this bug. ***
Comment 54 Brendan Eich [:brendan] 2009-03-24 15:11:10 PDT
IE, grrr. For years I had ass-u-med they followed ES3. That alone does not raise the temperature on this bug, but in conjunction with WebKit and (separately, also raising temp) ES3.1, I think it's time to morph.

The patch should be small. If it looks like trouble, 1.9.2. If not, let's please consider it with a careful risk analysis.

Magic-8-ball says "No doubt!" to "Waldo?".

/be
Comment 55 Jim Blandy :jimb 2009-07-01 10:43:31 PDT
Hi --- this bug is blocking for ECMAScript 5.  Any progress here?
Comment 56 Jeff Walden [:Waldo] (remove +bmo to email) 2009-07-06 17:39:07 PDT
No, haven't even looked at it in months.  Maybe soon.
Comment 57 Ryan Flint [:rflint] (ping via IRC for reviews) 2009-07-28 19:18:50 PDT
*** Bug 507074 has been marked as a duplicate of this bug. ***
Comment 58 John P Baker 2009-10-05 05:28:57 PDT
*** Bug 520534 has been marked as a duplicate of this bug. ***
Comment 59 Brendan Eich [:brendan] 2009-12-15 20:57:41 PST
*** Bug 534951 has been marked as a duplicate of this bug. ***
Comment 60 Jeff Walden [:Waldo] (remove +bmo to email) 2009-12-25 21:14:03 PST
I've started working on the patch, also doing a little thinking on what the parse tree and bytecode should look like.  Nothing testable yet, but this should be fairly straightforward to finish up.
Comment 61 Brendan Eich [:brendan] 2009-12-25 23:24:41 PST
Created attachment 419184 [details] [diff] [review]
draft patch, late-in-the-day-of xmas gift

Shouldn't require new bytecode or AST or anything, just adjustment to the meaning of JSOP_REGEXP and some nice reduction in code that optimized for the old lexical singleton way. Igor knows this code, please have him review your version of this, if this is close enough to the mark. Merry Christmas!

/be
Comment 62 Brendan Eich [:brendan] 2009-12-26 13:00:07 PST
Created attachment 419210 [details] [diff] [review]
shorter draft patch

Still "gifting" (not regifting ;-) this patch. Soliciting Igor's review since he has been involved in this code.

I removed more old, overlong, and now-bogus comments. Not only can't we hoist the cloning without a ton of analysis, we shouldn't try. Also, js_CloneRegExpObject calls js_NewObject, etc., which bootstrap parent for us, no need for that loop up the scope chain.

Unless there's some code (more likely stale comments) I'm forgetting that knows about script->nfixed for global code counting regexps in addition to gvars, or some similar code that knows about function reserved slots counting upvars and regexps lazily cloned as singleton-per-lexeme objects, this should do it.

/be
Comment 63 Igor Bukanov 2009-12-27 01:06:46 PST
(In reply to comment #62)
> I removed more old, overlong, and now-bogus comments. Not only can't we hoist
> the cloning without a ton of analysis, we shouldn't try.

The cloning avoidance makes sense currently because the script stores not only the regexp itself but also the object that holds it. If that object would be eliminated and script be taught to store the regexp directly, then there would be no point in doing JSOP_REGEXP->JSOP_OBJECT.
Comment 64 Brendan Eich [:brendan] 2009-12-29 12:16:02 PST
Created attachment 419446 [details] [diff] [review]
passes tests

There were a couple more places that knew about global scripts' nfixed members counting regexps as well as gvars.

/be
Comment 65 Brendan Eich [:brendan] 2009-12-29 22:05:36 PST
(In reply to comment #63)
> (In reply to comment #62)
> > I removed more old, overlong, and now-bogus comments. Not only can't we hoist
> > the cloning without a ton of analysis, we shouldn't try.
> 
> The cloning avoidance makes sense currently because the script stores not only
> the regexp itself but also the object that holds it. If that object would be
> eliminated and script be taught to store the regexp directly, then there would
> be no point in doing JSOP_REGEXP->JSOP_OBJECT.

Are you describing the pre-ES5 semantics here, or something about the patch that could be improved?

With ES5 there is a useful optimization still: compile-n-go scripts expressing regexp literals in statements executed at most once need not clone at all -- they can use the compiler-created "exemplar". The patch selects JSOP_OBJECT over JSOP_REGEXP for this case.

Otherwise ES5 semantics require JSOP_REGEXP's cloning behavior.

/be
Comment 66 Igor Bukanov 2009-12-30 01:27:29 PST
(In reply to comment #65)
> Are you describing the pre-ES5 semantics here, or something about the patch
> that could be improved?

That is for another bug.

> 
> With ES5 there is a useful optimization still: compile-n-go scripts expressing
> regexp literals in statements executed at most once need not clone at all --
> they can use the compiler-created "exemplar". The patch selects JSOP_OBJECT
> over JSOP_REGEXP for this case.

JSOP_REGEXP->JSOP_OBJECT currently makes sense because, to store JSRegExp, JSScript also needs to store JSObject holding it. If the latter can be avoided and JSScript would hold JSRegExp directly, the would be no need to do the optimization.

> Otherwise ES5 semantics require JSOP_REGEXP's cloning behavior.
> 
> /be
Comment 67 Brendan Eich [:brendan] 2009-12-30 07:50:06 PST
(In reply to comment #66)
> JSOP_REGEXP->JSOP_OBJECT currently makes sense because, to store JSRegExp,
> JSScript also needs to store JSObject holding it. If the latter can be avoided
> and JSScript would hold JSRegExp directly, the would be no need to do the
> optimization.

You need two data structures, one the compiler creates (expensive to duplicate) for the regular expression, the other a lightweight to hold lastIndex and any other mutable state (ad-hoc properties) -- the first is JSRegExp, the second is JSObject. You need one of the first, one or more of the latter.

Ah, but are you suggesting fusing the two as we did with JSFunction/JSObject for similar reasons, so the first has a "free" object (JSRegExp is-a JSObject)? That would only save a GC thing allocation at compile time, though. You'd still want two data structures and bytecodes, AFAICT.

/be
Comment 68 Igor Bukanov 2009-12-30 10:10:14 PST
(In reply to comment #67)
> You need two data structures, one the compiler creates (expensive to duplicate)
> for the regular expression, the other a lightweight to hold lastIndex and any
> other mutable state (ad-hoc properties) -- the first is JSRegExp, the second is
> JSObject. You need one of the first, one or more of the latter.

Currently the compiler creates both the object and JSRegExp even if the object is not used as the regexp would always be cloned. This extra object can be eliminated if the compiler creates and stores in JSScript just an instance of JSRegExp without this extra object. Then JSOP_REGEXP needs just to create a new object and set its private data to the stored JSRegExp. This way the object is created only when JSOP_REGEXP is executed avoiding any need in doing JSOP_REGEXP->JSOP_OBJECT optimization.
Comment 69 Brendan Eich [:brendan] 2009-12-30 14:02:43 PST
Gotcha -- seems like a good followup bug.

/be
Comment 70 Brendan Eich [:brendan] 2009-12-30 20:26:32 PST
This bug's patch needs a test, and needs to be committed. Waldo, do the es5 tests from MSFT cover this change? Do we have those integrated somehow, or will we? Or if not, where would you add the test to js/src/tests?

/be
Comment 71 Jeff Walden [:Waldo] (remove +bmo to email) 2009-12-31 10:40:24 PST
The MSFT es5 tests don't cover it -- or at least, assuming such tests would and should fit under 7.8.5, they don't exist.  We don't have them integrated yet; last I looked the harness was buggy -- since fixed, but I haven't checked on it recently (and, at the moment, the only harness interface only runs on Windows).  I assume we will integrate them sometime, but we haven't, they're not as comprehensive as our own tests in my estimation, and I'm wary of punting on testing until some unspecified time when we do integrate them anyway.

ecma_5/RegExp/ is as good a place as any for a new test, I guess.
Comment 72 Brendan Eich [:brendan] 2010-01-04 12:26:29 PST
http://hg.mozilla.org/tracemonkey/rev/3862a7e48e79

/be
Comment 73 Jason Orendorff [:jorendorff] 2010-01-05 10:53:03 PST
Backed out changeset 3862a7e48e79 due to tinderbox failures on js1_5/GC/regress-324278.js.

http://hg.mozilla.org/tracemonkey/rev/ee9b5d13cbaf

The error is JIT-only and looks like this:

$ ../d-objdir/js -j -f shell.js -f js1_5/shell.js -f js1_5/GC/shell.js js1_5/GC/regress-324278.js
begin test: js1_5/GC/regress-324278.js
BUGNUMBER: 324278
STATUS: GC without recursion
STATUS: BUILT
js1_5/GC/regress-324278.js:71: TypeError: chainTop is not a function
Comment 74 Brendan Eich [:brendan] 2010-01-05 21:35:23 PST
(In reply to comment #73)
> Backed out changeset 3862a7e48e79 due to tinderbox failures on
> js1_5/GC/regress-324278.js.
> 
> http://hg.mozilla.org/tracemonkey/rev/ee9b5d13cbaf
> 
> The error is JIT-only and looks like this:

Not JIT-only, from my gdb'ing the `jsDriver.pl -t` command line with and without -j.

> $ ../d-objdir/js -j -f shell.js -f js1_5/shell.js -f js1_5/GC/shell.js
> js1_5/GC/regress-324278.js
> begin test: js1_5/GC/regress-324278.js
> BUGNUMBER: 324278
> STATUS: GC without recursion
> STATUS: BUILT
> js1_5/GC/regress-324278.js:71: TypeError: chainTop is not a function

This is precisely because of the ES5-formalized change to semantics. The test has:

function build(N) {
  // Explore the fact that regexp literals are shared between
  // function invocations. Thus we build the following chain:
  // chainTop: function->regexp->function->regexp....->null
  // to check how GC would deal with this chain.

  var chainTop = null;
  for (var i = 0; i != N; ++i) {
    var f = Function('some_arg'+i, ' return /test/;');
    var re = f();
    re.previous = chainTop;
    chainTop = f;
  }
  return chainTop;
}

and later, check invokes chainTop:

function check(chainTop, N) {
  for (var i = 0; i != N; ++i) {
    var re = chainTop();
...

The second invocation of the synthesized (by Function) function evaluates /test/ into a fresh RegExp object, with no shared singleton-per-lexeme so no .previous property.

Igor, thoughts on how to fix the test?

/be
Comment 75 Brendan Eich [:brendan] 2010-01-05 21:36:34 PST
The comment means "Exploit", not "Explore". I'd like to re-land the patch for this bug with the test disabled. Comments?

/be
Comment 76 Brendan Eich [:brendan] 2010-01-06 07:09:24 PST
Created attachment 420320 [details] [diff] [review]
fix the test to track the semantic change
Comment 77 Jeff Walden [:Waldo] (remove +bmo to email) 2010-01-06 13:36:34 PST
Test change looks good to me.
Comment 78 Brendan Eich [:brendan] 2010-01-06 15:14:08 PST
http://hg.mozilla.org/tracemonkey/rev/7599965304e6

I landed again -- Igor, please feel free to ping me with any test changes you want and I'll make them.

/be
Comment 79 Mark S. Miller 2010-01-06 15:58:34 PST
Seems related to http://code.google.com/p/google-caja/issues/detail?id=528
Once this is fixed, would it then be safe to whitelist RegExp.prototype.test and RegExp.prototype.exec in Caja and draft SES[1]?

[1] http://code.google.com/p/es-lab/source/browse/trunk/src/ses/initSES.js#106
Comment 80 Brendan Eich [:brendan] 2010-01-06 16:37:06 PST
(In reply to comment #79)
> Seems related to http://code.google.com/p/google-caja/issues/detail?id=528

No, this bug (bug 98409) is not related to the issue mentioned there, specifically as described at:

http://code.google.com/p/google-caja/wiki/RegexpsLeakMatchGlobally

This bug is about a regexp literal evaluating to the same singleton per lexeme no matter how often evaluated, per ES3.

The issue described by the caja doc is about RegExp.input being the default arg to exec and test. I don't see a bugzilla bug on file. There is only a starting attempt at document the "RegExp static properties" here:

http://wiki.whatwg.org/wiki/Web_ECMAScript#RegExp

Before filing a bugzilla bug it would help to have a full testsuite and results for all the major browsers. In any case, we should take this elsewhere.

/be
Comment 81 Igor Bukanov 2010-01-07 03:31:00 PST
(In reply to comment #76)
> Created an attachment (id=420320) [details]
> fix the test to track the semantic change

The main purpose of that test from the bug 324278 is to show that the Deutsch-Schorr-Waite algorithm only works if it covers *all* possible navigation paths through the object graph. In particular, the test shows how to expose a bug in the DSW implementation via constructing a deep GC thing chain with links like JSFunction->JSScript->regexp_object_accessible_from_scripts->JSFunction. Since the DSW implementation was not aware that JSScript can point to an arbitrary object graph, that lead to stack overflow during the GC. 

Changes in this bug removes the possibility of having JSFunction->JSScript->regexp_object_accessible_from_scripts. So, if this bug were fixed prior that bug, the would be no way to show the problems in the DSW code using regexps in function literals.

To restore the original test coverage the test may use something like:

var cursor = null;
for (var i = 0; i != N; ++i) {
    var iter = Function('eval("yield /re/")')();
    var re = iter.next();
    re.prev = cursor;
    cursor = iter;
    iter = null;
}


This builds a chain like JSGenerator->JSScript->JSRegExp->JSGenerator.
Comment 83 Brendan Eich [:brendan] 2010-07-01 00:18:27 PDT
*** Bug 576124 has been marked as a duplicate of this bug. ***
Comment 84 John P Baker 2010-09-15 01:10:12 PDT
*** Bug 596466 has been marked as a duplicate of this bug. ***
Comment 85 Nickolay_Ponomarev 2010-11-06 11:43:46 PDT
*** Bug 571875 has been marked as a duplicate of this bug. ***
Comment 86 Boris Zbarsky [:bz] (Out June 25-July 6) 2010-12-07 09:02:58 PST
*** Bug 616798 has been marked as a duplicate of this bug. ***
Comment 87 John P Baker 2011-03-29 04:11:42 PDT
*** Bug 646002 has been marked as a duplicate of this bug. ***
Comment 88 Vitaliy Filippov 2011-04-26 05:36:00 PDT
Hello all!

I've found this bug while searching for problem in WikEd MediaWiki JS editor extension (http://en.wikipedia.org/wiki/User:Cacycle/wikEd), read all the comments and did not fully understand - which behaviour you think is the correct one?
1) /a/g.match('aaa') must remember position?
or
2) /a/g.match('aaa') must not remember position?

Now, Firefox 4 behaves like (2), so the following widely-used in WikEd code:
 while(/.../g.match(string) != null)
 {
 }
gives us an infinite loop.

My question is - is this Firefox 4 bug? Or do you think /.../g must create new Regexp object on each iteration?
Comment 89 Jeff Walden [:Waldo] (remove +bmo to email) 2011-04-26 06:45:36 PDT
(2) is correct behavior per ECMAScript 5.  (1) was specified by ECMAScript 3, but that's old and busted now, ES5 is the new hotness, and everyone implements (2) now (except maybe Opera, but if they haven't changed as we did I'd be very surprised).
Comment 90 Vitaliy Filippov 2011-04-27 02:35:26 PDT
Ok, thanks for the information.
It's really "except Google Chrome"...
Just checked, Opera does the same, and Chrome does ES3 behaviour...
Comment 91 Mark S. Miller 2011-04-27 11:29:12 PDT
(In reply to comment #90)
> Ok, thanks for the information.
> It's really "except Google Chrome"...
> Just checked, Opera does the same, and Chrome does ES3 behaviour...

What version of Chrome? I'm not seeing this bug on Chrome 12.0.742.9 dev.
Comment 92 Vitaliy Filippov 2011-04-27 12:45:05 PDT
Oops, that was some mystics. Now I also don't see this bug (on the same version). Maybe it was 11.x...
Comment 93 altieres 2012-01-13 09:22:26 PST
Hello, why this bug still happens in firefox 9? I tought that with the new especification, this bug should be fixed.
Chrome also have this bug...
Comment 94 qazuor 2013-06-14 09:43:14 PDT
Hi. In Firefox 21, on Ubuntu 64bits still happens.

I has been tested with:

var regExp = /ba/ig;
for (var i = 0; i < 10; ++i) {
    console.log(regExp.test('a'));
}

// Result:

// true
// false
// true
// false
// true
// false
// true
// false
// true
// false
Comment 95 Boris Zbarsky [:bz] (Out June 25-July 6) 2013-06-14 09:47:14 PDT
Yes, that's the spec-required behavior.  This bug was about using a regexp literal, which you are not.  As in, it was about this testcase:

  for (var i = 0; i < 10; ++i) {
    console.log(/ba/ig.test('a'));
  }

which conceptually creates a new RegExp object on each iteration.  You, on the other hand, are using the same RegExp object over and over.

Note You need to log in before you can comment on or make changes to this bug.