Closed Bug 237111 Opened 20 years ago Closed 18 years ago

g parameter in RegExp causes alternating true/false result on same string

Categories

(Core :: JavaScript Engine, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 98409

People

(Reporter: joachim.kathmann, Unassigned)

References

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; de-AT; rv:1.6) Gecko/20040113
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; de-AT; rv:1.6) Gecko/20040113

If a regular expression is finished with the g parameter for global search then
the first time you execute the test() or exec() method on a given string, the
result is correct(e.g. true). Using the same expression on the same string a
second time, it returns the opposite result (e.g. false). A third time the
result is the same as the first time and so on.


Reproducible: Always
Steps to Reproduce:
1. Save the following source code as HTML file:
<html>
	<head>
		<title>BugTest</title>
	</head>
	<script language="JavaScript">
		function BugTest(sString){
			var expr=/[a-z]+/g;
			return expr.test(sString);
		}
	</script>
	<body>
		<form id="test" name="test">
			<input type="button" id="btn" name="btn" value="Click Me"
onClick="alert(BugTest('teststring'));">
		</form>
	</body>
</html>
2. Open the file in Mozilla and click the button
3. First result is true
4. Click again
5. Second result is false
6. Click again
7. Third result is true
8. remove g from expr and save page
9. Reload page in browser
10. Every click returns true
Actual Results:  
The result of the RegExp Method test() alternates with every click on the button.

Expected Results:  
The result should be the same all the time.

This description also applies for the gi parameter combination.
On Internet Explorer 5.5 the result is always correct whether you use g or not.
Attached file Testcase
Attach reporters testcase
Hm, this is bad. Alternating awnsers from a regexp.

Confirming with Mozilla 1.7a under WinXP. Removing the g parameter indeed solves
the problem.

Bug 165353 and bug 209919 sound related but both are supposed to be fixed the
bug bug 85721 so I don't think they are duplicates.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Actually, JS regexps are not supposed to always return the same thing.  They
store state, so the answer will depend on what you have done with the regexp up
to now.

In particular, the algorithm in the ECMA-262 spec
(http://www.ecma-international.org/publications/files/ecma-st/ECMA-262.pdf)
gives us the following:

First invocation:

1. Let S = "teststring"
2. Let length = 10
3. Let lastIndex = 0 
4. Let i = 0 
5. Does nothing since "g" option is used.
6. Does nothing since 0 <= i <= length
7. Call to [[Match]] succeeds.  Go to step 10.
10. Let e = 10
11. Set lastIndex = 10
12. Return match

Second invocation:

1. Let S = "teststring"
2. Let length = 10
3. Let lastIndex = 10 
4. Let i = 10
5. Does nothing since /g option is used.
6. Does nothing since 0 <= i <= length
7. Call to [[Match]] fails, since there is nothing to match at position i.
   Go to step 8.
8. Let i = 11.
9. Go to step 6.
6'. i > length, so set lastIndex to 0 and return null.

Note that we returned null and the next time through we will start matching at
the beginning of the string again.  Hence the alternating behavior in the
testcase.  The point is that the /g option allows you to test the regexp against
the string multiple times, each time starting with the preceding match until no
more matches are left.  When that happens, null is returned to indicate no more
matches and the regexp is reset to the beginning of the string again.

So it looks like the problem is that IE has a bug in its implementation of /g
(and this bug is rather well-known if you look at the other /g-related bugs in
bugzilla).
Boris, thanks for this nice piece of education. I indeed do see the alternating
pattern occurring:

First run:
expr.global = true
expr.lastIndex = 0
match returns true

Second run:
expr.global = true
expr.lastIndex = 10
match returns false

etc.

What first of all surprises me is that the state is carried on to the next
instance of a regexpr. Apparently it is stored in the machine and not in the
instance. However, not having thoroughly studied the specs yet (it's kinda big)
I'll accept this blindly.

I'm however somewhat confused about a statement on page 101 of the specs:

15.5.4.10 String.prototype.match (regexp)

If regexp is not an object whose [[Class]] property is "RegExp", it is replaced
with the result of the expression new RegExp(regexp). Let string denote the
result of converting the this value to a string. Then do one of the following:
• If regexp.global is false: Return the result obtained by invoking
RegExp.prototype.exec (see 15.10.6.2) on regexp with string as parameter.
• If regexp.global is true: Set the regexp.lastIndex property to 0 and invoke
RegExp.prototype.exec repeatedly until there is no match. If there is a match
with an empty string (in other words, if the value of regexp.lastIndex is left
unchanged), increment regexp.lastIndex by 1. Let n be the number of matches. The
value returned is an array with the length property set to n and properties 0
through n–1 corresponding to the first elements of the results of all matching
invocations of RegExp.prototype.exec.

The spec states that "If regexp.global is true: Set the regexp.lastIndex
property to 0 and invoke RegExp.prototype.exec repeatedly until there is no
match." In the testcase expr.global always returns true which would mean, if I
understand it correctly, that the lastIndex property should always be set to 0.
In your second run you say the lastIndex = 10. Is there an explanation for this?
(I just hope I'm not looking at the wrong function here. At least I think I'm not.)

Like I said, I've only taken a very short look at it so far so don't shoot me if
I got it all wrong ;)
Hm, I guess I was looking at the wrong function. I should be looking at
RegExp.prototype.test as defined on page 145 and RegExp.prototype.exec as
defined on page 144, right? And the one I mentioned earlier doesn't apply here,
or does it?
Funny, if RegExp.test in the Netscape Javascript documentation is implemented by
RegExp.prototype.test and RegExp.exec from the Netscape docs is implemented by
RegExp.prototype.exec then why does the Netscape Javascript documentation say
the following about RegExp.test:

test
Executes the search for a match between a regular expression and a specified
string. Returns true or false.

Method of: RegExp
Implemented in: JavaScript 1.2, NES 3.0

Syntax:
regexp.test([str])

Parameters
regexp: The name of the regular expression. It can be a variable name or a literal.
str: The string against which to match the regular expression. If omitted, the
value of RegExp.input is used.

Description
When you want to know whether a pattern is found in a string use the test method
(similar to the String.search method); for more information (but slower
execution) use the exec method (similar to the String.match method).


How can exec be slower than test if test is implemented as
RegExp.prototype.exec(string) != null ?
(In reply to comment #4)
> What first of all surprises me is that the state is carried on to the next
> instance of a regexpr.

That's why I cced brendan and left the bug open.  I'm not sure whether having a
var declared like that should create a new regexp instance every time through
the function or not (it apparently does not in Mozilla).  I _think_ regexp
literals are evaluated and converted to objects at compile time, though...

(In reply to comment #5)
> Hm, I guess I was looking at the wrong function. I should be looking at
> RegExp.prototype.test as defined on page 145 and RegExp.prototype.exec as
> defined on page 144, right?

Yes.  My apologies for not including the spec section number in my comment; I
meant to and forgot.

As for the NS javascript docs, I don't really know enough about the JS engine to
know what's up there.  Brendan?
Thanks for your reply.

> How can exec be slower than test if test is implemented as
> RegExp.prototype.exec(string) != null ?

This makes me wonder: You can always bail out once you have the first match. The
awnser will be correct the first time but does that leave the machine in the
same state? Perhaps also a nice one for Brendan to awnser.
There are a lot of questions here best answered by developer docs, JS books, and
the ECMA spec.  They don't constitute a bug.

See ECMA-262 Edition 3 7.8.5, first paragraph, which stipulates that a regexp
literal creates a RegExp object once per source literal, when the program is
scanned, and evaluating that scanned reference results in a reference to the
same single object.

test is faster than exec because exec constructs and returns a match array on
match; test does not.

The Netscape JS docs have bugs, but again, those should not be reported here in
bugzilla.mozilla.org.

Take IE bugs to Microsoft, of course.

/be
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → INVALID
*** Bug 238723 has been marked as a duplicate of this bug. ***
*** Bug 245376 has been marked as a duplicate of this bug. ***
*** Bug 303554 has been marked as a duplicate of this bug. ***
*** Bug 313591 has been marked as a duplicate of this bug. ***
*** Bug 331504 has been marked as a duplicate of this bug. ***
I know this is closed, but I don't think the answer that it's not a bug is correct:

"See ECMA-262 Edition 3 7.8.5, first paragraph, which stipulates that a regexp
literal creates a RegExp object once per source literal, when the program is
scanned, and evaluating that scanned reference results in a reference to the
same single object."

"If regexp.global is true: Set the regexp.lastIndex property to 0"


This implies that the functionality of not creating a new object each time is correct, but that the index should be reset on the existing object every time you "declare" it.

I originally noticed this problem with regex.match.  Rene Pronk quoted from the regex.match documentation, but I highly doubt this specific functionality would be different from method to method.
ECMA TG1 views the singleton per literal design decision as a mistake, and believes that fixing it incompatibly, to create a new RegExp object on each evaluation of the literal, will only make code work as intended, not break anything.

This is on our list for Edition 4 incompatible bug-fix changes.

Bugzilla is not the place to track ECMA stuff.

Isn't this bug a dup of a much older bug?

/be
Whiteboard: DUPEME
yep. 
Status: RESOLVED → REOPENED
Resolution: INVALID → ---

*** This bug has been marked as a duplicate of 98409 ***
Status: REOPENED → RESOLVED
Closed: 20 years ago18 years ago
Resolution: --- → DUPLICATE
Whiteboard: DUPEME
You need to log in before you can comment on or make changes to this bug.