225094 - RegExp.exec does not return null when there are no more matches

Reporter

Description

•

22 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007 There is a problem when performing a regular expression in a loop that has only optional components; when no more matches are found the exec() does not return null, but an empty string or an array of empty strings; when you explicitly check for the null value in the loop the browser will hang in an eternal loop. Reproducible: Always Steps to Reproduce: simple testcase that will cause an eternal loop: <script type="text/javascript"> var s = 'abcdefghijklmnopqrstuvwxyz0123456789'; var re = /[a-z]?/g var match = re.exec(s); while (match != null) { document.write(match[0]+'<br />'); match = re.exec(s); } </script> Actual Results: browser hangup because of the eternal loop Expected Results: when there are no more matches, the exec() method should return the null value n/a

Boris Zbarsky [:bzbarsky]

Comment 1

•

22 years ago

> when no more matches are found That never happens, since a match is _always_ found for that regexp.... Consider the algorithm in the Ecma spec (http://www.ecma-international.org/publications/files/ecma-st/Ecma-262.pdf): RegExp.prototype.exec(string): 1. Let S be the value of ToString(string). 2. Let length be the length of S. 3. Let lastIndex be the value of the lastIndex property. 4. Let i be the value of ToInteger(lastIndex). 5. If the global property is false, let i =0. 6. If i <0 or i > length then set lastIndex to 0 and return null. 7. Call [[Match]], giving it the arguments S and i. If [[Match]] returned failure, go to step 8; otherwise let r be its State result and go to step 10. 8. Let i = i+1. 9. Go to step 6. 10. Let e be r's endIndex value. 11. If the global property is true,setlastIndex to e. 12. Let n be the length of r's captures array. (This is the same value as 15.10.2.1's NCapturingParens.) 13. Return a new array with the following properties: - The index property is set to the position of the matched substring within the complete string S. - The input property is set to S. - The length property is set to n + 1. - The 0 property is set to the matched substring (i.e. the portion of S between offset i inclusive and offset e exclusive). - For each integer i such that i >0 and i <= n, set the property named ToString(i) to the ith element of r's captures array. Now in our case we go through till we have i == length. Then we call [[Match]] in step 7. The patterm in question happily matches a zero-length string, so [[Match]] returns success. Then we go to step 10. endIndex == length, since the regexp matched but took up no chars, so lastIndex gets set to "length" and we return an array. Next time through the loop, we end up with exactly the same situation. So I believe our behavior is in fact correct -- a regexp that matches any string (such as the one provided) cannot ever return null from an exec() call.

crisp

Reporter

Comment 2

•

22 years ago

Even though it is according specs, i'd say returning nothing as a match (an empty string or array) can logically be concidered 'no match' and for the sake of preventing eternal loops just returning null might be the wiser thing to do - as Internet Explorer and Opera are also doing in this particular case...

Boris Zbarsky [:bzbarsky]

Comment 3

•

22 years ago

> Even though it is according specs, i'd say returning nothing as a match (an > empty string or array) can logically be concidered 'no match' Well, no. It's a match -- that much is clear. For example, does the regexp /^ *$/ match the empty string? Yes, it does. And if I do /^ *$/.test("") I expect it to return true. It does in Mozilla. It also does in Opera. /^ *$/.exec("") returns a non-null result in both Mozilla and Opera. Now here is the kicker. In Opera, yout testcase is actually an infinite loop (looking at Opera 7 Linux here). It just happens that in Opera you can interact with the UI even while this infinite loop is running (though it does chew up a nice amount of CPU as it runs). You can trivially test this by adding a document.write of a nonempty string or an alert inside the loop and watching what happens. I have no way to test IE, so I can't verify your claim there.

zack-weg

Comment 4

•

22 years ago

Note that for String.prototype.match() the spec does explicitly specify a different behavior. Thus I think it is intentional. Both behaviors differ from Perl, where a zero-length match forces the next match to have non-zero length (or to fail). js> "bar".match( /.??/g ).slice( 1 ).join( ":" ); :: js> "bar".split( /.??/g ).join( ":" ); b:a:r js> "bar".replace( /.??/g, "<$&>" ); <>b<>a<>r<> ~ $ perl -we 'print +(join ":", "bar" =~ /.??/g), "\n"' :b::a::r: ~ $ perl -we 'print +(join ":", split /.??/, "bar"), "\n"' ~ $ perl -we '$_= "bar"; s/.??/<$&>/g; print "$_\n"' <><b><><a><><r><> A related thing is /(.*)*/ (see bug 209919, comment 9 ff): js> "a".match( /(a|b*)*/ )[1]; a js> "a".match( /(b*)*/ )[1] === undefined; // Bug in Mozilla < 1.6a ! true ~ $ perl -we 'print +("a" =~ /(a|b*)*/)[0], "\n"' ~ $ perl -we 'print ! defined +("a" =~ /(b*)*/)[0], "\n"' Perl's behavior may be more intuitive, but ECMA has choosen to use more straight-forward rules.

Phil Schwartau

Comment 5

•

22 years ago

Attached file HTML testcase. Can be run safely in any browser - will not hang (obsolete) — Details

Phil Schwartau

Comment 6

•

22 years ago

The testcase sets re = /[a-z]?/g s = 'abc123' If you run it in IE6 and repeatedly call re.exec(s), you get this output: N = 0 match = undef re.lastIndex = 0 N = 1 match = ["a"] re.lastIndex = 1 N = 2 match = ["b"] re.lastIndex = 2 N = 3 match = ["c"] re.lastIndex = 3 N = 4 match = [""] re.lastIndex = 4 N = 5 match = [""] re.lastIndex = 5 N = 6 match = [""] re.lastIndex = 6 N = 7 match = [""] re.lastIndex = 7 N = 8 match = null re.lastIndex = 0 This shows that in IE, once re.exec(s) has consumed every character of |s|, if called again it will consume $, and move beyond the end of the string. Once beyond the string, it (as would Mozilla) returns a null match. Mozilla on the other hand is following the spec, as Boris demonstrated. I will try to get in touch with Waldemar and ask him to comment on this. As one of the authors of the spec, his insight will be invaluable here -

zack-weg

Comment 7

•

22 years ago

IE handles this as specified for a similar String.prototype.match(). Opera 7.11 is ECMA-compliant. Konqueror 3.1.4 has odd results (behaves as Perl does, but fails to stop at $): N = 0 match = undef re.lastIndex = 0 N = 1 match = ["a"] re.lastIndex = 1 N = 2 match = ["b"] re.lastIndex = 2 N = 3 match = ["c"] re.lastIndex = 3 N = 4 match = [""] re.lastIndex = 3 N = 5 match = [""] re.lastIndex = 4 N = 6 match = [""] re.lastIndex = 5 N = 7 match = [""] re.lastIndex = 6 N = 8 match = [""] re.lastIndex = 6 N = 9 match = [""] re.lastIndex = 6 ... ~ $ perl -e '$_= "abc123"; for $i (1..16) { $m= /[a-z]?/g; print "$i <$&> ${\(pos)}\n"; $m or last }' 1 <a> 1 2 <b> 2 3 <c> 3 4 <> 3 5 <> 4 6 <> 5 7 <> 6 8 <> When using match() with /g , Mozilla resets lastIndex (not changed by the regexp rewrite). Is this intentional or should I file a bug? According to ECMA it should be 7. js> var re= /[a-z]?/g, a= "abc123".match( re ); js> a.toSource() ["a", "b", "c", "", "", "", ""] js> re.lastIndex 0

zack-weg

Comment 8

•

22 years ago

I was wrong, the behavior of match() is correct. When lastIndex is 7, exec() is invoked one more time and sets it to 0.

Phil Schwartau

Comment 9

•

22 years ago

Attached file HTML testcase (better summary; left and right context added) — Details

Attachment #135096 - Attachment is obsolete: true

Phil Schwartau

Comment 10

•

22 years ago

As before, re = /[a-z]?/g s = 'abc123' The newer testcase makes it easier to see the difference between Mozilla and IE on re.exec(s). The crux of the matter: ------------------------------------- Moz ------------------------------------- N = 2 match = ["b"] lastIndex = 2 leftContext = a rightContext = c123 N = 3 match = ["c"] lastIndex = 3 leftContext = ab rightContext = 123 N = 4 match = [""] lastIndex = 3 leftContext = abc rightContext = 123 N = 5 match = [""] lastIndex = 3 leftContext = abc rightContext = 123 etc. etc. ------------------------------------- IE ------------------------------------- N = 2 match = ["b"] lastIndex = 2 leftContext = a rightContext = c123 N = 3 match = ["c"] lastIndex = 3 leftContext = ab rightContext = 123 N = 4 match = [""] lastIndex = 4 leftContext = abc rightContext = 123 N = 5 match = [""] lastIndex = 5 leftContext = abc1 rightContext = 23 The two browsers (and others, I expect) agree on the first four matches. So when N = 4, both browsers match the 0-length string between 'c' and '1'. When N = 5, IE decides to consume the '1'. The next match according to IE is therefore the 0-length string between '1' and '2'. Mozilla follows the algorithm Boris outlined above. Since the N=4 match is successful, we fall into Step 10 and 11. Since the match is of zero length, |lastIndex| remains the same. The '1' is never consumed. The next match will be the same 0-length string between 'c' and '1'.

Phil Schwartau

Comment 11

•

21 years ago

cc'ing Waldemar -

Shanti Rao

Comment 12

•

21 years ago

Perhaps re.exec() could return the null result set from the end of the string only once? Such as by setting lastIndex=-1 when it hits the end of the test string, and returning null thereafter.

Brendan Eich [:brendan]

Comment 13

•

20 years ago

Sorry, we follow ECMA, other browsers try to; everyone should stick to the spec. /be

QA Contact: pschwartau → general

Bob Clary [:bc] (inactive)

Comment 14

•

20 years ago

marking invalid since this is not a bug in the implementation.

Status: UNCONFIRMED → RESOLVED

Closed: 20 years ago

Resolution: --- → INVALID

Bugzilla

RegExp.exec does not return null when there are no more matches

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: crisp, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 1 obsolete file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Attachment

General

Description

File Name

Content Type

HTML testcase. Can be run safely in any browser - will not hang 22 years ago Phil Schwartau 2.78 KB, text/html		Details
HTML testcase (better summary; left and right context added) 22 years ago Phil Schwartau 3.19 KB, text/html		Details