Closed Bug 123437 Opened 23 years ago Closed 21 years ago

regexp backreferences /(a)? etc./ must hold |undefined| if not used

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: pschwartau, Assigned: rogerl)

References

Details

(Whiteboard: [QA note: verify HTML testcase as well as the JS testcase])

From correspondence between rogerl and waldemar:

Waldemar: "The match array always has N+1 elements, where N is
the number of capturing left parentheses in the regular expression,
regardless of whether all of them are actually used or not."


EXAMPLE:                     arr = /(a)?a/("a")
arr.toSource()   SHOULD BE:  ["a", undefined]
BUT IN CURRENT SPIDERMONKEY: ["a"]


EXAMPLE:                     arr = /a|(b)/("a")
arr.toSource()   SHOULD BE:  ["a", undefined]
BUT IN CURRENT SPIDERMONKEY: ["a"]


EXAMPLE:                     arr = /(a)?(a)/("a")    
arr.toSource()    SHOULD BE: ["a", undefined, "a"]
BUT IN CURRENT SPIDERMONKEY: ["a", "",        "a"]
Note in the last example,  ["a", , "a"] would also be acceptable output.
That is, as an array ["a", , "a"] is equivalent to ["a", undefined, "a"].

The same contraction cannot be made in the first two examples above.
That is, the array ["a",] is not equivalent to ["a", undefined].

Why? Trailing commas are ignored; ["a",] is equivalent to ["a"].
Note the contrast between a capturing left parenthesis "(" 
                  and a NON-capturing left parenthesis "(?:"
Testcase added to JS testsuite:

            mozilla/js/tests/ecma_3/RegExp/regress-123437.js
*** Bug 156465 has been marked as a duplicate of this bug. ***
cc'ing Brendan for his advice. This bug will be fixed by the big
RegExp rewrite in bug 85721. But should we push in a fix for this
separately (since it is a small fix) to ensure this gets into JS1.5?
Here is the HTML testcase from the duplicate bug 156465:
http://bugzilla.mozilla.org/attachment.cgi?id=90721&action=view

OUTPUT IN NN4.7, IE6:
// A regexp to trim strings -
re = /^\s*(\S+(\s+\S+)*)*\s*$/g

str = " hello world! " (note single spaces on each side)
str.replace(re, "\"$1\"") = "hello world!"

str = " " (i.e. just a single space)
str.replace(re, "\"$1\"") = ""


OUTPUT IN MOZILLA (trunk 20020701xx): the latter case fails:
str = " " (i.e. just a single space)
str.replace(re, "\"$1\"") = "$1"
Whiteboard: [QA note: verify HTML testcase as well as the JS testcase]
Summary: Backreferences /(a)? etc./ must hold |undefined| if not used → regexp backreferences /(a)? etc./ must hold |undefined| if not used
Note: this bug will be fixed by the patch for bug 85721;
adding that bug as a dependency -
Depends on: RegExpPerf
Blocks: 223247
The RegExp rewrite (bug 85721) has now gone in.

It has fixed this bug (the JS testcase in Comment #3 now passes).
However, the HTML testcase given in Comment #6 still fails.

Therefore I will resolve this bug as FIXED, but re-open bug 156465,
where the HTML testcase came from. Apparently bug 156465 was not a
duplicate of the current bug after all -
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Oops, I was checking the HTML testcase with an outdated build!
It now passes, too. All to the good; I will not re-open bug 156465.

Marking Verified -
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.