Closed Bug 57572 Opened 24 years ago Closed 24 years ago

regexp with ? matches incorrectly

Categories

(Core :: JavaScript Engine, defect, P3)

x86
Linux
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: rginda, Assigned: rogerl)

References

Details

(Keywords: js1.5)

Attachments

(2 files)

js> str = "foobar"
js> str.match (/(\S+)? ?(.*)/);
foobar,fooba,r
js> str = "foo bar"
js> str.match (/(\S+)? ?(.*)/);
foo bar,foo,bar

The first match should have been "foobar, foobar", the second match is correct. 
This used to work.  Maybe something missed in rogerl's 3.33 revision if
jsregexp.c?
*** Bug 56249 has been marked as a duplicate of this bug. ***
*** Bug 56249 has been marked as a duplicate of this bug. ***
DOTSTAR was behaving as if no match was a failed match. Here's a patch:

Index: jsregexp.c
===================================================================
RCS file: /m/pub/mozilla/js/src/jsregexp.c,v
retrieving revision 3.33
diff -u -r3.33 jsregexp.c
--- jsregexp.c  2000/09/14 22:39:21     3.33
+++ jsregexp.c  2000/10/24 15:39:35
@@ -1865,7 +1865,6 @@
           for (cp2 = cp; cp2 < cpend; cp2++)
               if (*cp2 == '\n')
                   break;
-          if (cp2 == cp) return NULL;
           while (cp2 >= cp) {
               const jschar *cp3 = matchRENodes(state, ren->next, NULL, cp2);
               if (cp3 != NULL) {
works here, r=rginda.
Keywords: js1.5
<JS_ASSERT((ts == NULL) || (ts->linebuf.limit < ts->linebuf.base + 
JS_LINE_LIMIT));
>JS_ASSERT(!ts || (ts->linebuf.limit < ts->linebuf.base + JS_LINE_LIMIT));

maybe?
Keywords: patch, review
timeless: and don't parenthesize the < expression, this ain't Pascal.

rogerl, I'll review, someone else should too.

/be
Status: NEW → ASSIGNED
Added testcase to JS test suite:

             js/tests/ecma_3/RegExp/regress-57572.js


This testcase is currently passing on SpiderMonkey, but parts of it 
are failing in Rhino. Specifically:


CASE A     pattern = /^(\S+)?( ?)(B+)$/;  //single space before second "?" 
           string = 'AAABBB';             //no spaces in given string
           actual = string.match(pattern);
           expect = Array(string, 'AAABB', cnEmptyString, 'B');


In Rhino:  string.match(pattern) === null 



CASE B     pattern = /(.+)?(!?)(!+)/;
           string = 'WOW !!! !!!';
           actual = string.match(pattern);
           expect = Array(string, 'WOW !!! !!', cnEmptyString,  '!');


In Rhino:  string.match(pattern) = ("!!!", "!!! !!!", "!",  "!!")
           This is especially strange: although matches are being produced, 
           the first term in the match array is not the given string. 



In SpiderMonkey:

All expected values are being produced. (I do not know regular expressions 
well enough to know if these are indeed what we should expect, however - )
For convenience, here is a hyperlink to the testcase (once lxr updates):

http://lxr.mozilla.org/mozilla/source/js/tests/ecma_3/RegExp/regress-57572.js
Blocks: 66234
Generated a meta bug to capture all current R.E. bugs.
Nits:

- Tabs in the patch?  See for example

+            cp += 3;
+
state->cp = cp;
+        /* Handle empty paren */
+        if (*cp == ')') {

- else after goto is a non-sequitur:

+                    goto madeit;
+                }
+                else {

- (Same area) Why not make the error case the most indented one, by testing

+                if (argc >= 2 && !JSVAL_IS_VOID(argv[1])) { /* 'flags' defined */

instead of

+                if (argc < 2 || JSVAL_IS_VOID(argv[1])) { /* 'flags' undefined */

- (Same area) Hanging indent doesn't line up with start of actual parameters:

+                    JS_ReportErrorNumber(cx, js_GetErrorMessage, NULL, 
+                                                    JSMSG_NEWREGEXP_FLAGGED);

- JSMSG_IN_NOT_OBJECT wording too different from JSMSG_BAD_INSTANCEOF_RHS. 
Compare the wording of

-MSG_DEF(JSMSG_IN_NOT_OBJECT,           27, 0, JSEXN_ERR, "target of 'in'
operator must be an object")
+MSG_DEF(JSMSG_IN_NOT_OBJECT,           27, 0, JSEXN_TYPEERR, "target of 'in'
operator must be an object")

to

-MSG_DEF(JSMSG_BAD_INSTANCEOF_RHS,      31, 1, JSEXN_ERR, "invalid instanceof
operand {0}")
+MSG_DEF(JSMSG_BAD_INSTANCEOF_RHS,      31, 1, JSEXN_TYPEERR, "invalid
instanceof operand {0}")

Also, the latter provides a decompiled version of the offending right operand. 
I think the IN message and call (in jsinterp.c) should do the same.  It's good
to say *why* the operand is invalid, and it helps to mention which operand if
possible (neither the IN nor INSTANCEOF messages do that clearly; is "target"
the right operand?).  Sorry, beating a gnat with a crowbar here.

Pick these nits and sr=brendan@mozilla.org.

/be
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Verified Fixed - ran this test suite directory on Linux, WinNT, and Mac

                      js/tests/ecma_3/RegExp/

with both debug and optimized versions of the JS shell, and got 0 errors.  
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: