Closed Bug 169497 Opened 22 years ago Closed 22 years ago

regular expression matching not compatible with Perl 5

Categories

(Core :: JavaScript Engine, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

VERIFIED DUPLICATE of bug 85721

People

(Reporter: martin.honnen, Assigned: rogerl)

Details

Attachments

(3 files)

I am trying to write a regular expression that extracts the content of the <body> tag from a HTML string source. I have an expression that does what I want with Perl 5 but doesn't match anything with the XPCOM shell (JavaScript-C 1.5 pre-release 4a 2002-03-21) that comes with Mozilla 1.1. Here is the Perl program: $html = ""; $html .= "<html>\n"; $html .= "<body onload=\"alert(event.type);\">\n"; $html .= "<p>Kibology for all<\/p>\n"; $html .= "<p>All for Kibology<\/p>\n"; $html .= "<\/body>\n"; $html .= "<\/html>"; ($first, $second) = ($html =~ /<body.*>((.*\n?)*?)<\/body>/i); print "first submatch: $first\n"; print "second submatch: $second\n"; When I run that with Perl ( v5.6.1 built for MSWin32-x86-multi-thread) I get the following result: first submatch: <p>Kibology for all</p> <p>All for Kibology</p> second submatch: <p>All for Kibology</p> which means the content of the <body> tag in the source is correctly extracted as the first match. The JavaScript version looks as follows: var html = ''; html += '<html>\n'; html += '<body onload="alert(event.type);">\n'; html += '<p>Kibology for all<\/p>\n'; html += '<p>All for Kibology<\/p>\n'; html += '<\/body>\n'; html += '<\/html>'; var bodyMatch = /<body.*>((.*\n?)*?)<\/body>/i; function showMatch (re) { var r = ''; var match = re.exec(html); if (match) { var r = ''; for (var i = 0; i < match.length; i++) r += i + ':||' + match[i] + '||\n'; } print("match with " + re + ":"); print(r); print(""); } showMatch(bodyMatch); When I run this with the XPCOM shell I get the following output: match with /<body.*>((.*\n?)*?)<\/body>/i: that is no match is found. If I compile the JavaScript with jsc, the JScript.NET compiler and then run the executable, I get the following output: match with /<body.*>((.*\n?)*?)<\/body>/i: 0:||<body onload="alert(event.type);"> <p>Kibology for all</p> <p>All for Kibology</p> </body>|| 1:|| <p>Kibology for all</p> <p>All for Kibology</p> || 2:||<p>All for Kibology</p> || which means the <body> tag and it content are matched, the content of the <body> tag is the first submatch, and the second submatch is the same as with Perl too. Thus I think the Spidermonkey implementation is wrong, it should find a match and find the submatches for the parenthised expressions. I will upload the JavaScript file.
I have also tried the test case with the Rhino shell (rhino1_5R3), Rhino finds the match but doesn't set the first submatch correctly: match with /<body.*>((.*\n?)*?)<\/body>/i: 0:||<body onload="alert(event.type);"> <p>Kibology for all</p> <p>All for Kibology</p> </body>|| 1:|||| 2:||<p>All for Kibology</p> || At least that would also support that Spidermonkey should find a match
There's possibly some negative interaction between the concepts of Kibology and SpiderMonkey. However, applying the patch from bug 85721 fixes this bug so I'm thinking this should be dup'ed to that - except that Rhino already contains that fix and should work fine. When I tried Rhino I got the expected results (i.e. not what Martin saw in comment #3 - maybe you could give this a try on your build, Phil?)
Martin's testcase added to JS testsuite: mozilla/js/tests/ecma_3/RegExp/regress-169497.js Roger is correct: the fix for this is covered by the big patch for bug 85721, which has already been committed to Rhino. When I run the testcase in the current SpiderMonkey shell, it produces no match, as Martin has reported. When I run the testcase in the current Rhino shell, it passes. Martin: the RegExp fix is not in rhino1_5R3, but is contained in the current version of Rhino in the Mozilla CVS repository. If you don't normally build Rhino from the Mozilla CVS repository, the fix might be in ftp://ftp.mozilla.org/pub/js/rhinoLatest.zip That is dated July 7 of this year. That should have the fix, which was checked in on 2002-06-20 (see Rhino bug 125562). Resolving this bug as a duplicate of SpiderMonkey bug 85721 - *** This bug has been marked as a duplicate of 85721 ***
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
Marking Verified. Martin: thank you for this report. You have been cc'ed on bug 85721 so you can follow progress on this issue. All that's left is for the patch in that bug to be reviewed and committed to the SpiderMonkey codebase. One day after that happens, the fix will be reflected in trunk builds of Mozilla -
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: