Closed Bug 78156 Opened 24 years ago Closed 23 years ago

m flag of regular expression does not work with $

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: georg.maass, Assigned: rogerl)

Details

(Keywords: testcase, Whiteboard: [Don't try testcase in NN4.7; m flags not recognized there])

Attachments

(2 files, 1 obsolete file)

Mozilla/5.0 (Windows; U; WinNT4.0; en-US; 0.8.1) Gecko/20010319 If the regular expression contains a $ to find an item at the line end, then the m flag is ignored. With ^ instead of $ to find the item at line beginnings it works well // Problem // ------- string='hghh\njasdhgflö&lt;s bp98&gt;szf 3434 8dxfnv8yvksjgk\r\n8ewrabduzase87bxdnbsj4711\r\nksdjkjertk789h'; a=new RegExp('\\d$','m'); document.write('<h2>Regul&auml;rer Ausdruck</h2><pre>'+a.toString()+'<\/pre>'); document.write('<h2>Ergebnis regExp.test(Teststring)<\/h2><pre>'+a.test(string)+'<\/pre>'); document.write('<h2>Ergebnis Teststring.search(regExp)<\/h2><pre>'+string.search(a)+'<\/pre>'); // No problem // ----------- a=new RegExp('^\\d','m'); document.write('<h2>Regul&auml;rer Ausdruck</h2><pre>'+a.toString()+'<\/pre>'); document.write('<h2>Ergebnis regExp.test(Teststring)<\/h2><pre>'+a.test(string)+'<\/pre>'); document.write('<h2>Ergebnis Teststring.search(regExp)<\/h2><pre>'+string.search(a)+'<\/pre>');
Attached file Reduced HTML testcase
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows NT → All
Whiteboard: [Note: don't try testcase in NN4.7; it doesn't have m flag]
Keywords: testcase
Testcase added to JS test suite: /js/tests/ecma_3/RegExp/regress-78156.js
NOTE: the testcase is failing in Rhino as well as in Spidermonkey -
Whiteboard: [Note: don't try testcase in NN4.7; it doesn't have m flag] → [Don't try testcase in NN4.7; m flags not recognized there]
To run this testcase directly in the JS or Rhino shell, do this: [/d/JS_TRUNK/mozilla/js/src/WINNT4.0_DBG.OBJ] ./js js> load('../../tests/ecma_3/shell.js') js> load('../../tests/ecma_3/RegExp/shell.js') js> load('../../tests/ecma_3/RegExp/regress-78156.js')
In Perl: $string = "aaa\n789\r\nccc"; $pattern = /^\d/m; --> Perl match == '7' $pattern = /\d$/m; --> Perl match == null IF WE TAKE THE '\r' OUT OF THE STRING: $string = "aaa\n789\nccc"; $pattern = /^\d/m; --> Perl match == '7' $pattern = /\d$/m; --> Perl match == '9' IF WE HAVE ONLY '\r' IN THE STRING AND NOT '\n' $string = "aaa\r789\rccc"; $pattern = /^\d/m; --> Perl match == null $pattern = /\d$/m; --> Perl match == null
Same thing in the JS shell: if you remove all '\r' from the teststring, and use ONLY '\n' to mark a newline, the testcase passes... Thus, the issue is "How should JS treat '\r' or '\r\n' in multiline RegExps?"
From ECMA Edition 3: 7.3 Line Terminators Like white space characters, line terminator characters are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space characters, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. A line terminator cannot occur within any token, not even a string. Line terminators also affect the process of automatic semicolon insertion (section 7.8.5). The following characters are considered to be line terminators: Code Point Value Name Formal Name \u000A Line Feed <LF> \u000D Carriage Return <CR> \u2028 Line separator <LS> \u2029 Paragraph separator <PS> Syntax LineTerminator :: <LF> <CR> <LS> <PS> 15.10 RegExp (Regular Expression Objects) 15.10.2.6 Assertion The production Assertion :: ^ evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following: 1. Let e be x's endIndex. 2. If e is zero, return true. 3. If Multiline is false , return false . 4. If the character Input[e­1] is one of the line terminator characters <LF>, <CR>, <LS>, or <PS>, return true. 5. Return false . The production Assertion :: $ evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following: 1. Let e be x's endIndex. 2. If e is equal to InputLength, return true. 3. If multiline is false , return false . 4. If the character Input[e] is one of the line terminator characters <LF>, <CR>, <LS>, or <PS>, return true. 5. Return false.
Rhino fix checked in.
Status: NEW → ASSIGNED
Rhino fix verified on WinNT. Testcase passes in both rhino, rhinoi shells -
Hmm, so this still needs to be fixed in the browser? ("abcdef\rghijf").split(/$/m) returns an array with just one element for me. This breaks Venkman on Mac because of http://lxr.mozilla.org/mozilla/source/extensions/venkman/resources/content/venkman-outliners.js#436. Rob, on Mac your sourceText array contains only one line for the scripts because the regexp /$/m doesn't work for Mac line endings :/. http://lxr.mozilla.org/mozilla/source/js/src/jsregexp.c#1940 seems suspect to me. Same for line 1953 in that file.
Well, here is a proposed patch. It matches $ as \n, \r, or \r\n, regardless of platform. The remaining problem is that these match characters are not removed fromt the result array.
Comment on attachment 48472 [details] [diff] [review] patch round one; jsregexp.c changes Have to obsolete the rginda patch as there's no ECMA support for multiline line separators.
Attachment #48472 - Attachment is obsolete: true
cc'ing reviewers -
r=khanson
Comment on attachment 53661 [details] [diff] [review] Added support for all Unicode line terminators sr=brendan@mozilla.org and noting khanson's r= here via the patch manager. rogerl, you should mail drivers for a= to get this into 0.9.6. /be
Attachment #53661 - Flags: superreview+
Attachment #53661 - Flags: review+
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Verified FIXED. The standalone JS testcase now passes in both the debug and optimized JS shells on WinNT, Linux, and Mac9.1. The HTML testcase above also passes in Mozilla trunk builds 20011206xx on all three platforms. If you run the HTML testcase, note the discrepancy in behavior between String.prototype.search() and String.prototype.match(). Per the ECMA spec, the former ignores global properties of the regexp it is given. (ECMA-262 Edition 3 Final, Section 15.5.4.12) That is why you see multiple matches from Teststring.match(), but only one from Teststring.search() The multiline flag we have set in the test regexp is honored by the one method, but ignored by the other. This is according to the ECMA spec.
Status: RESOLVED → VERIFIED
Flags: testcase+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: