Closed Bug 287630 Opened 20 years ago Closed 20 years ago

In edge cases, String.prototype.split() does not behave as it should according to the ECMA standard.

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.8beta3

People

(Reporter: spiff, Assigned: mrbkap)

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1

'a'.split(/()/) should return ['a'], but returns ['a','']

/()/ matches at position 0, but a zero-width match that would produce a 
zero-width field in the output is ignored (correctly). At position 1,
the end of the string has been reached, and the remaining characters should
be output ('a').

However, the current implementation seems to try to match again at that
position. In this case, the match works, and subsequently the remaining
characters (would be OK) and the captures (not OK!) are output.

A step-by-step interpretation of the algorithm given in the
ECMAScript standard (Section 15.5.4.14) shows that ['a'] is the correct
result:

1. S = 'a'                      // toString(this)
2. A = []                       // new Array()
3. lim = 2**32-1                // limit is undefined...
4. s = 1                        // s = S.length
5. p = 0
6. R = /()/                     // separator is a regex
7.                              // no jump: lim != 0
8.                              // no jump: separator != undefined
9.                              // no jump: s != 0
10. q = 0                       // q = p
11.                             // no jump: q != s
12. z = {endIndex:0, cap:['']}  // R matches at position 0
13.                             // no jump: z is not failure
14. e = 0, cap = ['']
15. goto 26                     // e == p
26. q = 1                       // q = q+1
27. goto 11
11. goto 28                     // q == s
28. T = 'a'                     // T = S.substring(p,s)
29. A. = ['a']                  // A.put(A.length, T)
30. return ['a']                // return A


Reproducible: Always

Steps to Reproduce:
Karsten's analysis appears correct to me. For comparison both MSIE6 and Opera7
return ['a'].
Status: UNCONFIRMED → NEW
Ever confirmed: true
js/tests/ecma_3/String/15.5.4.14.js checked in.
Taking and setting flags I think need to be set.
Assignee: general → mrbkap
Flags: testcase+
Attached patch patch v1Splinter Review
The extra element was coming from the parenthesis trying to append its match to
the split array. I'm not sure if this is the right behavior, but all this does
is make us ignore the trivial match at the end of the string.

Brendan, is this the right way to go?
Attachment #186533 - Flags: review?(brendan)
Status: NEW → ASSIGNED
Comment on attachment 186533 [details] [diff] [review]
patch v1

>+	    if ((size_t)i == length) {
>+                /* 
>+                 * If there was a trivial zero-length match at the end of the
>+                 * split, then we shouldn't output the matched string at the end
>+                 * of the split array. This accomplishes that.
>+                 */
>+                sep->chars = NULL;
>+	    }

This looks good to me.	Lose the last sentence in the comment, it doesn't add
any value.  Replace it with a reference to ECMA-262 Ed. 3, 15.5.4.14, Step 15.

/be
Attachment #186533 - Flags: review?(brendan)
Attachment #186533 - Flags: review+
Attachment #186533 - Flags: approval1.8b3+
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.8beta3
verified test passes in 6/19 deerpark cvs build on winxpsp2
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: