Closed Bug 287630 Opened 20 years ago Closed 20 years ago

In edge cases, String.prototype.split() does not behave as it should according to the ECMA standard.

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.8beta3

People

(Reporter: spiff, Assigned: mrbkap)

Details

Attachments

(1 file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1 'a'.split(/()/) should return ['a'], but returns ['a',''] /()/ matches at position 0, but a zero-width match that would produce a zero-width field in the output is ignored (correctly). At position 1, the end of the string has been reached, and the remaining characters should be output ('a'). However, the current implementation seems to try to match again at that position. In this case, the match works, and subsequently the remaining characters (would be OK) and the captures (not OK!) are output. A step-by-step interpretation of the algorithm given in the ECMAScript standard (Section 15.5.4.14) shows that ['a'] is the correct result: 1. S = 'a' // toString(this) 2. A = [] // new Array() 3. lim = 2**32-1 // limit is undefined... 4. s = 1 // s = S.length 5. p = 0 6. R = /()/ // separator is a regex 7. // no jump: lim != 0 8. // no jump: separator != undefined 9. // no jump: s != 0 10. q = 0 // q = p 11. // no jump: q != s 12. z = {endIndex:0, cap:['']} // R matches at position 0 13. // no jump: z is not failure 14. e = 0, cap = [''] 15. goto 26 // e == p 26. q = 1 // q = q+1 27. goto 11 11. goto 28 // q == s 28. T = 'a' // T = S.substring(p,s) 29. A. = ['a'] // A.put(A.length, T) 30. return ['a'] // return A Reproducible: Always Steps to Reproduce:
Karsten's analysis appears correct to me. For comparison both MSIE6 and Opera7 return ['a'].
Status: UNCONFIRMED → NEW
Ever confirmed: true
js/tests/ecma_3/String/15.5.4.14.js checked in.
Taking and setting flags I think need to be set.
Assignee: general → mrbkap
Flags: testcase+
Attached patch patch v1Splinter Review
The extra element was coming from the parenthesis trying to append its match to the split array. I'm not sure if this is the right behavior, but all this does is make us ignore the trivial match at the end of the string. Brendan, is this the right way to go?
Attachment #186533 - Flags: review?(brendan)
Status: NEW → ASSIGNED
Comment on attachment 186533 [details] [diff] [review] patch v1 >+ if ((size_t)i == length) { >+ /* >+ * If there was a trivial zero-length match at the end of the >+ * split, then we shouldn't output the matched string at the end >+ * of the split array. This accomplishes that. >+ */ >+ sep->chars = NULL; >+ } This looks good to me. Lose the last sentence in the comment, it doesn't add any value. Replace it with a reference to ECMA-262 Ed. 3, 15.5.4.14, Step 15. /be
Attachment #186533 - Flags: review?(brendan)
Attachment #186533 - Flags: review+
Attachment #186533 - Flags: approval1.8b3+
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.8beta3
verified test passes in 6/19 deerpark cvs build on winxpsp2
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: