Closed
Bug 273477
Opened 20 years ago
Closed 20 years ago
Javascript String.split produces incorrect output if regular expression can be empty
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
INVALID
People
(Reporter: iketo2, Unassigned)
Details
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 The Javascript function String.split takes a regular expression as input. If the regular expression can end up matching an empty string, the array output by String.split is not what one would expect. For example, while 'aaaabbbccbdddbbb'.split(/b+/) gives you ['aaaa','cc','ddd',''], which is what one would expect, the corresponding 'aaaabbbccbdddbbb'.split(/b*/) would give you ['a','a','a','a','c','c','d','d','d'], which is unexpected (because the last '' element is missing; in fact I'd expect an initial '' element as well as the first match of /b*/ is at position 0, but that's subject to discussion). A related issue with the problem happens when parentheses occurs within the string. While 'aaaabbbccbdddbbb'.split(/(b+)/) does the right thing and gives you ['aaaa','bbb','cc','b','ddd','bbb',''], 'aaaabbbccbdddbbb'.split(/(b*)/) behave in a strange way and gives you ['a','','a','','a','','a','bbb','c','','c','b','d','','d','','d','bbb']. Note that the last 'bbb' is not followed by a '', i.e., the separator is there, but the last field is lost. At least one would expect that the behaviour in the beginning of the string is the same as the behaviour at the end of the string. If we ask for 'bbbaaaabbbccbdddbbb'.split(/b*/), we get ['','a','a','a','a','c','c','d','d','d'], i.e., we get an empty field at the beginning, but lost that at the end. Reproducible: Always Steps to Reproduce: 1. 2. 3. Actual Results: Each of the example above has an empty string lost at the end. Expected Results: See above.
Comment 1•20 years ago
|
||
> in fact I'd expect an initial '' element as well
There shouldn't be one, per Section 15.5.4.14 of ECMA-262, which says:
In this case, separator does not match the empty substring at the beginning or
end of the input string.
For the rest, the difference between + and * wrt the end of the string seems
like a bug indeed...Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 2•20 years ago
|
||
From ECMA-262 Edition 3 15.5.4.14: The value of separator may be an empty string, an empty regular expression, or a regular expression that can match an empty string. In this case, separator does not match the empty substring at the beginning or end of the input string, nor does it match the empty substring at the end of the previous separator match. (For example, if separator is the empty string, the string is split up into individual characters; the length of the result array equals the length of the string, and each substring contains one character.) If separator is a regular expression, only the first match at a given position of the this string is considered, even if backtracking could yield a non-empty-substring match at that position. (For example, "ab".split(/a*?/) evaluates to the array ["a","b"], while "ab".split(/a*/) evaluates to the array["","b"].) /be
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•