Closed Bug 505083 Opened 15 years ago Closed 13 years ago

ecma_3_1/RegExp/regress-305064.js - Is ZERO WIDTH SPACE

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: bc, Unassigned)

References

Details

(Keywords: regression, testcase)

ecma_3_1/RegExp/regress-305064.js with jit only

Is ZERO WIDTH SPACE (category Cf) a space reason: Expected value 'true', Actual value 'false'

regression changeset: 30362:b837948c1daf user: Luke Wagner <lw@mozilla.com> date: Thu Jul 16 17:17:35 2009 -0700 summary: Bug 406271: add quantifier support for regexp->native compiler, r=dmandelin
Flags: in-testsuite+
Perhaps I am missing something in the spec, but 15.10.2.12 says that \s matches WhiteSpace (7.2) and LineTerminator (7.3).  This list includes the Unicode category Zs (space separator).  \u200B (zero width space) is not among any of these.  Looking at http://www.fileformat.info/info/unicode/char/200b/index.htm, the Java Character.isSpaceChar() and Character.isWhitespace() properties are both true, which is probably why the interpreter returns true.

So either (1) I'm misunderstanding the spec, (2) the spec has an omission, or (3) the test and interpreter are wrong.  What do you think?
Unicode 5.1 section 6.2 supports Luke's understanding of the spec:

--
One exceptional “space” character is U+200B zero width space. This character,  although called a “space” in its name, does not actually have any width or visible glyph in display. It functions primarily to indicate word boundaries in writing systems that do not actually use orthographic spaces to separate words in text. It is given the General Category [gc=Cf] and is treated as a format control character, rather than as a space character, in implementations. Further discussion of U+200B zero width space, as well as other zero-width characters with special properties, can be found in Section 16.2, Layout Controls. 
--

But Python 3 does what the regression test expects, treating U+200B as a whitespace. Any Web compatibility or Unicode experts here?
Not a TM bug. Still occurs with a current JS shell. FWIW, d8 shows the same behavior.
Summary: TM: ecma_3_1/RegExp/regress-305064.js - Is ZERO WIDTH SPACE → ecma_3_1/RegExp/regress-305064.js - Is ZERO WIDTH SPACE
Everyone who's commented here seems to agree the test is buggy, and I will independently continue that trend.  Moreover, at least one other engine agrees with SpiderMonkey and with our interpretation of the spec.  I think that's enough to call this bug (and test) invalid.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.