Closed Bug 514808 Opened 15 years ago Closed 13 years ago

Regex whitespace character class deficiencies

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1

People

(Reporter: kangax, Assigned: wesongathedeveloper)

References

()

Details

Attachments

(2 files)

As per ES3 (see 15.10.2.12), /\s/ should match both WhiteSpace (7.2) and LineTerminator (7.3). WhiteSpace is defined to include characters from Unicode "Zs" category (besides "standard" - x09, x0B, x0C, x20 and xA0). Firefox (contrary to, say, latest WebKit and Chrome) fails some of these characters. See <http://thinkweb2.com/projects/prototype/whitespace-deviations/> for a full results table.

A test is available online at: <http://yura.thinkweb2.com/rcornford_whitespace_test.html>

Thank you.
It's also worth mentioning that `String.prototype.trim` seems to rely on the same character set that is matched by (faulty) /\s/. I suppose, fixing /\s/ will also make `trim` fully compliant.
This bug is fixed in trunk. Should it be marked as wanted in older branches?
Which patch fixed this?
From what I gather, it could only be the patch in bug 406271. See http://hg.mozilla.org/mozilla-central/rev/b837948c1daf

This is interesting because the WhiteSpaceRanges[] constant array at http://mxr.mozilla.org/mozilla-central/source/js/src/jsregexp.cpp?&mark=3548-3571#3552 appears to be incomplete even though all the tests at <http://yura.thinkweb2.com/rcornford_whitespace_test.html> pass.
Ignore comment 2, it's broken in trunk if javascript.options.jit.content = false
Attached patch PatchSplinter Review
Assignee: general → wesongathedeveloper
Status: NEW → ASSIGNED
Attachment #400921 - Flags: review?(brendan)
Comment on attachment 400921 [details] [diff] [review]
Patch

I'm traveling and hoping to bounce this review to Dave or Luke. Dave, if you need bounce, then Luke is "it". Thanks,

/be
Attachment #400921 - Flags: review?(brendan) → review?(dmandelin)
Comment on attachment 400921 [details] [diff] [review]
Patch

Looks good. r+ with the test case added that I just attached.
Attachment #400921 - Flags: review?(dmandelin) → review+
Keywords: checkin-needed
Whiteboard: [c-n: tracemonkey]
Status: ASSIGNED → NEW
One of the shorter stays in the tracemonkey checkin-needed queue at only three months, but still: land it in tracemonkey or admit you can't be bothered and wontfix it.
(In reply to comment #10)
> One of the shorter stays in the tracemonkey checkin-needed queue at only three
> months, but still: land it in tracemonkey or admit you can't be bothered and
> wontfix it.

tone not welcome. checkin-needed noted.
http://hg.mozilla.org/tracemonkey/rev/8a44551a28c1
http://hg.mozilla.org/tracemonkey/rev/787d5fbc1f80
Status: NEW → ASSIGNED
Keywords: checkin-needed
OS: Mac OS X → All
Hardware: x86 → All
Whiteboard: [c-n: tracemonkey] → fixed-in-tracemonkey
Target Milestone: --- → mozilla1.9.3a1
Version: 1.9.1 Branch → Trunk
I backed this out to fix an orange tinderbox.
Whiteboard: fixed-in-tracemonkey
(In reply to comment #13)
> I backed this out to fix an orange tinderbox.

How do I find out which tests failed with this patch?
Hey there, it may be too late to find the logs at this point. I can send it to the tryserver[0] tomorrow and see what we get. I'll post the logs here.

[0] https://wiki.mozilla.org/Build:TryServer
(In reply to comment #15)
> Hey there, it may be too late to find the logs at this point. I can send it to
> the tryserver[0] tomorrow and see what we get. I'll post the logs here.
> 
> [0] https://wiki.mozilla.org/Build:TryServer

Would it be possible for you to do the same for bug 529386? (not sure what the protocol is for requesting these logs seeing as I don't have commit access).
Robert, any word on this bug?
This is the cause of most of the failures on http://samples.msdn.microsoft.com/ietestcenter/Javascript/ES15.5.html (one's really a test bug, rest are about this).  The test is basically this:

assertEq(("\n\r\u2028\u2029" +
          "\t\n\v\f\r \x85\xA0\u1680\u180E\u2000\u2001\u2002" +
          "\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A" +
          "\u2028\u2029\u202F\u205F\u3000\uFEFF").trim(),
         "");

That said, I'm not completely sure the test is right, because it seems to want U+0085 as a Unicode space, but at least the GNOME character map utility says it isn't.  There are other differences, too:

js> ("\t\n\v\f\r \x85\xA0\u1680\u180E\u2000\u2001\u2002\u2003" +
     "\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u2028\u2029" +
     "\u202F\u205F\u3000\uFEFF").split("").filter(function(s) {
       return s.trim() !== "";
     })
["\x85", "\u1680", "\u180E", "\u202F", "\u205F"]

U+205F is a space per the character map, so at least some of these are probably correct even if it turns out U+0085 is not.  Someone who actually knows this Unicode stuff should (ideally) investigate.

...and yet another bug forgotten.  How do we end up doing this?  :-(
U+0085 is from Cc category (Other, Control), not Zs (Separator, Space). The latter one is what ES5 considers a whitespace. 

It does look like a bug in a test.
Depends on: 652771
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: