Last Comment Bug 378738 - \d pattern matches characters other than the decimal digits 0-9 (ecma_3/RegExp/15.10.2.12.js)
: \d pattern matches characters other than the decimal digits 0-9 (ecma_3/RegEx...
Status: VERIFIED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal (vote)
: ---
Assigned To: Jeff Walden [:Waldo] (remove +bmo to email)
:
Mentors:
javascript: /\d/.test("\uFF11")
Depends on: 379088
Blocks:
  Show dependency treegraph
 
Reported: 2007-04-25 07:18 PDT by Martin Honnen
Modified: 2009-12-22 02:01 PST (History)
11 users (show)
bob: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Obvious fix (895 bytes, patch)
2007-04-25 10:02 PDT, Jeff Walden [:Waldo] (remove +bmo to email)
mrbkap: review+
Details | Diff | Splinter Review

Description Martin Honnen 2007-04-25 07:18:35 PDT
With Spidermonkey if you evaluate
  /\d/.test("\uFF11")
then it returns true but should return false as the ECMAScript specification clearly says:
  "15.10.2.12 CharacterClassEscape
The production CharacterClassEscape :: d evaluates by returning the ten-element set of characters containing the
characters 0 through 9 inclusive."

This is a bug in Spidermonkey (already present in Netscape 4 builds but still there in a Firefox trunk nightly). Other engines like Rhino or MS JScript return false.

Bug was reported in mozilla.dev.tech.javascript by Igor Tandetnik.
Comment 1 Simon Montagu :smontagu 2007-04-25 07:34:42 PDT
Oops, there is code in browser.js (by me, although Mano has blame) that depends on this bug: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/browser/base/content/browser.js&rev=1.778&mark=1434-1438#1421 checked in in bug 298004
Comment 2 Jeff Walden [:Waldo] (remove +bmo to email) 2007-04-25 10:02:42 PDT
Created attachment 262776 [details] [diff] [review]
Obvious fix

This doesn't fix the browser dependency on the bug; that can be a separate patch.
Comment 3 Brian Crowder 2007-04-25 10:14:05 PDT
Should we fix the REOP_NONDIGIT case as well?  And are we sure we shouldn't WONTFIX this?  Perhaps the spec is simply wrong here?
Comment 4 Brian Crowder 2007-04-25 10:15:41 PDT
It is worth noting probably that \d and \D in character _classes_ do the right thing.  So maybe we should fix this inconsistency.
Comment 5 Igor Tandetnik 2007-04-25 10:23:49 PDT
On WONTFIX: I feel it naturally follows from the principle of least surprise that, for every string s, if /^\d+$/.test(s) returns true then parseInt(s, 10) should return a value other than NaN (similarly for Number(s) ).

Currently this condition does not hold. Personally, I would be happy with either /\d/ changed to only match [0-9] (and \D to match [^0-9]), or parseInt changed to understand "fancy digits".
Comment 6 James Ross 2007-04-25 10:39:27 PDT
Please note that ChatZilla has a copy of the code in browser.js mentioned in comment 1, and other things might have copied it too.
Comment 7 Simon Montagu :smontagu 2007-04-25 10:40:44 PDT
(In reply to comment #5)
> On WONTFIX: I feel it naturally follows from the principle of least surprise
> that, for every string s, if /^\d+$/.test(s) returns true then parseInt(s, 10)
> should return a value other than NaN (similarly for Number(s) ).

I'm not sure I agree with this. Making parseInt handle digits other than in the ASCII range seems like a big can of worms. What should parseInt("१২੩") return? (that's a Devanagari 1, a Bengali 2, and a Gurmukhi 3).
Comment 8 Igor Tandetnik 2007-04-25 10:53:02 PDT
I'll be happy with it returning 123. The probability of such a situation arising in practice is very near zero, I believe. Since the string is linguistically meaningless, parsing it to 123 is as good an outcome as any other.

But again, I would be equally happy with \d restricted to [0-9] and "१২੩" failing to parse as a number. What I object to is "१২੩" looking like a number lexically, but then failing to parse. Or, rather, I object to "१२३" looking like a number but failing to parse (here, all three characters are Devanagari digits).
Comment 9 Jeff Walden [:Waldo] (remove +bmo to email) 2007-04-25 11:02:12 PDT
(In reply to comment #3)
> Should we fix the REOP_NONDIGIT case as well?

Dangit, that's twice I've done this recently (not catch an obvious variant of a bug); I'll fix that on checkin (probably this evening).

> And are we sure we shouldn't WONTFIX this?  Perhaps the spec is simply wrong
> here?

Everyone else gets it right.  Also, this contradicts most of ES3 by taking Unicode character classes into account; I'd leave it for ES4 if people want it.
Comment 10 Igor Tandetnik 2007-04-25 12:30:08 PDT
Consider also code like this:

http://lxr.mozilla.org/mozilla/source/calendar/resources/content/datetimepickers/datetimepickers.xml

It uses \d for parsing and Number for converting all over the place. It would seem the practice is quite common.
Comment 11 Jeff Walden [:Waldo] (remove +bmo to email) 2007-04-27 18:31:48 PDT
Fixed on trunk, with the browser bug filed as bug 379088.
Comment 12 Jeff Walden [:Waldo] (remove +bmo to email) 2007-04-27 18:31:56 PDT
Fixed on trunk, with the browser bug filed as bug 379088.
Comment 13 Jeff Walden [:Waldo] (remove +bmo to email) 2007-04-27 18:34:50 PDT
(I did add the fix for \D as well, to save people the trouble of checking.)
Comment 14 Bob Clary [:bc:] 2007-05-02 09:53:59 PDT
/cvsroot/mozilla/js/tests/ecma_3/RegExp/15.10.2.12.js,v  <--  15.10.2.12.js
initial revision: 1.1
Comment 15 Bob Clary [:bc:] 2007-05-05 15:22:47 PDT
verified fixed 1.9.0 linux/mac* shell.
Comment 16 WADA 2007-09-25 15:01:44 PDT
Following document seems to say about JS 1.5 RegExp spec based on test result with Mozilla family on which patch for bug 298004 is applied.
http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:RegExp
Please add description about "JS 1.9 backed out inappropriate spec alteration by bug 298004" in "New in JavaScript 1.9" of following page, when it'll be added.
  http://developer.mozilla.org/en/docs/JavaScript 

Note You need to log in before you can comment on or make changes to this bug.