With Spidermonkey if you evaluate
then it returns true but should return false as the ECMAScript specification clearly says:
The production CharacterClassEscape :: d evaluates by returning the ten-element set of characters containing the
characters 0 through 9 inclusive."
This is a bug in Spidermonkey (already present in Netscape 4 builds but still there in a Firefox trunk nightly). Other engines like Rhino or MS JScript return false.
Oops, there is code in browser.js (by me, although Mano has blame) that depends on this bug: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/browser/base/content/browser.js&rev=1.778&mark=1434-1438#1421 checked in in bug 298004
Created attachment 262776 [details] [diff] [review]
This doesn't fix the browser dependency on the bug; that can be a separate patch.
Should we fix the REOP_NONDIGIT case as well? And are we sure we shouldn't WONTFIX this? Perhaps the spec is simply wrong here?
It is worth noting probably that \d and \D in character _classes_ do the right thing. So maybe we should fix this inconsistency.
On WONTFIX: I feel it naturally follows from the principle of least surprise that, for every string s, if /^\d+$/.test(s) returns true then parseInt(s, 10) should return a value other than NaN (similarly for Number(s) ).
Currently this condition does not hold. Personally, I would be happy with either /\d/ changed to only match [0-9] (and \D to match [^0-9]), or parseInt changed to understand "fancy digits".
Please note that ChatZilla has a copy of the code in browser.js mentioned in comment 1, and other things might have copied it too.
(In reply to comment #5)
> On WONTFIX: I feel it naturally follows from the principle of least surprise
> that, for every string s, if /^\d+$/.test(s) returns true then parseInt(s, 10)
> should return a value other than NaN (similarly for Number(s) ).
I'm not sure I agree with this. Making parseInt handle digits other than in the ASCII range seems like a big can of worms. What should parseInt("१২੩") return? (that's a Devanagari 1, a Bengali 2, and a Gurmukhi 3).
I'll be happy with it returning 123. The probability of such a situation arising in practice is very near zero, I believe. Since the string is linguistically meaningless, parsing it to 123 is as good an outcome as any other.
But again, I would be equally happy with \d restricted to [0-9] and "१২੩" failing to parse as a number. What I object to is "१২੩" looking like a number lexically, but then failing to parse. Or, rather, I object to "१२३" looking like a number but failing to parse (here, all three characters are Devanagari digits).
(In reply to comment #3)
> Should we fix the REOP_NONDIGIT case as well?
Dangit, that's twice I've done this recently (not catch an obvious variant of a bug); I'll fix that on checkin (probably this evening).
> And are we sure we shouldn't WONTFIX this? Perhaps the spec is simply wrong
Everyone else gets it right. Also, this contradicts most of ES3 by taking Unicode character classes into account; I'd leave it for ES4 if people want it.
Consider also code like this:
It uses \d for parsing and Number for converting all over the place. It would seem the practice is quite common.
Fixed on trunk, with the browser bug filed as bug 379088.
(I did add the fix for \D as well, to save people the trouble of checking.)
/cvsroot/mozilla/js/tests/ecma_3/RegExp/22.214.171.124.js,v <-- 126.96.36.199.js
initial revision: 1.1
verified fixed 1.9.0 linux/mac* shell.
Following document seems to say about JS 1.5 RegExp spec based on test result with Mozilla family on which patch for bug 298004 is applied.
I added description following documents. http://developer.mozilla.org/en/docs/Firefox_3_for_developers#Notable_bugs_fixed