\d pattern matches characters other than the decimal digits 0-9 (ecma_3/RegExp/15.10.2.12.js)

VERIFIED FIXED

Status

()

Core
JavaScript Engine
VERIFIED FIXED
10 years ago
8 years ago

People

(Reporter: Martin Honnen, Assigned: Waldo)

Tracking

Trunk
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

10 years ago
With Spidermonkey if you evaluate
  /\d/.test("\uFF11")
then it returns true but should return false as the ECMAScript specification clearly says:
  "15.10.2.12 CharacterClassEscape
The production CharacterClassEscape :: d evaluates by returning the ten-element set of characters containing the
characters 0 through 9 inclusive."

This is a bug in Spidermonkey (already present in Netscape 4 builds but still there in a Firefox trunk nightly). Other engines like Rhino or MS JScript return false.

Bug was reported in mozilla.dev.tech.javascript by Igor Tandetnik.
Oops, there is code in browser.js (by me, although Mano has blame) that depends on this bug: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/browser/base/content/browser.js&rev=1.778&mark=1434-1438#1421 checked in in bug 298004
Created attachment 262776 [details] [diff] [review]
Obvious fix

This doesn't fix the browser dependency on the bug; that can be a separate patch.
Assignee: general → jwalden+bmo
Status: NEW → ASSIGNED
Attachment #262776 - Flags: review?(mrbkap)

Updated

10 years ago
Attachment #262776 - Flags: review?(mrbkap) → review+

Comment 3

10 years ago
Should we fix the REOP_NONDIGIT case as well?  And are we sure we shouldn't WONTFIX this?  Perhaps the spec is simply wrong here?

Comment 4

10 years ago
It is worth noting probably that \d and \D in character _classes_ do the right thing.  So maybe we should fix this inconsistency.

Comment 5

10 years ago
On WONTFIX: I feel it naturally follows from the principle of least surprise that, for every string s, if /^\d+$/.test(s) returns true then parseInt(s, 10) should return a value other than NaN (similarly for Number(s) ).

Currently this condition does not hold. Personally, I would be happy with either /\d/ changed to only match [0-9] (and \D to match [^0-9]), or parseInt changed to understand "fancy digits".

Comment 6

10 years ago
Please note that ChatZilla has a copy of the code in browser.js mentioned in comment 1, and other things might have copied it too.
(In reply to comment #5)
> On WONTFIX: I feel it naturally follows from the principle of least surprise
> that, for every string s, if /^\d+$/.test(s) returns true then parseInt(s, 10)
> should return a value other than NaN (similarly for Number(s) ).

I'm not sure I agree with this. Making parseInt handle digits other than in the ASCII range seems like a big can of worms. What should parseInt("१২੩") return? (that's a Devanagari 1, a Bengali 2, and a Gurmukhi 3).

Comment 8

10 years ago
I'll be happy with it returning 123. The probability of such a situation arising in practice is very near zero, I believe. Since the string is linguistically meaningless, parsing it to 123 is as good an outcome as any other.

But again, I would be equally happy with \d restricted to [0-9] and "१২੩" failing to parse as a number. What I object to is "१২੩" looking like a number lexically, but then failing to parse. Or, rather, I object to "१२३" looking like a number but failing to parse (here, all three characters are Devanagari digits).
(In reply to comment #3)
> Should we fix the REOP_NONDIGIT case as well?

Dangit, that's twice I've done this recently (not catch an obvious variant of a bug); I'll fix that on checkin (probably this evening).

> And are we sure we shouldn't WONTFIX this?  Perhaps the spec is simply wrong
> here?

Everyone else gets it right.  Also, this contradicts most of ES3 by taking Unicode character classes into account; I'd leave it for ES4 if people want it.

Comment 10

10 years ago
Consider also code like this:

http://lxr.mozilla.org/mozilla/source/calendar/resources/content/datetimepickers/datetimepickers.xml

It uses \d for parsing and Number for converting all over the place. It would seem the practice is quite common.
Blocks: 379088
Fixed on trunk, with the browser bug filed as bug 379088.
Fixed on trunk, with the browser bug filed as bug 379088.
Status: ASSIGNED → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
(I did add the fix for \D as well, to save people the trouble of checking.)

Comment 14

10 years ago
/cvsroot/mozilla/js/tests/ecma_3/RegExp/15.10.2.12.js,v  <--  15.10.2.12.js
initial revision: 1.1
Flags: in-testsuite+

Comment 15

10 years ago
verified fixed 1.9.0 linux/mac* shell.
Status: RESOLVED → VERIFIED
Summary: \d pattern matches characters other than the decimal digits 0-9 → \d pattern matches characters other than the decimal digits 0-9 (ecma_3/RegExp/15.10.2.12.js)
Following document seems to say about JS 1.5 RegExp spec based on test result with Mozilla family on which patch for bug 298004 is applied.
http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:RegExp
Please add description about "JS 1.9 backed out inappropriate spec alteration by bug 298004" in "New in JavaScript 1.9" of following page, when it'll be added.
  http://developer.mozilla.org/en/docs/JavaScript 

Comment 17

10 years ago
I added description following documents. http://developer.mozilla.org/en/docs/Firefox_3_for_developers#Notable_bugs_fixed
http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:RegExp

OS: Windows XP → All
Hardware: x86 → All
No longer blocks: 379088
Depends on: 379088
You need to log in before you can comment on or make changes to this bug.