Closed Bug 577536 Opened 14 years ago Closed 14 years ago

parseInt() uses radix 8 if the string starts with '0' but not 0x or 0X

Categories

(Core :: JavaScript Engine, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla2.0b3

People

(Reporter: linuxed7, Assigned: Waldo)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.6) Gecko/20100628 Ubuntu/10.04 (lucid) Firefox/3.6.6
Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.6) Gecko/20100628 Ubuntu/10.04 (lucid) Firefox/3.6.6

The parseInt method doesn't successfully parse strings like '08' and '09'.  However it does seem to correctly parse the strings '01' through '07'.

Example:
alert(parseInt('01')); //returns 1
alert(parseInt('07')); //returns 7
alert(parseInt('08')); //returns 0, should return 8
alert(parseInt('09')); //returns 0, should return 9

Reproducible: Always
If the number starts with '0', radix 8 is used (octal). Thus 08 is an invalid number.

However, I can't see the 5th edition mention radix 8, so maybe for spec compliance we should drop the automatic radix 8 inference here.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: parseInt() returns 0 when parsing the strings '08' and '09' but it returns the correct integer when parsing '01' through '07' → parseInt() uses radix 8 if the string starts with '0' but not 0x or 0X
Severity: normal → enhancement
OS: Linux → All
Priority: -- → P2
Hardware: x86_64 → All
Blocks: es5
Even Internet Explorer behaves like this ...

(In reply to comment #1)

> However, I can't see the 5th edition mention radix 8, so maybe for spec
> compliance we should drop the automatic radix 8 inference here.

In the O'Reilly book : Section 3.12 and in chapter 24.
Safari supports octal, too.
7.8.3 Numeric literals: A conforming implementation, when processing strict mode code (see 10.1.1), must not extend the syntax of
NumericLiteral to include OctalIntegerLiteral as described in B.1.1.

However:

Annex E: 15.1.2.2: "The specification of the function parseInt no longer allows implementations to treat Strings beginning with a 0 character as octal values."

Thus it seems parseInt should not allow octals in either mode, but literals in mathematical expressions should. Outside of strict mode, that is!

Question is, will this break web sites? Are there web sites that use octal, but do not specify the radix explicitly as a parameter? (Precious few naturally, but still...)
Its a bit weird that ES5 removed octal but everyone implements it. Maybe our resident standardistas want to weigh in?
In strict mode, octal is forbidden per ES5 -- we allow \0 in strings, though. See bug 514559.

In non-strict mode, this is a snipe hunt. Don't break the web, don't waste time on something that stands (last time we looked) to lose market share.

/be
Assignee: general → jwalden+bmo
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla2.0b3
Attached patch Patch and testsSplinter Review
The current organization of code to implement number parsing doesn't have much relation to the spec.  It's hard to compare against the spec's text for accuracy, and it forces base-sniffing behavior (leading zero, hex) and leading-whitespace checking on all callers.  I've therefore gone and refactored it a bit to be more verifiably correct, more understandably organized, and less magical in function.  The correspondence isn't exact -- sharing implementation between the fastnative parseInt and the traceablenative parseInt(string) methods precludes it -- but with comments the correlation should be apparent to the thinking reader.

And, of course, I fixed the original concern of this bug: making implicit octal-acceptance conditional on non-strictness of the caller.  (This relied upon [and I believe validates] the aforementioned cleanup, if you happened to question its immediate value and importance.  It would have been much harder to fix this [let alone cleanly] while preserving something like the js_strtointeger does-everything-always approach.)

This depends on bug 581067 landing first, because I wrote the test with direct reference to spec and thus tripped over BOMs not being considered whitespace in ES3 but are in ES5.
Attachment #459482 - Flags: review?(jorendorff)
Good cleanups. The code needed it -- rogerl from netscape.com in the '90s was last to really own this. Looks good (nit: I saw an overlong line even with 100 column limit).

/be
Just a few preliminary comments. I'm not done yet.

In jsnum.cpp:
>+ComputeAccurateBinaryBaseInteger(JSContext *cx, const jschar *start, const jschar *end, int base)
>+{
>+    BinaryDigitReader bdr(base, start, end);
>+
>+    /* Skip leading zeroes. */
>+    intN bit;
>+    do {
>+        bit = bdr.nextDigit();
>+    } while (bit == 0);
>+
>+    if (bit == 0)
>+        return 0.0;

You mean if (bit == -1), right?

>+GetPrefixInteger(JSContext *cx, const jschar *start, const jschar *end, int base,
>+                 const jschar **endp, jsdouble *dp)
>+{
>+    JS_ASSERT(start <= end);
>+    JS_ASSERT(2 <= base && base <= 36);
>+
>+    const jschar *s = start;
>+    jsdouble d = 0.0;
>+    do {
>+        int digit;
>+        jschar c = *s;

This dereferences s, but the range might be empty (according to the assertion; the comments in jsnum.h don't really say). Maybe move the condition to the top of the loop.

>+    /* If we haven't reached the limit of integer precision, we're done. */
>+    if (d < 9007199254740992.0)
>+        return true;

Give this constant a name, maybe? It might be nice to point out in a comment
that this is 2^53 (and why).
In jsnum.h, StringToNumberType:
>     /* ECMA doesn't allow signed hex numbers (bug 273467). */
>     if (end - bp >= 2 && bp[0] == '0' && (bp[1] == 'x' || bp[1] == 'X')) {
>         /* Looks like a hex number. */
>-        if (!js_strtointeger(cx, bp, end, &ep, 16, &d) ||
>-            js_SkipWhiteSpace(ep, end) != end) {
>+        const jschar *bp, *end, *endptr;
>+        str->getCharsAndEnd(bp, end);

These two lines look wrong. I think you just want to declare endptr. It causes
this to produce the wrong answer in this case:

    assertEq(+"  0x4", 4);
In js/src/tests/ecma_5/Global/parseInt-01.js:
>+ * 4. If S is not empty and the first character of S is a minus sign -, let
>+ *    sign be −1.
Looks like UTF-8 botched by bugzilla. If that would be adequately represented
by an ASCII hyphen, let's change it.
(In reply to comment #9)
> You mean if (bit == -1), right?

Actually, since we have the fits-in-double-as-exact-integer check, bit's always going to be 1 here -- now asserting that in my local version of the patch.


> This dereferences s, but the range might be empty (according to the assertion;
> the comments in jsnum.h don't really say). Maybe move the condition to the top
> of the loop.

The range should never be empty, assertion was over-generous and comment under-precise: added "non-empty" before "range [start, end)" in the declaration comment.


> Give this constant a name, maybe? It might be nice to point out in a comment
> that this is 2^53 (and why).

Gave it a name, changed its initializer to uint64(1) << 53 to make the value absolutely clear.  The corresponding comment should make clear its derivation:

+/*
+ * The largest positive integer such that all positive integers less than it
+ * may be precisely represented using the IEEE-754 double-precision format.
+ */


(In reply to comment #10)
> These two lines look wrong.

Yikes.  Fixed, and added this to js/src/tests/ecma/TypeConversion/9.3.1-1.js:

+var ws = ["",
+          " ",
+          "  ",
+          "   "];
+
+for (var i = 0, sz = ws.length; i < sz; i++)
+{
+  var start = ws[i];
+  for (var j = 0; j < sz; j++)
+  {
+    var end = ws[j];
+    new TestCase( SECTION,  "Number( '" + start + "' +  '0xA' )",       10,    Number( start+'0xA') );
+
+    new TestCase( SECTION,  "Number( '0xA' + '" + end + "' )",        10,    Number( '0xA'+end) );
+
+    new TestCase( SECTION,  "Number( '" + start + "'  + '0xA' + '" + end + "' )",         10,    Number( start +'0xA'+end ) );
+  }
+}

The bp + 2 technically exactly skips the 0x at start, so the varying-whitespace testing isn't quite necessary, but it gives me a little more peace of mind that the + 2 won't somehow jump over leading badness or something weird.


(In reply to comment #11)
> Looks like UTF-8 botched by bugzilla. If that would be adequately represented
> by an ASCII hyphen, let's change it.

There's more UTF-8 that will also be botched, can't just change a hyphen here or there and declare victory -- need to make sure to set the patch charset correctly when uploading.

Debated uploading the updated patch, decided not to to ease review of the existing one -- I can upload it at end if you desire, or just use the revision URL post-push.
(In reply to comment #12)
> The range should never be empty, assertion was over-generous and comment
> under-precise: added "non-empty" before "range [start, end)" in the declaration
> comment.

...or not, as my tests would have quickly shown me had I run them before commenting.  Fixed locally by moving the condition to the top as you suggested.
Comment on attachment 459482 [details] [diff] [review]
Patch and tests

I found bug 582643 (`0x` parsed as 0) while reviewing this code.

In ParseInt:
>     jsdouble d;
>-
>-    str->getCharsAndEnd(bp, end);
>-    if (!js_strtointeger(cx, bp, end, &ep, 0, &d) || ep == bp)
>+    if (!ParseIntStringHelper(cx, start, end, 10, true, &d))
>         return js_NaN;

So, the way this ignores OOM is kind of gross. Preexisting, but please comment
the grossness.

In ecma_5/Global/parseInt-01.js:
>+/*
>+6.  Let R = ToInt32(radix).
>+*/
>+
>+upvar = "";
>+str =
>+  { toString: function() { if (!upvar) upvar = "string"; return "42"; } };
>+radix =
>+  { toString: function() { if (!upvar) upvar = "radix"; return "10"; } };
>+
>+assertEq(parseInt(str, radix), 42);
>+assertEq(upvar, "string");

Might as well add a few more here, since this step is so awesome:

  assertEq(parseInt("123", null), 123);
  assertEq(parseInt("123", undefined), 123);
  assertEq(parseInt("123", NaN), 123);
  assertEq(parseInt("123", -0), 123);
  assertEq(parseInt("10", 72057594037927950), 16);
  assertEq(parseInt("10", -4294967292), 4);
  assertEq(parseInt("0x10", 1e308), 16);
  assertEq(parseInt("10", 1e308), 10);

In js1_8_5/extensions/parseInt-octal.js:
>+assertEq(parseInt("08"), 0);
>+assertEq(parseInt("09"), 0);
>+
>+function strictParseInt(s) { "use strict"; return parseInt(s); }
>+
>+assertEq(strictParseInt("08"), 8);
>+assertEq(strictParseInt("09"), 9);

Could you add a trace-test for this too?
Attachment #459482 - Flags: review?(jorendorff) → review+
http://hg.mozilla.org/tracemonkey/rev/5ec5ae37827f
Whiteboard: fixed-in-tracemonkey
It seems the if-not-strict-then-parse-as-octal check is in slightly the wrong place: if an explicit radix 16 is passed, then we attempt to infer octal-ness, which is undesirable and contrary to what ES3 permitted.  I believe this reorganization cures the problem, but I'm still waiting for JS testsuite results to be absolutely certain -- I'd thought I stared at the current code hard enough to have convinced myself of this already, but a test (Linux-only, rectified in this patch) exposed my folly.
Attachment #461627 - Flags: review?(jorendorff)
Attached patch Better followupSplinter Review
A trace test (of all things) was failing with an assertion because ParseInt was marked as DOUBLE and not DOUBLE_FAIL, parseInt("0") was needlessly leaving trace to check for in-strict-mode, and maybe a tweak or two more -- check patch interdiff if interested, mind no longer remembers everything tweaked.
Attachment #461627 - Attachment is obsolete: true
Attachment #461685 - Flags: review?(jorendorff)
Attachment #461627 - Flags: review?(jorendorff)
Comment on attachment 461685 [details] [diff] [review]
Better followup

Right, much better. I don't know why I didn't catch this on review. Shame on me.

Now that ParseInt is DOUBLE_FAIL, I think it's easy to fix bug 583126:

  if (!ParseIntStringHelper(...)) {
      SetBuiltinError(cx);
      return js_NaN;
  }
Attachment #461685 - Flags: review?(jorendorff) → review+
http://hg.mozilla.org/tracemonkey/rev/0d2e095b2eda

I'll push the fix for bug 583126 as a followup, in a bit.
This was pushed to m-c, so closing to get off the ES5 dependency tree radar:

http://hg.mozilla.org/mozilla-central/rev/5ec5ae37827f
http://hg.mozilla.org/mozilla-central/rev/0d2e095b2eda

but in bug 583925 we reverted the semantic change to parse as decimal only in strict mode.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: