577536 - parseInt() uses radix 8 if the string starts with '0' but not 0x or 0X

Reporter

Description

•

15 years ago

User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.6) Gecko/20100628 Ubuntu/10.04 (lucid) Firefox/3.6.6 Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.6) Gecko/20100628 Ubuntu/10.04 (lucid) Firefox/3.6.6 The parseInt method doesn't successfully parse strings like '08' and '09'. However it does seem to correctly parse the strings '01' through '07'. Example: alert(parseInt('01')); //returns 1 alert(parseInt('07')); //returns 7 alert(parseInt('08')); //returns 0, should return 8 alert(parseInt('09')); //returns 0, should return 9 Reproducible: Always

Andreas Gal :gal

Comment 1

•

15 years ago

If the number starts with '0', radix 8 is used (octal). Thus 08 is an invalid number. However, I can't see the 5th edition mention radix 8, so maybe for spec compliance we should drop the automatic radix 8 inference here.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Andreas Gal :gal

Updated

•

15 years ago

Summary: parseInt() returns 0 when parsing the strings '08' and '09' but it returns the correct integer when parsing '01' through '07' → parseInt() uses radix 8 if the string starts with '0' but not 0x or 0X

Andreas Gal :gal

Updated

•

15 years ago

Severity: normal → enhancement

OS: Linux → All

Priority: -- → P2

Hardware: x86_64 → All

Andreas Gal :gal

Updated

•

15 years ago

Blocks: es5

Jo Hermans

Comment 2

•

15 years ago

Even Internet Explorer behaves like this ... (In reply to comment #1) > However, I can't see the 5th edition mention radix 8, so maybe for spec > compliance we should drop the automatic radix 8 inference here. In the O'Reilly book : Section 3.12 and in chapter 24.

Andreas Gal :gal

Comment 3

•

15 years ago

Safari supports octal, too.

Lars Gunther

Comment 4

•

15 years ago

7.8.3 Numeric literals: A conforming implementation, when processing strict mode code (see 10.1.1), must not extend the syntax of NumericLiteral to include OctalIntegerLiteral as described in B.1.1. However: Annex E: 15.1.2.2: "The specification of the function parseInt no longer allows implementations to treat Strings beginning with a 0 character as octal values." Thus it seems parseInt should not allow octals in either mode, but literals in mathematical expressions should. Outside of strict mode, that is! Question is, will this break web sites? Are there web sites that use octal, but do not specify the radix explicitly as a parameter? (Precious few naturally, but still...)

Andreas Gal :gal

Comment 5

•

15 years ago

Its a bit weird that ES5 removed octal but everyone implements it. Maybe our resident standardistas want to weigh in?

Brendan Eich [:brendan]

Comment 6

•

15 years ago

In strict mode, octal is forbidden per ES5 -- we allow \0 in strings, though. See bug 514559. In non-strict mode, this is a snipe hunt. Don't break the web, don't waste time on something that stands (last time we looked) to lose market share. /be

Jeff Walden [:Waldo]

Assignee

Updated

•

15 years ago

Assignee: general → jwalden+bmo

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla2.0b3

Jeff Walden [:Waldo]

Assignee

Comment 7

•

15 years ago

Attached patch Patch and tests — Details — Splinter Review

The current organization of code to implement number parsing doesn't have much relation to the spec. It's hard to compare against the spec's text for accuracy, and it forces base-sniffing behavior (leading zero, hex) and leading-whitespace checking on all callers. I've therefore gone and refactored it a bit to be more verifiably correct, more understandably organized, and less magical in function. The correspondence isn't exact -- sharing implementation between the fastnative parseInt and the traceablenative parseInt(string) methods precludes it -- but with comments the correlation should be apparent to the thinking reader. And, of course, I fixed the original concern of this bug: making implicit octal-acceptance conditional on non-strictness of the caller. (This relied upon [and I believe validates] the aforementioned cleanup, if you happened to question its immediate value and importance. It would have been much harder to fix this [let alone cleanly] while preserving something like the js_strtointeger does-everything-always approach.) This depends on bug 581067 landing first, because I wrote the test with direct reference to spec and thus tripped over BOMs not being considered whitespace in ES3 but are in ES5.

Attachment #459482 - Flags: review?(jorendorff)

Brendan Eich [:brendan]

Comment 8

•

15 years ago

Good cleanups. The code needed it -- rogerl from netscape.com in the '90s was last to really own this. Looks good (nit: I saw an overlong line even with 100 column limit). /be

Jason Orendorff [:jorendorff]

Comment 9

•

15 years ago

Just a few preliminary comments. I'm not done yet. In jsnum.cpp: >+ComputeAccurateBinaryBaseInteger(JSContext *cx, const jschar *start, const jschar *end, int base) >+{ >+ BinaryDigitReader bdr(base, start, end); >+ >+ /* Skip leading zeroes. */ >+ intN bit; >+ do { >+ bit = bdr.nextDigit(); >+ } while (bit == 0); >+ >+ if (bit == 0) >+ return 0.0; You mean if (bit == -1), right? >+GetPrefixInteger(JSContext *cx, const jschar *start, const jschar *end, int base, >+ const jschar **endp, jsdouble *dp) >+{ >+ JS_ASSERT(start <= end); >+ JS_ASSERT(2 <= base && base <= 36); >+ >+ const jschar *s = start; >+ jsdouble d = 0.0; >+ do { >+ int digit; >+ jschar c = *s; This dereferences s, but the range might be empty (according to the assertion; the comments in jsnum.h don't really say). Maybe move the condition to the top of the loop. >+ /* If we haven't reached the limit of integer precision, we're done. */ >+ if (d < 9007199254740992.0) >+ return true; Give this constant a name, maybe? It might be nice to point out in a comment that this is 2^53 (and why).

Jason Orendorff [:jorendorff]

Comment 10

•

15 years ago

In jsnum.h, StringToNumberType: > /* ECMA doesn't allow signed hex numbers (bug 273467). */ > if (end - bp >= 2 && bp[0] == '0' && (bp[1] == 'x' || bp[1] == 'X')) { > /* Looks like a hex number. */ >- if (!js_strtointeger(cx, bp, end, &ep, 16, &d) || >- js_SkipWhiteSpace(ep, end) != end) { >+ const jschar *bp, *end, *endptr; >+ str->getCharsAndEnd(bp, end); These two lines look wrong. I think you just want to declare endptr. It causes this to produce the wrong answer in this case: assertEq(+" 0x4", 4);

Jason Orendorff [:jorendorff]

Comment 11

•

15 years ago

In js/src/tests/ecma_5/Global/parseInt-01.js: >+ * 4. If S is not empty and the first character of S is a minus sign -, let >+ * sign be âˆ’1. Looks like UTF-8 botched by bugzilla. If that would be adequately represented by an ASCII hyphen, let's change it.

Jeff Walden [:Waldo]

Assignee

Comment 12

•

15 years ago

(In reply to comment #9) > You mean if (bit == -1), right? Actually, since we have the fits-in-double-as-exact-integer check, bit's always going to be 1 here -- now asserting that in my local version of the patch. > This dereferences s, but the range might be empty (according to the assertion; > the comments in jsnum.h don't really say). Maybe move the condition to the top > of the loop. The range should never be empty, assertion was over-generous and comment under-precise: added "non-empty" before "range [start, end)" in the declaration comment. > Give this constant a name, maybe? It might be nice to point out in a comment > that this is 2^53 (and why). Gave it a name, changed its initializer to uint64(1) << 53 to make the value absolutely clear. The corresponding comment should make clear its derivation: +/* + * The largest positive integer such that all positive integers less than it + * may be precisely represented using the IEEE-754 double-precision format. + */ (In reply to comment #10) > These two lines look wrong. Yikes. Fixed, and added this to js/src/tests/ecma/TypeConversion/9.3.1-1.js: +var ws = ["", + " ", + " ", + " "]; + +for (var i = 0, sz = ws.length; i < sz; i++) +{ + var start = ws[i]; + for (var j = 0; j < sz; j++) + { + var end = ws[j]; + new TestCase( SECTION, "Number( '" + start + "' + '0xA' )", 10, Number( start+'0xA') ); + + new TestCase( SECTION, "Number( '0xA' + '" + end + "' )", 10, Number( '0xA'+end) ); + + new TestCase( SECTION, "Number( '" + start + "' + '0xA' + '" + end + "' )", 10, Number( start +'0xA'+end ) ); + } +} The bp + 2 technically exactly skips the 0x at start, so the varying-whitespace testing isn't quite necessary, but it gives me a little more peace of mind that the + 2 won't somehow jump over leading badness or something weird. (In reply to comment #11) > Looks like UTF-8 botched by bugzilla. If that would be adequately represented > by an ASCII hyphen, let's change it. There's more UTF-8 that will also be botched, can't just change a hyphen here or there and declare victory -- need to make sure to set the patch charset correctly when uploading. Debated uploading the updated patch, decided not to to ease review of the existing one -- I can upload it at end if you desire, or just use the revision URL post-push.

Jeff Walden [:Waldo]

Assignee

Comment 13

•

15 years ago

(In reply to comment #12) > The range should never be empty, assertion was over-generous and comment > under-precise: added "non-empty" before "range [start, end)" in the declaration > comment. ...or not, as my tests would have quickly shown me had I run them before commenting. Fixed locally by moving the condition to the top as you suggested.

Jason Orendorff [:jorendorff]

Comment 14

•

15 years ago

Comment on attachment 459482 [details] [diff] [review] Patch and tests I found bug 582643 (`0x` parsed as 0) while reviewing this code. In ParseInt: > jsdouble d; >- >- str->getCharsAndEnd(bp, end); >- if (!js_strtointeger(cx, bp, end, &ep, 0, &d) || ep == bp) >+ if (!ParseIntStringHelper(cx, start, end, 10, true, &d)) > return js_NaN; So, the way this ignores OOM is kind of gross. Preexisting, but please comment the grossness. In ecma_5/Global/parseInt-01.js: >+/* >+6. Let R = ToInt32(radix). >+*/ >+ >+upvar = ""; >+str = >+ { toString: function() { if (!upvar) upvar = "string"; return "42"; } }; >+radix = >+ { toString: function() { if (!upvar) upvar = "radix"; return "10"; } }; >+ >+assertEq(parseInt(str, radix), 42); >+assertEq(upvar, "string"); Might as well add a few more here, since this step is so awesome: assertEq(parseInt("123", null), 123); assertEq(parseInt("123", undefined), 123); assertEq(parseInt("123", NaN), 123); assertEq(parseInt("123", -0), 123); assertEq(parseInt("10", 72057594037927950), 16); assertEq(parseInt("10", -4294967292), 4); assertEq(parseInt("0x10", 1e308), 16); assertEq(parseInt("10", 1e308), 10); In js1_8_5/extensions/parseInt-octal.js: >+assertEq(parseInt("08"), 0); >+assertEq(parseInt("09"), 0); >+ >+function strictParseInt(s) { "use strict"; return parseInt(s); } >+ >+assertEq(strictParseInt("08"), 8); >+assertEq(strictParseInt("09"), 9); Could you add a trace-test for this too?

Attachment #459482 - Flags: review?(jorendorff) → review+

Jeff Walden [:Waldo]

Assignee

Comment 15

•

15 years ago

http://hg.mozilla.org/tracemonkey/rev/5ec5ae37827f

Whiteboard: fixed-in-tracemonkey

Jeff Walden [:Waldo]

Assignee

Comment 16

•

15 years ago

Attached patch Make parseInt("0A", 16) work correctly (obsolete) — Details — Splinter Review

It seems the if-not-strict-then-parse-as-octal check is in slightly the wrong place: if an explicit radix 16 is passed, then we attempt to infer octal-ness, which is undesirable and contrary to what ES3 permitted. I believe this reorganization cures the problem, but I'm still waiting for JS testsuite results to be absolutely certain -- I'd thought I stared at the current code hard enough to have convinced myself of this already, but a test (Linux-only, rectified in this patch) exposed my folly.

Attachment #461627 - Flags: review?(jorendorff)

Jeff Walden [:Waldo]

Assignee

Comment 17

•

15 years ago

Attached patch Better followup — Details — Splinter Review

A trace test (of all things) was failing with an assertion because ParseInt was marked as DOUBLE and not DOUBLE_FAIL, parseInt("0") was needlessly leaving trace to check for in-strict-mode, and maybe a tweak or two more -- check patch interdiff if interested, mind no longer remembers everything tweaked.

Attachment #461627 - Attachment is obsolete: true

Attachment #461685 - Flags: review?(jorendorff)

Attachment #461627 - Flags: review?(jorendorff)

Jason Orendorff [:jorendorff]

Comment 18

•

15 years ago

Comment on attachment 461685 [details] [diff] [review] Better followup Right, much better. I don't know why I didn't catch this on review. Shame on me. Now that ParseInt is DOUBLE_FAIL, I think it's easy to fix bug 583126: if (!ParseIntStringHelper(...)) { SetBuiltinError(cx); return js_NaN; }

Attachment #461685 - Flags: review?(jorendorff) → review+

Jeff Walden [:Waldo]

Assignee

Comment 19

•

15 years ago

http://hg.mozilla.org/tracemonkey/rev/0d2e095b2eda I'll push the fix for bug 583126 as a followup, in a bit.

Jeff Walden [:Waldo]

Assignee

Comment 20

•

15 years ago

This was pushed to m-c, so closing to get off the ES5 dependency tree radar: http://hg.mozilla.org/mozilla-central/rev/5ec5ae37827f http://hg.mozilla.org/mozilla-central/rev/0d2e095b2eda but in bug 583925 we reverted the semantic change to parse as decimal only in strict mode.

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Patch and tests 15 years ago Jeff Walden [:Waldo] 27.78 KB, patch	jorendorff : review+	Details \| Diff \| Splinter Review
Make parseInt("0A", 16) work correctly 15 years ago Jeff Walden [:Waldo] 4.87 KB, patch		Details \| Diff \| Splinter Review
Better followup 15 years ago Jeff Walden [:Waldo] 5.95 KB, patch	jorendorff : review+	Details \| Diff \| Splinter Review