Closed Bug 672484 Opened 13 years ago Closed 13 years ago

JSON parse failure on unicode escape + trailing digits

Categories

(Tamarin Graveyard :: Library, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Q3 11 - Serrano

People

(Reporter: pnkfelix, Assigned: pnkfelix)

References

Details

Attachments

(5 files)

Grammar at json.org says that \uNNNN needs no trailing terminator.

But look at this:

> JSON.parse('"Spend \\u00a3 100k"')
Spend £ 100k
> JSON.parse('"Spend \\u00a3100k"')
SyntaxError: Error #1132: Invalid JSON parse input.
	at JSON$/parseCore()
	at JSON$/parse()
	at global$init()[:1]

(This was found by an attempt to parse http://api.kongregate.com/badges.json so after it is fixed we should go back and double-check whether we can parse that file.  I will attach a copy of it here as well for posterity.)
Assignee: nobody → fklockii
Oddly enough, this bug seems like a very deliberate choice in the code for JSONParser::adv_string.  Note in particular the 'u' case, where it does this:

                case  'u':
                    m_i++;
                    wchar d1, d2, d3, d4;
                    if (m_i + 3 < m_len &&
                        isHexDigit(d1 = m_textUTF8.c_str()[m_i]) &&
                        isHexDigit(d2 = m_textUTF8.c_str()[m_i+1]) &&
                        isHexDigit(d3 = m_textUTF8.c_str()[m_i+2]) &&
                        isHexDigit(d4 = m_textUTF8.c_str()[m_i+3]) &&
                        (m_i + 3 == m_len ||
                         !isHexDigit(m_textUTF8.c_str()[m_i+4])))


That last line is the probable culprit.  It was inherited from Lars's original AS3 implementation (i.e. this is not a case where C++ burned us, but rather my adaptation of existing code).  I cannot see from ECMA-262 why Lars added that condition to his code.
(adding Jeff to CC list)

Jeff: Lars is out on vacation, so I figured you're the next best person to ask for insight on this.

Can you see anything in the ECMA-262 grammar that would have led Lars to put in a check that the unicode escape sequence is not immediately followed by a hex digit?

For reference, the relevant original code is the string() function in attachment 464839 [details] from Bug 548704.

(I'm currently planning to just remove the check if I cannot find anything in the spec that says that it needs to stay.)
See Also: → 584704
Target Milestone: --- → Q3 11 - Serrano
steven: I picked you but I bet wmaddox or jodyer could also review this easily.  So if you're short on time, reassign.
Attachment #549946 - Flags: review?(stejohns)
Attachment #549946 - Flags: review?(stejohns) → review+
changeset: 6495:725209247fc3
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 672484: allow unicode escape + trailing digits (r=stejohns).

http://hg.mozilla.org/tamarin-redux/rev/725209247fc3
Ironically, presumably as part of getting 100% branch coverage, we put in a test to check this exact case, but put the wrong answer (an exception) as the expected output.

The test in question is part of JSON/Invalid.as in the acceptance test suite; its the one that checks the string literal "\u000FF "

It's easy to translate the test into a format where its testable in Firefox.  I did that, and verified that Firefox will indeed successfully parse the unicode escape with the trailing digit.  I'll post that test here in a moment.
Here's the example HTML I promised in comment 6.
Comment on attachment 549989 [details] [diff] [review]
comment out erroneous acceptance test

(adding dschaffe for super-review; I'm planning to push this but he should probably just glimpse at it when he gets back.)
Attachment #549989 - Flags: superreview?(dschaffe)
changeset: 6496:0a8dfade89a2
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 672484: comment out erroneous JSON acceptance test case (r=fklockii, sr pending=dschaffe).

http://hg.mozilla.org/tamarin-redux/rev/0a8dfade89a2
(In reply to comment #5)
> changeset: 6495:725209247fc3
> user:      Felix S Klock II <fklockii@adobe.com>
> summary:   Bug 672484: allow unicode escape + trailing digits (r=stejohns).
> 
> http://hg.mozilla.org/tamarin-redux/rev/725209247fc3

This was cherry-picked to TR-serrano in the following changeset:

changeset:   6342:16329e069968
user:        Felix S Klock II <fklockii@adobe.com>
date:        Mon Aug 01 18:40:11 2011 -0400
summary:     Bug 672484: allow unicode escape + trailing digits (r=stejohns).
(In reply to comment #10)
> changeset: 6496:0a8dfade89a2
> user:      Felix S Klock II <fklockii@adobe.com>
> summary:   Bug 672484: comment out erroneous JSON acceptance test case
> (r=fklockii, sr pending=dschaffe).
> 
> http://hg.mozilla.org/tamarin-redux/rev/0a8dfade89a2

This was cherry-picked to TR-serrano in the following changeset:

changeset:   6344:3fc738d982d2
user:        Felix S Klock II <fklockii@adobe.com>
date:        Mon Aug 01 18:39:00 2011 -0400
summary:     Bug 672484: comment out erroneous JSON acceptance test case (r=fklockii, sr pending=dschaffe).
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
(In reply to Felix S Klock II from comment #3)
> (adding Jeff to CC list)
> 
> Can you see anything in the ECMA-262 grammar that would have led Lars to put
> in a check that the unicode escape sequence is not immediately followed by a
> hex digit?

That's a bug.  It was probably inherited via thoughtless (or too thoughtful) copying from the definition of NumericLiteral; in E262-5 note the normative comment near the top of page 21, "The source character immediately following a NumericLiteral must not be an IdentifierStart or DecimalDigit."  For hex escape sequences, which are always fixed length, that does not matter at all.
Attachment #551509 - Flags: review?(dschaffe)
(In reply to Felix S Klock II from comment #14)
> Created attachment 551509 [details] [diff] [review] [diff] [details] [review]
> new regression tests

review ping.
Comment on attachment 551509 [details] [diff] [review]
new regression tests

(switching to a SR? and doing a self-review so that I can land this proactively.)
Attachment #551509 - Flags: review?(dschaffe) → superreview?(dschaffe)
changeset: 6748:c8e7164e7bd1
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 672484: regression tests (r=fklockii, sr pending=dschaffe).

http://hg.mozilla.org/tamarin-redux/rev/c8e7164e7bd1
Attachment #549989 - Flags: superreview?(dschaffe) → superreview+
Attachment #551509 - Flags: superreview?(dschaffe) → superreview+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: