Last Comment Bug 512266 - JSON.stringify serializes \n to \u000a
: JSON.stringify serializes \n to \u000a
Status: RESOLVED FIXED
fixed-in-tracemonkey
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal with 1 vote (vote)
: mozilla6
Assigned To: Jeff Walden [:Waldo] (remove +bmo to email)
:
:
Mentors:
http://software.hixie.ch/utilities/js...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-08-24 10:23 PDT by Luke Smith
Modified: 2011-05-02 16:02 PDT (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch and test (7.37 KB, patch)
2011-03-17 18:27 PDT, Jeff Walden [:Waldo] (remove +bmo to email)
paul.biggar: review+
Details | Diff | Splinter Review
Updated for comments, plus fix a really stupid problem in the previous version (16.37 KB, patch)
2011-03-22 18:32 PDT, Jeff Walden [:Waldo] (remove +bmo to email)
no flags Details | Diff | Splinter Review
Unbitrotted a bit (16.32 KB, patch)
2011-04-13 14:35 PDT, Jeff Walden [:Waldo] (remove +bmo to email)
paul.biggar: review+
Details | Diff | Splinter Review

Description Luke Smith 2009-08-24 10:23:55 PDT
User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2
Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2

According to the latest draft version of ES 5 spec (though this has probably been in there for a while), newlines should stringify to "\n".

This is defined in step 2 of the abstract function Quote in section 15.12.3:
2. For each character C in value 
    a. If C is the double quote character or the backslash character 
        i. Let product be the concatenation of product and the backslash character. 
        ii. Let product be the concatenation of product and C.
    b. Else If C is backspace, formfeed, newline, carriage return, or tab 
        i. Let product be the concatenation of product and the backslash character.
        ii. Let abbrev be the character corresponding to the value of C as follows: 
            backspace "b" 
            formfeed "f" 
            newline "n" 
            carriage return "r" 
            tab "t" 
        iii. Let product be the concatenation of product and abbrev. 
    c. Else If C is a control character having a code unit value less than the space character 
        i. Let product be the concatenation of product and the backslash character. 
        ii. Let product be the concatenation of product and "u". 
        iii. Let hex be the result of converting the numeric code unit value of C to a String of 4 base 16 digits. 
        iv. Let product be the concatenation of product and hex. 
    d. Else 
        i. Let product be the concatenation of product and C.

Reproducible: Always

Steps to Reproduce:
JSON.stringify({x:"has\nnewline"});

Actual Results:  
{"x":"has\u000anewline"}

Expected Results:  
{"x":"has\nnewline"}
Comment 1 Jeff Walden [:Waldo] (remove +bmo to email) 2011-03-17 18:26:24 PDT
Patch anon.  It seems this is what causes us to fail, inter alia (so it appears), http://hg.ecmascript.org/tests/test262/file/ee75c6254247/test/suite/ietestcenter/chapter15/15.12/15.12.3/15.12.3-11-19.js on the test262 site.
Comment 2 Jeff Walden [:Waldo] (remove +bmo to email) 2011-03-17 18:27:11 PDT
Created attachment 520105 [details] [diff] [review]
Patch and test

This applies atop the other two JSON patches I've already pointed at you.
Comment 3 Paul Biggar 2011-03-22 15:59:15 PDT
Comment on attachment 520105 [details] [diff] [review]
Patch and test

It would be nice if we could more clearly see the spec in the code. So we see /* Steps 3-4 */ for example, but not a great deal more than that.


+            JS_ASSERT((c >> 4) < 10);
+            uint8 x = c >> 4, y = c % 16;
+            if (!sb.append('0' + x) || !sb.append(y < 10 ? '0' + y : 'a' + (y - 10)))
+                return false;

I can't follow this, though it roughly looks like what the spec says. What does 10 do? Can we have better names for x, c and y (like the exact names the spec uses)?

Ideally, I'd like to see the lines from the spec added as comments, but since the rest of the project doesn't do this, it's probably not helpful to do it only here.

I know it was there before this bug, but a comment explaining exactly what |mark| is doing would be nice.

I could be wrong, but the test looks a little weak. They test exactly this, but not a lot around it. What about things like "\\u000D", "\u000d", "\u00D"? There seems to be only success tested, not failure.


r+ with these addressed (better names required, the rest may not be needed).
Comment 4 Jeff Walden [:Waldo] (remove +bmo to email) 2011-03-22 18:32:02 PDT
Created attachment 521081 [details] [diff] [review]
Updated for comments, plus fix a really stupid problem in the previous version

In rereading I noticed a stupid bug concerning our append-maximal-nonspecial-sequences optimization -- we weren't appending that sequence if if was followed by [\b\f\n\r\t].  This refactoring fixes that, arguably even better as the maximal-sequence control flow is completely disentangled from the likely-less-predictable escaping stuff.  That problem would have caused things like |JSON.stringify("a\b") === '"\\b"'|, omitting the characters before the special character.  I figure these changes should probably get a re-review, and it won't hurt to see if I addressed your comments as desired at the same time.

(In reply to comment #3)
> +            JS_ASSERT((c >> 4) < 10);
> +            uint8 x = c >> 4, y = c % 16;
> +            if (!sb.append('0' + x) || !sb.append(y < 10 ? '0' + y : 'a' +
> (y - 10)))
> +                return false;
> 
> I can't follow this, though it roughly looks like what the spec says. What
> does 10 do? Can we have better names for x, c and y (like the exact names the
> spec uses)?

|c| is the spec name.  |x| and |y| aren't named in the spec -- "Let hex be the result of converting the numeric code unit value of C to a String of four hexadecimal digits" specifies behavior but not process.  I don't like or dislike the names, but I suspect longer names would hurt readability more than help.  The ternary picks out 0-9 or a-f as appropriate for the hex digit.

> Ideally, I'd like to see the lines from the spec added as comments, but since
> the rest of the project doesn't do this, it's probably not helpful to do it
> only here.

I've been moving that way as I make changes to existing methods, actually, and generally request the same in patches -- search for "/* Step" for examples.  dmandelin too has said he finds the pattern nice to read.  There's nothing wrong with requesting it, although optimizations can funkify step-numbering (jsnum.cpp:num_parseInt is one such case).

> I could be wrong, but the test looks a little weak. They test exactly this,
> but not a lot around it. What about things like "\\u000D", "\u000d", "\u00D"?
> There seems to be only success tested, not failure.

I added tests for JSON.stringify("\\u000D") and the like, which I believe addresses your first concern.

Assuming I understand you correctly, the second and third are things the JS scanner (jsscan.cpp) would handle when parsing the overall program, so they should be tested separately, in tests specifically targeted at the scanner/parser.
Comment 5 Jeff Walden [:Waldo] (remove +bmo to email) 2011-04-13 14:35:25 PDT
Created attachment 525820 [details] [diff] [review]
Unbitrotted a bit
Comment 6 Jeff Walden [:Waldo] (remove +bmo to email) 2011-04-26 23:22:52 PDT
http://hg.mozilla.org/tracemonkey/rev/f1751a93f665
Comment 7 Chris Leary [:cdleary] (not checking bugmail) 2011-05-02 16:02:13 PDT
cdleary-bot mozilla-central merge info:
http://hg.mozilla.org/mozilla-central/rev/f1751a93f665

Note You need to log in before you can comment on or make changes to this bug.