Last Comment Bug 657367 - eval('("\u2028")') is evaluated as valid, but, this is illegal ECMAScript source.
: eval('("\u2028")') is evaluated as valid, but, this is illegal ECMAScript sou...
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla7
Assigned To: Jeff Walden [:Waldo] (remove +bmo to email)
: Jason Orendorff [:jorendorff]
Depends on:
Blocks: es5 578216
  Show dependency treegraph
Reported: 2011-05-16 08:44 PDT by Yusuke Suzuki
Modified: 2011-06-13 11:01 PDT (History)
10 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

add EvalJSON mode to JSONParser (2.99 KB, patch)
2011-05-16 12:19 PDT, Yusuke Suzuki
no flags Details | Diff | Splinter Review
patch v2. move parsingMode condition check to out of the second loop. (3.31 KB, patch)
2011-05-16 19:48 PDT, Yusuke Suzuki
no flags Details | Diff | Splinter Review
patch v3, template based check code dispatch (6.28 KB, patch)
2011-05-26 10:35 PDT, Yusuke Suzuki
no flags Details | Diff | Splinter Review
Make eval not use the JSON parser if it finds U+2028 or U+2029 in the code string (4.53 KB, patch)
2011-06-03 16:55 PDT, Jeff Walden [:Waldo] (remove +bmo to email)
n.nethercote: review+
Details | Diff | Splinter Review

Description Yusuke Suzuki 2011-05-16 08:44:56 PDT
User-Agent:       Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.68 Safari/534.24
Build Identifier: 

JSON is not subset of ECMAScript.

"\u2028" (or "\u2029")
is valid JSON String, but invalid ECMAScript String.

but, SpiderMonkey eval parse "(...)" expression as JSON at first, so eval('("\u2028")') dosen't raise SyntaxError.

I suggests removing JSON shortcut in eval.

Reproducible: Always

Steps to Reproduce:
1. evaluate eval('"\u2028"')
2. evaluate eval('("\u2028")')

Actual Results:  
1 raises SyntaxError, but 2 doesn't raise SyntaxError.

Expected Results:  
1 and 2 should raise SyntaxError.

I saw this blog post and remembered SpiderMonkey's eval evaluates source as JSON at first.
Comment 1 Alice0775 White 2011-05-16 09:09:47 PDT
Regressed Pushlog:
5685f8de41fa	Nicholas Nethercote — Bug 578216 - Make eval(json-like string) fast. r=sayrer
Comment 2 Paul Biggar 2011-05-16 09:13:20 PDT
Related discussion:
Comment 3 Boris Zbarsky [:bz] (still a bit busy) 2011-05-16 09:19:46 PDT
I think the right way to do this is to have the json parser set a flag if it sees that char and if that happens we throw it all away even if the json parse succeeded and eval instead.  There's no reason to slow down the fast path for this edge case, and slowing down the edge case should be fine, imo.
Comment 4 Yusuke Suzuki 2011-05-16 12:19:56 PDT
Created attachment 532706 [details] [diff] [review]
add EvalJSON mode to JSONParser

> have the json parser set a flag
Yes, Good idea.

This patch adds EvalJSON flag (StrictJSON and reject "\u2028" + "\2029") to JSONSourceParser.

Comment 5 Boris Zbarsky [:bz] (still a bit busy) 2011-05-16 12:38:23 PDT
Comment on attachment 532706 [details] [diff] [review]
add EvalJSON mode to JSONParser

I can probably sr if needed, but waldo's the right reviewer here.
Comment 6 David Mandelin [:dmandelin] 2011-05-16 15:53:03 PDT
Comment on attachment 532706 [details] [diff] [review]
add EvalJSON mode to JSONParser

No super-review needed here--just regular review.
Comment 7 Nicholas Nethercote [:njn] 2011-05-16 17:51:09 PDT
Oh how I hate \u2028 and \u2029.

One comment about the patch -- that loop in readString() is very hot.  It would be better to have two versions of the loop and test EvalJSON to determine which one to use.
Comment 8 Yusuke Suzuki 2011-05-16 19:48:25 PDT
Created attachment 532845 [details] [diff] [review]
patch v2. move parsingMode condition check to out of the second loop.

> that loop in readString() is very hot
thanks, I fixed this patch.
move parsingMode condition (in second loop) to out of the second loop.

The first loop is a little big to write twice, and iterator "start" is used in the second loop too, so I only moved parsingMode cond in the second loop and didn't create function. Buf if this cost is too heavy, it might be good idea to write loop twice or, to create boolean template parameter to specialize character check function like,

template<bool AcceptLineParagraphSeparator>
inline bool isJSONStringLiteralRejectChar(jschar ch) {
  return ch <= 0x001F;

inline bool isJSONStringLiteralRejectChar<false>(jschar ch) {
  return ch <= 0x001F || ch == 0x2028 || ch == 0x2029;
Comment 9 Jeff Walden [:Waldo] (remove +bmo to email) 2011-05-25 16:34:35 PDT
I looked into what the other open-source engines do here.  v8 doesn't try to parse likely JSON passed to eval as JSON, so they don't encounter this issue.  WebKit does try to parse as JSON.  They use a parser in NonStrictJSON mode.  In that mode, they use this method to check string characters:

template <LiteralParser::ParserMode mode>
static inline bool isSafeStringCharacter(UChar c)
    return (c >= ' ' && (mode == LiteralParser::StrictJSON || c <= 0xff) && c != '\\' && c != '"') || c == '\t';

So they avoid the problem by only fast-pathing for strings containing (non-control) characters all under U+00FF, and thus for \u2028 and \u2029 they fall back to the slow path.  I wonder if something similar might perhaps be the best solution.

I'm also thinking about how to get the one-off bad-character check out of the first loop.  That's the fast path, so keeping it lean as comment 7 notes is important.

Anyway: looking at this now, thinking the patch is correct but possibly not perfection as regards performance, noodling over ideas on how to address.
Comment 10 Yusuke Suzuki 2011-05-26 10:35:32 PDT
Created attachment 535396 [details] [diff] [review]
patch v3, template based check code dispatch

Thanks for your code review and very good advice.
I read JSC "eval" code, and write patch that uses template specialization for char check function.
When using JSON.parse, isBadStringLiteralChar is specialized

static inline bool isBadStringLiteralChar(jschar c) {
    return c <= 0x001F;

not run \u2028 \u2029 check code.

And followed recommendation, add c >= 0x00FF guard against running \u2028 \u2029 check for char like ASCII.

inline bool isBadStringLiteralChar<JSONParser::EvalStringLiteral>(jschar c) {
    return c <= 0x001F || ((c >= 0x00FF) && (c == 0x2028 || c == 0x2029));
Comment 11 Jeff Walden [:Waldo] (remove +bmo to email) 2011-05-26 13:38:32 PDT
The other thing that worries me about this is that, really, the parsing mode is a boolean.  The only reason it's an enum is that enums are more readable.  Actually making it a full-fledged enum by adding a third value to it raises my hackles.

But as I said, still thinking this through.
Comment 12 Yusuke Suzuki 2011-05-27 01:08:08 PDT
Oh, sorry. I misunderstood that I was said that I should noodle this patch.
Comment 13 Jeff Walden [:Waldo] (remove +bmo to email) 2011-05-27 14:29:12 PDT
Oops, didn't mean to imply that -- just trying to decide what's the best-est course of action here, if we can't just not do this parse-as-JSON thing (by far the easiest solution, but also the slowest solution).

One idea I keep wondering about.  How much bad would it really be to just search potential candidates for JSON-parsing (i.e. start/end with '('/')') for U+2028 and U+2029, and if you find either bail on trying to use the JSON parser?  Actual programs don't usually trigger the JSON parser, so most execution would be unaffected.  If candidate strings are small enough, search time would be negligible.  And even if they're large, it'd still be a very tight loop to search the entire string.  The JSON parser wouldn't have to be touched at all, and the cost of U+2028/U+2029 checking would be paid entirely by the code that had to bear it.

Comment 14 Nicholas Nethercote [:njn] 2011-05-27 19:29:57 PDT
> Thoughts?

Try it and see.  The original motivation for the parse-eval-as-JSON optimization was string-tagcloud in Sunspider, apparently a 30%-ish (13ms) win according to bug 578216;  see if it's hurt by your 2028/2029 pre-scan.

One thing that's changed since then is that I've sped up vanilla JS parsing by another 30% or so, so the benefit of this whole optimization will have reduced.  While you're measuring, you could try turning it off altogether and see what happens.
Comment 15 Jeff Walden [:Waldo] (remove +bmo to email) 2011-06-03 16:55:58 PDT
Created attachment 537273 [details] [diff] [review]
Make eval not use the JSON parser if it finds U+2028 or U+2029 in the code string

I tested this patch, and it doesn't change SunSpider scores.  (v8 doesn't use eval in a way which could trigger JSON parsing.)  I didn't try altogether removing the JSON parser invocation; the JavaScript parser may have gotten faster, but so has the JSON parser.  And while I'm more than happy to slow down eval-of-JSON to check for U+2028 and U+2029 on the theory that modern code and libraries now use JSON.parse, and developers should adapt if they want the utmost performance, I hesitate at believing they've switched so completely to warrant removing the JSON parser hack altogether.  So for now I think this is the best solution.
Comment 16 Nicholas Nethercote [:njn] 2011-06-03 20:27:43 PDT
Comment on attachment 537273 [details] [diff] [review]
Make eval not use the JSON parser if it finds U+2028 or U+2029 in the code string

Review of attachment 537273 [details] [diff] [review]:

Seems like a good compromise.  Thanks for fixing this.
Comment 17 Jeff Walden [:Waldo] (remove +bmo to email) 2011-06-06 18:13:34 PDT
Comment on attachment 535396 [details] [diff] [review]
patch v3, template based check code dispatch

I think we're going with the eval-specific check that doesn't touch the JSON parser.  But thanks for the work on the patch, and the patience with our review process and deciding exactly how to fix this!

I should be able to land the patch tomorrow in the TM repository.
Comment 18 Yusuke Suzuki 2011-06-06 21:18:01 PDT
Thank you for your review.
I'm happy that this bug is fixed!
Comment 19 Jeff Walden [:Waldo] (remove +bmo to email) 2011-06-07 10:21:38 PDT
Comment 20 Chris Leary [:cdleary] (not checking bugmail) 2011-06-13 11:00:53 PDT
cdleary-bot mozilla-central merge info:

Note You need to log in before you can comment on or make changes to this bug.