The default bug view has changed. See this FAQ.

eval('("\u2028")') is evaluated as valid, but, this is illegal ECMAScript source.

RESOLVED FIXED in mozilla7

Status

()

Core
JavaScript Engine
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: Yusuke Suzuki, Assigned: Waldo)

Tracking

unspecified
mozilla7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

6 years ago
User-Agent:       Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.68 Safari/534.24
Build Identifier: 

JSON is not subset of ECMAScript.

StringLiteral
"\u2028" (or "\u2029")
is valid JSON String, but invalid ECMAScript String.

but, SpiderMonkey eval parse "(...)" expression as JSON at first, so eval('("\u2028")') dosen't raise SyntaxError.

I suggests removing JSON shortcut in eval.

Reproducible: Always

Steps to Reproduce:
1. evaluate eval('"\u2028"')
2. evaluate eval('("\u2028")')

Actual Results:  
1 raises SyntaxError, but 2 doesn't raise SyntaxError.

Expected Results:  
1 and 2 should raise SyntaxError.

I saw this blog post http://timelessrepo.com/json-isnt-a-javascript-subset and remembered SpiderMonkey's eval evaluates source as JSON at first.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Blocks: 445494

Comment 1

6 years ago
Regressed Pushlog:
http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=fa851447f195&tochange=b6179390c115
Suspected:
5685f8de41fa	Nicholas Nethercote — Bug 578216 - Make eval(json-like string) fast. r=sayrer

Comment 2

6 years ago
Related discussion: http://timelessrepo.com/json-isnt-a-javascript-subset
I think the right way to do this is to have the json parser set a flag if it sees that char and if that happens we throw it all away even if the json parse succeeded and eval instead.  There's no reason to slow down the fast path for this edge case, and slowing down the edge case should be fine, imo.
Blocks: 578216
(Reporter)

Comment 4

6 years ago
Created attachment 532706 [details] [diff] [review]
add EvalJSON mode to JSONParser

> have the json parser set a flag
Yes, Good idea.

This patch adds EvalJSON flag (StrictJSON and reject "\u2028" + "\2029") to JSONSourceParser.

thanks.
Attachment #532706 - Flags: superreview?(dmandelin)
Attachment #532706 - Flags: review?(bzbarsky)
Comment on attachment 532706 [details] [diff] [review]
add EvalJSON mode to JSONParser

I can probably sr if needed, but waldo's the right reviewer here.
Attachment #532706 - Flags: review?(bzbarsky) → review?(jwalden+bmo)
Comment on attachment 532706 [details] [diff] [review]
add EvalJSON mode to JSONParser

No super-review needed here--just regular review.
Attachment #532706 - Flags: superreview?(dmandelin)
Oh how I hate \u2028 and \u2029.

One comment about the patch -- that loop in readString() is very hot.  It would be better to have two versions of the loop and test EvalJSON to determine which one to use.
(Reporter)

Comment 8

6 years ago
Created attachment 532845 [details] [diff] [review]
patch v2. move parsingMode condition check to out of the second loop.

> that loop in readString() is very hot
thanks, I fixed this patch.
move parsingMode condition (in second loop) to out of the second loop.

The first loop is a little big to write twice, and iterator "start" is used in the second loop too, so I only moved parsingMode cond in the second loop and didn't create function. Buf if this cost is too heavy, it might be good idea to write loop twice or, to create boolean template parameter to specialize character check function like,

template<bool AcceptLineParagraphSeparator>
inline bool isJSONStringLiteralRejectChar(jschar ch) {
  return ch <= 0x001F;
}

template<>
inline bool isJSONStringLiteralRejectChar<false>(jschar ch) {
  return ch <= 0x001F || ch == 0x2028 || ch == 0x2029;
}
Attachment #532706 - Attachment is obsolete: true
Attachment #532706 - Flags: review?(jwalden+bmo)
Attachment #532845 - Flags: review?(jwalden+bmo)
I looked into what the other open-source engines do here.  v8 doesn't try to parse likely JSON passed to eval as JSON, so they don't encounter this issue.  WebKit does try to parse as JSON.  They use a parser in NonStrictJSON mode.  In that mode, they use this method to check string characters:

template <LiteralParser::ParserMode mode>
static inline bool isSafeStringCharacter(UChar c)
{
    return (c >= ' ' && (mode == LiteralParser::StrictJSON || c <= 0xff) && c != '\\' && c != '"') || c == '\t';
}

So they avoid the problem by only fast-pathing for strings containing (non-control) characters all under U+00FF, and thus for \u2028 and \u2029 they fall back to the slow path.  I wonder if something similar might perhaps be the best solution.

I'm also thinking about how to get the one-off bad-character check out of the first loop.  That's the fast path, so keeping it lean as comment 7 notes is important.

Anyway: looking at this now, thinking the patch is correct but possibly not perfection as regards performance, noodling over ideas on how to address.
(Reporter)

Comment 10

6 years ago
Created attachment 535396 [details] [diff] [review]
patch v3, template based check code dispatch

Thanks for your code review and very good advice.
I read JSC "eval" code, and write patch that uses template specialization for char check function.
When using JSON.parse, isBadStringLiteralChar is specialized

static inline bool isBadStringLiteralChar(jschar c) {
    return c <= 0x001F;
}

not run \u2028 \u2029 check code.

And followed recommendation, add c >= 0x00FF guard against running \u2028 \u2029 check for char like ASCII.

inline bool isBadStringLiteralChar<JSONParser::EvalStringLiteral>(jschar c) {
    return c <= 0x001F || ((c >= 0x00FF) && (c == 0x2028 || c == 0x2029));
}
Attachment #532845 - Attachment is obsolete: true
Attachment #532845 - Flags: review?(jwalden+bmo)
Attachment #535396 - Flags: review?(jwalden+bmo)
The other thing that worries me about this is that, really, the parsing mode is a boolean.  The only reason it's an enum is that enums are more readable.  Actually making it a full-fledged enum by adding a third value to it raises my hackles.

But as I said, still thinking this through.
(Reporter)

Comment 12

6 years ago
Oh, sorry. I misunderstood that I was said that I should noodle this patch.
Oops, didn't mean to imply that -- just trying to decide what's the best-est course of action here, if we can't just not do this parse-as-JSON thing (by far the easiest solution, but also the slowest solution).

One idea I keep wondering about.  How much bad would it really be to just search potential candidates for JSON-parsing (i.e. start/end with '('/')') for U+2028 and U+2029, and if you find either bail on trying to use the JSON parser?  Actual programs don't usually trigger the JSON parser, so most execution would be unaffected.  If candidate strings are small enough, search time would be negligible.  And even if they're large, it'd still be a very tight loop to search the entire string.  The JSON parser wouldn't have to be touched at all, and the cost of U+2028/U+2029 checking would be paid entirely by the code that had to bear it.

Thoughts?
> Thoughts?

Try it and see.  The original motivation for the parse-eval-as-JSON optimization was string-tagcloud in Sunspider, apparently a 30%-ish (13ms) win according to bug 578216;  see if it's hurt by your 2028/2029 pre-scan.

One thing that's changed since then is that I've sped up vanilla JS parsing by another 30% or so, so the benefit of this whole optimization will have reduced.  While you're measuring, you could try turning it off altogether and see what happens.
Created attachment 537273 [details] [diff] [review]
Make eval not use the JSON parser if it finds U+2028 or U+2029 in the code string

I tested this patch, and it doesn't change SunSpider scores.  (v8 doesn't use eval in a way which could trigger JSON parsing.)  I didn't try altogether removing the JSON parser invocation; the JavaScript parser may have gotten faster, but so has the JSON parser.  And while I'm more than happy to slow down eval-of-JSON to check for U+2028 and U+2029 on the theory that modern code and libraries now use JSON.parse, and developers should adapt if they want the utmost performance, I hesitate at believing they've switched so completely to warrant removing the JSON parser hack altogether.  So for now I think this is the best solution.
Attachment #537273 - Flags: review?(nnethercote)
Comment on attachment 537273 [details] [diff] [review]
Make eval not use the JSON parser if it finds U+2028 or U+2029 in the code string

Review of attachment 537273 [details] [diff] [review]:
-----------------------------------------------------------------

Seems like a good compromise.  Thanks for fixing this.
Attachment #537273 - Flags: review?(nnethercote) → review+
Comment on attachment 535396 [details] [diff] [review]
patch v3, template based check code dispatch

I think we're going with the eval-specific check that doesn't touch the JSON parser.  But thanks for the work on the patch, and the patience with our review process and deciding exactly how to fix this!

I should be able to land the patch tomorrow in the TM repository.
Attachment #535396 - Flags: review?(jwalden+bmo)
(Reporter)

Comment 18

6 years ago
Thank you for your review.
I'm happy that this bug is fixed!
http://hg.mozilla.org/tracemonkey/rev/ef4d79fc60c0
Assignee: general → jwalden+bmo
Status: NEW → ASSIGNED
Whiteboard: fixed-in-tracemonkey
Target Milestone: --- → mozilla7
cdleary-bot mozilla-central merge info:
http://hg.mozilla.org/mozilla-central/rev/ef4d79fc60c0
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.