Last Comment Bug 406555 - the decompiler should not depend on JS_C_STRINGS_ARE_UTF8
: the decompiler should not depend on JS_C_STRINGS_ARE_UTF8
Status: VERIFIED FIXED
: verified1.8.1.12
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal (vote)
: ---
Assigned To: Igor Bukanov
:
Mentors:
Depends on:
Blocks: 397215
  Show dependency treegraph
 
Reported: 2007-12-03 03:57 PST by Igor Bukanov
Modified: 2008-02-07 05:37 PST (History)
7 users (show)
dveditz: blocking1.8.1.12-
bob: in‑testsuite+
bob: in‑litmus-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
v1 (1.97 KB, patch)
2007-12-03 04:03 PST, Igor Bukanov
brendan: review+
brendan: approval1.9+
Details | Diff | Splinter Review
v1 for 1.8.1 branch (1.96 KB, patch)
2008-01-11 14:21 PST, Igor Bukanov
dveditz: approval1.8.1.12+
Details | Diff | Splinter Review
trunk and branch diff (797 bytes, patch)
2008-01-11 14:23 PST, Igor Bukanov
no flags Details | Diff | Splinter Review

Description Igor Bukanov 2007-12-03 03:57:02 PST
Currently the decompiler includes JS_C_STRINGS_ARE_UTF8 define to avoid using \u escapes in string literals when JS_C_STRINGS_ARE_UTF8 is defined. This is wrong since that makes the decompiler output (which is UCS-2 string) depends on the details of encoding schema. Moreover, it also introduces incompatible change as the specs allows arbitrary UCS-2 chars in JS strings including ones representing invalid UTF-16 surrogate pairs. with JS_C_STRINGS_ARE_UTF8 defined such output would lead to errors.

Thus I suggest to remove that escapes avoidance under JS_C_STRINGS_ARE_UTF8 and always use \u escapes for consistency.
Comment 1 Igor Bukanov 2007-12-03 04:03:55 PST
Created attachment 291193 [details] [diff] [review]
v1

The removal of escape avoidance under JS_C_STRINGS_ARE_UTF8 in the decompiler.
Comment 2 Jeff Walden [:Waldo] (remove +bmo to email) 2007-12-03 07:41:42 PST
Note that there are some cases in which it's impossible to avoid outputting UTF-16, e.g. an XML element literal whose name contains a non-ASCII character.  I remember finding one other instance where escapes couldn't be used, but I can't remember what it was right now.  I don't know what the solution is here; I'm kind of a fan of switching the stack to use jschar on the principle that decompilation isn't used enough to worry about the excess memory consumption, but there might be an alternative sometimes.
Comment 3 Igor Bukanov 2007-12-03 08:18:09 PST
(In reply to comment #2)
> Note that there are some cases in which it's impossible to avoid outputting
> UTF-16, e.g. an XML element literal whose name contains a non-ASCII character. 

Right, one can not escape it with &# code points. But this is for another bug if SM can not reproduce it in the decompiler output. This bug relates to the output of string literals that can be escaped. 

Since the output of the decompiler gives a canonical representation of string literals, it should not depend on the details of converting jschar sequences into the char arrays. This is what this bug suggests to fix.
Comment 4 Brendan Eich [:brendan] 2007-12-03 14:24:52 PST
Comment on attachment 291193 [details] [diff] [review]
v1

Ugh, I should have caught this during review of the patch for bug 315974.

/be
Comment 6 Igor Bukanov 2007-12-04 01:42:23 PST
I suggest to have the fix on 1.8.1 branch so embeddings that use JS_C_STRINGS_ARE_UTF8 would not be hit by it. This is very safe code change as it removes the code inside "if JS_C_STRINGS_ARE_UTF8" and that JS_C_STRINGS_ARE_UTF8 is 0 in browser builds. 

Comment 7 Daniel Veditz [:dveditz] 2008-01-08 16:55:45 PST
Comment on attachment 291193 [details] [diff] [review]
v1

approved for 1.8.1.12, a=dveditz for release-drivers
Comment 8 Daniel Veditz [:dveditz] 2008-01-08 16:56:19 PST
approvable, but not blocking
Comment 9 Igor Bukanov 2008-01-11 14:21:24 PST
Created attachment 296597 [details] [diff] [review]
v1 for 1.8.1 branch

This is a trivial merge of the trunk patch.
Comment 10 Igor Bukanov 2008-01-11 14:23:11 PST
Created attachment 296598 [details] [diff] [review]
trunk and branch diff

This is a diff to confirm a triviality of merge: both patches removes the same code.
Comment 11 Igor Bukanov 2008-01-11 14:25:02 PST
Comment on attachment 296597 [details] [diff] [review]
v1 for 1.8.1 branch

Asking for an approval once again as trunk patch required a trivial but still merging by hands.
Comment 12 Daniel Veditz [:dveditz] 2008-01-14 16:20:03 PST
Comment on attachment 296597 [details] [diff] [review]
v1 for 1.8.1 branch

approved for 1.8.1.12, a=dveditz for release-drivers
Comment 13 Igor Bukanov 2008-01-15 02:04:53 PST
I checked in the patch from comment 9 to MOZILLA_1_8_BRANCH:

Checking in jsopcode.c;
/cvsroot/mozilla/js/src/jsopcode.c,v  <--  jsopcode.c
new revision: 3.89.2.74; previous revision: 3.89.2.73
done
Comment 14 Al Billings [:abillings] 2008-01-30 15:29:44 PST
I've verified that this is checked into the 1.8 branch. Is there any other verification needed here for this change?
Comment 15 Igor Bukanov 2008-01-30 15:48:18 PST
(In reply to comment #14)
> I've verified that this is checked into the 1.8 branch. Is there any other
> verification needed here for this change?
> 

To verify this one need to compile the shell with JS_C_STRINGS_ARE_UTF8 defined and then check that the following test case does not throw any exceptions:

var expect = uneval(function() {return "\uD800\uD800";});
var f = eval(expect);
var consistency_check = f();
if (consistency_check !== "\uD800\uD800")
    throw "Unexpected result: "+consistency_check;
var actual = uneval(f);
if (actual !== expect)
    throw "Bad decompilation: "+actual;


Here \uD800\uD800 is an invalid surrogate pair which is without the patch should be rejected during the decompilation. 
Comment 16 Bob Clary [:bc:] 2008-01-31 02:42:35 PST
verified fixed 1.9.0 and for 1.8.1.12 using test in comment 15. before patch, exception is thrown, after patch and with current branch no exception.
Comment 17 Igor Bukanov 2008-01-31 14:20:03 PST
(In reply to comment #16)
> verified fixed 1.9.0 and for 1.8.1.12 using test in comment 15. before patch,
> exception is thrown, after patch and with current branch no exception.

Can you add the test case to the test suite?
Comment 18 Bob Clary [:bc:] 2008-01-31 17:21:23 PST
Will it be meaningful in the general case since we don't generally test with JS_C_STRINGS_ARE_UTF8? Or would the benefit be just having it in the tree and easy to test with for the times we do run tests with JS_C_STRINGS_ARE_UTF8? But, yes, I'll add it.
Comment 19 Igor Bukanov 2008-01-31 17:39:44 PST
(In reply to comment #18)
> Will it be meaningful in the general case since we don't generally test with
> JS_C_STRINGS_ARE_UTF8? 

Since the patch for bug 397215 landed UTF8 stuff is dynamic and can be switched on via calling JS_CStringsAreUTF8(). One day the shell may well do it automatically or at least with a switch.
Comment 20 Bob Clary [:bc:] 2008-02-07 05:37:31 PST
/cvsroot/mozilla/js/tests/js1_5/decompilation/regress-406555.js,v  <--  regress-406555.js
initial revision: 1.1

Note You need to log in before you can comment on or make changes to this bug.