Closed Bug 349814 Opened 18 years ago Closed 18 years ago

Decompiler escapes line breaks and backslashes within E4X literals

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla1.8.1

People

(Reporter: jruderman, Assigned: brendan)

Details

(Keywords: testcase, verified1.8.1)

Attachments

(3 files, 2 obsolete files)

z = function ()
{ 
  a =
    <x>
      <y/>
    </x>;
}

alert(z);
alert(eval("" + z));


Result: first alert shows \n, second alert shows \\n.

Expected: both alerts should show actual line breaks.
Attached file testcase
diff -w next.

/be
Assignee: general → brendan
Status: NEW → ASSIGNED
Attachment #236587 - Flags: review?(mrbkap)
Review this for best results.

/be
Summary: function toString incorrectly escapes line breaks and backslashes within E4X literals → Decompiler escapes line breaks and backslashes within E4X literals
OS: Mac OS X 10.4 → All
Priority: -- → P2
Hardware: Macintosh → All
Target Milestone: --- → mozilla1.8.1
Attached patch fix, v2Splinter Review
Parser already selects JSOP_QNAMEPART, so the code generator tweak is unnecessary (along with SRC_UNQUOTE).

/be
Attachment #236587 - Attachment is obsolete: true
Attachment #236670 - Flags: review?(mrbkap)
Attachment #236587 - Flags: review?(mrbkap)
Attachment #236588 - Attachment is obsolete: true
Attachment #236670 - Flags: review?(mrbkap) → review+
Fixed on trunk.

/be
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Comment on attachment 236670 [details] [diff] [review]
fix, v2

Nominating for 1.8.1.  This patch is bulked up by cleanup, but the cleanup is important to avoid confusing trunk vs. branch differences, some of this cleanup will very likely be needed for the remaining important js1.7 bugfixes, and the remainder is provably safe (especially the cosmetic-only cleanup).  Here's what is going on:

>Index: jsemit.c
...
>     /* The right side of the descendant operator is implicitly quoted. */
>-    if (op == JSOP_DESCENDANTS && right->pn_op == JSOP_STRING &&
>-        js_NewSrcNote(cx, cg, SRC_UNQUOTE) < 0) {
>-        return JS_FALSE;
>-    }
>+    JS_ASSERT(op != JSOP_DESCENDANTS || right->pn_type != TOK_STRING ||
>+              right->pn_op == JSOP_QNAMEPART);

The parser already selected JSOP_QNAMEPART, so we don't have to use SRC_UNQUOTE.  This gets rid of that source note, leaving a gap we may need for destructuring decompilation (bug 346642).

>@@ -6027,21 +6025,21 @@ js_EmitTree(JSContext *cx, JSCodeGenerat
> 
> JS_FRIEND_DATA(JSSrcNoteSpec) js_SrcNoteSpec[] = {
>     {"null",            0,      0,      0},
>     {"if",              0,      0,      0},
>     {"if-else",         1,      0,      1},
>     {"while",           1,      0,      1},
>     {"for",             3,      1,      1},
>     {"continue",        0,      0,      0},
>-    {"decl",            1,      0,      0},
>+    {"decl",            1,      0,      1},

This is actually an important change, see the comment in jsemit.h highlighted below.

>Index: jsemit.h
...
>-/* Constants for the SRC_DECL source note. */
>+/*
>+ * Constants for the SRC_DECL source note.  Note that span-dependent bytecode
>+ * selection means that any SRC_DECL offset greater than SRC_DECL_LET may need
>+ * to be adjusted, but these "offsets" are too small to span a span-dependent
>+ * instruction, so can be used to denote distinct declaration syntaxes to the
>+ * decompiler.
>+ */

Without the change from 0 to 1 for the last member initializer in the "decl" row of js_SrcNoteSpec, when a very long let block has its code generated, and it is long enough to overflow 16-bit signed immediate jump offsets, the span-dependent instruction selection algorithm kicks in and extends enough jumps to 32-bit signed immediate forms for the entire block to compile correctly.  This will require the SRC_DECL source note's first offset (a different kind of immediate operand, similar to a jump offset) to be adjusted upward as jumps are extended.  But without that little 1 at the end of the row, the span-dependent instruction selector would not know to update the SRC_DECL that tells the decompiler how to find the end of the let block.

That means not only broken decompilation, but at the limit, memory safety bugs.  The decompiler, being the hardest case and most red-headed of step-children, tries to defend against bugs in its own understanding of bytecode not matching the code generators, but it may fail.  So we want this fix, even if we don't care about >64K code in let blocks.

>Index: jsopcode.c
...
>+#define QUOTE_IN_XML    0x10000
>+
> static char *
>-QuoteString(Sprinter *sp, JSString *str, jschar quote)
>+QuoteString(Sprinter *sp, JSString *str, uint32 quote)
... 
>         /* Use js_EscapeMap, \u, or \x only if necessary. */
>         if ((u = js_strchr(js_EscapeMap, c)) != NULL) {
>-            ok = Sprint(sp, "\\%c", (char)u[1]) >= 0;
>+            ok = inXML
>+                 ? Sprint(sp, "%c", (char)c) >= 0
>+                 : Sprint(sp, "\\%c", (char)u[1]) >= 0;

This is the main bug fix, to avoid escaping (and then double-escaping, etc. on round-trips through the decompiler) control characters in XML literals.  It's very safe, just communicating the inXML flag from the main decompiler function (Decompile) into the QuoteString helper, without hacking a second parameter all over the place (QuoteString is called a lot).  Hence QUOTE_IN_XML, a flag bit passed in above the valid 16 jschar bits used by the engine to communicate a quote character (which is always 0, ' or ").

> #ifdef JS_C_STRINGS_ARE_UTF8
>             /* If this is a surrogate pair, make sure to print the pair. */
>             if (c >= 0xD800 && c <= 0xDBFF) {
>                 jschar buffer[3];
>                 buffer[0] = c;
>                 buffer[1] = *++t;
>                 buffer[2] = 0;
>                 if (t == z) {
>+                    char numbuf[10];
>+                    JS_snprintf(numbuf, sizeof numbuf, "0x%x", c);
>+                    JS_ReportErrorFlagsAndNumber(cx, JSREPORT_ERROR,
>+                                                 js_GetErrorMessage, NULL,
>+                                                 JSMSG_BAD_SURROGATE_CHAR,
>+                                                 numbuf);

This JS_C_STRINGS_ARE_UTF8 code is off by default still, and the fix here is to supply a missing error report instead of just failing silently via:

>                     ok = JS_FALSE;
>                     break;

as the code would do until this patch, if it were given the first part of a surrogate pair at the end of the string to quote.

>Index: jsstr.c

The JS_C_STRINGS_ARE_UTF8 missing error case made me look at jsstr.c, where I found an error to reuse for this (arguably internal, but it shouldn't be a silent failure) case.  That revealed a bunch of style violence done when JS_C_STRINGS_ARE_UTF8 went in under shaver's r= :-P.  The fixes here are all cosmetic, in on the trunk, good for 1.8 branch -- see the diff -w attachment for proof.

/be
Attachment #236670 - Flags: approval1.8.1?
Checking in regress-349814.js;
/cvsroot/mozilla/js/tests/e4x/Regress/regress-349814.js,v  <--  regress-349814.js
initial revision: 1.1
Flags: in-testsuite+
Comment on attachment 236670 [details] [diff] [review]
fix, v2

a=beltzner on behalf of 181drivers
Attachment #236670 - Flags: approval1.8.1? → approval1.8.1+
verified fixed 1.9 20060906 windows/mac*/linux
Status: RESOLVED → VERIFIED
Fixed on the 1.8 branch.

/be
Keywords: fixed1.8.1
verified fixed 1.8 1.9 20060909 windows/mac*/linux
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: