Closed Bug 355674 Opened 18 years ago Closed 12 years ago

Japanese character in E4X literal is incorrectly escaped

Categories

(Core :: JavaScript Engine, defect)

PowerPC
macOS
defect
Not set
minor

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jruderman, Unassigned)

References

Details

(Keywords: testcase)

Attachments

(3 files)

js> f = eval("function() { return <\u3056/> }")
function () {
    return <\u3056/>;
}

js> eval("" + f)
typein:9: SyntaxError: illegal XML character:
typein:9:     return <\u3056/>;
typein:9: ............^
See also bug 349814 (fixed) and bug 352285.

Jeff, do you want this?
Monday and Tuesday next week are anti-suicide holidays at MIT (they call it Columbus Day, but we know better), so I should be able to look at this sometime over the "weekend".
Assignee: general → jwalden+bmo
Attached patch PatchSplinter Review
There's only one option here: the string must contain the actual Unicode character.  \ does not start an escape sequence in an XML initializer per 8.3, so we can't use \u3056.  We could use &#x3056;, which is the specified replacement for \ escapes per section 8.3, but entities aren't valid XML as element names:

data:application/xml,<&#x3056;/>

(I assume the spec doesn't intend to make that actually valid as an initializer, but I could be wrong.  I hope I'm not.)  Consequently, we're left with the literal character itself as the only way to decompile the element name.

I don't know whether "%hc"/"%c"/something else is the right format here.  %hc and %c work for me (the former partially because !JS_C_STRINGS_ARE_UTF8 means behavior fallthrough to %c) in that .charCodeAt(index) returns the right number, but I don't know if that's architecture-dependent behavior based on the size of the char type or something similar.
Attachment #242302 - Flags: review?(brendan)
Attached file Testcase (fails)
Hmm, so clearly there's something whacked with the fix, because this test fails (failure HTML to be posted momentarily).
Comment on attachment 242302 [details] [diff] [review]
Patch

Debugging with a breakpoint in str_indexOf shows that when we compare \u3056 in the decompiled version and in the string to be found, the decompiled version only includes the low 8 bits and not the high ones.  I tried various things to work around this -- "%hs" with a jschar buf[2] = {c, 0}, "%c%c" with (c&0xFF, c>>8) and vice versa, and if I remember correctly "%lc" as well, but all failed for one reason or another, mostly due to not omitting the high-order bits.

There may be a way to do this, but I've scanned dosprintf enough and think I've tried all the plausible-looking ways to do this, so I don't think I'm going to be able to fix this without a suggestion on how to do it (other than the seemingly-overkill task of changing Sprinter to use jschar* instead of char*).
Attachment #242302 - Flags: review?(brendan)
Blocks: e4x
Assignee: jwalden+bmo → general
Jeff, the testcase you posted is the same as the patch you posted.
E4X will be removed again from Spidermonkey (bug 788293)
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: